Wikipedia:Bots/Requests for approval/H3llBot 11


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

H3llBot 11
Operator:

Time filed: 11:03, Friday August 23, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): C#, custom API

Source code available: No

Function overview: below

Links to relevant discussions (where appropriate): --

Edit period(s): Continuous

Estimated number of pages affected: <500 per Category:Pages with archiveurl citation errors then as they come up

Exclusion compliant (Yes/No): Y

Already has a bot flag (Yes/No): Y

Function details:

Appending H3llBot 4 (User:H3llBot/U2A):

In citations, when the archiveurl or url are set to an archive service link, but the corresponding url is not set or archiveurl isn't used, set the missing fields and fill in the date if needed. H3llBot 4 already covers this for urls and dates I can parse out of the citations. However, the majority of Category:Pages with archiveurl citation errors are using shorthand archive urls, so I need to actually browse the pages and retrieve the url/data.

For example, |Aaron Van Cleave has 2 errors. Citations have http://www.webcitation.org/64zXFfeH5 and http://www.webcitation.org/6B2tdaqFt links, which need browsing to get the actual values -- http://www.isuresults.com/bios/isufs00012936.htm at 2012-01-26 and http://www.isuresults.com/bios/isufs00012936.htm at 2012-09-29.

I feel this is different enough in technology (actually reliably browsing the websites, editors can't tell the url/date from markup, and I need to implement each site-specific check) that this warrants a BRFA.

I'll try and add all the major/accepted archive providers I come across, including Wayback (Internet Archive), Webcite, Archive.is, Google Cache, etc.

Here is a sandbox edit with common providers converted/filled in (webcitation is down atm, but that one can be seen in previous edits).

For the record, I have also upgraded the original task with a few other parameter misuse cases. A popular being setting archiveurl, but not url. The logic is exactly the same, except the archive url itself was already in the correct location. You can see this in recent contribs.

Discussion
As an outsider, this looks good to me Hasteur (talk) 14:40, 20 September 2013 (UTC)

Anomie⚔ 00:04, 17 October 2013 (UTC)

D Has this trial taken place? Josh Parris 11:13, 5 November 2013 (UTC)

I made a batch of edits, they are last in contributions (along with earlier trial and previous incremental task upgrades before I decided this should be a full BRFA). Here's a good example of massive url misuse and lost original urls. Also found a blacklisted url. — HELL KNOWZ  ▎TALK 11:32, 5 November 2013 (UTC)
 * For prosperity, a permalink to the edits Josh Parris 12:08, 5 November 2013 (UTC)
 * The URL injected in this edit broke the wikitext, you'll need to do some additional escaping for certain characters that might be used in URLs but that MediaWiki doesn't recognize ( is what MediaWiki doesn't recognize). I also see in a few of the earlier edits (e.g., , ) the URL was present but in a misnamed parameter.
 * Anyway, since all that seems rare and easy to fix and I have confidence you will fix them, Anomie⚔ 20:23, 5 November 2013 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.