Wikipedia:Bots/Requests for approval/GreenC bot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

GreenC bot 3
Operator:

Time filed: 04:40, Thursday, November 3, 2016 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): GNU awk

Source code available: https://github.com/greencardamom/WebArchiveMerge

Function overview: TfM consensus to merge 4 templates into a 5th template; of which the bot will merge two, and I will manually merge the other two.

Links to relevant discussions (where appropriate): Templates_for_discussion/Log/2016_October_24

Edit period(s): Periodic batch runs until complete.

Estimated number of pages affected: 100,000

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Merge 2 templates into. The 2 templates are,. The TfM also includes merger of and  but for various reasons I'll be doing these manually. About 95% of the merger is the other 5%.

A typical merger will look:


 * old:
 * new:

The bot checks dates to make sure a date argument exists if otherwise missing, by decoding the date from the URL. Webcite IDs uses base62 encoding to unix-time. It preserves date formats iso, dmy, mdy and ymd. Interprets positional arguments and converts to named arguments. Converts short-form Webcite URLs to long-form per RfC, using the API.

Discussion

 * Please review this request - there is one conflict between the summary and the description - I don't think you mean to touch ? — xaosflux  Talk 10:58, 3 November 2016 (UTC)
 * Fixed. Definitely don't want to merge citeweb :) -- Green  C  14:12, 3 November 2016 (UTC)
 * I think the proposer meant WebCite. Also, the overview says "merge 4 templates", but the bot appears to merge two templates into a third. Minorly confusing. – Jonesey95 (talk) 12:53, 3 November 2016 (UTC)
 * Yeah the other two I'm doing manually. -- Green  C  14:12, 3 November 2016 (UTC)


 * (50 of each template). Please post results below. —  xaosflux  Talk 15:20, 3 November 2016 (UTC)


 * . The trial is 50 articles containing and 50 . There is overlap with some articles containing both templates, but anyway 100 articles total.
 * Webcite: 50 edits (Migration to Xinjiang to Les Valses de Vienne)
 * Wayback: 50 edits (Fetal rights to Barat College). Actually 51 edits because Fetal rights was done twice to fix garbage data.


 * The new template has tracking categories for error checking so problems will usually show up there and those cats are clean post-trial. I also manually checked each edit and they seem OK.
 * -- Green  C  17:33, 4 November 2016 (UTC)
 * Why are these getting encoded in different formats? When possible, : is preferable to %3A. —  xaosflux  Talk 02:11, 5 November 2016 (UTC)
 * Ok that's in the query portion of the string (following the "?") which requires encoding. I'm following RFC 3986. In section 2.3 the ':' is not listed as unreserved (ie. characters that should not be percent-encoded). According to section 3.4 on query strings, the '/' and '?' should be encoded, but because the "value is [often] a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters." Thus only the ':' needs to be encoded. See similar behavior with IABot. --  Green  C  03:45, 5 November 2016 (UTC)
 * I started a Village Pump to see if anyone has more thoughts. Village_pump_(technical) -- Green  C  04:51, 5 November 2016 (UTC)
 * Thanks - I just noticed it looked a bit odd, ping me back after the VPT discussion runs its course. — xaosflux  Talk 13:34, 5 November 2016 (UTC)
 * There was a good answer there, and I will go ahead and not encode the : or / for webcitation.org queries, unless something else comes up. But this question is likely even more relevant to User:Cyberpower678's IABot which is doing thousands of new webcitation.org URLs encoding : and / (example). -- Green  C  15:43, 5 November 2016 (UTC)


 * with updated parameters. — xaosflux  Talk 17:48, 5 November 2016 (UTC)
 * Wayback (250): (Talk:Flag of Northern Ireland to List of districts in Kerala)
 * Webcite (250): (Richard B. Teitelman to Big Sandy Creek (Cheat River))
 * — Preceding unsigned comment added by Green Cardamom (talk • contribs)
 * Thank you, I'd like to let this sit for 48 hours in the event there are any issues brought up by editors, baring none this will be approved. — xaosflux  Talk 16:32, 6 November 2016 (UTC)
 * Thank you, I'd like to let this sit for 48 hours in the event there are any issues brought up by editors, baring none this will be approved. — xaosflux  Talk 16:32, 6 November 2016 (UTC)


 * Task approved. — xaosflux  Talk 23:21, 8 November 2016 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.