Wikipedia:Bots/Requests for approval/Bender the Bot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Bender the Bot 2
Operator:

Time filed: 19:48, Saturday, August 20, 2016 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AutoWikiBrowser

Source code available:

Function overview: HTTP → HTTPS conversion for Google News and Google Books links

Links to relevant discussions (where appropriate): Village pump (proposals)/Archive 127

Edit period(s): one time run

Estimated number of pages affected: conservatively guessed 100k (but possibly 300k or more)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Since the transition of Internet Archive links to HTTPS is finished and WaybackMedic will take care of Wayback Machine, I want to now fix links to Google services, starting with Google News and Google Books. The bot should find the string
 * and  (see below)
 * and

replaced with
 * and, respectively

The reasons for the change to HTTPS in general have already been elaborated in the RfC. In this particular case, note that  automatically redirects to HTTPS (ever since 2012 or so). That means links from Wikipedia (which is HTTPS by default) go HTTPS→HTTP→HTTPS, which not only is slower than HTTPS→HTTPS, but also breaks the HTTP Referrer (per RFC 2616 §15.1.3).

Furthermore, I wanted to combine the HTTPS move with a change in the TLD to, especially for those international TLD considered "sensitive" in certain regions (like   in Arab countries, or   in China).

Discussion
Isn't (editor) the regex that should get replaced with  ?--Joel Amos (talk) 18:34, 22 August 2016 (UTC)
 * Yes it is. Sorry, I had that wrong. Fixed above. Thanks. --bender235 (talk) 19:01, 22 August 2016 (UTC)
 * That's fine. Also, the brackets aren't needed around the "s" and a backward slash should precede the first "." (my bad). Also, you'll want to remove the trailing slash from the replacement string so that it doesn't change to   edit: beat me to it :D --Joel Amos (talk) 19:39, 22 August 2016 (UTC)
 * Fixed the backslash (although it worked fine when I tested it). --bender235 (talk) 19:53, 22 August 2016 (UTC)
 * An un-escaped dot means "any character," so the old regex would've matched false positives (e.g. news@google.com).--Joel Amos (talk) 02:09, 23 August 2016 (UTC)
 * Fair enough. --bender235 (talk) 14:35, 23 August 2016 (UTC)


 * What now? Should I have a trial run of 100 articles like with the previous Internet Archive conversion? --bender235 (talk) 23:39, 26 August 2016 (UTC)
 * This may require multiple round of trials (hopefully increasing in size). Please run a short trial and post the initial results below. Please include in all summaries either a link to this BRFA trial or other ways for concerned editors to easily know what was going on and make a reply. —  xaosflux  Talk 02:51, 27 August 2016 (UTC)
 * — xaosflux  Talk 02:51, 27 August 2016 (UTC)


 * Results are in edit history. Found one issue, on E. R. Cowell: the Regex not only caught the URL, but also the pseudo-URL in the   parameter and crippled the rest of the citation template (ran manually, didn't save). Best solution would be to have things like   replaced with   (obviously Google Books is not the publisher of the books). Or, and that is the easier option for now, make the   in the Regex non-optional, so that it only replaces true URLs. Actually, I suggest the latter to keep this bot as simple as possible. --bender235 (talk) 22:53, 27 August 2016 (UTC)


 * So, any further requests or can this bot go live? --bender235 (talk) 20:56, 6 September 2016 (UTC)


 * Due to the huge size of your bot run, I'd like you to run a longer trial to give more opportunity for any odd issues to come up and get caught by other editors. — xaosflux  Talk 04:43, 15 September 2016 (UTC)


 * — xaosflux  Talk 04:43, 15 September 2016 (UTC)


 * Fair enough. --bender235 (talk) 14:16, 15 September 2016 (UTC)
 * . Didn't spot any unusual behavior. --bender235 (talk) 15:49, 15 September 2016 (UTC)


 * Due to your large run size, please ramp up in stages up to the following, this will allow brief periods for unknown issues to be brought to your attention.
 * 3000 edits, 24 hour pause
 * 4000 edits, 24 hour pause
 * 5000 edits, 24 hour pause
 * 10000 edits, 24 hour pause
 * 50000 edits, 24 hour pause
 * Rest of run. — xaosflux  Talk 01:19, 19 September 2016 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.