Wikipedia:Bots/Requests for approval/Bender the Bot 7


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Bender the Bot 7
Operator:

Time filed: 01:06, Friday, January 20, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: upon request

Function overview: replace  with   for the New York Times domain.

Links to relevant discussions (where appropriate): WPR: Why we should convert external links to HTTPS wherever possible and WPR: Should we convert existing Google and Internet Archive links to HTTPS?

Edit period(s): one-time run

Estimated number of pages affected: about 100,000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Bgwhite recently pointed me at Secure The News, a project of the Freedom of the Press Foundation, conveniently listing all major news outlets that enable HTTPS access. Having already converted The Guardian links earlier, I want to work through that list one by one, starting with The New York Times who proudly announced their activation of HTTPS a week ago.

We have a lot of NYT links (my conservative guess is 100k pages), and while the NYT announcement says so far only "articles published in 2014 and later" are HTTPS accessible, I want to convert them all right now for two reasons: (1) it does not break older links (for example), only redirect to HTTP again; but if NYT does that on their site, at least they keep the HTTP Referrer information. And (2) as they announced they "intend to bring the rest of our site under the HTTPS umbrella", so it's only a matter of time.

Discussion

 * Here is a trial approval to test your code, do you have any statistics as to how many of the links you will change will end up being currently useless due to the remote server changing them back to http? — xaosflux  Talk 05:09, 21 January 2017 (UTC)


 * I don't have an accurate number, but I would guess as of today about 70-80% of the links would be re-routed to HTTP on the NYT server. This number will gradually go to zero over the next couple of months. (By the way, the example link above already works with HTTPS on mobile; desktop will follow soon.) --bender235 (talk) 12:07, 21 January 2017 (UTC)
 * Edit history obviously here. --bender235 (talk) 16:21, 21 January 2017 (UTC)
 * Due to the massive size of this request, I have posted a link in from WP:VPR. Placing on hold for any initial community comments. —  xaosflux  Talk 17:20, 26 January 2017 (UTC)


 * I have no problem with this and it's a great idea and thanks for BB for doing this important work. My question is why not run the bot for multiple sites? It would reduce the number of edits if > 1 site could be processed at once. -- Green  C  17:57, 26 January 2017 (UTC)
 * So you're saying I should also convert, say Techdirt and Bloomberg and others in the same bot run? I figured it would be more organized if I do one domain at a time. --bender235 (talk) 22:14, 26 January 2017 (UTC)


 * Support I support this proposal. In addition to the 50 edit trial I'll throw out another possibility but ignore if other bot experts think this is overkill. Given the large number of affected references (approximately 100,000), I think it would be wise to run it for 1000 or so and then pause for couple days just to see if something odd has happened.-- S Philbrick (Talk)  20:58, 26 January 2017 (UTC)
 * Generally when approving massive jobs like this I do it with a ramp-up throttle (e.g. 1000 edits, 24 hr wait, 2000 edits, 24 hour wait) with increasing large steps depending on the max size (last step is "open"). — xaosflux  Talk 00:00, 27 January 2017 (UTC)
 * Sounds good. Thanks.-- S Philbrick (Talk)  01:05, 27 January 2017 (UTC)


 * Support Clear-cut and helpful. My only feedback would be to change this from a one-time run to a large initial run, and then periodically after, as there will certainly be non-https links re-introduced by editors, either due to bad copy/pastes or restoring stuff from page history, etc. Avic ennasis @ 22:26, 28 Tevet 5777 / 22:26, 26 January 2017 (UTC)
 * I don't think that will be necessary, since nytimes.com is now HTTPS-by-default, so all copy-pasted URL will be HTTPS from now on. --bender235 (talk) 23:37, 26 January 2017 (UTC)
 * All "good" 'copy-pasted URL will be HTTPS from now on', but that wasn't my example. An editor who copied from "://www.foo.com" by accident, might just type in http at the beginning of that URL, vs. going back and re-copy/pasting. And that still doesn't address stuff pulled from history and the like. But, that's your call to make. I still support it either way. Avic ennasis @ 02:57, 29 Tevet 5777 / 02:57, 27 January 2017 (UTC)
 * I don't think there will be more than a few cases. I'll keep an eye on it. --bender235 (talk) 19:48, 28 January 2017 (UTC)


 * Support I can't find any issues with this proposal, overall net positive. Iazyges   Consermonor   Opus meum  04:19, 27 January 2017 (UTC)


 * Support I don't see any problems. HTTPS is important for the privacy and security of our readers. Thank you for the good work ! Tony Tan · talk  05:31, 29 January 2017 (UTC)


 * Please update with results. — xaosflux  Talk 20:11, 30 January 2017 (UTC)


 * Edit history again here. --bender235 (talk) 06:44, 31 January 2017 (UTC)


 * Task approved with ramp up schedule:
 * 1000 edit, 24hr hold (already completed above)
 * 2000 edits, 24hr hold
 * 3000 edits, 24hr hold
 * 5000 edits, 24hr hold
 * Open editing.


 * Should there be any minor issues brought up during the ramp up that are easily correctable, make corrections and restart the ramp up schedule. You may certainly use a slower ramp up schedule at your discretion. —  xaosflux  Talk 16:32, 3 February 2017 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.