Wikipedia:Bots/Requests for approval/Bender the Bot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Bender the Bot
Operator:

Time filed: 22:16, Friday, August 5, 2016 (UTC)

Automatic, Supervised, or Manual: Supervised Automatic

Programming language(s): AutoWikiBrowser

Source code available:

Function overview: HTTP → HTTPS conversion for Internet Archive links

Links to relevant discussions (where appropriate): Village pump (proposals)/Archive 127

Edit period(s): One time run

Estimated number of pages affected: unsure (I guess 50,000 but possibly 100,000+)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No):

Function details: Basically all it does is find
 * and

and replace with

The above listed WP:VPP discussion already determined that this is a useful endeavor for several reasons, but let me add another one: since Wikipedia is HTTPS-only, all outbound links to HTTP break the HTTP referer (per RFC 2616 §15.1.3). That means fixing these links is also in the Internet Archive's interest (if it wasn't already for their active encouragement to use HTTPS links). (Compare a related request by Newspapers.com to have their inbound links from Wikipedia switched from HTTP to HTTPS, a task that I already completed.)

I have been doing this task with my regular account so far, but suggested I should apply for bot approval with a new account. --bender235 (talk) 23:05, 5 August 2016 (UTC)

Discussion

 * Fully support this. As a note of history, Bender has been running this task semi-automatically for some time from his main account, but he's been occasionally hitting rates that were raising some eyebrows (30+ edits/min). He was very receptive to turning this into an automated task instead, and I don't think the edits from his main should reflect negatively on his competence to run a bot. The issue seemed to be a lack of awareness of WP:BOTASSIST rather than apathy toward the bot policy. ~ Rob 13 Talk 23:14, 5 August 2016 (UTC)
 * 30+ epm is high, even for a bot that is performing a lower urgency task like this. What sort of edit rate do you propose to run this bot at? —  xaosflux  Talk 02:41, 6 August 2016 (UTC)
 * At this speed, or preferably higher. Since it's a clear-cut task I don't see any damage that could be done (by false positives). Also, as for the "urgency", remember that  (even though there are probably 50k or more) is just the tip of the iceberg. There's also   (the Wayback Machine), which are probably more than a million links still using HTTP. As of today, the Internet Archive is basically blind in terms of the HTTP referer for all these links, and I wanted to finish this task at some point this year or next year. The good news is that there are no new HTTP links to Internet Archive being added, since now they redirect to HTTPS from the main page. That means this task does have a clear endpoint. It's only the "legacy links" that we need to take care of. --bender235 (talk) 14:02, 6 August 2016 (UTC)
 * It's extraordinarily unlikely you'll be approved for that edit speed, mostly because if all bots on the wiki operated at such speeds, there would be significant lag as a result. 6 edits/minute is the "norm" for non-urgent tasks and 12 edits/minute is a hard cap for urgent tasks. Keep in mind, though, that this can be 6 edits/minute 24/7/365, so it will go much faster than what you've been doing so far. At that constant rate of 6 EPM, you'd clear a quarter million edits each month, so you'd likely finish the task in January 2017 or thereabouts, right on your desired schedule. I can say for certain that 30 EPM will be declined, but even 12 EPM is asking a lot and really unnecessary, in my opinion. ~ Rob 13 Talk 23:45, 9 August 2016 (UTC)
 * Well, it's a simple task that requires minimal computing power. How can it be slowed down artificially if I'm using AWB? --bender235 (talk) 23:51, 9 August 2016 (UTC)
 * (a) It's about the server power, not your own computing power. Say we have 10 bots editing at 30 edits/minute. That alone is 300 edits/minute. To place that in perspective, I just checked, and in the last minute right now (basically primetime), the total number of edits was ~100, including bots, so allowing such a high throttle would increase the load on the servers by a factor of 4. We have a lot more than 10 bots. (b) There's an option in bot mode of AWB that lets you set a delay (in seconds) for the bot to wait after each edit. If you set the delay to ~8 seconds, you'd wind up with ~6 edits/minute (keeping in mind that it takes a couple seconds for the actual action of loading/saving the page). ~ Rob 13 Talk 23:59, 9 August 2016 (UTC)
 * Ok, that sounds reasonable. I didn't know AWB had this built-in option available. If a 8-sec delay makes it an edit frequency within the speed limits, I'm fine with that. --bender235 (talk) 00:03, 10 August 2016 (UTC)


 * - OK to trial - please post results below when done. — xaosflux  Talk 16:39, 12 August 2016 (UTC)
 * Added to WP:AWB/CP for trial. — xaosflux  Talk 16:41, 12 August 2016 (UTC)
 * Thanks. Two more questions though: running AWB as a bot, do I still apply "AWB general fixes" or not? And also, what exactly do you mean by "results"? Is there some log file I need to post here? --bender235 (talk) 17:51, 12 August 2016 (UTC)
 * As a general rule, I apply genfixes only if they're necessary for the task I'm doing (i.e. my find-and-replace is not robust for handling white space). If you don't need the genfixes, it's a good idea to keep them off. The somewhat-annoying thing about genfixes is that you're still responsible for your bots edits but any bugs for the genfixes are completely outside your control, so I consider it a best practice to avoid automating those unless you have a reason to do so. As for results, you can just link to the 100 contributions from your bot. Note that once you log into your bot account on AWB, you can click the "Bot" tab, check "Auto-save", set the delay, and type in a maximum number of edits (which should be 100 for this trial). Let me know if you have any questions about the bot mode. ~ Rob 13 Talk 17:56, 12 August 2016 (UTC)
 * Alright, thanks for the advice. Will run it now. --bender235 (talk) 17:59, 12 August 2016 (UTC)
 * Ok, finished 100 edits. --bender235 (talk) 18:43, 12 August 2016 (UTC)
 * Contributions, for reference. Haven't checked them yet, but I will shortly. Would you mind noting that you run User:Bender the Bot on your user page? That's typical practice and more-or-less expected under our rules for legitimate alternative accounts. ~ Rob 13 Talk 18:45, 12 August 2016 (UTC)

I pulled a sample of 10 edits and found no errors (being mindful to sample some of the ones that didn't have the same change in byte size). The regex Bender supplied is also obviously correct with no real edge cases to consider. Note that the bot removes "www.", but that has no effect on anything; we might want to keep that in, though, per typical practice in writing URLs? ~ Rob 13 Talk 18:48, 12 August 2016 (UTC)
 * Mentioned the bot on my user page.
 * As for the : note that http://www.archive.org/ redirects you to https://archive.org/. It seems as if they do not want you to use the subdomain, unless it's   for the Wayback Machine. --bender235 (talk) 18:53, 12 August 2016 (UTC)
 * Fair point, thanks. ~ Rob 13 Talk 19:06, 12 August 2016 (UTC)
 * Another thing: sometimes there were links like:
 * UCSF Tobacco Industry Videos Collection via Internet Archive
 * where the second link is just IA's main page. I feel like all these things should be converted to a Wiki link, like
 * Is that something I could add to the bot's functionality? --bender235 (talk) 19:18, 12 August 2016 (UTC)
 * Given the scale of this task, anything you do here is going to come under lots of scrutiny. I'd recommend possibly addressing that at a later time with another bot task. There's no reason to complicate a simple but high-edit task with something like that, in my opinion. Dividing and conquering is best, especially when starting out as a botop. ~ Rob 13 Talk 19:20, 12 August 2016 (UTC)
 * Fair enough. I'll keep it in mind though, for future bot tasks. --bender235 (talk) 19:26, 12 August 2016 (UTC)
 * after the test run, do I have permission to run this bot now? --bender235 (talk) 21:51, 13 August 2016 (UTC)
 * Fair enough. I'll keep it in mind though, for future bot tasks. --bender235 (talk) 19:26, 12 August 2016 (UTC)
 * after the test run, do I have permission to run this bot now? --bender235 (talk) 21:51, 13 August 2016 (UTC)

As discussed. One extra thing: I'd like for you to change the edit summary to clarify that the only http->https transition being made is for archive.org. —  Earwig   talk 20:16, 16 August 2016 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.