Wikipedia:Bots/Requests for approval/HasteurBot 10


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

HasteurBot 10
Operator:

Time filed: 00:29, Thursday, June 11, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Pywikibot with special driver above it. Driver file is

Function overview: Source referenced in many pages has relocated to a new server and changed their page location format. The new format is somewhat predictable but requires poking to figure out which day the old page points at.

Links to relevant discussions (where appropriate): Bot_requests

Edit period(s): One time run, but may need to be ran again if a large collection of new links pops up.

Estimated number of pages affected: Accorging to the requesting user, 136 pages.

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Bot asks for all the weblinks that are *.okazu.blogspot.com, It then goes through to evaluate if the page should be adjusted and what exemptions are appropriate. Exempt pages include BotReq, User pages, and any page that is an "Archive". Once the exemptions are dealt with we gather the text of the page and do a regex to find any string where the site is mentioned (extracting the year, month, and nomnitive title). We create a compound key based on the 3 pieces of information and look it up to see if we've already searched for that reference in the new site, and if so, don't bother asking the site again for the same information. If we haven't found the new location of the reference, we go and brute force ask the site "For this Year, Month, Day, and Title, do you have a page?" The site will return a 404 if we haven't guessed right, but returns a 200 when we do guess right. We store the successes url into our cache of already searched for replacements and return the new url so that the string can be replaced in the text. The last step would be to save the page with an appropriate notice (Something to the effect of "HasteurBot 10 Replacing okazu.blogspot.com refs with yuricon.com equivilants"). Once the bot task is ran, there should be no need to run it again as the maintenance levels will be much more managable. This task is not exclusion eligible as we're fixing links to make them point at something that is working correctly.

Discussion
CC as the editor primarily championing this cause. Hasteur (talk) 00:35, 11 June 2015 (UTC)
 * Thanks. ··· 日本穣 ? · 投稿  · Talk to Nihonjoe ·  Join WP Japan ! 00:51, 11 June 2015 (UTC)

The number of edits is small. Let's give it a try. -- Magioladitis (talk) 22:20, 11 June 2015 (UTC)

-- Magioladitis (talk) 22:20, 11 June 2015 (UTC)
 * In order:
 * - Wrong configuration, undid in the very next revision manually
 * - Correct
 * - Partially correct. The remainders are where the short name has changed. We can do better than this
 * At this point I stopped my run and started poking into how the user did the remapping and found that they have a blogger2wordpress addon that gets the new home of the content. Hasteur (talk) 23:00, 11 June 2015 (UTC)
 * Second Test Run
 * - Cleans up the Hana no Asuka-gumi! talk page
 * - Correct (changes some of the archive links which I'm not wild about)
 * - Correct
 * - Correct
 * At this point I think I've provided a good second demonstration. Hasteur (talk) 23:18, 11 June 2015 (UTC)

Hasteur do you think it has to be automated then then review every single edit then? -- Magioladitis (talk) 06:39, 12 June 2015 (UTC)
 * As part of a bot trial, I always review every single diff (as I'm a perfectionist). I feel that it could run unattended, but having a log page of every diff this task makes so that a human set of eyes can review them would be wise. Obviously it's up to  if this is something to be added to the request. If there are concerns about it running 100% unattended I can kick the task off and review each replacement to verify that it's not doing anything unintentional. Hasteur (talk) 11:51, 12 June 2015 (UTC)
 * I think it would be good to have a log page for the task so it can easily be reviewed. I trust Hasteur, though, and if he is comfortable the bot is going to do exactly what was requested, I'm fine either way. ··· 日本穣 ? · 投稿  · Talk to Nihonjoe ·  Join WP Japan ! 19:16, 12 June 2015 (UTC)

Hasteur Let's complete the bot trial. If everything works fine. I can approve for fully automated run. I'll need you around because they are more links taht need fix. -- Magioladitis (talk) 11:57, 12 June 2015 (UTC)
 * New trial revealed Wikipedia:Peer review/Kashimashi: Girl Meets Girl/archive1 which the bot dutifly tried to work on. I reversed the change and added an condition where the page title has "Archive" or "archive" in it's title. Updated the bot's code here. Hasteur (talk) 15:00, 13 June 2015 (UTC)
 * Encountered Articles for deletion/GirlFriends (manga) automatically. Reversed the bot's actions and put a special guard against AfD discussions. Coded in here. Hasteur (talk) 15:10, 13 June 2015 (UTC)
 * Ok, after 24 edits (and a few corrections from suprises) I am standing down and waiting for feedback. I think doing the gruntwork of the replacements (but having each replacement verified by me before the bot moves on) is a reasonable compromise between the needs of WP and the needs of the editors/readers at large. Hasteur (talk) 15:18, 13 June 2015 (UTC)

Hasteur I trust you to check th edits while the bot is running or after the bot is done. It's clear that are some edge cases e did not cover wth the bot trial. -- Magioladitis (talk) 20:56, 13 June 2015 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.