Wikipedia:Bots/Requests for approval/MalnadachBot 13


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard. The result of the discussion was

MalnadachBot 13
Operator:

Time filed: 06:07, Saturday, June 11, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AutoWikiBrowser

Source code available: AWB, regexes given below, query/64398

Function overview: Blank inactive talkpages of inactive IPs which are not currently blocked and replace it with Blanked IP talk

Links to relevant discussions (where appropriate): Community consensus was established at Village pump (proposals) (permanent link)

Edit period(s): One time run

Estimated number of pages affected: at least 1.5 million, exact number unknown

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: The bot will edit IP talkpages which meet the following conditions -
 * 1) The IP talkpage has not received edits in the last 5 years
 * 2) The IP address is not currently blocked (including range blocks)
 * 3) There have been no edits from the IP address in the last 5 years

List of pages that meet this criteria will be fetched using query/64398. Since there are millions of IP addresses to check, I will be fetching pages by targeting smaller range of IPs at a time so that the query will not time out.

The pages in the list will be matched using AWB's find and replace in advanced mode. The regex used is  → . This regex will match everything and replace it with nothing, thereby blanking the page. Then AWB's append function is used to add Blanked IP talk and the edit will be saved.

query/64398 takes a long time to execute and there is an alternate way of fetching pages over a broader range. This will be a backup documented for the purpose of this BRFA and I do not expect to use it much.
 * Alternate way to get list of pages

This involves using query/64414, query/64388 and User:MalnadachBot/expand ip.py. query/64414 gives list of IP talkpages which have received no edits in the last 5 years and when there has been no edit from the IP in the last 5 years. query/64388 gives a list of blocked IPs address (including IP ranges), the result of this will be fed to expand_ip.py so that I can get all individual IPs that are between range blocks. Then I will use AWB's list comparator to get A ∩ B' of query/64414 and the expanded IP list, i.e inactive IP talkpages of inactive IPs which are not currently blocked. This final list will then be processed by the same find/replace and append procedure as descried above.

Discussion

 * Comment: I notice that the first criterion here (no edits in the last 5 years) is different from the RFC's criterion (Have not received any messages in the last 5 years). I suspect that there are many IP talk pages that meet the RFC criteria but do not meet the bot's proposed criteria, because a bot or gnome has come by to tidy the page sometime in the last five years. I don't know if it is possible to exclude these tidying edits somehow, but if so, it would probably lead to a larger pool of pages to be cleaned up. I support the approval of this task, whichever set of criteria it operates under. This comment should not be read as attempting to impede bot task approval in any way. – Jonesey95 (talk) 14:50, 11 June 2022 (UTC)
 * Yes, since this is a narrower criteria than what there is consensus for, I don't expect it to be a problem. The thing is quarry already struggles to generate this list of pages, trying to exclude gnome edits will make it harder. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 16:13, 11 June 2022 (UTC)
 * I imagine as the total number of pages quarry returns reduces it would be easier to then craft something for excluding gnome edits? -- The SandDoctor Talk 15:29, 19 June 2022 (UTC)
 * Yeah, I expect it will be easier after some time. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 16:46, 19 June 2022 (UTC)


 * -- The SandDoctor Talk 15:31, 19 June 2022 (UTC)
 * 50 edits. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 12:58, 20 June 2022 (UTC)
 * Comment/Praise: Thank you for publishing everything so that it was easy to follow along. The code you posted wmcloud was a great introduction to that system for me so thanks for that. Did you run into any problems with running this task? It's entirely my own interest as I'm getting started with AWB and writing some code for my own bot.  Dr vulpes  (💬 • 📝) 22:56, 21 June 2022 (UTC)
 * Thanks. The actual operation performed on a page in this task is very simple - blank the page and add a template. The complicated part is in fetching the list of pages since it will have to filter from millions of IP addresses. As said above, quarry currently cannot do that, so I am getting the list from small ranges at a time. Once the number of IP talkpages with no edits in 5 years has decreased, it will be easier. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 04:15, 22 June 2022 (UTC)

Under normal circumstances, I would prefer to leave the close for someone else. However, given the backlog, lack of recent BAG activity (myself included), and the fact that this task is uncontroversial and based on how well the trial went, I am inclined to make an exception for this. As per usual, if amendments to - or clarifications regarding - this approval are needed, please start a discussion on the talk page and ping. -- The SandDoctor Talk 18:14, 9 July 2022 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard.