Wikipedia:Bots/Requests for approval/KadaneBot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

KadaneBot 3
Operator:

Time filed: 16:10, Tuesday, March 19, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Not published yet

Function overview: Tags redirects with R to disambiguation page, R from unnecessary disambiguation, and R from incomplete disambiguation if it meets criteria described in function details.

Links to relevant discussions (where appropriate): Bot_requests

Edit period(s): Monthly

Estimated number of pages affected: ~56,417 first run

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: '''Note: This BRFA only covers the functionality mentioned in Case 2. Case 1 and Case 3 have been stricken'''

where bar does not equal disambiguation AND Foo is NOT a disambiguation page, then tag Foo (bar) with R from unnecessary disambiguation Currently 39,963 articles fit this case

where bar does not equal disambiguation AND Foo is IS a disambiguation page then tag with R from incomplete disambiguation. Currently 16,427 articles fit this case

AND Foo is a disambiguation page AND Foo (disambiguation) is NOT malformed, then tag Foo (bar) with R to disambiguation page Currently 27 articles fit this case

The following functionality/logic exists for all 3 cases:
 * If the redirect page is already tagged R with possibilities, R to disambiguation page, R from unnecessary disambiguation, or R from incomplete disambiguation skip
 * If the redirect page is in Category:Printworthy redirects skip
 * For Case 2: If these templates are present replace with R from incomplete disambiguation.
 * If a redirect exists  and disambiguation is malformed log to User:KadaneBot/Task3/Malformed disambiguations
 * In any case that results in adding a redirect template to a page, if there will be 2 or more redirect templates nest tags in Redirect category shell.

Discussion

 * A sample of 1000 edits the bot would make (under current functional details) along with the template it would add to the page is listed at User:KadaneBot/Sandbox Kadane (talk) 16:11, 19 March 2019 (UTC)

Comment The following should be tagged as R from incomplete disambiguation instead of R from unnecessary disambiguation


 * 11th Division (United States) → 11th Division
 * 13 (EP) → 13
 * 16th Brigade (United States) → 16th Brigade
 * 16th Brigade (United Kingdom) → 16th Brigade
 * 23 (EP) → 23
 * 24 Hours (song) → 24 Hours
 * 3.0 (album) → 3.0
 * 34th Street station (Philadelphia) → 34th Street station
 * 40th Street (SEPTA Subway-Surface Trolley station) → 40th Street
 * 40th Street (SEPTA Subway–Surface Trolley station) → 40th Street
 * 49th Street (SEPTA station) → 49th Street
 * 63rd Street station (Philadelphia) → 63rd Street station
 * A10 (bus route) → A10

Those can be identified by the landing page being a disambiguation page.

This one should be skipped, or tagged with something else (investigating)
 * AI for Good (disambiguation) → AI for Good

These ones should be skipped as malformed DAB pages (missing space, capital D), but collecting them so they can be RFD's would be good.
 * 212th Division(disambiguation) → 212th Division
 * 2nd Avenue (Disambiguation) → 2nd Avenue
 * A&B (Disambiguation) → A&B

Headbomb {t · c · p · b} 17:11, 19 March 2019 (UTC)
 * Okay I have updated the functional details of the bot to fix the cases you brought up. I will update the table of edits when I make it home. Kadane (talk) 19:23, 19 March 2019 (UTC)
 * I have uploaded new edits to User:KadaneBot/Sandbox. It contains 100 edits of each of the cases, with the exception of R to disambiguation page which only has 22 edits total. I have also included all of the malformed disambiguation pages (these will not be modified by the bot, just included in the log). Kadane (talk) 05:48, 20 March 2019 (UTC)

Better, although Should be tagged with R from incomplete disambiguation instead of R from unnecessary disambiguation. Headbomb {t · c · p · b} 09:31, 20 March 2019 (UTC)
 * 02 (album) → 02
 * 03 (album) → 03
 * 1. Liga (football) → 1. Liga
 * 118th Regiment of Foot (1761) → 118th Regiment of Foot
 * - There was an error in my CSV parsing from the database dump. I forgot to set the parameter, which resulted in some lines being skipped when the database query was being scanned. Because of this some articles and disambiguation pages were being ignored. This is fixed now. I clicked through most of the cases and I can't find any errors. User:KadaneBot/Sandbox is updated. Kadane (talk) 15:17, 20 March 2019 (UTC)


 * Of all cases, the following aren't really disambiguation pages.

Maybe a full list should be created so we can purge all cases that shouldn't be tagged. Everything else look fine though. Headbomb {t · c · p · b} 18:03, 20 March 2019 (UTC)
 * .hack//G.U. (Volume 1: Rebirth) → .hack//G.U.
 * 112th Special Operations Signal Battalion (Airborne) → 112th Special Operations Signal Battalion
 * 104th Regiment Royal Artillery (Volunteers) → 104th Regiment Royal Artillery
 * 105th Regiment Royal Artillery (Volunteers) → 105th Regiment Royal Artillery
 * To save time, that full list to review could exclude things that end in  since those are safe. Headbomb {t · c · p · b} 21:02, 20 March 2019 (UTC)

Alright all edits have been saved with the of the articles that end in what you listed above removed. Kadane (talk) 21:52, 20 March 2019 (UTC)
 * See
 * See
 * See
 * Case 3 are all fine, I'll review Case 1 and 2. Headbomb {t · c · p · b} 22:09, 20 March 2019 (UTC)
 * Actually Always(song)) and a few others with )) are malformed. Headbomb {t · c · p · b} 22:12, 20 March 2019 (UTC)

So are


 * Ahmed Ali(footballer)
 * Always(song))
 * Blinded(movie)
 * Chris Collins(Politician)
 * City of Angels(TV Show)
 * Daredevil(comics)
 * Everlasting(BoA)
 * Expm(x)
 * Marlborough(car)
 * Molly(fish)
 * One in a Million(TV series)
 * Paul Hamilton(Footballer)
 * Point(unit)
 * Reckless(2010 novel)
 * Relentless(CD)
 * The Brothers(TV Series)

Headbomb {t · c · p · b} 22:19, 20 March 2019 (UTC)
 * Ah I was under the impression that we only checked malformed disambig on case 3 (when name ends with (disambiguation)). Updated the logic to check for malformed disambigs for all cases. Kadane (talk) 22:37, 20 March 2019 (UTC)

There are actually a few more, which I've sent to RFD.


 * CCI (Prison disambiguation)
 * Euso (disambugation)
 * First Army (Poland - disambiguation)
 * Gary (disambiguation page)
 * Gradius 2 (Disambiguition)
 * Gradius 2 (disambiguition)
 * Lake Mamacocha (dosambiguation)
 * Lancaster (disambiguation page)
 * Le Mont (disambigution)
 * Momochi (disambuiguation)
 * Pook (disambiguaton)
 * Rizwan Ahmed (disambiguation page)
 * Roger Graham (disambituation)
 * Sarah Palmer (disambiguation page)
 * Shirani, Iran (dismabiguation)
 * Social Justice Coalition (disambiguation page)
 * St. Thomas' Church (disambigaution)
 * Ten (album disambiguation)
 * Tiw (disabiguation)
 * Upstage (disabiguation)
 * Victoria (geographical disambiguation)
 * Wanne (isambiguation)

Headbomb {t · c · p · b} 22:49, 20 March 2019 (UTC)

, actually could you break User:KadaneBot/Task3/Case 1 in sections of 100 KB tops? Those pages are pretty slow to load/edit (I have scripts that classify type of links, which slow down these pages considerably). Headbomb {t · c · p · b} 23:06, 20 March 2019 (UTC)
 * ✅ Also I am catching disambiguation misspellings as well as other words appearing next to disambiguation between parenthesis. If there are any other misspellings they should probably be excluded manually unless there is a pattern. Kadane (talk) 23:15, 20 March 2019 (UTC)

Could you also break down redirects into 'species', e.g. all those ending with \s\(*album\) into a subpage (or section), all those ending with \s(*song\) into another, and so on (and everything else considered "Other")? At least for endings in All case insensitive. Headbomb {t · c · p · b} 23:18, 20 March 2019 (UTC)
 * \d (i.e. ends with digits, like Typhoon Haikui (2012)); album; AM; band; book; channel; comics; company; company; cricketer; decade; district; EP; episode; film; FM; footballer; game; gene; Germany; German Empire; journal; magazine; name; network; newspaper; novel; number; numeral; politician; publisher; series; show; single; song; soundtrack; station; United States; video; website
 * and could you also put the target page in those lists? Headbomb {t · c · p · b} 23:21, 20 March 2019 (UTC)
 * I am on my way to class but I can do that in a couple hours. Kadane (talk) 23:23, 20 March 2019 (UTC)
 * No rush. Enjoy class. Headbomb {t · c · p · b} 23:24, 20 March 2019 (UTC)
 * any update? Headbomb {t · c · p · b} 20:41, 22 March 2019 (UTC)
 * I got sick and fell behind. This is on my to do list today. Kadane (talk) 21:31, 22 March 2019 (UTC)

Okay all edits have been sorted by 'species' and a list of all pages can be found here. Kadane (talk) 00:09, 23 March 2019 (UTC)

- Let's start with everything in User:KadaneBot/Task3/Edits/other/Case_3. This is something that could safely be automated. Make sure to run on the most version of the pages, since things may be updated. Headbomb {t · c · p · b} 00:11, 23 March 2019 (UTC)
 * - Come to find out Task 3 is already taken care of by RussBot and it ran through and tagged every article in case 3 with R to disambiguation. I could run another database query to see if there are any cases that RussBot has missed, but a task for case 3 seems redundant. What do you think?
 * Also I made 1 trial edit|edit which resulted in an error because of a misplaced quotation mark in my code. Going forward it will check (correctly) to see if the category has been added since the last database scan. Kadane (talk) 01:20, 23 March 2019 (UTC)
 * If Case 3 is taken care of by RussBot, then let's leave it to RussBot. We can revisit this if RussBot goes dead. Let's trial case 2 on everything in User:KadaneBot/Task3/Edits/newspaper/Case 2 then. Headbomb {t · c · p · b} 01:23, 23 March 2019 (UTC)

Okay. I found another error in my code for case 2 that resulted in articles that were already tagged being reported in the edit cases. I have fixed that bug and it has resulted in a large reduction of edits case 2. This error only affected the database scan and was caught during editing when the algorithm double checks it should edit.

I have completed the trial edits. The rest were false positives. I am hesitant to mark the trial as done with only 3 edits.

May I suggest trialing either User:KadaneBot/Task3/Edits/cricketer/Case 2 (135 edits), User:KadaneBot/Task3/Edits/footballer/Case 2 (60 edits), or User:KadaneBot/Task3/Edits/politician/Case 2 (40 edits)? Kadane (talk) 01:47, 23 March 2019 (UTC)
 * I picked that category on purpose to see how it would handle those cases and not blow everything up. Side note // this is a much much better format. And while you don't have to do this, when making edits, you might as well add if you find a #Whatever in the redirect. Headbomb {t · c · p · b} 01:51, 23 March 2019 (UTC)
 * For a follow up trial, you can do 25 edits in User:KadaneBot/Task3/Edits/other/Case_2/1. Headbomb {t · c · p · b} 01:59, 23 March 2019 (UTC)


 * - All edits are here . There was one error, which added R from section when it shouldn't have. I fixed this and subsequently tested it . The whitespace looks off, but that is because the template Redirect category shell already exists and the white space was already malformed from my removal. The bot also edited from another 'species' , , , , and . This was operator error. My database isn't structured by species and the view and edit code are separate. I had to introduce new code to just edit the 'other' species since there is no specific regex for an article that fits into other. Kadane (talk) 03:10, 23 March 2019 (UTC)
 * You can do the rest of User:KadaneBot/Task3/Edits/other/Case_2/1/User:KadaneBot/Task3/Edits/other/Case_2/1 to see if all the kinks are worked out. Headbomb {t · c · p · b} 03:14, 23 March 2019 (UTC)

Small whitespace issues:,. Headbomb {t · c · p · b} 04:55, 23 March 2019 (UTC)
 * Dupe disambiguation category:, . Also . Headbomb {t · c · p · b} 05:00, 23 March 2019 (UTC)
 * Weird R catshell thing. Headbomb {t · c · p · b} 05:02, 23 March 2019 (UTC)
 * Missed an R catshell opportunity . Headbomb {t · c · p · b} 05:07, 23 March 2019 (UTC)
 * the those with 'alternative' dabs should be likely be skipped. Or compiled in a seperate list for human review. Headbomb {t · c · p · b} 05:11, 23 March 2019 (UTC)
 * should remove the dupe category for incomplete dabs. Headbomb {t · c · p · b} 05:17, 23 March 2019 (UTC)


 * Okay I have implemented logic to fix everything you have put here so far except for the whitespace issue. I am not quite sure how to fix that using MWParserFromHell. It only affects a small number of pages, if this is something that needs to be fixed I will figure something out in the coming days. Kadane (talk) 05:21, 23 March 2019 (UTC)


 * One more: (see all aliases) Headbomb {t · c · p · b} 05:23, 23 March 2019 (UTC)

For the whitespace issue, I think you can have something similar to  →   and   →. Headbomb {t · c · p · b} 05:29, 23 March 2019 (UTC)


 * if you're ready to continue trial, you can tackle User:KadaneBot/Task3/Edits/other/Case_2/3. Headbomb {t · c · p · b} 23:43, 27 March 2019 (UTC)
 * Okay everything is ready. I have several deadlines in the coming days and will run the trial when real life permits. Should be no later than Saturday 6th and I am hoping that it's much earlier than that. Kadane (talk) 01:16, 28 March 2019 (UTC)


 * Here are the edits from the bot trial. I started the trial off on an old version of the source which resulted in an error in the first 5 edits. I reverted this edit, restarted, and the bot worked as expected . Also during the trial I realized that there may be an issue with and . The bot will now skip pages in Category:Printworthy redirects or containing the template R with possibilities. I have updated the functional details. Kadane (talk) 00:08, 15 April 2019 (UTC)
 * Looks all good to me. Could you update the function overview section to reflect what the BRFA is for 'case 2' only? I'll approve after. &#32; Headbomb {t · c · p · b} 16:54, 15 April 2019 (UTC)
 * done. Kadane (talk) 17:02, 15 April 2019 (UTC)
 * &#32; Headbomb {t · c · p · b} 19:27, 15 April 2019 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.