Wikipedia:Bots/Requests for approval/WikiCleanerBot 5


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

WikiCleanerBot 5
Operator:

Time filed: 08:49, Saturday, June 15, 2019 (UTC)

Function overview: Fix some WP:WCW errors using WPCleaner

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (WPCleaner)

Source code available: On GitHub

Links to relevant discussions (where appropriate): Bots/Requests for approval/PkbwcgsBot

Edit period(s): Twice a month, with the dump analysis that I already perform, see Bots/Requests for approval/WikiCleanerBot.

Estimated number of pages affected: A few thousand articles for the initial runs spread over a few sessions, then normally only a few dozen or hundreds each time.

Namespace(s): Main

Exclusion compliant (Yes/No): Yes

Function details: As PkbwcgsBot hasn't been run for several months, I'd like to take over some of the tasks that Pkbwcgs was performing with WPCleaner. This request is a part of Bots/Requests for approval/PkbwcgsBot. It includes automatically fixing part of some WP:WCW errors:
 * : tags with incorrect syntax. The list of articles that the bot will check comes from CheckWiki list #2 (currently 617 articles) and from CHECKWIKI/WPC 002 dump (currently 725 articles): only some articles will be fixed, only the simple ones (like false  tags).
 * : unicode control characters. The list of articles that the bot will check comes from CheckWiki list #16 (currently 2508 articles): only some articles will be fixed, only the simple ones.
 * : category duplication. The list of articles that the bot will check comes from CheckWiki list #17 (currently 6328 articles) and from CHECKWIKI/WPC 017 dump (currently 8449 articles): only some articles will be fixed, only the simple ones (like exact category duplication with same sort key). For example, on the first 100 articles in CHECKWIKI/WPC 017 dump, 70 are modified.
 * : tags without content. The list of articles that the bot will check comes from CheckWiki list #85 (currently 831 articles): only some articles will be fixed, only the simple ones.
 * : DEFAULTSORT with a blank at first position. The list of articles that the bot will check comes from CheckWiki list #88 (currently 349 articles): only some articles will be fixed, only the simple ones.
 * : internal link written as an external link. The list of articles that the bot will check comes from CheckWiki list #90 (currently 5715 articles): only some articles will be fixed, only the simple ones.
 * : interwiki link written as an external link. The list of articles that the bot will check comes from CheckWiki list #91 (currently 2100 articles): only some articles will be fixed, only the simple ones.

Discussion
Please run 20 edits for each proposed task. Primefac (talk) 12:31, 15 June 2019 (UTC)
 * Thanks ! Here are the results:
 * (tags with incorrect syntax): 20 edits, no problems detected.
 * (unicode control characters): 20 edits, no problems detected.
 * (category duplication): 20 edits, no problems detected.
 * (tags without content): 20 edits. Wondering what I should do when there are comments inside the tag without content (gallery tags: Ana Vidjen, Andrews County Veterans Memorial, Battle of Naseby, Catherine Marks ; noinclude tags: Barnet Copthall): either keep the automatic fix as it is now, or comment the tag itself, or do nothing. Answer can be different depending on the tag.
 * With respect to commented-out markup I'd leave them alone in case the comment markup is ever removed (e.g if the file(s) is/are restored). Jo-Jo Eumerus (talk, contributions) 14:22, 16 June 2019 (UTC)
 * Jo-Jo Eumerus. To be on the safe side, I've modified WPC not to automatically remove tag without content when there are comments inside them. --NicoV (Talk on frwiki) 17:22, 17 June 2019 (UTC)
 * (DEFAULTSORT with a blank at first position): 20 edits, no problems detected.
 * (internal link written as an external link): 20 edits, no problems detected.
 * (interwiki link written as an external link):
 * 4 edits, a problem detected on the 4th edit on Azerbaijan State Philharmonic Hall. I've modified WPC not to automatically replace the external link when it's not surrounded by square brackets.
 * 6 edits, a problem detected on the 6th edit on Counties of Norway. I've modified WPC not to automatically replace the external link when there's no text provided.
 * 10 edits, no problems detected.
 * . --NicoV (Talk on frwiki) 14:17, 15 June 2019 (UTC)
 * BAG assistance needed --NicoV (Talk on frwiki) 13:57, 25 July 2019 (UTC)
 * 20 edits each for and . &#32; Headbomb {t · c · p · b} 04:07, 6 August 2019 (UTC)
 * Thanks Headbomb. Here are the results:
 * (tag without content): 20 more edits, no problems dectect.
 * (interwiki link written as an external link): 20 more edits, no problems dectect.
 * --NicoV (Talk on frwiki) 20:48, 6 August 2019 (UTC)
 * This would be much better than this. &#32; Headbomb {t · c · p · b} 21:01, 6 August 2019 (UTC)
 * I can also remove the carriage return if the empty tag was on the first line, and alone in the line, if you want. For other cases (not on the first line), there may be side effects with removing the carriage return. What do you say? --NicoV (Talk on frwiki) 21:22, 6 August 2019 (UTC)
 * Should be for otherwise empty lines only. &#32; Headbomb {t · c · p · b} 21:34, 6 August 2019 (UTC)
 * The problem is that it will change the display in some situations, see below. --NicoV (Talk on frwiki) 22:53, 6 August 2019 (UTC)
 * I've modified WPC to remove extra white lines (if there are 2 or more, or if they are the beginning or the end of the article). Result on the same article that you reported. --NicoV (Talk on frwiki) 19:32, 7 August 2019 (UTC)

Example:

Line 1 before noinclude tag &lt;noinclude>&lt;/noinclude> Line 2 after noinclude tag

Before removal of the empty tag:

Line 1 before noinclude tag Line 2 after noinclude tag

After removal of the empty tag (keeping the empty line): same display

Line 1 before noinclude tag

Line 2 after noinclude tag

After removal of the empty tag (removing the empty line): modified display

Line 1 before noinclude tag Line 2 after noinclude tag


 * 20 edits to see if that case is handled correctly in, and results in oddities otherwise. Have a mix of that case and others in the trial if possible. &#32; Headbomb {t · c · p · b} 20:10, 7 August 2019 (UTC)
 * Here are the new edits:
 * Feminism in Sweden: span tags in the middle of a sentence
 * FK Dubnica: gallery tags in their own lines
 * FC Epfendorf 1929: center tags in table cells
 * Eretz Yisrael Shelanu: div tags at the end of table
 * Elsa Cladera de Bravo: includeonly tags in the middle of a sentence
 * Dominic Fotia: gallery tags at the beginning of the article
 * Domadugu: div tags on their own lines
 * District of Columbia and United States Territories Quarter: noinclude tags at the beginning of the article
 * History of agriculture: div tags spanning on 2 lines
 * Hidden message: includeonly tags in the middle of a sentence
 * Heritage Day (South Africa): includeonly tags in the middle of a sentence
 * Heidi Quante: gallery tags spanning on 2 lines
 * H. M. Khoja: gallery tags spanning on 2 lines
 * God's Favorite Customer: includeonly tags at the beginning of the article
 * Geneva fusillade of 9 November 1932: span tags in the middle of a sentence
 * Fredericton shooting: includeonly tags at the beginning of a sentence
 * Frank Dorsa: gallery tags at the beginning of the article
 * Jacques Delors: includeonly tags at the beginning of the article
 * Jakobstad Museum: gallery tags at the end of the article
 * Isleworth Mona Lisa: includeonly tags at the beginning of the article
 * --NicoV (Talk on frwiki) 21:29, 7 August 2019 (UTC)

is technically cosmetic in many cases, but I feel it's editor-hostile enough to deal with it through a bot. Ping me if there's pushback on that task. &#32; Headbomb {t · c · p · b} 22:12, 7 August 2019 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.