Wikipedia:Bots/Requests for approval/WikiCleanerBot 16


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

WikiCleanerBot 16
Operator:

Time filed: 14:49, Saturday, April 25, 2020 (UTC)

Function overview: Do edit for fixing (Headline double).

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (WPCleaner)

Source code available: On GitHub (especially algorithm 92)

Links to relevant discussions (where appropriate): Wikipedia_talk:WikiProject_Check_Wikipedia

Edit period(s): Twice a month

Estimated number of pages affected: At first, I included only pages from Main in CHECKWIKI/WPC 092 dump‎ (a dry run on the 11598 pages results in the modification of 386 pages). After, I included also pages from File in the dump analysis, but keeping only articles where duplicate headings were consecutive (a dry run on the 12040 pages results in the modification of 5455 pages).

Namespace(s): Main + File

Exclusion compliant (Yes/No): Yes

Function details: The bot will remove some of the useless headlines that are doubled in some articles, if they are consecutive.

I already run a similar task on frwiki with 23 edits in Main and around 200 edits in File.

Discussion
Couple of questions
 * a) If this task is "off" at WP:CWERRORS, why do we need a bot to run this task?
 * b) Will the bot only remove headers where it's ==&lt;header>== &lt;whitespace> ==&lt;header>==?

Primefac (talk) 19:05, 11 May 2020 (UTC)
 * Hi Primefac.
 * a) I think the task is currently "off" at WP:CWERRORS because it was bringing too much false positives. WPCleaner detection is restricted to consecutive titles and a maximum level of 3 for the titles.
 * Activating again this detection was requested by Jonteemil in this discussion.
 * I tested this detection on frwiki, and all the pages reported had actual problems with the headlines or the content (various situations). I fixed all of them, either automatically for simple situations (most of the pages in File: were in such situations like here it seems), or manually for the others.
 * b) The bot will only remove headers for non-ambiguous situations, leaving more complex situations for humans to fix.
 * Non-ambiguous situations can also include things like ==&lt;header>== &lt;text> ==&lt;header>== &lt;text> &lt;other_text> (both sections have the same content, or one section has the same text as the other section + other text after), but it's less frequent.
 * --NicoV (Talk on frwiki) 10:48, 12 May 2020 (UTC)


 * Primefac (talk) 18:01, 22 May 2020 (UTC)
 * . Thanks, I've done 50 edits. I didn't see any problems in the edits. Fixes also take into account cases like :
 * is simplified into  : 1922 Manitoba general election, ...
 * --NicoV (Talk on frwiki) 15:24, 24 May 2020 (UTC)

As per usual, if amendments to - or clarifications regarding - this approval are needed, please start a discussion on the talk page and ping. -- The SandDoctor Talk 05:30, 27 May 2020 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.