Wikipedia:Bots/Requests for approval/WikiCleanerBot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

WikiCleanerBot 2
Operator:

Time filed: 17:25, Monday, February 25, 2019 (UTC)

Function overview: To fix ISSN with an incorrect syntax. As described in ISSN, the correct syntax for an ISSN is "an eight digit code, divided by a hyphen into two four-digit numbers"

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (WPCleaner)

Source code available: On Github

Links to relevant discussions (where appropriate): Maintenance task for

Edit period(s): At most, twice a month, following the dump analysis that I already perform, see Bots/Requests for approval/WikiCleanerBot.

Estimated number of pages affected: Around a thousand At most a few hundred pages for the first complete run (pages with such problems are listed in CHECKWIKI/WPC 106 dump, which currently contains a list of 1315 420 pages), and probably no more than a few dozen after that on each run given the evolution of the number of pages in the list.

Namespace(s): Main namespace

Exclusion compliant (Yes/No): No, because there's no reason to use an incorrect syntax for an ISSN instead of the correct one.

Function details: Based on the list generated on CHECKWIKI/WPC 106 dump, the bot will only fix trivial problems (like a missing hyphen in the ISSN number, extra whitespace characters...) and will leave the more complex ones to be fixed by a human. It will reduced a lot the list, so human editors can fix the remaining problems.

For the bot flag, I currently don't have it, and I would like to keep it that way (or if need be, only added temporarily for the first run).

Discussion
If you will be operating from the dump, could you not do a dry run outputting to CHECKWIKI/WPC 106 dump so its handling of the pathological cases there can be inspected? --Xover (talk) 17:48, 25 February 2019 (UTC)
 * Hi Xover. The dump analysis is performed independently and produces several analysis (CHECKWIKI/WPC all), I would prefer to keep it separated from automatic fixing. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)
 * But if you want to know which pages won't be fixed by the bot, I can do a dry run on my computer and give the list of fixed pages. --NicoV (Talk on frwiki) 18:06, 25 February 2019 (UTC)
 * I was more interested in seeing the before→after list. Several of the instances listed in the WPC 106 dump looked like they would be hard to fix automatically, so if the output of a dry run could be inspected it might provide a priori confidence that the task won't mess anything up. A dry run might be more efficient / reduce the need for a trial period with live edits (but I speak only for myself: the BAG may see it differently). --Xover (talk) 18:24, 25 February 2019 (UTC)
 * Ok, I understand. I will see if I can do something. The idea is to fix only trivial cases automatically, the hard ones will be left to human editors, and I will check what the results are before doing an actual run. --NicoV (Talk on frwiki) 09:28, 26 February 2019 (UTC)

Comment: The dump list appears to have some false positives on it. I picked one page at random, Pocket Dwellers, and there is an ISSN of 00062510 listed within a citation template. This ISSN is valid within a CS1 template; articles with invalid ISSNs are placed in. The template handles this unhyphenated ISSN format with no trouble, displaying properly with a hyphen. It should not be "corrected"; the bot would be making a cosmetic edit, leaving the rendered page unchanged. Perhaps the dump analysis should be corrected before this bot attempts to modify articles based on the list. – Jonesey95 (talk) 17:56, 25 February 2019 (UTC)
 * Hi Jonesey95. On other wikis like frwiki, the templates don't add the hyphen by themselves. If ISSN without the missing hyphen have to be considered correct on enwiki for some templates, then I will first need to add an option in WPCleaner for this (and then generate again the page CHECKWIKI/WPC 106 dump to check that false positives are removed) before implementing the automatic replacement. I will post here when this part is done. --NicoV (Talk on frwiki) 18:05, 25 February 2019 (UTC)
 * Thanks. It looks like ISSN does not add the hyphen, but the CS1 citation templates do so. Just to see if I had gotten unlucky, I picked four more articles at semi-random from the list, limiting my "random" choices to articles that were displaying eight digits as the erroneous string. All four articles: Acritogramma metaleuca, Capri (cigarette), David Mba, and Ensoniq VFX contain no ISSN errors. I believe that the dump analysis needs to be debugged before this task can be run. It is possibly telling that there are only 65 pages in the three ISSN error categories combined. – Jonesey95 (talk) 18:16, 25 February 2019 (UTC)
 * Jonesey95. I've modified my code to allow telling WPCleaner that some templates automatically add the hyphen if it's missing, so the articles you mentionned won't be reported anymore. I'm currently running an update of CHECKWIKI/WPC 106 dump to see what will be left. --NicoV (Talk on frwiki) 09:24, 26 February 2019 (UTC)

Page CHECKWIKI/WPC 106 dump has been updated to avoid reporting missing dash when the template automatically adds it to the displayed result, there are only 420 pages remaining compared to the 1315 initially. I could probably also remove reports for internal links to pages like ISSN 1175-5326 which exist, but even if they are reported, the bot won't fix anything there. With the current algorithm, a dry run modifies 115 pages on the 420.


 * Anatis ocellata
 * 1919 in women's history
 * Acta Musicologica
 * Adalia decempunctata
 * Abraham Moss Community School
 * Ancient astronauts
 * Anisosticta novemdecimpunctata
 * Aphidecta obliterata
 * Arms and the Man
 * Arthur de la Mare
 * Avenger (1981 video game)
 * Bernard Skinner (entomologist)
 * CFE CFE738
 * Canadian Independent Record Production Association
 * Carolyn Muessig
 * Charles A. Alluaud
 * Charles I Louis, Elector Palatine
 * Chen Jian (academic)
 * Chilocorus bipustulatus
 * Clark University
 * Coast Mountain Bus Company
 * Coccidula rufa
 * Coccinella hieroglyphica
 * Coccinella septempunctata
 * Coccinella undecimpunctata
 * Court of Justice of the European Union
 * Cream-spot ladybird
 * DNM2
 * Darasuram
 * Diaphorodoris olakhalafi
 * Don't Make Me Go
 * Dow University of Health Sciences
 * Edgemere Landfill
 * Eighteen-spotted ladybird
 * Eublemma ostrina
 * Executive order
 * Ford Bronco
 * Forgiveness
 * Gangaikonda Cholapuram
 * Garcilaso de la Vega (poet)
 * General Staff Corps
 * George Seldes
 * Georgia Makhlouf
 * Great Kei River
 * Great Lakes Twa
 * Halyzia sedecimguttata
 * HeRAMS
 * Hippodamia tredecimpunctata
 * Human sacrifice in Aztec culture
 * Hypericum canadense
 * Iran–Iraq War
 * Jan Miodek
 * Jan-Gunnar Isberg
 * Jim Forest
 * Julieanna Preston
 * Junzaburō Nishiwaki
 * Kaprun disaster
 * Konik
 * Krymchak language
 * Kuala Terengganu
 * Laodice of the Sameans
 * LaserPacific
 * Lee Cazort
 * Mahir Tomruk
 * Marienbad (video game)
 * Martin Yeoman
 * McKinsey & Company
 * Microbial oxidation of sulfur
 * Middle-range theory (archaeology)
 * Minchiate
 * Miranda Esmonde-White
 * Mpenjati Nature Reserve
 * Myzia oblongoguttata
 * Nephus redtenbacheri
 * New Zealand Threat Classification System
 * New Zealand bat flea
 * Next in Line (Johnny Cash song)
 * Nicole de Weever
 * Nijat Sirel
 * Operation Thunderbolt (1997)
 * Phase-change memory
 * Propylea quatuordecimpunctata
 * Quagga
 * RCA Records
 * Renault FT
 * Resonant inductive coupling
 * Reza Pahlavi, Crown Prince of Iran
 * Ripple Rock
 * Rosemary Bryant Mariner
 * Sarasota School of Architecture
 * School Without Walls (Canberra)
 * Scymnus auritus
 * Scymnus suturalis
 * Second Sudanese Civil War
 * Shamordino Convent
 * Simon Zadek
 * Sino-Indian War
 * Sir William Curtius
 * Stanley, the Ugly Duckling
 * Striker (comic)
 * Subcoccinella vigintiquatuorpunctata
 * Symplocos paniculata
 * Ted Nugent
 * The Mauritius Command
 * Timeline of women in science
 * Tropiocolotes wolfgangboehmei
 * Tulbaghia violacea
 * Tytthaspis sedecimpunctata
 * Universal Music Group Nashville
 * Urengoy–Pomary–Uzhhorod pipeline
 * Watsons
 * Wilfrid Cracroft Ash
 * Winston Branch
 * Yohimbine
 * You Are Happy

--NicoV (Talk on frwiki) 12:36, 26 February 2019 (UTC)
 * That list looks much more reasonable. There are still some weird ones in there, like You Are Happy, where issn was being used in a WorldCat template, which doesn't support that parameter. Also, it looks like dashes, as in Iran–Iraq War and The Mauritius Command and Resonant inductive coupling, are also silently converted to hyphens by CS1 templates, so those don't need to be fixed and should be removed from the WPCleaner report.
 * I can also add an option to ignore such cases where the dash is automatically replaced, like I did for the missing hyphen. But is it a good idea to keep incorrect syntax just because the template itself will fix it ?
 * For the non-existing parameter in a Worldcat, I think I will leave it like that and a hyphen will be added, there are only a few pages like that. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)


 * In a case like Tytthaspis sedecimpunctata, will the bot/script apply the ISSN template, making the ISSN actually useful, or will it just replace the dash with a hyphen? – Jonesey95 (talk) 13:23, 26 February 2019 (UTC)
 * Currently, it will simply replace the dash with a hyphen, but I can add a feature to use a template instead. --NicoV (Talk on frwiki) 14:02, 26 February 2019 (UTC)
 * I think replacing a plain-text ISSN with a template is a good idea in nearly every case.
 * I don't want to rain on your parade, but at this point, it looks like a periodic supervised AWB run, combined with a bit more tweaking of the WPCleaner report, might be the best option. The risk of cosmetic edits by the bot (and AWB, unless it is watched carefully) is high. With considerably fewer than 100 pages fixable by the proposed bot, a script may be better. If you still want to get this task bot-flagged in order to avoid cluttering people's watchlists, of course, I would support that. – Jonesey95 (talk) 14:04, 26 February 2019 (UTC)
 * I will try several modifications to limit the number of false positives in the generated list (which is good in itself), and we'll see then what is the best course of action. --NicoV (Talk on frwiki) 16:38, 26 February 2019 (UTC)
 * I will try several modifications to limit the number of false positives in the generated list (which is good in itself), and we'll see then what is the best course of action. --NicoV (Talk on frwiki) 16:38, 26 February 2019 (UTC)


 * Even if it means only fixing less than a 100 pages at the end, I'm still interested in running at least a test run. For the test run, if it's accepted, I will proceed one page at a time (after each modification, WPCleaner will ask me if it should proceed, so I will be able to check thoroughly before going to the next article). Running a script would be a good idea, but as no one is proposing to create it and run it (the list has been available for years), I think it's interesting to run WPCleaner on this. After the test run, we can still decide if it's interesting running it periodically or not. --NicoV (Talk on frwiki) 11:05, 23 March 2019 (UTC)
 * BAG assistance needed : can I do a test run? As explained, after each modification, WPCleaner will ask me if it should proceed, so I will check each edit before letting it do the next one. If it's ok, tell me how many modifications I can make. --NicoV (Talk on frwiki) 13:21, 1 April 2019 (UTC)

Primefac (talk) 20:32, 4 April 2019 (UTC)

Primefac I've done the 50 edits, they can be checked in this list. I've seen no problem in the edits. --NicoV (Talk on frwiki) 14:03, 8 May 2019 (UTC)
 * So I'm a bit confused here on one part. This account currently does have the bot flag as a result of another task approval - why would you NOT want to flag repeatable minor edits as bot (thus flooding recent changes and watchlists unneccessarily?) Keep in mind, that the bot flag gives you access to the bot attribture on edits, but you don't have to assert it (if you are depending on someone else's framework you may not have the choice). If you don't want the bot flag for some tasks, but do for others - but you don't have the capability of controlling this in your requests - you will need to create a separate account.  How do you want to deal with this? —  xaosflux  Talk 13:10, 15 May 2019 (UTC)
 * When I submitted this request, I didn't think if I would submit others, and this one took a long time to do. I removed the message about the bot flag, it's ok if I run this task with the bot flag and tagging my edits as such. I will check my other tasks, especially Bots/Requests for approval/WikiCleanerBot, to see if they should better be run without tagging the edits as bot: if so, I will manage it on my side. --NicoV (Talk on frwiki) 13:17, 15 May 2019 (UTC)

BAG assistance needed Any decision ? --NicoV (Talk on frwiki) 17:15, 12 June 2019 (UTC)
 * Primefac (talk) 12:23, 15 June 2019 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.