Wikipedia:Bots/Requests for approval/WikiCleanerBot 18


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

WikiCleanerBot 18
Operator:

Time filed: 13:40, Friday, June 12, 2020 (UTC)

Function overview: Fix some nowiki tags after internal links (cf. CHECKWIKI/WPC 553 dump).

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (WPCleaner)

Source code available: On GitHub (especially algorithm 553)

Links to relevant discussions (where appropriate):

Edit period(s): Twice a month

Estimated number of pages affected: About 10k pages found during the dump analysis, not all can be fixed automatically, so a few thousand edits.

Namespace(s): Main

Exclusion compliant (Yes/No): Yes

Function details: Tools like VE or CX tend to create internal links with incorrect formatting (the hyperlink is not covering all the letters), because the user doesn't always select exactly on what the link should apply. Part of such errors could be fixed automatically (see for example what my bot did on frwiki for several thousand articles). Examples of situations where the bot can automatically fix the internal link:
 * ’Ori tahiti,  replaced by  : displayed text is the same as the target of the link
 * Şabran (raion),  replaced by  : "s" is configured on frwiki as a possible extension (plural). Configuration for enwiki will also include "s", I will see with what is left after a first pass if other extensions can be added.
 * Œdipe et le Sphinx,  replaced by  : whitespace after the nowiki makes it useless.
 * İbrahim Tatlıses,  replaced by  : "s" is configured on frwiki as a possible extension (plural).

After the first run on frwiki, I'm adding some other automatic fixing abilities to the bot:
 * Albert Rhys Williams,  replaced by  : displayed text is the same as the target of the link minus the text after the opening parenthesis
 * Amarok (mythologie),  replaced by  : displayed text is the same as the target of the link, regardless of uppercase/lowercase

Discussion
Primefac (talk) 23:55, 15 June 2020 (UTC)
 * Comment: Thanks for taking this on. It looks uncontroversial. Do you know if there is a phabricator bug report so that this can get fixed in VE? – Jonesey95 (talk) 17:35, 12 June 2020 (UTC)
 * Hi Jonesey95. I don't know if there's a specific phabricator bug report for this, but I know the subject of incorrect links created by VE has been a long-standing issue... For example, you also have many links that are to an unrelated article (see for example, the list I'm generating on each dump analysis for frwiki for internal links like ). --NicoV (Talk on frwiki) 14:24, 14 June 2020 (UTC)
 * I haven't seen a bug report for that issue. I will be happy to file one. Do you have links to diffs? We don't link to years on en.WP, but I imagine that there are incorrect links being generated somewhere, given all of the other link-related bugs with VE. – Jonesey95 (talk) 15:27, 14 June 2020 (UTC)
 * Hi Jonesey95. A few examples gathered from Recent changes with nowiki tag, just by looking at the last 20 edits:
 * Mesta:  replaced by
 * Chilean Army:  added.
 * Manuel Romero Rubio:  replaced by
 * Fox Networks Group:  replaced by
 * Attica Scott:  added
 * New Democratic Party:  and   added.
 * As you can see, it's quite frequent (and most of the other nowiki tags are just different problems...). I gave up on reporting this kind of things to VE team, I reported them years ago... --NicoV (Talk on frwiki) 06:07, 15 June 2020 (UTC)
 * Jonesey95. If you were speaking about examples of links with an incorrect target, I don't have diffs, but I noticed articles with such problems when doing some trial edits, but I didn't try to find where they are coming from:
 * 2015 New South Wales Cup:
 * Akaoni Studio:
 * They are hard to track by a bot (except for the dates, that's why I added #526 for frwiki). --NicoV (Talk on frwiki) 06:21, 15 June 2020 (UTC)
 * Jonesey95. Even if you're not supposed to link to years on en.WP, I just started a dump analysis for #526, and it quickly found articles with such problems... Maybe some are false positives.
 * Australian Labor Party:
 * Clement Attlee:
 * Ducati Motor Holding S.p.A.:
 * European Free Trade Association:  and
 * Spenser (character):
 * William Ewart Gladstone:
 * 549:
 * CHECKWIKI/WPC 526 dump should be generated in a few hours. --NicoV (Talk on frwiki) 06:53, 15 June 2020 (UTC)
 * Jonesey95. More than 6k pages listed in CHECKWIKI/WPC 526 dump. --NicoV (Talk on frwiki) 18:06, 15 June 2020 (UTC)
 * Most of those links in the WPC 526 dump look OK to me, per WP:YEARLINK. (sorry, I misread the first few links; I see that most of them appear to link to the wrong year.) It is links like 1999 that are typically (but not always) discouraged, per WP:YEARLINK. The real problem links are the ones like . Over 6,000! Wow. – Jonesey95 (talk) 18:57, 15 June 2020 (UTC)
 * One note: I believe that many of the links related to sports seasons are intentional, like, because, as the article says, "The 1998 Pro Bowl was the NFL's all-star game for the 1997 season." In the US, American football seasons take place almost entirely in the second half of a given year, with the post-season games at the beginning of the following year but designated as part of the previous year's "season". If that makes sense. If there is any way to avoid changing links where the link text is one number higher than the target year, please do so pending further discussion. – Jonesey95 (talk) 03:33, 16 June 2020 (UTC)
 * Hi Jonesey95. I can try to ignore  when xxx=yyyy+1. Do you think it's the same reason for the elections links (2 in the above examples) or it will be problems that are missed? Or do I need to configure the list of "..." for which xxxx=yyyy+1 should be ignored? The incorrect links problem for years if just the tip of the iceberg for incorrect links, but I don't know how I can find all the other ones... --NicoV (Talk on frwiki) 09:22, 16 June 2020 (UTC)
 * The election links generally take the form xxxx=yyyy-1, like 1836 United States presidential election, where the election took place in one year (in November), but the dispute over it took place while votes were being counted in the following months. I think the bot might need to ignore all cases where the years are different by one (higher or lower), since it will run into context problems. The links that differ by more than one look like they are mostly typos and copy/paste errors. – Jonesey95 (talk) 14:10, 16 June 2020 (UTC)
 * Hi Jonesey95. I've modified the detection to allow configuring the minimum difference, so next time the list is generated, it will be trimmed down a bit. I think we should continue the discussion elsewhere, like Wikipedia talk:WPCleaner‎‎. I don't think it's possible to fix this error automatically (sometimes the link is correct, sometimes the displayed year is correct): on frwiki, I'm just adding a template after the link to request help from editors to fix the link. --NicoV (Talk on frwiki) 06:00, 17 June 2020 (UTC)
 * Thanks Primefac. I've done the 50 edits, and bot behaved as expected. --NicoV (Talk on frwiki) 18:46, 16 June 2020 (UTC)
 * I looked through all 50 test edits, and they all looked fine to me. In diff 1, I would have changed the link to "Wake Forest's" (I think this is the expected format on en.WP, although I can't find the guideline at the moment; I don't think you'll get any complaints), but the bot's "Wake Forest's" is acceptable. — Preceding unsigned comment added by NicoV (talk • contribs) 06:00, 17 June 2020 (UTC)
 * Primefac (talk) 17:12, 19 June 2020 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.