Wikipedia:Bots/Requests for approval/WikiCleanerBot 17


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

WikiCleanerBot 17
Operator:

Time filed: 14:57, Monday, May 25, 2020 (UTC)

Function overview: Do edit for fixing Special:LintErrors/wikilink-in-extlink / (Links in links).

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (WPCleaner)

Source code available: On GitHub (especially algorithm 513)

Links to relevant discussions (where appropriate):

Edit period(s): Twice a month

Estimated number of pages affected: Special:LintErrors/wikilink-in-extlink currently reports about 60k errors (for all namespaces), and the bot will only fix some situations, so I expect the number of pages affected ranging from a few thousands to 20k. I will also generate a dump analysis in CHECKWIKI/WPC 513 dump for a better view of the problems (it will display the problematic links).

Namespace(s): Main

Exclusion compliant (Yes/No): Yes

Function details: The bot will fix some of the problems due to internal links inside external links (like  ) which result in poor display. It will only be able to fix part of the errors. The behavior of the fixes can be customized per wiki (see configuration of error 513).

The fixes and the configuration will be done progressively : running the bot on Special:LintErrors/wikilink-in-extlink or on CHECKWIKI/WPC 513 dump, check what is fixed, extend the configuration/improve the algorithm if needed, update CHECKWIKI/WPC 513 dump if needed, and starting again...

I already run a similar task on frwiki with a few thousand edits (in several runs, allowing to improve the range of detection and automatic fixing).

Examples of automatic fixes that show what the algorithm do with different situations:
 * 1956 Eilat bus ambush:  is replaced by   (the coma before the internal link makes the shortening of the external link safe enough and automatic)
 * 1975 State of the Union Address:  is replaced by   (same as previous, and the dot after is also accepted as a punctuation)
 * 1981 Vienna synagogue attack:  is replaced by   (same as previous, and   is accepted as matching a configured regular expression)
 * 2012 Dhivehi League Round 2:  is replaced by   (same as previous but with the opening parenthesis, and   is accepted as a configured text)

If interested in details, currently, the algorithm is as follow, but it may evolve if I find enhancements along the way:
 * Analysis of external links created directly in wikitext (like ) :
 * It looks for the first instance of :
 * an internal link (like )
 * a template creating an internal link (like, the list of templates WPCleaner looks for is configured with variable
 * If it's a template, and a replacement template has been configured for this template (on frwiki for example:   can be replaced by  , the first creates link to dates, the latter no) :
 * The only suggestion is to replace the template
 * The replacement is automatic only if it has been configured to be automatic.
 * If it's an internal link or a template without replacement
 * The bot will go backward from the beginning of the link/template to see where the external link could be shortened: it takes into account whitespaces, some punctuations ( currently) or some configured texts (in variable  ). If a punctuation or a configured text with automatic flag set is found, the position to shorten the external link is deemed safe enough.
 * The bot will go forward from the end of the link/template to see if it can go safely to the end of the external link : it takes into account whitespaces, some punctuations ( currently) or some configured regular expressions (in variable  ).
 * If the position to shorten the external link is deemed safe enough and the bot could go to the end of the external link, the external link is shortened.
 * If it's an internal link at the beginning of the external link, and the link is configured (in variable ), the internal link is moved before the external link


 * Analysis of external links created through the use of templates (like using its url and title parameters to create an external link). The list of template/parameter is configured in variable
 * It looks for the first instance of an internal link or a template creating an internal link (same as above)
 * If it's a template, and a replacement template has been configured... (same as above)
 * If it's an internal link and the template/parameter is configured for automatic removal of the links, the internal link is replaced by the displayed text.

Discussion
What namespaces will this bot operate in? The bot should not fix deliberate errors, which means that operating in Template, Help, and Talk spaces is probably not advisable. I support its use in article space and Draft space. I have fixed a few thousand of these errors, which can be tricky to figure out, and I look forward to seeing some test edits to see how well the algorithm works. – Jonesey95 (talk) 15:38, 25 May 2020 (UTC)
 * Hi Jonesey95. For the moment, only Main namespace. Maybe other namespaces in the future, but I will open a new Request for approval then. I agree that Template and Talk are too tricky, Help I don't know, but I would rather go for namespaces like Category, File, Reference... before.
 * If you want to see some results, I've already done several thousands modifications on frwiki : here, here, here... (look for "Lien interne dans un lien externe", with "2.02b", the "b" is for bot). --NicoV (Talk on frwiki) 16:48, 25 May 2020 (UTC)
 * I clicked on many of those corrections, but they are all wikilinks in titre parameters of citation templates. We do not have any of those. Those errors would appear in, which is currently empty (I fixed many thousands of articles a few years ago, and a couple of diligent editors watch the category for new errors). Do you have fixes for Linter errors in regular URL links? If not, I can wait for the bot trial. Merci. – Jonesey95 (talk) 18:11, 25 May 2020 (UTC)
 * Hi Jonesey95. I proceeded step by step on frwiki, so each list may have rather one type of modification. I think this list maybe closer to what you're looking for (older list with actual internal links). But I think, I'll find ideas for improvements when I have started working really on enwiki for this. For example, among the improvements, I think of adding a list of internal links that can be safely put before the external link when they are at the beginning (like in 1953 Milwaukee Braves season for  replaced by   ). --NicoV (Talk on frwiki) 18:45, 25 May 2020 (UTC)
 * And in fact, there are maybe templates like URL with wikilinks in 2, for example in Åbyhøj Church. --NicoV (Talk on frwiki) 18:58, 25 May 2020 (UTC)
 * Hi Jonesey95. I've implemented the improvement mentioned just above, most of the modifications in this list are for the same internal link (to Élections Nouveau-Brunswick) at the beginning of the external link. --NicoV (Talk on frwiki) 15:35, 29 May 2020 (UTC)


 * Mainspace only. Primefac (talk) 14:55, 29 May 2020 (UTC)
 * Thanks Primefac. I've done 50 edits, and I didn't see big problems, just 2 very minor tweaks. For this edit, I've added "&amp;nbsp;" to the texts before, so in similar cases, the closing bracket will be before it. For this edit, I've modified the detection of the texts before to be case insensitive. Jonesey95, if you're interested to check the edits. --NicoV (Talk on frwiki) 16:29, 29 May 2020 (UTC)
 * Edited after bot approval: I also checked the edits, and they look great! Thanks for taking on this task, . Ping me if you need help. – Jonesey95 (talk) 00:09, 31 May 2020 (UTC)

I looked over the edits and this performs as expected. As per usual, if amendments to - or clarifications regarding - this approval are needed, please start a discussion on the talk page and ping. -- The SandDoctor Talk 18:42, 30 May 2020 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.