Wikipedia:Bots/Requests for approval/NihlusBOT 5


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

NihlusBOT 5
Operator:

Time filed: 03:10, Friday, October 13, 2017 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Fix double colons in internal links (i.e. :Test ); see Special:LintErrors/multi-colon-escape

Links to relevant discussions (where appropriate): Bot requests

Edit period(s): One time run then monthly runs for Version 1.0 Editorial Team pages

Estimated number of pages affected: ~25,000?

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Matching  and replacing with. This is an error that changes how the link is rendered on the page (i.e. it doesn't render as a link, it renders as just text). I'm unsure about the number of edits required since the error page lists about 3k while a database scan returns 25k.

Discussion

 * Comment: 25K is probably closer to the real number. An insource search for "insource:/\[\[::/" finds 19,468 pages. It appears that Linter tagging is subject to the same problems described in : "Changes to MediaWiki code related to parsing can leave links tables out of date". For example, Special:LintErrors/multi-colon-escape was empty in article space until I null-edited Ida Dehmel, and now that page is listed. That page was edited on 17 July 2017, less than three months ago; many WP pages have not been edited for years. – Jonesey95 (talk) 04:17, 13 October 2017 (UTC)
 * Yeah, that's why I went with the higher number. My database is a couple weeks old and someone fixed a lot of them via meatbotting, but I wanted to be conservative on my estimates. A lot of the ones in that search though are brought over from Commons on the file pages, which was another reason for me to be unsure. Nihlus 14:38, 13 October 2017 (UTC)
 * Second comment: Will the regex above be able to avoid the false positive in Array slicing? I'm not a good enough regex parser to figure it out. – Jonesey95 (talk) 04:17, 13 October 2017 (UTC)
 * That's a good point. Any easy fix would be to tack  to the end of the regex and then change the replace to
 * ...Actually, since that's the only instance in article space, the bot could just ignore Article space and the issue would be avoided. Primefac (talk) 12:39, 13 October 2017 (UTC)
 * There are only two instances of Article space errors, both previously mentioned: Array slicing and Ida Dehmel. Nihlus 14:38, 13 October 2017 (UTC)
 * I actually fixed Ida just before making my previous comment. Primefac (talk) 14:54, 13 October 2017 (UTC)
 * Comment: Someone might want to do a manual fix to pages containing \[\[::: (three colons) before setting this bot loose. – Jonesey95 (talk) 04:24, 13 October 2017 (UTC)
 * There are six pages with three or more colons, should be easy to deal with before this happens. Primefac (talk) 12:32, 13 October 2017 (UTC)
 * Fixed. Nihlus 14:38, 13 October 2017 (UTC)


 * There are only about 1700 pages here, your estimate is wildly off. How come you are so far out?
 * I'm not sure you have demonstrated consensus, the message you refer to was about a sub-set of these pages which have been manually fixed. If you do demonstrate consensus, please ping me.
 * All the best: Rich Farmbrough, 19:09, 13 October 2017 (UTC).


 * My guess would be the 6k edits you performed about 24 hours ago, and the system just hadn't caught up yet when Nihlus put this through. Primefac (talk) 20:58, 13 October 2017 (UTC)
 * Good guess, but wrong!  They partially explain it at the end  of the request, which I initially missed " I'm unsure about the number of edits required since the error page lists about 3k while a database scan returns 25k."  If the database scan is a downloaded copy of the database, it's woefully out of date- which is most likely, worrying that they didn't realise this.   The other factor is that each page is typically at least 2-3 entries in the table - user, user talk and special.  So the figure on the special page is not enough data either.  All the best: Rich Farmbrough, 22:21, 13 October 2017 (UTC).


 * I null-edited a few hundred pages sometime in the last day or two, which increased the count by many hundreds. This means that the count is not accurate, and is definitely low (the phab link at the top explains why). Rich appears to be correct that the Linter page counts instances of the error, rather than pages, FWIW. When I search for "insource:/\[\[::/" in User Talk space, I get 18,446 pages, some of which are false positives, but most of which have errors that need fixing. This means that the number of affected pages in all other namespaces combined is less than 1,000. If I had to bet, I would bet that the number of affected pages is in the 10,000 to 20,000 range. – Jonesey95 (talk) 23:54, 13 October 2017 (UTC)

PAGE ]]) 15:25, 16 October 2017 (UTC) PAGE ]]) 17:36, 16 October 2017 (UTC) PAGE ]]) 18:29, 16 October 2017 (UTC) PAGE ]]) 18:44, 16 October 2017 (UTC)
 * ~ Rob 13 Talk 15:08, 15 October 2017 (UTC)
 * See here. One issue arose from my list getting refreshed so that it included Array slicing as mentioned above, although it has been fixed. Nihlus 15:26, 15 October 2017 (UTC)
 * Comment: I fully support this bot for cleaning up most of the double-colon errors across Wikipedia. However, in relation to my original bot request, this would actually breaking links on "log" subpages of Version 1.0 Editorial Team. These pages have double colons wherever the WP 1.0 bot didn't recognize the namespace. I have cleaned up the older pages that had issues with the File, WikiProject, and Category namespaces, and while those problems with the WP 1.0 bot were fixed, that bot is still not recognizing the Draft namespace. However, since none of that bot's maintainers are still active, it is continually adding new links to draft articles with double colons. Therefore, on subpages of Version 1.0 Editorial Team, the bot should be replacing  with  . --Ahecht ([[User_talk:Ahecht|'''TALK
 * Thanks, . I was going to exclude those pages since I figured they were already fixed (my database is older than the most recent fixes), but I can do those too. Would you like an additional trial? These pages are just a small subset of my list, so I can do a specific run on some of these pages if you would like. Nihlus 15:43, 16 October 2017 (UTC)
 * Just run the code once semi-auto from your main account on one of those pages to verify it works then ping me with the diff. ~ Rob 13 Talk 16:09, 16 October 2017 (UTC)
 * . Nihlus 16:17, 16 October 2017 (UTC)
 * Looks fine to me, but can you check that diff as well to make sure it's behaving as you'd expect? ~ Rob 13 <sup style="margin-left:-1.0ex;">Talk 16:35, 16 October 2017 (UTC)
 * Looks good to me. There is one more corner case that I completely forgot to mention, which is when the WP 1.0 bot reports a renamed page, which would need a regex from →  (as in this diff). --Ahecht ([[User_talk:Ahecht|<span style="color:#FFF;background:#00f;display:inline-block;padding:1px 1px 0;vertical-align:-0.3em;line-height:1;font-size:50%;text-align:center;">'''TALK
 * Those only have one colon, though, right? <b style="padding:2px 2px;font-variant:small-caps;whitespace:nowrap;color:#000;letter-spacing:-0.5px">Nihlus</b> 17:42, 16 October 2017 (UTC)
 * Oops, that was a bit of a misleading example, because had already removed the first colon here. A better example would be Special:Diff/804902874. --Ahecht ([[User_talk:Ahecht|<span style="color:#FFF;background:#00f;display:inline-block;padding:1px 1px 0;vertical-align:-0.3em;line-height:1;font-size:50%;text-align:center;">'''TALK
 * I figured that was the case. The code above works, so I can add it to my list for when I fix Version 1.0 Editorial Team pages. Do you know if is going to be fixed soon? If not, does this need to be an ongoing task rather than a one time run? <b style="padding:2px 2px;font-variant:small-caps;whitespace:nowrap;color:#000;letter-spacing:-0.5px">Nihlus</b> 18:34, 16 October 2017 (UTC)
 * Neither of the developers of WP 1.0 bot are still active. I have pinged both of them, and filed a bug report on the GitHub page, but I haven't heard of any progress. I would assume for now that this has to be an ongoing task. --Ahecht ([[User_talk:Ahecht|<span style="color:#FFF;background:#00f;display:inline-block;padding:1px 1px 0;vertical-align:-0.3em;line-height:1;font-size:50%;text-align:center;">'''TALK

Any update on this one? <b style="padding:2px 2px;font-variant:small-caps;whitespace:nowrap;color:#000;letter-spacing:-0.5px">Nihlus</b> 20:00, 19 October 2017 (UTC)
 * Bit of a busy week, but I'll check over the trial tonight. Sorry for the wait. ~ Rob 13 <sup style="margin-left:-1.0ex;">Talk 21:48, 19 October 2017 (UTC)
 * Can you explain this edit? There's more going on here than I would expect. ~ Rob 13 <sup style="margin-left:-1.0ex;">Talk 13:38, 20 October 2017 (UTC)
 * It looks like its the pipe trick, not something that the bot did specifically. | got converted (properly) to |: when the page was saved. It didn't do this before because it didn't recognize : as a valid wikilink. No idea about the de: thing, though. Primefac (talk) 13:42, 20 October 2017 (UTC)
 * The :de: to de: also looks like the bot removed the colon and the MW software automatically processed a WP:PIPETRICK link. Harmless, I'd say. – Jonesey95 (talk) 14:13, 20 October 2017 (UTC)

Pinging for an update. Nihlus 02:14, 23 October 2017 (UTC)
 * ~ Rob 13 <sup style="margin-left:-1.0ex;">Talk 11:25, 23 October 2017 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.