Wikipedia:Bots/Requests for approval/Mdann52 bot 13


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Mdann52 bot 13
Operator:

Time filed: 10:35, Monday, September 4, 2017 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Pywikibot

Source code available: https://github.com/Mdann52/wikipedia/blob/master/iso4bot.py

Function overview: Help clear up the backlog in Category:Articles with missing ISO 4 redirects

Links to relevant discussions (where appropriate): Bot_requests

Edit period(s): One time run

Estimated number of pages affected: ~1000

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: From Bot_requests:

To help clear up the backlog in Category:Articles with missing ISO 4 redirects, if a bot could
 * Parse every article containing Infobox journal, retrieve J. Foo. Some articles will contain multiple infoboxes.
 * If J. Foo. exists and is tagged with R from ISO 4
 * Create J Foo with
 * 1) REDIRECTArticle containing Infobox journal

Thanks! Headbomb {t · c · p · b} 11:57, 31 August 2017 (UTC)
 * If J Foo already exists, make sure it is tagged with R from ISO 4, and remove any other R from ... templates present (like R from abbreviation/R from acronym).
 * Null edit the original article containing the infobox with J. Foo.

Discussion
Sample edits - here. Mdann52 (talk) 10:40, 4 September 2017 (UTC)
 * When I saw the function description, I thought you were creating ISO redirects as well as tagging them. Isn't that what the bot request is for?  Pinging .— CYBERPOWER  ( Chat ) 10:56, 4 September 2017 (UTC)
 * Nevermind. I didn't read your diffs properly.— CYBERPOWER  ( Chat ) 10:59, 4 September 2017 (UTC)


 * — CYBERPOWER  ( Chat ) 10:59, 4 September 2017 (UTC)

Handling dotted vs dotless abbreviations
Three things: First, I believe this bot request was about adding a dotless redirect, like, only when the dotted version redirect already exists, like. The code seems to add dotted versions, only based on the abbreviation parameter, which might be too much GIGO: the parameter is incorrect in about 1/8 cases (there's an effort to fix that, see below). Tokenzero (talk)

Second, there's a long discussion on how to exactly categorize the dotless redirects, which seems to have now settled down. Headbomb, should we use the template R from dotless ISO 4 (currently placing the article in the same category as R from ISO 4) just in case this rebounds, or just keep things simple?

Third, we now have an automatic tool that computes abbreviations. It has an error rate of ~5%, but it detects virtually all errors made in human-edited abbreviations (the 1/8 garbage, see the list of mismatches). So we could handle both dotted and dotless redirects automatically, by only doing that when the human-edited abbreviation parameter matches the computed abbreviation. This should handle most redirects (eventually all but ~5%, as editors will fix the mismatches), with virtually no GIGO and without introducing any new errors. This would be a bit more complicated and there's a few more corner cases (e.g. the bot should not overwrite pages like Ann. Phys., or any redirects to unexpected pages like ; it should find all infobox journals when a page has many - I did that with mwparserfromhell for scraping the list). I could write the code for that and give it here, or just submit my own bot. What do you think? Tokenzero (talk) 09:32, 13 September 2017 (UTC)
 * I don't personally see a consensus to use R from dotless ISO 4 at all [such a consensus may develop in the future, of course, but I don't see it as better than 50-50 that it will]. I also agree that using the automatic tool to verify abbreviations is the superior approach to what I suggested above. If the infobox abbreviation matches the tool's 'probable abbreviation', the bot should create both dotted and undotted versions, and then null edit the original article. Headbomb {t · c · p · b} 11:22, 13 September 2017 (UTC)


 * D Any update on this?— CYBERPOWER  ( Message ) 23:37, 18 September 2017 (UTC)
 * Working on this when I can - some issues have come up (as alluded to below), so I'm trying to find the time to make the fixes. Mdann52 (talk) 18:00, 20 September 2017 (UTC)
 * The bot function specification should be changed, as discussed above, and I believe the easiest way would be if I write and submit my own bot for BRFA, which would replace your bot. Do you agree? If you prefer to make the changes yourself, I can send some code for handling infoboxes cleanly and details on the automatic tool. To clarify the proposed changes in the bot function:
 * 1) It should add redirects only when the infobox abbreviation matches the one given by the automated tool OR if the dotted version redirect already exists (and is categorized as ISO-4),
 * 2) It should not replace existing pages unless they are just miscategorized redirects to the page we came from (e.g. it should keep disambiguation pages and dotless redirects to them),
 * 3) Perhaps we'll want to change dotless redirects to R from dotless ISO 4, but there's no consensus on that, just a thing to keep in mind. Tokenzero (talk) 18:53, 20 September 2017 (UTC)
 * either works ok for me - if you wish to take over the task, feel free. Mdann52 (talk) 19:40, 20 September 2017 (UTC)
 * When I made the original request, the tool didn't exist yet. There's a better way of doing things now, so we should do that. Makes no difference to me who codes it, but Tokenzero could probably code it more quickly as they made the tool the bot would be based on. Headbomb {t · c · p · b} 20:01, 20 September 2017 (UTC)
 * Ok, I'll take it over then. Since it changes the maintainer, bot account, specification, etc. I think I'll just submit a new BRFA when I'm done with some technicals. Tokenzero (talk) 20:40, 20 September 2017 (UTC)

False positives
Moved from User talk:Mdann52 Hey there -- thanks so much setting up Mdann52 bot to tag ISO 4 redirects with. I wanted to call your attention to a false positive that I recently noticed; I figure you may want to know about these things. Berkeley J. Emp. & Lab. L. was tagged as an ISO 4 redirect, but this is actually the Bluebook abbreviation and should be tagged as. I think the mistake occurred because, at the time the redirect was initially tagged with, the "abbreviation" field in the main article's infobox (which is used for ISO 4 abbreviations) erroneously contained the Bluebook abbreviation. If the bot is relying solely upon data in the "abbreviation" field in the infobox, and that information is incorrect, then bot may be creating redirects from incorrect titles. There may not be anything that can be done about it, but I wanted to give you the heads up. Best, -- Notecardforfree (talk) 11:31, 4 September 2017 (UTC)
 * Another example of an erroneous redirect is Berk. J. Int. Law., which was just created by the bot. Like the example I mentioned above, it looks like the bot relied upon incorrect data in the article's infobox. -- Notecardforfree (talk) 11:46, 4 September 2017 (UTC)
 * It is purely relying on the infobox entry, yes, so if these are incorrect, then the wrong redirect will be created. I'm not too sure how I can resolve these false positives - I'll stop the run for now and am open to suggestions.
 * Mdann52 (talk) 15:30, 4 September 2017 (UTC)
 * AFIACT, that's fine by me. The error already exist, which means the bot isn't doing anything worse. It'll create a badly categorized redirect, based on a badly categorized redirect. I'll be going through Category:Redirects from ISO 4, and having two such redirects means I'm more likely to catch the error. However, the bot should make sure that R from ISO 4 is present on the dotted redirect before creating the dotless one. Headbomb {t · c · p · b} 15:34, 4 September 2017 (UTC)
 * and, I think the bot is doing good work, and ultimately this task will save countless hours of human editors' time. I understand that a few false positives will occur now and then, but I think the utility of having the bot perform this function far outweighs any harm that would occur from creating a few false positives every now and then. I didn't mean for my message to throw a wrench in the works; I simply wanted to bring this to your attention in case it was relevant to maintaining the bot. I think that we should have the bot continue to perform this task and then have human editors review for accuracy once they have been created (checking for accuracy will take far less time than creating the redirects/tags). Thanks again for your work with this! Best, -- Notecardforfree (talk) 15:40, 4 September 2017 (UTC)
 * Not an issue - you've actually pointed out an interesting bug before I noticed (namely I wasn't checking the template name when I extracted the paramater!). I'm looking into resolving this in the next few days. Mdann52 (talk) 16:37, 8 September 2017 (UTC)


 * in favor of Bots/Requests for approval/TokenzeroBot.— CYBERPOWER  ( Chat ) 13:43, 9 October 2017 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.