Wikipedia:Bots/Requests for approval/Monkbot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Monkbot 2
Operator:

Time filed: 15:23, Tuesday March 4, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: Yes (source)

Function overview: Scans for  citations that use the deprecated parameters coauthor or coauthors and where:
 * these parameters are empty, removes the parameter;
 * the parameter contains one 1–4 segment name, replaces coauthor or coauthors with author2
 * the parameter contains multiple (2–9) semicolon delimited names, replaces coauthor or coauthors and the semicolons with author2 – authorn (where n is 3–10)
 * template contains harv, does nothing (does not apply when coauthor or coauthors is empty)
 * template contains lastn or authorn where n is greater than 1, does nothing

Links to relevant discussions (where appropriate):

Edit period(s): Occasionally after initial run through the category

Estimated number of pages affected: At the time of this writing, the deprecated parameter category contains 102,700 pages

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: A full and detailed description of the task's functionality is available with it source.

Discussion
I looked through the code, and maybe I missed the following nuance. contains articles in which coauthors exists in a citation in the absence of a populated author1 or last1. Does the bot's code account for the possibility that coauthors can exist without author1 or last1? If you take a citation with no populated author parameters and replace coauthors with author2, the citation will not display any authors.

Related: I have put in a feature request to detect author2 without author1 or first1 without last1, but it hasn't been implemented yet. If we implement that feature, I'd be OK with this bot's current code, since any replacements of coauthors with author2 in the absence of author1 would simply move the article to a new, easily-fixable error category. I want to avoid "fixing" an article by replacing coauthors with author2 in the absence of author1, since the article will not currently have any error tracking after the "fix" even though it still contains a broken citation.

I have been working on cleaning up via an AutoEd script. AWB should also work fine if someone wants to run through the category. These articles are easy to fix, except when they contain a mix of citations that are "coauthors-only" and "author1 plus coauthors". – Jonesey95 (talk) 21:55, 4 March 2014 (UTC)


 * Good catch. I've edited about 3000 articles using variants of this script and haven't see (yet) the case of coauthor without author.  I've added the item to my todo list.  It essentially means that there will be 2x the number of regexes that there are now.


 * —Trappist the monk (talk) 22:43, 4 March 2014 (UTC)


 * A tricky situation to watch out for in the above category is the presence of a populated first, an empty or missing last, and a populated coauthors. The intent of the original editor was to list the "first" author followed by coauthors. The fix is to replace first with author1 and coauthors with author2. I have fixed about 600 articles in the category and have seen this arrangement a few dozen times.


 * The more I think about this proposed bot, the more I think that it should fix only the most obvious of low-hanging fruit, at least at first. Will it behave properly if there are four authors, one missing author, three more authors, and coauthors? Would the bot create duplicate a author6 in this case? We don't have code to flag repeated parameters–the citation simply displays the final one–so if the bot created a second author6, nobody would ever know. Since we also don't have code to flag the errant "missing author" situation, I suggest leaving citations like this alone for a human to sort out. I think it would be reasonable, at least for a first pass through the category, to fix only those situations in which there is a single populated author or author1 or last1/first1 followed immediately by coauthors. Run through the 100,000 articles fixing only that condition, see what the category looks like when the bot is done, and make refinements to the bot's code. That would be a conservative approach. – Jonesey95 (talk) 00:07, 5 March 2014 (UTC)


 * If I understand the essence of your coauthors-without-author comment that opened this discussion, the citation must have last, last1, author or author1 and that parameter must have a value. If none of those parameters are present, or one is but is empty, then the citation shall be skipped. Because one of the last or author parameters is required, the first issue is not an issue, right?


 * As written right now, if values are assigned to any lastn or authorn (where n is greater than 1 and less than 100) and that parameter is located ahead of the coauthor parameter:
 * then, there is no replacement because the simple First Coauthor ... Fifth Coauthor replacement would bugger up the citation. However, if all of the existing author2 – authorn parameters in my example are empty, then there is no reason not to proceed with the replacement.
 * then, there is no replacement because the simple First Coauthor ... Fifth Coauthor replacement would bugger up the citation. However, if all of the existing author2 – authorn parameters in my example are empty, then there is no reason not to proceed with the replacement.


 * The other case, where lastn or authorn follow coauthor, there is no reason to do the replacement because the existing authorn parameters will override the new.


 * Will it behave properly if there are four authors, one missing author, three more authors, and coauthors? Yes, because the script found a match in step 2 and so protected that citation from step 3 editing. No duplicate author6.


 * We can do as you suggest and limit the search and replace to the case where last, last1, author or author1 precedes coauthor (that is the most common case). I have a test version of the script that is doing just that for the one-coauthor case.


 * —Trappist the monk (talk) 00:58, 5 March 2014 (UTC)


 * I follow your logic and am satisfied that the "first/coauthors" citations will be left alone.


 * I do think it would be conservative and reasonable to start the bot with a simple task, run through the category knowing that the bot should not make any mistakes because the code is so straightforward, then see what is left in the category. At that point, as we did with BattyBot 25, we can work to suggest refinements that will take care of known problems that appear to be easy to add to the existing code.


 * This is a coding philosophy, and you do not have to agree with it. I prefer to roll out simple code, make sure it works and is bug-free, then add complexity from there based on known needs. One can try to build a program that performs complex actions right from the start, and if one is very clever, one might succeed, but I am not that clever. I expect that addressing the most common case will be easy and will take care of two-thirds to three-quarters of the errors in the category. Once it does, it will be easier to find the odd situations that require additional complexity.


 * I am ready to see some test edits if there is an admin around who can approve them. I will be happy to check all of the edits. – Jonesey95 (talk) 01:21, 5 March 2014 (UTC)


 * You'll get no argument from me that simple is good. I think that this is the simple case that isn't so simple that it's trivial.  The challenge is still ahead of us: Last, First M., First M. Last, ... – I had much grander visions when I started down this path.


 * —Trappist the monk (talk) 02:00, 5 March 2014 (UTC)

Manual test edits

 * All of the Step 3 regexes now require last, last1, author or author1 to precede coauthor. All of the AWB edits in Special:Contributions/Trappist the monk from 11:39, 5 March 2014 were made with this version of the script.


 * —Trappist the monk (talk) 12:06, 5 March 2014 (UTC)

I looked at the first 35 edits by the script. Comments:

Good work. On a side note, if you could run some test edits on the Q-Z section of the alphabet in, that would help me clear out that category. The end of the alphabet contains articles with a mix of coauthors-related errors, and the script should be able to get the articles down to just one type of error that is easier for me to fix. – Jonesey95 (talk) 17:02, 5 March 2014 (UTC)
 * This edit and This edit did not fix any errors. They only deleted empty coauthors parameters. Editors will probably object if a bot does only that to an article. Perhaps the script should exit without editing if all citations end up protected.
 * This edit shows that the protection is working as intended. The script is being very conservative. That's good.
 * This edit has some GIGO going on ("foreword by Mark L."). The script worked fine.
 * This edit also has GIGO. The script worked fine. The output is no worse than the input.
 * This edit and a couple of others resulted in a citation with exactly nine authors, which triggers the displayauthors CS1 error. That's OK. Another bot or editor can fix that problem. I think Citation Bot is being programmed to work on those errors, which it should be able to fix easily.


 * I have made about 3500ish edits with various versions of this script. There are a lot of pages with empty coauthor parameters that have been removed.  There have been no complaints – no doubt, now that I written that, someone will complain.


 * It is trivial to add 9 to the replacement when there are 9 authors. Is that the correct solution to that problem?  Is it a problem?  Is it something that a bot should be doing?


 * —Trappist the monk (talk) 17:38, 5 March 2014 (UTC)


 * A human (I assume you are a human) making the change with a script is one thing. A bot doing it is another. I'm looking at WP:COSMETICBOT, which I have seen people cite when making objections to edits by bots.


 * As for 9, the problem is that the original source may have more than nine authors, but the editor inserting the citation may have listed only nine because of the previous nine-author limit in cite journal. Citation Bot goes out to check the original source (if a DOI or PMID is available) and adds the remaining authors (or, pending a feature request, adds 9). The solution, in any case, is to refer to the original source before deciding the number of authors to display. – Jonesey95 (talk) 18:50, 5 March 2014 (UTC)


 * I don't think that the removal of empty deprecated parameters qualifies as cosmetic – cosmetic implies appearance. The script is only removing something that isn't seen anyway.  I look at it more as instructive and preventive.  Instructive because editors will see that coauthor is deprecated, and preventative because editors aren't tempted to fill in the empty blank.


 * I have run the script through . It fixed about 475 pages.


 * —Trappist the monk (talk) 20:57, 5 March 2014 (UTC)

Ready for trial, approval needed
I am ready to see some test edits if there is an admin around who can approve them. I will be happy to check all of the edits. This bot task owner has a track record of being a conservative, responsible, and responsive bot owner. – Jonesey95 (talk) 05:41, 13 March 2014 (UTC)

—Trappist the monk (talk) 15:30, 16 March 2014 (UTC)

 MBisanz  talk 20:06, 29 March 2014 (UTC)

Fifty-seven edits made (I started without getting the edit summary right). They are listed here: Special:Contributions/Monkbot beginning at 21:35, 29 March 2014 and ending at 21:46, 29 March 2014 (times in UTC). Except for the first six, these edits are marked with this edit summary: Task 2: Fix CS1 deprecated coauthor parameter errors (bot trial)

It's a rather uninteresting collection of edits, though all of Task 2's features are demonstrated except the longer strings of coauthor names (3–9). But, it does illustrate the most common edits. I didn't see anything untoward in these edits.

Pinging Editor Jonesey95.

—Trappist the monk (talk) 22:09, 29 March 2014 (UTC)


 * So that I can let Monkbot continue to work on Task 1, here is a link to the wmflabs edit-summary search tool results that lists the edits made in this trial.


 * —Trappist the monk (talk) 11:28, 30 March 2014 (UTC)

I inspected the 50 edits linked immediately above. I noticed the following: I found no errors in the test edits I inspected. The bot appears to be conservative in operation, as it should be. – Jonesey95 (talk) 02:29, 31 March 2014 (UTC)
 * The bot removed empty coauthors parameters, as described above. This will discourage editors from filling in this deprecated parameter.
 * The bot appeared to limit itself to names containing no more than four segments, as described above. For example, this edit skipped, as it should have.
 * The bot operated correctly on coauthors parameters containing multiple (2 or 3) semicolon delimited names, as described above. This is evidenced in this edit. The test edits did not include a coauthors parameter with more than three authors.
 * I do not have an easy way to confirm that the bot ignores citations containing harv or that it ignores citations in which a template contains lastn or authorn where n is greater than 1, but I did not see any evidence in the test edits that the bot modified any such citations.


 * Thank you for doing that.


 * —Trappist the monk (talk) 13:30, 31 March 2014 (UTC)

Per a conversation at BRFA Monkbot 3, I have changed the script to add 9 when the replacement results in nine authors listed in the citation. This prevents the script from adding the page to.

—Trappist the monk (talk) 11:38, 1 April 2014 (UTC)

—Trappist the monk (talk) 11:08, 10 April 2014 (UTC)

-- slakr \ talk / 06:49, 12 April 2014 (UTC)
 * I checked a little over 50 of these test edits, and I found zero errors.


 * Here is an example of the bot correctly adding 9 to a cite template.


 * It handles ampersands and "and" gracefully.


 * It avoids wikilinked coauthor values, as it should.


 * I recommend approval. – Jonesey95 (talk) 13:59, 12 April 2014 (UTC)

Thank you. Every edit through edit 200 inspected, thereafter frequent random inspections.

is flawed. Monk bot should have removed the 'and ' from Y. Hasegawa; and Y. Azuma. I reverted, tweaked the script and let Monkbot ; this time successful (this reedit makes the total trial edit count 501).

Not a bad edit by task 5, but rather an editor's.

where the editor's choice mystifies Monkbot:
 * McBurnie MA, Newman A, Tracy RP, Kop WJ, Hirsch CH, Gottdiener J, Fried LP; Cardiovascular Health Study
 * → McBurnie MA, Newman A, Tracy RP, Kop WJ, Hirsch CH, Gottdiener J, Fried LPCardiovascular Health Study

In this case, it looks like the editor merely copy/pasted the author list from Pubmed:. Still, I reverted, tweaked the script. All rules enabled for ten edits, Monkbot reedited with. From this point through edit 150, only the multiple coauthors rules were enabled.

For edit 151, I disabled all rules except the 9 coauthor rule in order to to make sure to find a. After which, all rules were enabled for the duration of the test. I found no other questionable edits.

The edits are listed at Special:Contributions/Monkbot beginning at 11:30, 12 April 2014 and ending at 15:09, 12 April 2014 (times in UTC) and have this edit summary: Task 2: Fix CS1 deprecated coauthor parameter errors (bot trial). Also edit summary search results.

—Trappist the monk (talk) 15:29, 12 April 2014 (UTC)

—Trappist the monk (talk) 11:54, 27 April 2014 (UTC)
 *  MBisanz  talk 05:02, 4 May 2014 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.