Wikipedia:Bots/Requests for approval/Monkbot 1


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Monkbot 1
Operator:

Time filed: 14:49, Saturday January 4, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AutoWikiBrowser

Source code available: User:Monkbot/CS1 deprecated parameters (AWB)

Function overview: Concatenate values from individual and adjacent template parameters: date or day with month and year into a new date. Replace the source parameters with the single date parameter:

Links to relevant discussions (where appropriate): Help talk:Citation Style 1/Archive 4

Edit period(s): In bursts

Estimated number of pages affected: The bot will be run through the pages listed at which at the time of this request contained 163,762 pages.

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): No

Function details: Citation Style 1 templates utilize either the wiki-markup or the newer Lua Module:Citation/CS1 engines to render individual citations in a consistent manner. This script does not modify templates that use because  does not support date parameters with a CITEREF disambiguator.

As I understand it, the parameters day, month and year were created to overcome limitations in the MediaWiki  function. The specific reasons are somewhat hazy. Whatever the problem with, it has been resolved rendering the parameters day and month unnecessary. Parameter day has been deprecated for quite some time and month is recently deprecated – both, because they are no longer required to serve their original intended purpose. The parameter year is still required for those CS1 -based templates that are used with short form citations that use and the  family of templates.

This script mimics the actions taken by the various CS1 templates that use and by Module:Citation/CS1. In all of these cases, the values from day, month, and year are concatenated into a WP:DATESNO compliant dmy format date which is then used for display. Often, CS1 citations contain date, month, and year where date is a 1- or 2-digit day number. I suspect that this is caused by the template as produced by the enhanced editing toolbar – editors fill in the month, year and date fields assuming that date means day. When date is present and has a value, and Module:Citation/CS1 use that value for the citation's rendered date and ignore month and year. When date contains a 1- or 2-digit number, that is the displayed date.

Monkbot task 1 looks for Module:Citation/CS1-based templates that have adjacent (in any order): The individual parameters are further constrained:
 * date and month and year
 * day and month and year
 * month and year
 * date and day must be a 1- or 2-digit number;
 * month may be a single month, season, or gibberish text – the content is not evaluated except to determine if:
 * month represents a range of months or seasons where the two members of the range are separated by spaced or unspaced hyphen, solidus, endash, or the html entity, or,
 * month contains a leading or trailing 1- or 2-digit day number – where this occurs the day number is extracted and, with the month text, concatenated with the content of year;
 * year must be a 3- or 4-digit number with or without a single lowercase alpha character for use as a CITEREF disambiguator to be used with short form referencing templates and the  family.

The script does not not check for spelling, capitalization, or for rational dates: 99 Nosuchmonth 2525 produces 99 Nosuchmonth 2525. It is anticipated that the script will create date values that have improper format, spelling, punctuation, capitalization, etc. These malformed dates are most likely the result of malformed original data and not flaws in the script. Such errors are detectable by Module:Citation/CS1 and will be added to. There are other bots that operate on the pages listed there and which are designed to make appropriate repairs (see BattyBot task 25).

It is not anticipated that this bot will do general fixes.

Discussion
"The script does not not check for spelling, capitalization, or for rational dates." It seems pretty straight-forward to check for those (unless you are using just AWB search-replace, but even then some clever regex). So the bot can exclude things like  or   or even. In many cases, it becomes harder to look for these once you merge them. I expect (i.e. have encountered with bot work) a lot of these, especially from 160k pages. — HELL KNOWZ  ▎TALK 15:04, 4 January 2014 (UTC)


 * The script is an AWB regex find and replace.


 * Re:  The script produces this (presuming that YYYY precedes date):
 * – the new date parameter is no more broken than it was before; the citation no longer causes the page to be part of . Script now ignores citations like this.


 * Re: : If the parameter order is year day month or month day year nothing changes because month and year are not adjacent to each other and the 4-digit day value causes the match to fail.


 * In the other four cases, dmy, myd, ymd, dym, month and year are adjacent so other regex patterns intended for templates with only month and year match those parameters and ignore day. The script produces this (assuming Month and YYYY):
 * →  – same when source month and year are transposed
 * →  – same when source month and year are transposed
 * The script ignores citations that contain year, month, and day or day but failed a match because day / date wasn't 1 or 2 digits are ignored.


 * Re: : Ignored when month precedes year because the extraneous text is not expected.  When year precedes month the script produces this (assumes YYYY):
 * – the intent of the extraneous text is lost Script now ignores citations like this.


 * I have had no success in concocting a regex pattern that would prevent a match when month contains extraneous text. If there is a way and someone out there knows what it is, please share.


 * Is this from a real citation? I can think of no reason why month should not be part of date.  Module:Citation/CS1 and all of the remaining CS1 templates that use  concatenate the content of month and year to create the displayed date.


 * —Trappist the monk (talk) 20:32, 4 January 2014 (UTC)


 * So you are not doing any kind of field checking? What if there is a date already, or what if there are several year fields, or fields just aren't next to each other? Personally, I don't think AWB+Regex is the right tool for this. — HELL KNOWZ  ▎TALK 20:57, 4 January 2014 (UTC)
 * Try changing the end of your find statement from  to   - I believe this will skip citations with extraneous text as in the example above.  I also suggest you use an edit summary that provides a link where editors who don't know what "CS1 deprecated date parameter errors" are could get more information, such as "Fix CS1 deprecated date parameter errors".
 * Looking at the code, if the fields aren't next to each other, it appears the bot wouldn't change it. GoingBatty (talk) 23:04, 4 January 2014 (UTC)


 * Changed the edit summary. Your suggested fix doesn't solve the problem.  I think that what wants to happen is for everything between the equal sign that follows the parameter label and the next pipe symbol (less leading and trailing white space) should be captured.  There is an exception. When something enclosed in html remark tags follows the "month/season" text, the entire match should fail and the script should ignore the citation.
 * → the capture is:
 * → should fail to match so that the script does nothing with this citation


 * The purpose of capturing everything between the = and | (less leading and trailing white space) is to keep parts of a month together if they should have gotten separated somehow: Dec ember.


 * I have not noodled this out. Surely there is a way to do it.


 * —Trappist the monk (talk) 19:39, 5 January 2014 (UTC)
 * - OK, load User:GoingBatty/Monkbot settings and try the rule marked "GB ydm cite xxx" on User:GoingBatty/Monkbot tests. GoingBatty (talk) 23:16, 5 January 2014 (UTC)


 * Ding! Ding! Ding! I was just beginning to wonder about what word boundaries meant and if it could be used to solve this problem and here you are with the answer.  I changed the capture   to   so that full stops in the month value would be copied into date.  It could probably be left as you did it so that BattyBot 25 wouldn't need to repair that citation.


 * I have since made 200+ supervised edits with the new script.


 * —Trappist the monk (talk) 15:03, 6 January 2014 (UTC)


 * Tweaked to replace hyphen, solidus, html  entity in month ranges with endash.  Also, when abbreviated months are followed by a terminal period, the period is removed.


 * —Trappist the monk (talk) 16:27, 8 January 2014 (UTC)
 * I have checked 50 or so of these supervised edits. I found no errors and no cause for concern. It appears to do what it says on the tin. If it merges parameters that result in an invalid date, BattyBot task 25 or a human editor will clean it up. – Jonesey95 (talk) 14:18, 10 January 2014 (UTC)
 * Leaving things for other editors/bots to fix is something we don't approve unless there are special circumstances. — HELL KNOWZ  ▎TALK 14:27, 10 January 2014 (UTC)
 * I will rephrase in an attempt at being more clear: This bot does not appear to create new errors. If there is already an invalid date, this bot will not fix that error. It fixes only the deprecated parameter error, which allows it to be a focused bot with limited complexity (i.e. it has a lower chance of unexpected and undesired output). Fixing invalid dates is the purview of a bot that is already approved and active. – Jonesey95 (talk) 18:22, 10 January 2014 (UTC)

— HELL KNOWZ  ▎TALK 14:27, 10 January 2014 (UTC)

Comment: I believe that this bot should operate only in the Article namespace, at least at first. I am new to BRFA and don't see a standard header for the BRFA request form that asks about namespaces. Is it assumed that all new bots will operate only in the Article namespace? What is the right venue for this question (I assume it's not this page)? Thanks. – Jonesey95 (talk) 21:28, 10 January 2014 (UTC)
 * We usually assume it is article space. There is no syntax guide for any other space and they might have examples, tests, etc. that have nothing to do with article usage. May be the "number of pages affected" should really be just "pages affected" for namespaces and estimates. — HELL KNOWZ  ▎TALK 21:39, 10 January 2014 (UTC)


 * Module:Citation/CS1 excludes several different namespaces from which is the list of pages that Monkbot task 1 will work on.  The list of excluded namespaces is at the top of Module:Citation/CS1/Configuration in the table.


 * —Trappist the monk (talk) 01:50, 11 January 2014 (UTC)

Bot trial results
The bot has completed 200 edits. I checked the diffs for all of them. Here is what I observed: I see no problems. Other editors may see something that I missed. – Jonesey95 (talk) 23:30, 11 January 2014 (UTC)
 * I saw zero cases in which the bot made an erroneous edit.
 * The bot is able to detect (and combine with year to make a valid date) month names, season names, and month ranges like "March–April".
 * The bot preserves the original editor's version of valid month names and ranges. If the original month value is a valid abbreviated month like "Sep", that is preserved and combined with year to result in a date parameter with the same format as the original citation. The bot fixes minor problems that caused the original month values to result in CS1 date errors, thereby fixing two errors with one edit.
 * The bot edited at a rate of exactly 100 edits per hour for the first 100 edits, then at about 200 edits per hour for the second hundred edits.

Special:Contributions/Monkbot which see.

Editor Jonesey95 is quick, ne? Those extra reliable eyes are much appreciated. Thanks for giving it a look.

I did not find any improper edits. I did, however, find a weakness in the script that allowed fixable citations to go unfixed. Cite note 8 should have been fixed with. That weakness has been fixed and the citation repaired by the script with.

Another weakness that I've observed is that the script doesn't recognize redirect CS1 names: is a redirect to  but it wasn't repaired. I'll research and add those names to the script.

—Trappist the monk (talk) 01:52, 12 January 2014 (UTC)


 * Ok, I'm not going to be adding CS1 redirects,, for example, has ,  has , etc.  Better to leave Monkbot task 1 as it is.


 * —Trappist the monk (talk) 12:48, 12 January 2014 (UTC)
 * If you turn on AWB's general fixes, that will also enable AWB's Template redirects functionality, which will convert those redirects for you. You could then set up your find & replace rules to run after general fixes (see AutoWikiBrowser/Order of procedures).  For example, try Lycoming ALF 502 with and without general fixes on.  GoingBatty (talk) 15:50, 12 January 2014 (UTC)


 * Thanks for that. But, because I am responsible for every change that Monkbot makes, I choose to not take responsibility for code someone else has developed.  And, while this trial is ongoing, verification of Monkbot is much easier when the only changes in a page are those made by Monkbot and not hidden amonst those made by AWB general fixes.


 * —Trappist the monk (talk) 18:25, 12 January 2014 (UTC)

For reference: Not saying these are issues, just pointing out. — HELL KNOWZ  ▎TALK 15:12, 12 January 2014 (UTC)
 * becomes
 * becomes
 * No whitespace around fields is preserved


 * Correct. The regex does not capture the pattern   between the parameter identifier and the parameter value – there are two or three of those that could be captured; which one should it be?


 * —Trappist the monk (talk) 18:25, 12 January 2014 (UTC)


 * Ideally, all of them. But we have not required this (mostly). — HELL KNOWZ  ▎TALK 19:34, 12 January 2014 (UTC)

Can you please run it on 100 random pages from the category, not the first ones, which here ended up being the same groups -- almost all are to genuses or chemicals/drugs which all have almost the same syntax. — HELL KNOWZ  ▎TALK 15:12, 12 January 2014 (UTC)


 * Special:Contributions/Monkbot which see.


 * I made a list of about a thousand pages from various locations in . That was much more than I needed.  Still, perhaps what Monkbot edited is sufficiently random.  I found no errors, nor anything untoward.


 * —Trappist the monk (talk) 18:25, 12 January 2014 (UTC)
 * I checked all 100 of these edits and found zero erroneous edits. Nice work. – Jonesey95 (talk) 18:39, 12 January 2014 (UTC)

All edits checked, no issues. — HELL KNOWZ  ▎TALK 19:34, 12 January 2014 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.