Wikipedia:Bots/Requests for approval/GreenC bot 5


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

GreenC bot 5
Operator:

Time filed: 02:52, Tuesday, April 24, 2018 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): BotWikiAwk

Source code available: accdate.awk

Function overview: The proposal is for 'accdate bot' to remove access-date from citations in the tracking category using targeted strategies.

Links to relevant discussions (where appropriate): Help_talk:Citation_Style_1 - also CS1 documentation which supports use of access-date for url only.

Edit period(s): one-time run during first pass as standalone bot; then semi-continually as part of a module of WaybackMedic

Estimated number of pages affected: 25,000 (57% of 43,719)

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details:

Of the Category:CS1 errors, the tracking category with the most entries is (43,719). There is no silver bullet solution to clearing the cat, so this will break it down by targeting known types of problems within that category. There have been many discussions about it over the years.

The proposal is for 'accdate bot' to remove access-date from citations in the tracking category using the following strategies:


 * 1. Remove accessdate in CS1|2 templates that don't have a url but do have a value assigned to any of the various 'permanent-record' identifiers. Excluding templates, , and . Normally isbn would be excluded from the identifier list, but if a it would be included.
 * 2. Remove accessdate in, and  with no url. Per the documentation, "Access dates are not required for links to published research papers, published books, or news articles with publication dates." If a publication date is provided, remove accessdate.

Discussion

 * The bot has been updated to the specifications above. A dry run of 1,000 articles found a fix in 574 or about 57%. The total cites fixed is 1165, of those 1121 are of type #2 and 44 are of type #1. I manually checked about 100 diffs offline and don't see any problems but will manually check these 574 once they are uploaded. Or whatever number is approved for trial. --  Green  C  15:44, 1 May 2018 (UTC)


 * See User talk:CitationCleanerBot/Archive 1. You may want to link to that, or provide a similar explanation on the bot's page, because those questions will happen A LOT. Headbomb {t · c · p · b} 16:39, 1 May 2018 (UTC)
 * Agreed a good idea to have a FAQ since access-date is a common source of confusion, what it's for and why exists. --  Green  C  04:11, 3 May 2018 (UTC)

Since no one from BAG seems interest in this, I'll take it despite having been involved in the discussion a bit. Headbomb {t · c · p · b} 16:45, 16 May 2018 (UTC)
 * Edits (toolserver). Or Special:Contributions of May 7. -- Green  C  21:06, 17 May 2018 (UTC)
 * The edits look good to me. A very minor cosmetic issue: for edits like these where the accessdate parameter is the last parameter in the citation, ideally the bot should also be removing the white space in front of the pipe character rather than leaving some extra white space at the end of the citation. —RP88 (talk) 21:37, 17 May 2018 (UTC)
 * The space is there because the preceding argument has a trailing space and the bot leaves other arguments alone for safety. I understand personal preferences for spacing, but I can't program for every contingency, cites are often a mix of spacing styles. If removal of the preceding argument trailing space is the right decision, always, I don't know. Arguably in this case the spacing is consistent because every other argument has both a leading and trailing space. The bot retained the existing style, though it was coincidence.  --  Green  C  22:15, 17 May 2018 (UTC)

In edits like these (and I could pick several examples), the bot also removes empty url parameters, and I do not see the wisdom in doing that. This discourages finding free URLs and makes it (slightly) harder to add them. Empty parameters should be left alone. Headbomb {t · c · p · b} 16:23, 18 May 2018 (UTC)
 * I concur with User:Trappist the monk in the discussion, and also generally about removing them when they might cause confusion - in this case empty url have actually created some of the problem this bot is attempting to resolve. There is no evidence empty arguments encourage users to fill them in (nudge theory); there's no way future editors can know why the empty argument exists: did it once have something and was deleted? Was the citation copy-pasted in with other empty args and lazily the empties were kept? Was it always empty? There's no nudge factor because there are so many possibilities of why it exists. If the empty url included a wikicomment saying "A URL might exist; please fill me in, or delete this notice and empty arg" that would be more clear. Do we want to do it? It seems like it would be true for any citation without a url and goes down the rabbit hole of trying to direct users what to do.  --  Green  C  18:09, 18 May 2018 (UTC)


 * By that rationale, every empty parameter should be removed, and that's not something I feel bots should be doing, save in fairly controled situations, or strong consensus to do so (in which case the functionality could be implemented in AWB). I picked a clean edit, but I could have picked an edit where the bot removed an empty url parameter, but left a slew of other empty parameters alone (jstor/zbl/etc...) such as . The problem the bot is trying to solve is stray accessdates, so it should stick to that IMO. Open to other BAG opinion here since I'm partly involved here. I will point out that in the dicussion that lead to this, no one suggested/supported removing empty url parametesr from citations. Headbomb {t · c · p · b} 18:18, 18 May 2018 (UTC)
 * "In this case empty url have actually created some of the problem this bot is attempting to resolve." Removal is relevant to the purpose of the bot, and it's limited to the citation it edits as a secondary - it doesn't seek out other empty arguments in other citations. To nudge the community to do things with signals of encouragement is not the bot's intention. OTOH removal of url within the citations its edits is relevant to the bot's purpose. -- Green  C  19:41, 18 May 2018 (UTC)
 * Personally, I have no issue with removal. The empty args are a waste of space and accomplish nothing from my viewpoint.  Also basic bots working with cite templates, may encounter issues with empty URL parameters, though good coding can easily work around that.— CYBERPOWER  ( Chat ) 20:30, 18 May 2018 (UTC)
 * WP:COSMETICBOT says «changes that do not [change output] are typically considered cosmetic». Sometimes this means that it's taken for granted they can be performed alongside bigger changes, sometimes it means they raise more complaints than the bigger change. :) --Nemo 23:58, 18 May 2018 (UTC)

To be clear I'm recusing myself from making the final call here. I have listed some objections above, but I'll note for the record they are not a personal deal breaker for me, simply a concern I have. Headbomb {t · c · p · b} 16:37, 1 June 2018 (UTC)
 * I have no issues with the removal of |url=, as mentions above, they can cause issues.   SQL Query me!  15:53, 28 August 2018 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.