Wikipedia:Bots/Requests for approval/Snotbot 4


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol oppose vote.svg Withdrawn by operator.

Snotbot 4
Operator:

Time filed: 23:24, Monday March 14, 2011 (UTC)

Automatic or Manually assisted: Automatic

Programming language(s): Python

Source code available: Pywikipedia

Function overview: Fix duplicate references.

Links to relevant discussions (where appropriate): Per CITE. Let me know if centralized discussion is necessary for this task.

Edit period(s): Will run once to clear out the current backlog, then run intermittently thereafter if the backlog becomes large again.

Estimated number of pages affected: Current backlog is 5,589 per toolserver.

Exclusion compliant (Y/N): No

Already has a bot flag (Y/N): Yes

Function details: The bot will work off the list of articles provided at the reference duplication toolserver script. Specifically, it will check each article for duplicate references. If it finds multiple copies of the text:

it will replace the first instance with:

and it will replace all subsequent instances with:

Unless someone has a better idea for a ref naming scheme.

Discussion

 * In general, it is a good idea. The only issue I can think of is that, not very often, I'll use duplicate references when I want to use two different URL links to two different pages in the same book posted at Google. If the bot were to remove one of my two URLs, that would be a mistake. -- Uzma Gamal (talk) 10:04, 15 March 2011 (UTC)
 * I assume this deals with exact same content between refs (+- whitespace)? — HELL KNOWZ  ▎TALK 10:06, 15 March 2011 (UTC)
 * The use of named references is opposed by some editors, see Village pump (policy)/Archive 74 for one example (there are links to other examples in there too). Anomie⚔ 11:27, 15 March 2011 (UTC)
 * To be clear, the bot is only intended to find references that match exactly. If there are minor differences between the tags, take note of what's between them, and look for exact matches elsewhere in the article.  So, in both Uzma Gamal's case and the cases brought up at the VP thread, those potential issues will not be a problem.  I'll start a discussion to be sure though.  &mdash;SW&mdash; yak 14:23, 15 March 2011 (UTC)
 * Discussion started at Village pump (proposals)/Archive 108. &mdash;SW&mdash; squeal 16:57, 15 March 2011 (UTC)


 * Very strongly Oppose. Whether we should have named references or not (and correspondingly, whether we should have multiple footnotes at a single point) are matters of editorial judgment; an article repeating one reference exactly is not a problem - and will avoid other problems. The examples given in the discussion show clearly that the creator envisages only articles using web sources (for which the system of named footnotes is usually appropriate); but we have many articles which are not.  Septentrionalis PMAnderson 17:21, 16 March 2011 (UTC)
 * Naming references is an editorial decision, and not one that I am trying to interfere with. There is no logical editorial process that would lead to the decision to have multiple references in an article that are 100% identical.  There are plenty of reasons why you'd want to have multiple references to a single source, where each reference is slightly different (i.e. different page numbers, chapters, comments, quotes, etc.), and this bot will not affect those articles at all.  In the unlikely event that duplicate references were created as a result of a conscious editorial decision, then it was the result of a bad editorial decision which should be corrected.  Furthermore, the table that I posted at the village pump discussion clearly shows that the intention is not to only affect articles using web sources.  Several of the examples use the cite book template, which is clearly not for web sources.  The bot will not differentiate between varying types of sources, it will only look for identical wikitext between the tags.  I'm interested to learn about specific example cases where you believe the operation of this bot will cause problems.  &mdash;SW&mdash; gossip 17:56, 16 March 2011 (UTC)
 *  Absolutely oppose. [See comment at the end of this section] Of course there is a logical process which would lead to absolutely identical footnotes: citing precisely the same source at different points in the article. Print sources decrease this slightly with ibid. and loc. cit., but there is consensus that this is too dangerous for us, since any rearrangement may make these into errors - and they repeat ibid. anyway.  Septentrionalis PMAnderson 02:23, 18 March 2011 (UTC)
 * Please get a clue, if you are going to cite the same ref you should not be using ibid or anything like that, what you should be doing is creating a reference and then using to refer to it. Anything else will lead to a fuck up. If you're using ibid and someone else adds an additional reference between your original source and the use of ibid ibid now refers to the wrong ref. The person who decides to add a single ref should not have to worry about not breaking all the refs on a given page. ΔT The only constant 02:29, 18 March 2011 (UTC)
 * Please read WP:FOOTNOTE. Named footnotes are one solution; they are not mandatory; depending on the article, they may be distinct disimprovements. If this bot is equally badly written in other respects, it should be stopped summarily. Septentrionalis PMAnderson 02:34, 18 March 2011 (UTC)
 * Let me quote that back to you since you cannot seem to bother reading the full thing

'''"Do not use ibid., Id., or similar abbreviations in footnotes." '''
 * Not sure you can get much clearer than that. thus your argument is dead. ΔT The only constant 02:41, 18 March 2011 (UTC)
 * There is now an expansive thread on Village pump (proposals)/Archive 108 with very broad support and a small but vocal minority of opposition (consisting of 2 editors). If you read through the entire thread, you'll see that none of your arguments are even remotely persuasive.  You can say all day long that "named footnotes may be distinct disimprovements" and that "sequentially numbered footnotes are sometimes preferable", but until you actually provide evidence that any of these things are true, your argument falls on deaf ears.  "Evidence" doesn't mean "your opinion", nor does it mean pointing to irrelevant external style guides.  I've said it too many times to count, but I'll say it one more time because it has not yet been refuted:  There is no logical reason to ever have duplicate references that have not been grouped by naming them.  There's just not.  Take a look at every FA on the site, I guarantee you won't find one example.  If you do find an example and point it out, it would be quickly fixed.  If you submitted an article to WP:FAR with duplicate references, it would not pass until they were fixed.  For these and so many other reasons, I still believe this bot should run its task.  I believe the thread on the village pump shows consensus to continue, and I hope someone from BAG will allow a trial to commence soon.  Also note that Pmanderson apparently has some kind of problem with me and/or my bot (despite never having run into him before to my knowledge), as he has taken it upon himself to oppose every bot request I have open at the moment.  &mdash;SW&mdash; express 03:09, 18 March 2011 (UTC)
 * See also Help:Footnotes, which does not mention that this is optional or subject to stylistic interpretation in any way. Case closed.  &mdash;SW&mdash; gab 03:46, 18 March 2011 (UTC)
 * Note that Pmanderson was just blocked for 1 week in an unrelated incident, and therefore won't be able to respond further. &mdash;SW&mdash; express 04:16, 18 March 2011 (UTC)

Oppose proposal. A few years ago it became popular to separate references in endnotes, and to use the same endnote when the same reference is used. This manner of citation has become very popular but it is not, as PMAnaderson says, policy (a good thing to if you want to print articles with foot- rather than end-notes!). The passing of this proposal would give it the force of policy, and WP:BRFA doesn't have the power to do that. Incidentally, I really dislike the apparent bullying here. PMAnderson is entitled to his opinions without those whose proposals he opposes jumping on to his talk page and kicking him while he's down. @Snottywong, we know you support your own proposal; that you believe your own arguments are the best is no surprise either. The question here is what others think. Deacon of Pndapetzim ( Talk ) 18:00, 18 March 2011 (UTC)
 * I'm not sure I fully understand. If you have two endnotes which reference the exact same footnote, then why would it be problematic for those endnotes to be named and grouped?  &mdash;SW&mdash;</b> gossip 18:31, 18 March 2011 (UTC)
 * I think I'm confused because I don't know exactly what you mean by "endnotes". Could you show me an example file that uses endnotes?  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#0a0 -0.2em -0.2em 0.4em,#0a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> yak 20:39, 18 March 2011 (UTC)
 * I use the term endnotes because when you print out they are endnotes, not footnotes (which would appear at the bottom of each page rather than the end). The grouping of references in the same endnote is foreign to printed texts, a valid reason for an individual editor to wish to avoid their use. Deacon of Pndapetzim ( Talk ) 12:54, 20 March 2011 (UTC)
 * I've created some alternate proposals so that the bot won't interfere with articles that it shouldn't. Please see Village_pump_(proposals).  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#a00 -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> chat 13:57, 20 March 2011 (UTC)
 * Endnotes appear at the end of articles; footnotes at the bottom of each page. For articles which occupy one page (like our standard format) endnotes and footnotes have to be distinguished by typography; but see Pericles for the two different types. Septentrionalis PMAnderson 19:45, 25 March 2011 (UTC)


 * Comment. A question about this: these bots only add ref name when the repeated URL stands alone between ref tags, is that right? I ask because I often bundle citations, so that a URL might be repeated deliberately—but in those cases it never stands alone. SlimVirgin  TALK |  CONTRIBS 22:55, 18 March 2011 (UTC)
 * You are correct, this bot would not affect bundled citations. Basically, the bot looks at the raw wikitext between a tag.  If the raw wikitext in between is identical, then it groups them.  Otherwise, it leaves it alone and moves on.  So, let's say you had a reference at the top of the article, and then you created a bundled citation which included the exact same reference as part of the bundle.  In this case, the bot would not touch either reference.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#5a0 -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> squeal 23:17, 18 March 2011 (UTC)

Stats
I ran an analysis on the first 500 articles from the toolserver list, and found the following statistics: <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#a00 -0.2em -0.2em 0.4em,#a00 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> speak 16:26, 18 March 2011 (UTC)
 * 672 distinct references were duplicated at least once.
 * A total of 1,820 duplications were detected.
 * The maximum number of times a reference was duplicated was 24.
 * The average number of times each duplicated reference was duplicated was 2.7.
 * 197 of the 500 articles (39%) already had at least one named reference.
 * Out of the 197 articles with named references, the average number of named references per article is 27.1.
 * Out of the 197 articles with named references, the maximum number of named references in an article is 466.
 * 156 articles had 5 or more named references.

The usual guideline that we have for these things is that editors (and bots) should not change from one optional style to another (WP:CITEVAR). The use of named references is completely optional, not required; as long as that remains the case, bots should not be changing the citation style. The right place to make changes to our requirements for inline citations is WP:CITE, not in a bot request. &mdash; Carl (CBM · talk) 17:52, 18 March 2011 (UTC)
 * WP:CITE doesn't actually say that it is optional, and Help:Footnotes clearly implies that it is not optional. <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#a00 -0.2em -0.2em 0.4em,#00a 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> spout 17:54, 18 March 2011 (UTC)
 * See also Footnotes which clearly describes how to deal with identical references. <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#5a0 -0.2em -0.2em 0.4em,#00a 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> express 18:13, 18 March 2011 (UTC)
 * WP:FOOTNOTE says,
 * "However, some editors prefer to repeat the entire footnote, to prevent inadvertent removal of the only full copy of the reference, although this approach requires that updates to the footnote be made to all the footnote instances if all the instances are to reflect the current displayed text."
 * That pretty clearly describes a style in which duplicate footnotes are used instead of named references. WP:FOOTNOTE allows that style to be used. The Help: namespace is not part of the MOS, and is not well watched at all, so it can't really be used authoritatively. &mdash; Carl (CBM · talk) 18:57, 18 March 2011 (UTC)
 * Wrong. Read WP:FOOTNOTE again.  It's saying that if you're worried about the inadvertent removal of the only full copy of the reference, then you can still provide the entire reference within a named ref tag.  It would look like this:
 * Sam went to the store. He liked to go to the store on Monday.
 * You're using the exact same ref, and spelling the entire ref out both times, but since you've also named it both times, it only shows up as one ref in the references section. If this is a problem you're concerned about, it is easily addressable.  The bot can certainly be programmed to provide the entire reference each time instead of the shortened  version.  However, if you read WP:FOOTNOTE closely, you'll see it is not encouraging anyone to allow duplicate identical references to appear in the references section.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#00a -0.2em -0.2em 0.4em,#a00 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> talk 19:12, 18 March 2011 (UTC)
 * I think AWB actually removes the text from subsequent references with the same name; so if someone wanted the text to appear in each note they would achieve it by not using named references. &mdash; Carl (CBM · talk) 21:12, 18 March 2011 (UTC)
 * Ok, well that's a problem with AWB that needs to be fixed, but it has no bearing on this discussion. AWB shouldn't be removing the reference material from subsequent named references.  Obviously, the solution to that problem is not to start creating duplicate identical references, the solution is to fix AWB.  How is that not completely obvious?  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#0a0 -0.2em -0.2em 0.4em,#0a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> chatter 21:21, 18 March 2011 (UTC)
 * AWB does that because of the practice of keeping the established citation style in each article... which might include duplicating some references. &mdash; Carl (CBM · talk) 01:18, 19 March 2011 (UTC)
 * I think Citation bot and Rjwilmsibot already does this too, probably because they use AWB but wanted to mention it. --Kumioko (talk) 14:15, 19 March 2011 (UTC)
 * If other bots are already doing this, then #1, this bot should be speedy approved per prior precedent, and #2, those bots must not be doing a great job because there are nearly 6,000 articles with duplicate references and that number grows by about 50 per day. It's depressing how difficult it is to make improvements to WP anymore.  It kinda makes me want to stop trying to make improvements at all.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#a00 -0.2em -0.2em 0.4em,#00a 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> converse 15:09, 19 March 2011 (UTC)
 * Yes your right its extremely hard. Personally as long as this bot does not enforce a certain citation standard (which it doesn't appear to do) then I don't see a problem with it. --Kumioko (talk) 19:39, 19 March 2011 (UTC)
 * See also Bots/Requests for approval/Citation bot 6. <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#0a0 -0.2em -0.2em 0.4em,#a00 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> communicate 15:53, 19 March 2011 (UTC)
 * Sometimes BAG makes mistakes. &mdash; Carl (CBM · talk) 19:12, 19 March 2011 (UTC)


 * I have made some other suggestions at Village_pump_(proposals). I'd appreciate if any of the opposers could take a look at those suggestions and see if they eliminate your concerns.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#00a -0.2em -0.2em 0.4em,#a00 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> prattle 02:11, 20 March 2011 (UTC)
 * The first suggestion addresses the most serious concerns; the other two are sound procedure, and should have been included to begin with. But this is still an unnecessary task, an attempt by one editor to impose stylistic uniformity, despite objections by editors who prefer another style. Oppose. Septentrionalis PMAnderson 19:45, 25 March 2011 (UTC)
 * No one intends to impose a stylistic uniformity, rather we are trying to fix thousands of unintentional mistakes. The comparatively minuscule quantity of articles which have intentionally used duplicate references as a stylistic choice will admittedly also be affected, but the amount of effort it will take to revert the bot's edits is tiny.  And, considering that the bot will be prohibited from editing any article on which it has been previously reverted, I think that the "collateral damage" caused by the bot is almost immeasurably small when compared to the improvements it will make.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#0a0 -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> converse 20:42, 25 March 2011 (UTC)

Function details update

 * Due to some users who have opposed the function of this bot, I have made some changes to the way this bot will operate to minimize any unwanted changes that the bot might perform (see the updates to the original bot request above in ). The function of the bot remains the same: it will look for identical text between tags and group duplicate references by naming them (as described above in the original bot request).  However, the bot will also be exclusion compliant and restricted to 1RR.
 * For exclusion compliance, it will not touch any articles which have or  in the wikitext of the article.
 * The bot will also keep track of which articles it modified the last time it ran. About a week or so after the last time it was run, it will check all of the articles it modified to see if there are still duplicate references.  For any articles that have duplicate references, I will manually inspect the history of the article to see if the bot's edits were actually reverted or if new duplicate references were accidentally introduced.  If the bot's edits were actually reverted, then that article will be placed in a blacklist that the bot will no longer edit.
 * I will also add some instructions to User:Snotbot on how to prevent Snotbot from modifying your article (i.e. by adding the bots template), and I will link to these instructions in each edit summary.

There was initial broad support for the bot with minimal (but vocal) opposition. I then amended the proposal for the bot to include exclusion compliance and the 1RR requirement, and there was more support for it (see Village_pump_(proposals)). I have posted a notice here as well as contacted the opposers directly, asking them to comment on the amended proposal, but none of them have done so. I take that to mean that they have no further opposition to the amended proposal. I'd like to get a ruling from BAG as to whether this bot can be approved for a trial, or if I should stop wasting my time on it. Thanks. BAG assistance needed <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#5a0 -0.2em -0.2em 0.4em,#0a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> chatter 23:00, 21 March 2011 (UTC)


 * This is a good task for automation, in that manually performing such edits is time consuming and error prone, and is widely needed (see the stats). The few cases where existing editors are "negatively" impacted by the operation of this bot can be easily handled manually, as the bot will not return to the pages in question. Snottywong appears to be a responsible, and responsive, bot operator; I therefore see no significant hurdles to approving this task. — <span style="font-family: Courier New, monospace ;font-style:italic">V = IR (Talk&thinsp;&bull;&thinsp;Contribs) 02:29, 24 March 2011 (UTC)
 * And more support is pouring into the VP thread lately. <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#00a -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> speak 17:32, 29 March 2011 (UTC)

To sum up what I gather, I see rough consensus for the initial task and support for subsequent restrictions. This is a long BRFA with several parallel threads, so pardon me if I oversimplify the issues. The proposal is based on the points (1) "grouped references are more readable", (2) "there is no reason/value in duplicating reference content", and (3) "existing cases are almost exclusively mistakes or editor didn't know how to group them". The opposition points seem to be (4) "sequential references are more readable", (5) "naming and consolidating duplicate references isn't required by MoS", and (6) "not using duplicate references has been an editorial decision (CITEVAR)".

I believe that "Restrict the bot to working on articles that already use named references." would answer the last two opposition's concerns -- for (6) per CITEVAR the existing style would be to use the named references and for (5) the bot wouldn't be the one introducing new style, as it would already be introduced by previous editors. With that, I believe there is would be sufficient support for at least a lengthier trial run to see editor response to the edits. — HELL KNOWZ  ▎TALK 18:35, 29 March 2011 (UTC)


 * Restricting the bot from working on articles that don't already use named references was one suggestion put forward to address opposition issues, but I think that there were several supporters who noted that going to that length was unnecessary if the bot was going to check for reversions and be exclusion compliant. The problem with that restriction is that it drastically limits the usefulness of the bot, as it almost completely prohibits the bot from operating on articles in category #3: existing cases which are almost exclusively mistakes or the the editor didn't know how to group them.  Particularly for newer articles, if the primary author didn't know that grouping references was an option, then the article almost certainly won't have grouped references and therefore wouldn't get fixed by the bot under this restriction.  Could I suggest that we do some extended trials without that restriction (but with the updated Function Details above) and see if any editors complain?  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#00a -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> comment 19:30, 29 March 2011 (UTC)

(It seems this request has stalled...) BAG assistance needed <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#5a0 -0.2em -0.2em 0.4em,#a00 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> spill the beans 18:15, 19 April 2011 (UTC)


 * Question from SpinningSpark if an editor informs you that they think your bot has made a mistake, what action will you take?  Sp in ni ng  Spark  18:20, 21 April 2011 (UTC)
 * Is this a trick question? I will take whatever action is required per WP:BOTPOL.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#0a0 -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> communicate 19:06, 21 April 2011 (UTC)
 * No, it was not a trick question, and regarding your edit summary "stop wasting everyone's time" I do not consider it a waste of time to determine whether or not bot operators actually understand the requirements of WP:BOTPOL as opposed to merely knowing that it exists.  Sp in ni ng  Spark  00:23, 22 April 2011 (UTC)
 * It is a waste of people's time to ask editors if they intend to adhere to policy. Wikipedia is not a bureaucracy.  You don't need to ask me (and every other editor with an open BRFA) if I intend to adhere to policy so that you have a paper trail to point to when you want to block me.  Policy is policy.  If I'm violating policy, block me.  No pre-emptive questions required.  I understand you recently had a problem with another bot owner than ended up on ANI, but that is no reason to take it out on uninvolved editors who have done nothing wrong.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#5a0 -0.2em -0.2em 0.4em,#0a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> communicate 00:40, 22 April 2011 (UTC)

How will you name the references? To address at least some of the discussion concerns, the name has to be meaningful, not "duplicateref1". As an example, you could use the domain name for bare urls, author/year for citations, may be parse text-only ones for common syntax. It would also be great to have a report file for all refs where the bot couldn't figure out the best name, so you can review them later and add more rules. That would certainly go a long way to editor-friendly naming, which is what most opposition state as one of the arguments. In fact, you could even "fix" references called "dumplicateref1" in the future. — HELL KNOWZ  ▎TALK 09:13, 22 May 2011 (UTC)

Any response to H above?  MBisanz  talk 00:11, 13 June 2011 (UTC)


 * Withdraw bot request, for now. I no longer have any time to dedicate to this task.  I might re-open a new BRFA on this at a later date.  <b style="white-space:nowrap;text-shadow:#000 0em 0em 0.4em,#00a -0.2em -0.2em 0.4em,#5a0 0.2em 0.2em 0.4em;color:#ddd">&mdash;SW&mdash;</b> comment 16:11, 19 June 2011 (UTC)
 * Anomie⚔ 18:39, 19 June 2011 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.