User talk:Citation bot/Archive 19

Unusefull edit?
Please see this (diff) edit. Changing an empty url to chapter-url does not seem to be really useful like this, combined with the fact that the only other change with the edit was ISSN --> issn. --Redalert2fan (talk) 14:33, 26 October 2019 (UTC)
 * fixing minor typos and encouraging people to do the right thing does have value. But perhaps the change description should not think ISSN to issn is a remove and an add.  Also, how did you activate the script since it says “other”? AManWithNoPlan (talk) 15:33, 26 October 2019 (UTC)
 * https://tools.wmflabs.org/citations/index.html using the process page feature. Redalert2fan (talk) 15:43, 26 October 2019 (UTC)
 * I should fix that! AManWithNoPlan (talk) 15:44, 26 October 2019 (UTC)
 * UCB_Other fix https://github.com/ms609/citation-bot/pull/2219 AManWithNoPlan (talk) 16:27, 26 October 2019 (UTC)
 * Overly exaggerated edit summary fix (ISSN to issn no longer called a removal and an addition) https://github.com/ms609/citation-bot/pull/2220 AManWithNoPlan (talk) 16:33, 26 October 2019 (UTC)
 * fixed for the better. AManWithNoPlan (talk) 22:09, 26 October 2019 (UTC)

Character � added
That should be fixable. Odd use of a Unicode character. AManWithNoPlan (talk) 15:45, 26 October 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2223 AManWithNoPlan (talk) 22:14, 26 October 2019 (UTC)

Accept Terms and Conditions on JSTOR
Can the bot be programmed to fix this: to be changed to

Jonatan Svensson Glad (talk) 17:31, 26 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2222  and   https://github.com/ms609/citation-bot/pull/2221   AManWithNoPlan (talk) 18:30, 26 October 2019 (UTC)

fixed

adds another HDL
https://github.com/ms609/citation-bot/pull/2225 AManWithNoPlan (talk) 23:51, 26 October 2019 (UTC)
 * Much better code now. Mostly fixed AManWithNoPlan (talk) 00:14, 27 October 2019 (UTC)
 * This will seal the deal https://github.com/ms609/citation-bot/pull/2226 AManWithNoPlan (talk) 00:21, 27 October 2019 (UTC)

support more RIS usages
TY - CHAP TI - [Part Three: Introduction] T2 - Ladies of Soul

https://github.com/ms609/citation-bot/pull/2228 AManWithNoPlan (talk) 00:03, 28 October 2019 (UTC)

More JSTOR formats
https://github.com/ms609/citation-bot/pull/2227/ AManWithNoPlan (talk) 23:46, 27 October 2019 (UTC)

New handle
https://github.com/ms609/citation-bot/pull/2227/ AManWithNoPlan (talk) 23:46, 27 October 2019 (UTC)

Are you a robot?
See https://en.wikipedia.org/w/index.php?search=insource%3A%2FAre+you+a+robot%2F&title=Special%3ASearch&go=Go Jonatan Svensson Glad (talk) 01:24, 28 October 2019 (UTC)


 * why yes we are 🤣😂.  Will add to bad titles list. AManWithNoPlan (talk) 01:33, 28 October 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2227 AManWithNoPlan (talk) 01:40, 28 October 2019 (UTC)

URL
Is there any way to decrypt url to https://www.bloomberg.com/news/articles/2019-06-10/hong-kong-vows-to-pursue-extradition-bill-despite-huge-protest or at least blacklist extracting info from URLs with "tosv2.html"? Jonatan Svensson Glad (talk) 05:30, 28 October 2019 (UTC)


 * The other problem, they prevent archive bots from archiving the page, so when the page dies there will be no archive. This might be better discussed at Village Pump technical to see if anyone has ideas for decryption or determining underlying URL somehow. -- Green  C  17:51, 28 October 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2228 this will block the bot from looking at them. AManWithNoPlan (talk) 17:54, 28 October 2019 (UTC)


 * It's BASE64 encoded. VERY easy to decode. AManWithNoPlan (talk) 17:55, 28 October 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2234 AManWithNoPlan (talk) 21:40, 29 October 2019 (UTC)


 * fixed https://en.wikipedia.org/w/index.php?title=Tulsi_Gabbard&type=revision&diff=923664735&oldid=923664366 AManWithNoPlan (talk) 00:19, 30 October 2019 (UTC)

Editor is not an author
This should be a blacklisted author. Jonatan Svensson Glad (talk) 22:02, 16 October 2019 (UTC)
 * Probably a good thing that wasn't blacklisted. That flags this citation as having bad data. (If we had simply said, don't allow "Editor" through, then the first name would be wrong, as it includes "Diplomatic".) --Izno (talk) 23:40, 16 October 2019 (UTC)

ISBN-10 and not ISBN-13 from Amazon URL
We don’t convert old ISBN. We don’t get the year until late in the process. I need to add checking ISBN again at the end. AManWithNoPlan (talk) 00:45, 28 October 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2235 AManWithNoPlan (talk) 01:13, 30 October 2019 (UTC)

Reuters x2
https://github.com/ms609/citation-bot/pull/2236 AManWithNoPlan (talk) 11:30, 30 October 2019 (UTC)

Ignore roman numeral 'parts' in title for title matching purposes
For instance, in the title is listed as XI.—On q-Functions and a certain Difference Operator. This should be treated as equivalent to On q-Functions and a certain Difference Operator. &#32; Headbomb {t · c · p · b} 04:33, 18 October 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2240 AManWithNoPlan (talk) 15:11, 31 October 2019 (UTC)


 * fixed AManWithNoPlan (talk) 15:42, 31 October 2019 (UTC)

More betterly cleaning up of garbage volumes/issues
https://github.com/ms609/citation-bot/pull/2241 AManWithNoPlan (talk) 15:52, 31 October 2019 (UTC)

MOS:FOREIGNTITLE violations
We capitalize Latin titles normally. The Études one would have been caught if the accent was used. The other ones should have their exceptions coded. &#32; Headbomb {t · c · p · b} 02:10, 7 November 2019 (UTC)
 * That said, TTR capitalizes itself normally (https://www.erudit.org/fr/revues/ttr/) &#32; Headbomb {t · c · p · b} 02:10, 7 November 2019 (UTC)
 * there is actually conflicting rules for this on Wikipedia styles. I will add exceptions. AManWithNoPlan (talk) 02:13, 7 November 2019 (UTC)

Fix broken doi
Unrelated, but WOW! the doi and the jstor are the same, but point to different websites. And that my friends is why they are not redundant identifiers. AManWithNoPlan (talk) 00:00, 8 November 2019 (UTC)
 * As I understand it, theoretically at least, where the doi goes can depend on who is asking (so that if the same resource is offered by different publishers, then different readers could be directed to the ones for which they have subscriptions). Anyway, in this case the right thing to do seems obvious, but how are we to know that some crazy publisher won't put # characters into their dois? The ones with parentheses, angle brackets, colons, and semicolons are bad enough. Maybe, if we are to do this sort of processing, there should be some sanity check that the pre-fix doi is broken and the post-fix doi is not? —David Eppstein (talk) 02:08, 8 November 2019 (UTC)
 * we do lots of sanity checking. It’s nuts.  Just added more. AManWithNoPlan (talk) 02:19, 8 November 2019 (UTC)
 * don’t forget DOIs with emojis in them 🤨 AManWithNoPlan (talk) 03:19, 8 November 2019 (UTC)
 * I wouldn't be surprised. —David Eppstein (talk) 05:57, 8 November 2019 (UTC)
 * The DOI always goes to the same URL for everyone on the official resolver. To get people to different URLs based on the DOI or other searches, universities use OpenURL or other resolver before the DOI.org resolution, or the publisher has its own DOI resolver after doi.org which might be doing anything (a few hundreds publishers have one and even CrossRef has no idea how many they are or what they're doing).
 * Additionally, Google Scholar has agreements with some universities to use/prefer their local URL resolver instead of the URL it would normally point to, for users connecting from institutional IP addresses. Hence, one might think they're clicking the "usual" publisher or GS-preferred link when they're actually clicking a link provided by the library. Nemo 07:37, 8 November 2019 (UTC)

Replacement of URL with doi-parameter causes dead-link
Springer changes their urls regularly. Which is why dois are better long. We have special Code to make sure the above does actually work. Springer link lies to us. Will add more code. AManWithNoPlan (talk) 10:51, 19 October 2019 (UTC)

I have whined to springer and crossref AManWithNoPlan (talk) 11:54, 26 October 2019 (UTC)
 * Thank you. Since the broken springer-URL is referenced in hundreds of articles, how long do you expect me to wait for a fix to happen before I start restoring the original URL myself?  R fassbind  – talk  04:41, 29 October 2019 (UTC)
 * Publishers are slow dinosaurs. I would wait at least a month before starting to worry. Nemo 16:03, 29 October 2019 (UTC)

Caps: Geologiska Föreningen i Stockholm Förhandlingar

 * The Swedish word  is like the English word  . Jonatan Svensson Glad (talk) 20:44, 9 November 2019 (UTC)


 * a quick check of Reddit reveals Sweden does not exist https://www.reddit.com/r/finlandConspiracy/comments/8jceqb/finnish_propaganda_trying_to_get_us_to_think/?utm_source=amp&utm_medium=&utm_content=comments_view_all 🙄🙄🙄🙄🙄🙄.  I will work on this. AManWithNoPlan (talk) 01:03, 10 November 2019 (UTC)
 * Nah, that's Norway that has dissapeared. Jonatan Svensson Glad (talk) 03:14, 10 November 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2246 AManWithNoPlan (talk) 16:53, 11 November 2019 (UTC)

Deliberate reference to review database erroneously converted into reference to the article it reviews

 * Don't provide the DOI/JSTOR then, since those are about the article it reviews, and not the review. &#32; Headbomb {t · c · p · b} 00:04, 14 November 2019 (UTC)
 * I didn't provide them. They were added erroneously by Citation bot in a pass a week earlier that I didn't catch until now.
 * That's the real bug then. Bot shouldn't add non-MR identifiers if Mathematical Reviews. &#32; Headbomb {t · c · p · b} 00:17, 14 November 2019 (UTC)
 * Or MathSciNet, or Mathematical Reviews, or Zentrallblatt, or who knows how many other ways people might choose to write the same thing or how many other non-mathematical ones there might be that I haven't heard of. Instead of adding special rules like that, how about noticing that the journal name and author are totally different from the article the added data is for and not changing citations that violate those expectations? —David Eppstein (talk) 00:22, 14 November 2019 (UTC)
 * I'm sort of wondering if is appropriate for use with mr in this application.  The link created by 870473 links to this thing that MathSciNet calls Relay Station. At the Relay Station, readers get bibliographic detail for a journal article and if available, a link to the online article via doi or whatever.  There isn't any bibliographic detail there for the review which, I presume, is linked through the 'Username/Password Subscribers access MathSciNet here' link (it has the same identifier value).  Perhaps this is a case for a  template that accepts and requires only mr as an identifier along with the typical reviewing author, review title, review date, etc bibliographic details and links to the login page instead of the Relay Station; mr used in any other cs1|2 template continues to act as it does now.
 * —Trappist the monk (talk) 00:51, 14 November 2019 (UTC) 12:45, 14 November 2019 (UTC) (withdrawn)
 * Wouldn't it be convenient if you could wish away your bugs by making other people do the work of choosing and using different templates for the buggy cases. Perhaps you are unaware, but for people with access the MR link goes to an actual published review, not just the relay thing that the unsubscribed see. It is more or less the same as for most subscription-only dois: people marked as subscribers by their IP address or cookies see the full content, and everyone else gets a weaker substitute. Also, reviews from that time period were published in a physical journal, titled Mathematical Reviews. It is only later that they were converted into database entries in the MathSciNet database. That's why the abbreviation is "MR". As such, they are content published in a journal, is appropriate, and mr is the correct way to link to the review. Also, the citations in question actually used  so another cite-series template wouldn't have been appropriate. There is no login link that we should be directing people to instead, and your assumption that there is a different place to find the review, that should be linked differently, is false. —David Eppstein (talk) 01:11, 14 November 2019 (UTC)
 * PS sometimes the meta goes to even deeper levels. Here's an MR entry containing a review by Albert C. Lewis of a review by Victor Pambuccian of a book of Hilbert's lectures: . —David Eppstein (talk) 01:29, 14 November 2019 (UTC)
 * David, the time has come for you to read those reviews and write a reply letter and add a level of meta. Added bonus points for deliberately sneaking and subtle error into your letter to leave open the possibility of a reply article. AManWithNoPlan (talk) 12:16, 14 November 2019 (UTC)
 * I never meta-joke that I didn't like. XOR&#39;easter (talk) 17:13, 14 November 2019 (UTC)

Adds weird journal for some IEEE conferences
https://github.com/ms609/citation-bot/pull/2248 AManWithNoPlan (talk) 19:03, 16 November 2019 (UTC)

Wrong authors for cite arXiv

 * Likely a case of bad metadata. &#32; Headbomb {t · c · p · b} 14:21, 23 November 2019 (UTC)


 * arXiv changed their API. https://github.com/ms609/citation-bot/pull/2252 AManWithNoPlan (talk) 15:24, 23 November 2019 (UTC)
 * we will no longer use the multi search API. AManWithNoPlan (talk) 15:25, 23 November 2019 (UTC)

Bad publisher data from archive.org

 * Internet Archive metadata is highly variable in format, completeness and reliability. I'd be super cautious about dumping their metadata into Wikipedia. -- Green  C  23:46, 23 November 2019 (UTC)


 * We are very selective about what we take from them. Adding this soon:  https://github.com/ms609/citation-bot/pull/2254 AManWithNoPlan (talk) 21:32, 24 November 2019 (UTC)

Example publisher data from this book:

One of many variations. The publisher can also appear within multiple locations on the page. This is basic code for extracting from the HTML, but I know there are other variety it misses.

# itemprop="publisher">New York : Viking if match(fp, "(?i)itemprop[ ]*[=][ ]*\"[ ]*publisher[ ]*\"[ ]*[>][^<]*[^<]", dest) > 0: gsub("(?i)itemprop[ ]*[=][ ]*\"[ ]*publisher[ ]*\"[ ]*[>]", "", dest) addKeyPairValue(iaTable, id, "IApub", strip(dest) )

# >dc.publisher: Longmans Green And Co. Bombay< elif match(fp, "(?i)[>][ ]*dc[.]publisher[ ]*[:][^<]*[^<]", dest) > 0: gsub("(?i)[>][ ]*dc[.]publisher[ ]*[:]", "", dest) addKeyPairValue(iaTable, id, "IApub", strip(dest) )

# >Publisher: The Clarendon Press; Oxford; 1909< elif match(fp, "(?i)[>][ ]*publisher[ ]*[:][^<]*[^<]", dest) > 0: gsub("(?i)[>][ ]*publisher[ ]*[:]", "", dest) addKeyPairValue(iaTable, id, "IApub", strip(dest) ) -- Green  C  23:40, 24 November 2019 (UTC)

Caps: vir
https://github.com/ms609/citation-bot/pull/2253 AManWithNoPlan (talk) 21:27, 24 November 2019 (UTC)

Better ieeexplore support
This was achieved by replacing the ieeexplore.org url with the doi found on the corresponding ieeexplore.org page. &#32; Headbomb {t · c · p · b} 05:36, 15 November 2019 (UTC)
 * Sometimes it's possible to find the DOI from CrossRef or derivatives, looking for an URL which ends in "arnumber=8386824" or a DOI which ends in "8386824" (in the example). Nemo 08:08, 15 November 2019 (UTC)
 * If not actually parsing the page to search for the doi on the page, then make sure that the prefix is 10.1109 for IEEE journals. &#32; Headbomb {t · c · p · b} 09:16, 15 November 2019 (UTC)
 * IEEE takes pride in blocking bots. Sometimes we work sometimes we don’t.   I will investigate reverse lookup of url in crossref. AManWithNoPlan (talk) 16:58, 15 November 2019 (UTC)

Removes URL for IUCN Red List

 * No, the static page is best, per WP:SAYWHEREYOUGOTIT, and per the information listed at the redlist at the time it was cited. If you follow the 'old' link, the page will mention there is an update, so if you need the updated information, you can check it then. &#32; Headbomb {t · c · p · b} 00:26, 21 November 2019 (UTC)
 * This was discussed at length before (example). I still didn't get confirmation of whether it's true that IUCN reuses the DOI for significantly different documents (i.e. that an assessment can change content without a new assessment being released, and that this results in a new ID in the URL but not a new DOI). Nemo 09:26, 21 November 2019 (UTC)
 * Let us continue this discussion, but in the mean time https://github.com/ms609/citation-bot/pull/2264 AManWithNoPlan (talk) 17:05, 26 November 2019 (UTC)
 * The DOIs are static. When there's a new version, it has a new doi, as can be seen in 10.2305/IUCN.UK.2012.RLTS.T195519A2383117.en. &#32; Headbomb {t · c · p · b} 18:38, 26 November 2019 (UTC)

Access date removal bug
So annoying when parameters are used wrong. AManWithNoPlan (talk) 11:51, 22 November 2019 (UTC)
 * The template also treats the access-date as wrong. Will look at fixing bad templates, but this is not a bug. AManWithNoPlan (talk) 16:53, 26 November 2019 (UTC)

JSTOR book meta data
What I see is this. So, they use the Chapter field for secondary sub-title when doing books. Will work on. TY - BOOK TI - Benevolent Assimilation T1 - The American Conquest of the Philippines, 1899-1903 AManWithNoPlan (talk) 22:45, 25 November 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2258 AManWithNoPlan (talk) 22:53, 25 November 2019 (UTC)

Fails to convert a JSTOR
https://github.com/ms609/citation-bot/pull/2255 AManWithNoPlan (talk) 22:25, 25 November 2019 (UTC)

Remove soft hyphens
https://github.com/ms609/citation-bot/pull/2257 AManWithNoPlan (talk) 22:41, 25 November 2019 (UTC)

Series: Advances in Pharmacology
https://github.com/ms609/citation-bot/pull/2256 AManWithNoPlan (talk) 22:28, 25 November 2019 (UTC)

Series: Inorganic Syntheses
https://github.com/ms609/citation-bot/pull/2262 AManWithNoPlan (talk) 12:14, 26 November 2019 (UTC)

ZooKeys issues
ZooKeys is like that. You can safely TNT issue every time for those. &#32; Headbomb {t · c · p · b} 07:53, 12 November 2019 (UTC)
 * What makes you think zookeys is unique with issue=1 data entry error, of are you just saying that since Zookeys has no volumes it is very unlikely to be correct? AManWithNoPlan (talk) 12:34, 16 November 2019 (UTC)
 * ZooKeys has issues, no volumes. Whenever you have a volume for ZooKeys, the bot should discard volume/issue/pages and re-populate the fields. Or something to that effect. &#32; Headbomb {t · c · p · b} 15:48, 16 November 2019 (UTC)
 * "we already blow away volumes." The issue isn't that you are not blowing away volumes, but rather that when you blow volumes, you should also blow issues. Otherwise you remove volumes, and more often than not leave an erroneous volume in. &#32; Headbomb {t · c · p · b} 20:55, 26 November 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2266 AManWithNoPlan (talk) 21:44, 26 November 2019 (UTC)

Need to run twice?
https://github.com/ms609/citation-bot/pull/2265 AManWithNoPlan (talk) 20:48, 26 November 2019 (UTC)

Don’t remove rubbish URLs if someone grabbed an archive of it
https://github.com/ms609/citation-bot/pull/2263 AManWithNoPlan (talk) 16:49, 26 November 2019 (UTC)

Caps: NeuroReport
https://github.com/ms609/citation-bot/pull/2265 AManWithNoPlan (talk) 12:56, 27 November 2019 (UTC)

Replaced ProQuest URL with ID field, leaving cite web with no URL field
True. Should change template type also. AManWithNoPlan (talk) 02:04, 30 November 2019 (UTC)
 * doubly true since cite web was wrong to begin with. AManWithNoPlan (talk) 02:06, 30 November 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2269 AManWithNoPlan (talk) 14:13, 30 November 2019 (UTC)

Remove redundant ingentaconnect.com/content/
https://github.com/ms609/citation-bot/pull/2268 AManWithNoPlan (talk) 13:59, 30 November 2019 (UTC)

support existing others
Don’t add more AManWithNoPlan (talk) 16:58, 29 November 2019 (UTC)
 * Don't add more what? &#32; Headbomb {t · c · p · b} 17:32, 29 November 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2270 AManWithNoPlan (talk) 15:44, 30 November 2019 (UTC)
 * fixed AManWithNoPlan (talk) 19:37, 1 December 2019 (UTC)

lww.com is redundant with ovid.com
https://github.com/ms609/citation-bot/pull/2272 AManWithNoPlan (talk) 23:21, 1 December 2019 (UTC)

meta.wkhealth.com is dead
https://github.com/ms609/citation-bot/pull/2272 AManWithNoPlan (talk) 23:21, 1 December 2019 (UTC)

Missed a DOI expansion when not already in a template, just wrapped in ref tags
That is not in CrossRef database oddly. AManWithNoPlan (talk) 14:00, 30 November 2019 (UTC)
 * Weird, shoving it in a cite journal with doi gave the first time, and I had to add the title here. &#32; Headbomb {t · c · p · b} 14:14, 30 November 2019 (UTC)
 * maybe it’s back? Crossref is not perfect? AManWithNoPlan (talk) 15:18, 30 November 2019 (UTC)
 * A plain  still won't expand. So wondering if something's possible, like shove in   or   before trying to expand.&#32; Headbomb {t · c · p · b} 15:22, 30 November 2019 (UTC)
 * notabug we require some type of title to be found for a plain url to be replaced. AManWithNoPlan (talk) 01:05, 3 December 2019 (UTC)

GIGO with named references
notabug References from included page messes up things AManWithNoPlan (talk) 00:57, 3 December 2019 (UTC)
 * I don't think "notabug" is a correct evaluation. Before Citation bot edited the page, it had no errors. After Citation bot edited the page, it was in worse shape, showing user-visible red error text in the references section when the was none before. "Garbage in" also mis-characterizes the situation and I think demonstrates a lack of willingness to consider the situation fully. If we were to stipulate that the input was "garbage", then we should expect an autoamted process to either reject that input, not make things worse, or repair the bad input directly. -- Mikeblas (talk) 02:33, 3 December 2019 (UTC)

Strip Bloomberg URL
https://github.com/ms609/citation-bot/pull/2273 AManWithNoPlan (talk) 21:10, 2 December 2019 (UTC)

Caps: USGS WRIR
https://github.com/ms609/citation-bot/pull/2275 AManWithNoPlan (talk) 19:32, 3 December 2019 (UTC)

author link and inventive editors
https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 15:35, 4 December 2019 (UTC)

Leading zero in IEEE document numbers

 * I'll remove those 39 broken URLs later if nobody beats me at it. Nemo 17:55, 4 December 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 19:54, 4 December 2019 (UTC)

Dotted year cleanup
https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 12:09, 5 December 2019 (UTC)

Series: Advances in Enzymology and Related Areas of Molecular Biology
This is possibly caused by the hyphen difference in Advances in Enzymology and Related Areas of Molecular Biology and Advances in Enzymology - and Related Areas of Molecular Biology. &#32; Headbomb {t · c · p · b} 11:59, 5 December 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2279 AManWithNoPlan (talk) 12:08, 5 December 2019 (UTC)

Regular expression failure when extracting Templates
Cannot perfectly fix since it seems to be out of memory bug, but I have an idea. AManWithNoPlan (talk) 15:43, 6 December 2019 (UTC)

Converts volumes to issues for books
https://github.com/ms609/citation-bot/pull/2281 AManWithNoPlan (talk) 15:45, 6 December 2019 (UTC)

PMID website changing
Might want to have a look at this post and see if anything needs to change in Citation bot. maybe also for Citoid change? --Izno (talk) 03:36, 6 December 2019 (UTC)


 * This for now. https://github.com/ms609/citation-bot/pull/2281 AManWithNoPlan (talk) 15:30, 6 December 2019 (UTC)


 * fixed for now. AManWithNoPlan (talk) 20:49, 6 December 2019 (UTC)

Removal of html comments
Why does the bot remove html comments from references, as here? – Uanfala (talk) 12:45, 6 December 2019 (UTC)
 * I don't know, but isn't it unusual to have the comment "inside" the parameter name? Usually it's at the end of a parameter content (i.e. before the next pipe). Nemo 12:54, 6 December 2019 (UTC)
 * Sometimes horribly setup references do lead to such problems. notabug, since so rare. AManWithNoPlan (talk) 15:17, 6 December 2019 (UTC)
 * See for instance special:diff/929597433 where the comment to the non-empty parameter was left. Nemo 22:04, 6 December 2019 (UTC)

orphans |chapter-url-access=
https://github.com/ms609/citation-bot/pull/2281 AManWithNoPlan (talk) 15:30, 6 December 2019 (UTC)

Also remove empty month and day when date is set

 * I was about to report this myself. &#32; Headbomb {t · c · p · b} 20:32, 6 December 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2284 AManWithNoPlan (talk) 20:53, 6 December 2019 (UTC)

Bad title
Translates to: Log in or register to view (from Dutch). Redalert2fan (talk) 20:34, 6 December 2019 (UTC)
 * Apart from Facebook not being a good ref to use, this might come up on other dutch sites. Redalert2fan (talk) 20:36, 6 December 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2283 AManWithNoPlan (talk) 20:48, 6 December 2019 (UTC)

Japanese characters in title
https://github.com/ms609/citation-bot/pull/2285 AManWithNoPlan (talk) 21:16, 6 December 2019 (UTC)

International Astronomical Union Circular

 * Cite journal is fine. &#32; Headbomb {t · c · p · b} 13:25, 7 December 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:24, 7 December 2019 (UTC)

Remove search.serialssolutions.com proxy links
https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:23, 7 December 2019 (UTC)

Remove broken www.informaworld.com/smpp/ when redundant
https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:23, 7 December 2019 (UTC)

Remove broken www.sciencedirect.com/science when redundant
https://github.com/ms609/citation-bot/pull/2287 AManWithNoPlan (talk) 21:23, 7 December 2019 (UTC)
 * Thanks. I've not checked the code in depth, should perhaps the "?" be escaped if that's used in a regex? Nemo 21:59, 7 December 2019 (UTC)

Caps: SCH, JPN
Probably a good idea to leave all SCH and JPN alone, either way. &#32; Headbomb {t · c · p · b} 04:02, 9 December 2019 (UTC)

Erroneous move of publication-place to location

 * You're right that Template:Citation currently lists "publication-place" first when naming the variants of this parameter, however I don't see any specific discussion of whether or why it should be preferred. Help:Citation_Style_1 quite clearly prefers "location" and Help:Citation Style 2 doesn't list "publication-place" among the intended differences, so it's not unreasonable to interpret that "location" is preferred or acceptable for both classes of templates.
 * If the consensus is different, of course, I'm sorry for the mistake. I found some 2007 and 2010 discussions indicating that "publication-place" was there earlier parameter and "location" was added later, and various discussions where the was some confusion about place of publication vs. place inside the work, but as recently as 2014 a need for documentation was expressed, specifically with regard to the unclear nature of "publication-place". The discussion went on to other topics but the lack of a clear answer back then suggests that there is no specific consensus preferring this form of the parameter over another, otherwise someone would have mentioned it immediately. I don't know if some other discussion happened more recently which made "publication-place" preferred. Nemo 11:05, 3 December 2019 (UTC)
 * I take your point about the two help pages, but when using a template I normally go to the template's documentation as authoritative. There is a subtle distinction between the two parameters:
 * Citation (1) has both a location where the report was written, and a publication place where the newspaper went to press. In (2) the location has been deleted, and the citation formats correctly.  In (3) the publication place has been omitted, and the location now is assumed to be the publication place, which in this case it isn't.  Sorry I can't expand further, I've just had a text calling me away. Back tomorrow evening. Martin of Sheffield (talk) 13:23, 3 December 2019 (UTC)
 * The correct to seek a clarification is Help talk:CS1, not to request the bot to do X/Y/Z. The template documentation is more-or-less not authoritative, especially when there are apparently ambiguities, though you might prefer otherwise. --Izno (talk) 13:27, 3 December 2019 (UTC)
 * related conversation started:
 * —Trappist the monk (talk) 14:38, 3 December 2019 (UTC)
 * Actually I'm requesting that the bot DOES NOT undo the work of editors, not that it DOES anything. If you think the documentation for the template is wrong, perhaps that is the place to take the discussion and ask the template maintainers to modify the template?  Would you like me to raise the issue there for you?  Regards, Martin of Sheffield (talk) 23:06, 4 December 2019 (UTC)
 * Ttm started the discussion for you as to the validity of the publication-place parameter in toto. Please feel free to participate. --Izno (talk) 01:20, 5 December 2019 (UTC)
 * I've suggested a change to the documentation for citation to align it with your preferences. BTW, I didn't see anything there from TTM, isn't template talk:citation the correct place for documentation errors?  Probably best to close off this bug report if it is the documentation at citation that is the problem.  Regards, Martin of Sheffield (talk) 11:39, 6 December 2019 (UTC)
 * Umm, I answered that discussion at . Because I had already started another discussion about publication-place, location, and place at  (mentioned in my post above) I think that the discussion at Template talk:Citation should be closed and made part of the earlier discussion at WT:CS1.
 * —Trappist the monk (talk) 19:41, 8 December 2019 (UTC)
 * I've suggested a change to the documentation for citation to align it with your preferences. BTW, I didn't see anything there from TTM, isn't template talk:citation the correct place for documentation errors?  Probably best to close off this bug report if it is the documentation at citation that is the problem.  Regards, Martin of Sheffield (talk) 11:39, 6 December 2019 (UTC)
 * Umm, I answered that discussion at . Because I had already started another discussion about publication-place, location, and place at  (mentioned in my post above) I think that the discussion at Template talk:Citation should be closed and made part of the earlier discussion at WT:CS1.
 * —Trappist the monk (talk) 19:41, 8 December 2019 (UTC)

ProQuest
Not a bug; url-access applies to url which links title. Identifiers (id in this case) do not link title so url-access would be misapplied. Additionally, sources linked through identifiers are presumed to lie behind some sort of paywall / registration barrier. cs1|2 does not highlight the norm so retaining some sort of subscription information would be contrary to the way cs1|2 works.

—Trappist the monk (talk) 19:35, 8 December 2019 (UTC)


 * also, there is work (or at least talk of work) of having the ProQuest template automatically rewrite URLs based upon being at a library. AManWithNoPlan (talk) 21:07, 8 December 2019 (UTC)
 * Based on some JavaScript, I suppose? HTML is cached and the same for all (unregistered) users, templates can't do much. Nemo 06:31, 9 December 2019 (UTC)
 * It's rare (and good) for url-access to be compiled in relation to ProQuest URLs, although it should be. If the issue is the loss of the lock icon, maybe we can continue at Template talk:ProQuest: I think it's never open access, so maybe we can just add the lock by default? Nemo 06:31, 9 December 2019 (UTC)

Invalid ISBN added

 * It's technically invalid because the control character fails the checksum, but ISBN 1595939934 is widely used and most booksources links happily return results, including Open Library and Karlsruhe. These conference proceedings often end up being poorly edited volumes, so it doesn't surprise me if this was printed with a wrong ISBN. There is no ideal solution here.
 * Help:CS1_errors recommends adding |ignore-isbn-error=true in such a case if there is no alternative. I guess here we can use the alternate ISBN 1581139934 which is a bit more used. Nemo 07:45, 12 December 2019 (UTC)

Missed a title
Possible a database issue. &#32; Headbomb {t · c · p · b} 14:25, 13 December 2019 (UTC)

CAPS NBER
https://github.com/ms609/citation-bot/pull/2304 AManWithNoPlan (talk) 19:16, 13 December 2019 (UTC)

Fix TODO
There are couple in the code. Note to not forget them and the master build failure. AManWithNoPlan (talk) 23:56, 25 November 2019 (UTC)
 * and code coverage too AManWithNoPlan (talk) 01:47, 9 December 2019 (UTC)

fixed

Wrong title?
Now, I don't speak Russian but it seems to happen because a redirect happens from https://www.kinopoisk.ru/film/verni-moyu-lyubov-2014-846894 to https://www.kinopoisk.ru/film/846894/. I think Ой! (Oh!) is being used as an error message here, or making it clear to wait for loading.
 * In this other edit regarding the same website no redirect happend and the correct title was added. Redalert2fan (talk) 13:49, 14 December 2019 (UTC)

class deprecated warning
AManWithNoPlan (talk) 20:14, 19 December 2019 (UTC)
 * soon fixed https://github.com/ms609/citation-bot/pull/2319 AManWithNoPlan (talk) 14:01, 20 December 2019 (UTC)

|DOI= is a legitimate alias of |doi=
https://github.com/ms609/citation-bot/pull/2318 AManWithNoPlan (talk) 14:04, 20 December 2019 (UTC)

fix master build PageTest::testBotExpandWrite test
AManWithNoPlan (talk) 19:48, 21 December 2019 (UTC)
 * fixed Travis IP addresses are blocked. Disable test. AManWithNoPlan (talk) 20:56, 21 December 2019 (UTC)

fix wikipediabottest::testCategoryMembers test
fixed category had been cleaned. Changed category we were checking in tests suite. AManWithNoPlan (talk) 19:48, 21 December 2019 (UTC)

incomplete edit summary
https://github.com/ms609/citation-bot/pull/2321 AManWithNoPlan (talk) 02:16, 21 December 2019 (UTC)

Better series handling: Antibiotics and Chemotherapy
https://github.com/ms609/citation-bot/pull/2345 🎅🏻 AManWithNoPlan (talk) 17:17, 25 December 2019 (UTC)

Better series handling: Studies in Bilingualism
https://github.com/ms609/citation-bot/pull/2345 🎅🏻 AManWithNoPlan (talk) 17:19, 25 December 2019 (UTC)

JSTOR cleanup
https://github.com/ms609/citation-bot/pull/2367 AManWithNoPlan (talk) 01:15, 31 December 2019 (UTC)

Bot down
It fails on every page. Both gadget and bot itself. Hiccup? Bigger issue? &#32; Headbomb {t · c · p · b} 23:36, 26 December 2019 (UTC)
 * Same here. When I click on the Citations button it gives me "Error: Citations request failed". Trying to use the bot directly redirects to a 503 page. --Ihaveacatonmydesk (talk) 16:51, 27 December 2019 (UTC)
 * it appears that over Christmas those with power are on vacation. AManWithNoPlan (talk) 20:34, 27 December 2019 (UTC)
 * The development version is still live (https://tools.wmflabs.org/citations-dev/), but it doesn't seem to do everything that https://tools.wmflabs.org/citations does, as I was trying to use it to expand abbreviated journal titles, which it didn't.  Seppi  333  (Insert 2¢) 01:39, 28 December 2019 (UTC)
 * I wouldn’t use that version. AManWithNoPlan (talk) 01:44, 28 December 2019 (UTC)
 * also the bot never expanded abbreviated journals. Not on its own at least. &#32; Headbomb {t · c · p · b} 04:14, 28 December 2019 (UTC)
 * Then how were you doing it?  Seppi  333  (Insert 2¢) 04:14, 28 December 2019 (UTC)
 * Deleting abbreviations manually and letting the bot fill them. Then taking care of what the bot didn't do. &#32; Headbomb {t · c · p · b} 04:38, 28 December 2019 (UTC)

That seems to be it. Sad. BernardoSulzbach (talk) 19:04, 28 December 2019 (UTC)
 * anything that can be done here? You're listed as contact people on the error message/toolabs page.&#32; Headbomb {t · c · p · b} 12:48, 30 December 2019 (UTC)
 * I think that maintane_files.php corrupted the files. I have removed that tool from the source tree so it cannot happen again.   AManWithNoPlan (talk) 13:12, 30 December 2019 (UTC)
 * I'm getting a 503 message whenever I try to run the bot; I don't think that fixed it, at least on my end. --Nathan2055talk - contribs 21:38, 30 December 2019 (UTC)
 * Yup, still not fixed when I tried it today. Tgeorgescu (talk) 10:06, 31 December 2019 (UTC)

Please don't ping me, I am not an operator. I can’t reboot it. AManWithNoPlan (talk) 11:52, 31 December 2019 (UTC)
 * Seems like this incident shows it's time to extend reboot privileges to a few other people. --Ihaveacatonmydesk (talk) 17:38, 31 December 2019 (UTC)
 * its rather important to have this running--Ozzie10aaaa (talk) 17:56, 1 January 2020 (UTC)

I asked to restart the service, so it should be working now, however, there is a syntax error somewhere causing the tool to kill itself. Jonatan Svensson Glad (talk)
 * I've tweaked it a bit, try now. If it doesn't work, or it breaks itself again, we should probably wait for a real maintainer of the tool to sort things out. -- Krenair (talk &bull; contribs) 01:03, 2 January 2020 (UTC)
 * After some more fiddling around I believe it's working without any more local hacks from me. It seems the tool on toolforge had a broken file from an automatic update mechanism that is being removed, FYI maintainers: I've reset the repository in public_html from ef1ea17a4d1d2bc0adbcce6032a768f91b53ec40 to 8d755d36a9e5e023c690c47be7bf10bd5422f00 to drop the automatic local commits to constants/capitalization.php. -- Krenair  (talk &bull; contribs)
 * Can confirm things are running now. &#32; Headbomb {t · c · p · b} 02:37, 2 January 2020 (UTC)
 * Thanks for fixing it. Grimes2 (talk) 09:22, 2 January 2020 (UTC)

fixed

Bot down again
Same as before. &#32; Headbomb {t · c · p · b} 23:24, 6 January 2020 (UTC)
 * Odd. AManWithNoPlan (talk) 00:13, 7 January 2020 (UTC)
 * https://en.wikipedia.org/wiki/User_talk:Krenair#Citation_bot fixed AManWithNoPlan (talk) 01:29, 7 January 2020 (UTC)

Bloomberg
When the bot goes to the Bloomberg website, it returns the title "Are you a robot?" See https://en.wikipedia.org/w/index.php?title=Pankisi&oldid=882816353  Kaltenmeyer (talk)

fixed AManWithNoPlan (talk) 11:48, 7 January 2020 (UTC)

removing links to worldcat
I suspect this is probably a feature rather than a bug, but I don't understand why this should be a feature... seems very counter-intuitive. The difference appears to be that the deleted url led directly to the full-text whereas the OCLC field does not lead to /viewport. (not sure where to click to get there either) 🌿  SashiRolls t ·  c 20:09, 24 November 2019 (UTC)
 * "Preview this book" right below the image. AManWithNoPlan (talk) 21:22, 24 November 2019 (UTC)
 * "deleted url led directly to the full-text " that is simply untrue. It leads to a limited google books preview. AManWithNoPlan (talk) 21:23, 24 November 2019 (UTC)
 * OK, I understand a bit better now. Clicking on preview this book, and then clicking on google preview is what I missed... because I thought it was a worldcat digitization.  I only scrolled through the first few fifteen-twenty pages, so did not realize it was partial.  I have to say it's not very user friendly to have a link to (partial) full-text labelled 1113896227 instead of just directly linked from the reference title, but then I suppose we are expecting wiki-readers to be sufficiently geeky to know that 1113896227 will lead them to more info whereas the secret code  978-1-496-21803-2 leads nowhere useful (like the bluelinks to ISBN and OCLC ). Thanks for looking into it and explaining the odd logic. :)  🌿   SashiRolls t ·  c 22:09, 24 November 2019 (UTC)
 * One of the objections to including links to Google Books is that what different readers will see varies unpredictably, and may change. These WorldCat digitized previews are stable, which is a major plus. They don't allow linking to the specific page, which we've come to do, but I think the stability can only be a plus. ISBNs and OCLC numbers lead to full bibliographic info, but the reader is still stymied if they can't get access to the book, which is quite common. (Interlibrary loan is very limited for readers in most places, and we can hardly expect readers to always buy a book, or even to be able to do so in whatever country they live in.) So where's the downside of also adding a link that guarantees they can scroll to the relevant page? In particular, it's hardly a duplication at all, especially since this OCLC link is largely unknown; I had no idea it existed. Yngvadottir (talk) 22:44, 24 November 2019 (UTC)
 * These WorldCat digitized previews are stable, which is a major plus. Not true.  The OCLC viewport link is just a link to a Google Book preview.  Google books did the scanning.  Worldcat simply builds a little box and links to the google scan in that box.  This is the same mechanism that other websites (unrelated to google maps) use to display a little box with google maps content.  The problems with google books preview that you describe above are still there.  My vote is to always remove worldcat links from url when there is a matching oclc identifier.
 * —Trappist the monk (talk) 23:16, 24 November 2019 (UTC)


 * This is not a vote. Wikipedia already says to not link to google books, unless it is a complete and free preview.  Can someone find that policy and link it here.  These are worse than google book links.  They point to some random page instead of a front page or a specifically chosen page. AManWithNoPlan (talk) 17:12, 25 November 2019 (UTC)
 * I agree that it would be good for someone (perhaps you?) to dig up this policy that you say you've seen, as it would directly contradict the Citing Sources guideline, which I'm more familiar with. (NB:  it says quite clearly that the OCLC, ISBN, etc. can coexist with the link in the citation as of this writing).  🌿   SashiRolls t ·  c 19:59, 25 November 2019 (UTC)
 * It seems part of a wider problem of what exactly should be the algorithm for handling these identifiers and links which may not be as perfect as they sometimes make out to be ... should there be a policy of using only the restrictive and perfect identifiers available as per here albeit at the result denying access to those who have no such access to the source which is available elsewhere ... this seems it line with the url blue linking approach at Bots/Noticeboard and Bots/Noticeboard where the approach is to not to use the ol= identifier and use the URL.  I can see there may be reasons for the approach but I would like to see evidencing of clearing guidelines rather than people's opinions.  It is not unknown for me to goto a library or purchase a resource so oclc has its uses.  Thankyou.Djm-leighpark (talk) 20:17, 25 November 2019 (UTC)
 * This is unfortunate because archive.org is 100% viewable for free (with 1-time registration) - which is not the case for Google where you only get a partial view. With archive.org you can link to any page within a book for a free 2-page preview (no registration), which is not the case with Google which can only preview certain pages. However, understood archive.org does not have every book that Google might. In my experience Google Book scans come and go, they are not a library and take books (or previews) offline for commercial reasons so no guarantee those scans will be accessible in the future. Also Wikipedia and archive.org are non-profits with close overlap of goals, while Google is a commercial book seller with different goals, we will favor non-profits over commercial given the choice.  --  Green  C  20:45, 25 November 2019 (UTC)
 * In some ways I'd prefer to use "open library" rather than archive.org as archive.org is at least dual purpose, one is for storing/OCR'ing and provisioning either unrestricted free or by limited library lending; the other for archival of web pages. There perhaps may be no clashing between these BOTs but having had two articles where it has broken syntax'ed on me I'm not confident everything on the same page and perhaps guidelines should be updated so the old algorithms can be written and checked against them? (I may have strayed from the original bug) Djm-leighpark (talk) 20:59, 25 November 2019 (UTC)
 * I think you're a bit confused: openlibrary.org is a collection of catalog records to aid in the discovery of books; archive.org is the actual digital library. The archival of web pages is at web.archive.org. Nemo 21:34, 25 November 2019 (UTC)
 * I am sorry I am somewhat confused and ask stupid questions. It is in my nature and training.Djm-leighpark (talk) 21:43, 25 November 2019 (UTC)
 * Don't be sorry! It's essential to surface such misunderstandings, otherwise we're just going to talk past each other. A lot of people are confused by archive.org vs. web.archive.org etc., almost as many as wikimedia.org vs. mediawiki.org. ;-) Nemo 22:06, 25 November 2019 (UTC)

For me personally, the links to worldcat.org are completely useless because they don't load any preview at all unless I allow a series of cookies and third-party resources. Links to the splash page leading to a full text (for instance on biodiversitylibrary.org) are often useful, but I've yet to encounter a case where worldcat.org is the best link available for a given content. Nemo 21:34, 25 November 2019 (UTC)
 * ? Djm-leighpark (talk) 22:26, 25 November 2019 (UTC)

Mobile web
Is it possible for the bot to replace links to mobile sites such as https://m.washingtontimes.com/news/2017/may/2/peter-newsham-confirmed-as-chief-of-dc-police/ to https://www.washingtontimes.com/news/2017/may/2/peter-newsham-confirmed-as-chief-of-dc-police/ (see Special:Diff/930509786&oldid=930509645? Jonatan Svensson Glad (talk) 00:19, 13 December 2019 (UTC)


 * I think so, but that might be a better task for a different bot. AManWithNoPlan (talk) 11:54, 13 December 2019 (UTC)

notabug Best for a single mass run with a different bot. AManWithNoPlan (talk) 18:32, 8 January 2020 (UTC)

Incorrect PMC added
Here the bot added to the existing citation for, the PMC is for a different paper. The PMC may be for a reprint of the cited paper but is in a different journal (also different year, volume, pages) so should not be added. What validation is the bot doing to determine that a PMC (that presumably has been found from a keyword search of PMC database) is for the correct paper? Thanks Rjwilmsi  15:53, 31 December 2019 (UTC)


 * I have reported the error to the database. AManWithNoPlan (talk) 17:11, 31 December 2019 (UTC)
 * Interesting, I can't see any data issue on the pubmed side (PMID 19741352 and PMC 3435945) - what am I missing? Thanks Rjwilmsi  18:12, 31 December 2019 (UTC)
 * it’s in the DOI to open source resolver. We do have lots of checks, but when the title and other things match we get fooled.  AManWithNoPlan (talk) 19:50, 31 December 2019 (UTC)

notabug that we can fix. AManWithNoPlan (talk) 18:32, 8 January 2020 (UTC)

Fails on Ion channel
Has something to do with Biorxiv / doi = 10.1101/... &#32; Headbomb {t · c · p · b} 13:36, 7 January 2020 (UTC)
 * I am currently adding hundreds of test cases to the cdde base and have removed several functions that are not called and fixed a half dozen minor bugs. The file containing the biozrx code is next.  I will jump ahead to that file and fix this. AManWithNoPlan (talk) 12:29, 8 January 2020 (UTC)

! User is either invalid or blocked on en.wikipedia.org
I will flag as fixed, since it seems to work now. Not really our problem since it points to a wiki server problem. AManWithNoPlan (talk) 18:37, 8 January 2020 (UTC)

If chapter/title are identical, TNT them
https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:15, 15 January 2020 (UTC)

Change year to date when it makes sense
https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:14, 15 January 2020 (UTC)

Ignore diacritics for title matching
These were after TNTing both the title and the journal. TNTing the journal alone isn't enough. &#32; Headbomb {t · c · p · b} 15:33, 6 January 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:14, 15 January 2020 (UTC)

Removing intended colons from citation title
I came across this edit just now, where the title of the cited page is ":: ISG ::" and your bot changed it to ":: ISG". Now ":: ISG ::" isn't a very good title for a citation, but I wanted to alert you to this behaviour.


 * https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:14, 15 January 2020 (UTC)

Semantic Scholar
I think it's not fair to replace public repository like zenodo or that belong to university with Semantic Scholar. Look at the privacy policy https://allenai.org/privacy-policy.html and the trackers. Regards,

LaMèreVeille (talk) 15:22, 16 January 2020 (UTC)

notabug we do not do that. AManWithNoPlan (talk) 16:13, 16 January 2020 (UTC)

Timing out, not processing URLs
I fed the bot 2017–18 Chelsea F.C. season with "Thorough mode" ticked, and got back: > Using Zotero translation server to retrieve details from URLs. ! Operation timed out after 5001 milliseconds with 0 bytes received  For URL: http://www.statto.com/football/teams/chelsea/history ! Operation timed out after 5000 milliseconds with 0 bytes received  For URL: http://www.skysports.com/football/news/11668/10870337/billy-gilmour-completes-move-to-chelsea-from-rangers ! Operation timed out after 5001 milliseconds with 0 bytes received  For URL: https://metro.co.uk/2017/05/09/daishawn-redan-reportedly-agrees-to-join-chelsea-over-manchester-united-6625627/ ! Operation timed out after 5001 milliseconds with 0 bytes received  For URL: http://www.chelseafc.com/news/latest-news/2017/07/new-nike-kits-available-now.html ! Operation timed out after 5001 milliseconds with 0 bytes received  For URL: http://www.chelseafc.com/news/latest-news/2017/07/caballero-signs.html ! Operation timed out after 5001 milliseconds with 0 bytes received  For URL: http://www.chelseafc.com/news/latest-news/2017/07/loan-return-for-palmer.html ! Giving up on URL expansion for a while

It's been doing this for all pages I feed it since yesterday.

Running the page through without "Thorough mode" ticked does nothing with the bare URLs - David Gerard (talk) 18:12, 16 January 2020 (UTC)

notabug just high usage. AManWithNoPlan (talk) 20:48, 16 January 2020 (UTC)

Fails to decapitalize
I had to whack on the bot to make this happen. It should have decapitalized FRONTIERS IN IMMUNOLOGY and BIOGERONTOLOGY on its own (adding the '(journal)' pipe was me, i don't expect the bot to do that). &#32; Headbomb {t · c · p · b} 01:40, 7 November 2019 (UTC)


 * I think you posted the wrong edit link. But it sounds like you want us to fix fully capitalized journal names like we do titles that are all caps.  Is that correct? AManWithNoPlan (talk) 15:43, 7 November 2019 (UTC)
 * Yes that's the wrong link. However, we already decapitalize all caps journals usually, see e.g. . &#32; Headbomb {t · c · p · b} 18:45, 7 November 2019 (UTC)
 * fixing links currently does not work via the gadget since the bot is not logged in to query the database. It should be possible to use curl to get the same information. AManWithNoPlan (talk) 00:04, 8 November 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2422 AManWithNoPlan (talk) 20:16, 15 January 2020 (UTC)

Adds empty placeholder parameters
As best as I can tell, the bot ran during a git update and had mixed state files. AManWithNoPlan (talk) 17:58, 17 January 2020 (UTC)
 * Should consider temporarily blocking the bot from running during an update. --Izno (talk) 19:05, 17 January 2020 (UTC)
 * fixed and block added AManWithNoPlan (talk) 13:50, 18 January 2020 (UTC)
 * This isn't really fixed. The bot will fail to edit if you have an empty title/journal, probably because it's still trying to add the placeholders, and gets blocked from doing so. &#32; Headbomb {t · c · p · b} 17:15, 18 January 2020 (UTC)

That’s now fixed also. AManWithNoPlan (talk) 21:34, 18 January 2020 (UTC)

Fix apostrophe
Not sure what character that is, but it should be fix. At least for this journal if this doesn't generalize. &#32; Headbomb {t · c · p · b} 17:55, 17 January 2020 (UTC)


 * wontfix it is a greek mark being misused. We cannot just get rid of it. AManWithNoPlan (talk) 13:53, 18 January 2020 (UTC)
 * It's a random acute accent. Like I said, if this doesn't generalize, then at least for this journal. &#32; Headbomb {t · c · p · b} 16:53, 18 January 2020 (UTC)
 * Why? Is there some reason that people who reference this journal are character use impaired? AManWithNoPlan (talk) 21:35, 18 January 2020 (UTC)
 * It's in the database, so if you provide you will get Hoppe-Seyler´s Zeitschrift für physiologische Chemie. &#32; Headbomb {t · c · p · b} 22:08, 18 January 2020 (UTC)

presentation of handles loses useful information
Someone has been running this bot over Queensland content, e.g and it is stripping out the name of the website (which denies the reader the knowledge that it comes from a reliable source -- The State Library of Queensland) in favour of making the rather ugly handle visible to reader. I don't have a problem with the URL being replaced with a handle but could we make the visible text of the handle the name of the handle naming authority (if available) or website/publisher (alternatively) State Library of Queensland or simply retain the name of the website/publisher where provided). Thanks Kerry (talk) 07:49, 30 December 2019 (UTC)
 * Actually, it looks like there are too many primary sources and not enough secondary sources and these primary sources are missing more important info like author, work & publisher. Is the library that holds the records even important? — Chris Capoccia 💬 11:40, 30 December 2019 (UTC)
 * If we were talking about a random library, I would agree it probably didn't matter, but when it is the library with the statuatory obligation to collect and preserve Queensland content, then I think it is a matter of relevance/reliability that it is included in their collection. That's why (if it were technically possible), then the name of the handle provider should be automtically included in the citation, but if that's not possible, then leaving the website/publisher in place is the 2nd best solution. Kerry (talk) 03:17, 2 January 2020 (UTC)
 * The website information is incorrect. The website if hdl.handle.net.  So, the original template has the library in the wrong place.  Should probably be in a different field. AManWithNoPlan (talk) 18:30, 8 January 2020 (UTC)

notabug

Extra long list of bogus parameter changes
I will look into that. After years of people wanting the summary to add more and more; this is a new one where there seems a lot of extra stuff listed. AManWithNoPlan (talk) 18:51, 25 January 2020 (UTC)
 * Since it seems to be triggering a lot with my batches, maybe it has something to do with consecutive large category scans? --~ ฅ(ↀωↀ&#61;) neko-channyan 19:31, 25 January 2020 (UTC)
 * Here is another example. The bot adds two parameters, publisher and type (the type addition is somewhat dubious, by the way). The edit summary is:
 * "Alter: doi-broken-date, title, template type, author, url, id, pages. Add: type, publisher, title-link, bibcode, doi, archive-date, archive-url, pmid, pages, issue, volume, author-link, newspaper, year, url, chapter-url, date, title. Removed parameters. Formatted dashes. Some additions/deletions were actually parameter name changes."
 * It is impossible to guess what actually happened from this summary. —David Eppstein (talk) 20:19, 25 January 2020 (UTC)
 * It's only during category runs. I figured it out and will fix it soon. AManWithNoPlan (talk) 21:55, 25 January 2020 (UTC)

ZooKeys issues, still not fixed
We only change it if it’s set to one. The problem is that the existing data looks reasonable with 12. AManWithNoPlan (talk) 11:59, 28 November 2019 (UTC)
 * Zookeys will always have issues that match the bold part of 10.3897/zookeys.772.24410. &#32; Headbomb {t · c · p · b} 12:47, 28 November 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2471 AManWithNoPlan (talk) 19:45, 23 January 2020 (UTC)

Fix apostrophes in links
https://github.com/ms609/citation-bot/pull/2473 AManWithNoPlan (talk) 20:13, 23 January 2020 (UTC)

Bbc.com
https://github.com/ms609/citation-bot/pull/2472 AManWithNoPlan (talk) 19:44, 23 January 2020 (UTC)

Caps: Sch
https://github.com/ms609/citation-bot/pull/2519 AManWithNoPlan (talk) 17:57, 28 January 2020 (UTC)

Caps: iScience
https://github.com/ms609/citation-bot/pull/2519 AManWithNoPlan (talk) 17:57, 28 January 2020 (UTC)

Don't remove : in authorlink
This breaks interwikilinks. Probably the same in title-link and other -link parameters. &#32; Headbomb {t · c · p · b} 00:32, 29 January 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2523 AManWithNoPlan (talk) 16:09, 29 January 2020 (UTC)

Don't add ISSNs
These add little to no value, and there is no consensus to add those by bots. &#32; Headbomb {t · c · p · b} 00:37, 29 January 2020 (UTC)
 * It shouldn't be, unless the Open-Access DOI resolved to worldcat url, which would be an issn. That's my only thought without looking at the code.  We do add ISSN when removing worldcat urls since we are not "adding" it.  But, if the url came in as new the code does not detect that.  Its probably something else since the code is odd at times.  AManWithNoPlan (talk) 11:58, 29 January 2020 (UTC)

bbc titles
Also, is it possible to see if the bot can fetch a "clean" title if it ends with  as in Omar al-Bashir: How Sudan's military strongmen stayed in power - BBC News Jonatan Svensson Glad (talk) 01:11, 15 December 2019 (UTC)
 * wontfix since it is too likely that new title will not be any better, and in fact worse since time has passes. AManWithNoPlan (talk) 17:18, 30 January 2020 (UTC)

Bot edits cause articles to be added to cleanup category, despite being okay
Hi, came across a number of errors in the Australian project cleanup listing (see here under "New Articles") Citation bot has changed cite web to cite document without including a periodical name here, here, here, here. (There are most likely more as the cleanup list has over 1400 "no periodical title" listed, i've noticed that the number of this error has increased over the last few months). Coolabahapple (talk) 02:04, 29 January 2020 (UTC)
 * So what's the issue? Periodicals aren't required. &#32; Headbomb {t · c · p · b} 02:13, 29 January 2020 (UTC)
 * From the update name of this bug, it sounds like something to take to Help talk:CS1. &#32; Headbomb {t · c · p · b} 03:17, 29 January 2020 (UTC)
 * cite document redirects to cite journal which does require work or one of its aliases. The edit the bot made is incorrect. --Izno (talk) 04:11, 29 January 2020 (UTC)
 * The issue is that the template needs fixing. It's a leftover/oversight from the mandatory periodical thing from a few months back. It's also why those 'errors 'aren't visible, several aren't actually errors. &#32; Headbomb {t · c · p · b} 04:30, 29 January 2020 (UTC)
 * notabug please finish discussion in the citation template talk space. AManWithNoPlan (talk) 13:25, 30 January 2020 (UTC)

Google book
Website= Google Books gets automatically removed, but when it is (misspelled as) "Google book" the bot misses it. Redalert2fan (talk) 15:27, 1 February 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2542 AManWithNoPlan (talk) 17:03, 1 February 2020 (UTC)

Pages= null
Hello, could (or should) the bot auto remove pages= null as I did myself here? It does not seem particularity useful to include and probably a mistake while being imported by someone or some tool. Redalert2fan (talk) 17:16, 1 February 2020 (UTC)
 * Seems like unless the book is written by a geek that thinks they are cute, that would never be right. Probably a tool or database that turned NULL into a string. AManWithNoPlan (talk) 17:36, 1 February 2020 (UTC)
 * fixed. AManWithNoPlan (talk) 18:28, 1 February 2020 (UTC)

Proquest
This was one of the edits where it completely removed a publisher of content and dumped in a random date. Not quite sure why this was done, but I reverted the edit. -  Neutralhomer •  Talk  • 04:12 on February 4, 2020 (UTC)
 * That's because ProQuest is not the publisher. &#32; Headbomb {t · c · p · b} 06:11, 4 February 2020 (UTC)
 * Actually, here it is. That's very unusual, giving it's 99%+ abused to mean something hosted in a ProQuest database, rather than being actually published by ProQuest LLC. The bot should avoid removing ProQuest LLC, given that's clearly not the databases. &#32; Headbomb {t · c · p · b} 06:13, 4 February 2020 (UTC)
 * Now looks for "LLC" fixed AManWithNoPlan (talk) 14:59, 4 February 2020 (UTC)

online books
In Fabian S. Woodley Citation bot changed this:

To this:

It obscures the fact that the reference is to the on-line version of the Oxford Dictionary of National Biography. And the Oxford Dictionary of National Biography is not a journal. Topo122 (talk) 19:44, 19 December 2019 (UTC)
 * Should be a cite dictionary / cite book. &#32; Headbomb {t · c · p · b} 20:22, 19 December 2019 (UTC)
 * I think whether it's online or offline is immaterial. At the end of the day, it's still a dictionary/book (and happens to be available online which was the source accessed). I do agree that it shouldn't be cite journal, but similarly it shouldn't be cite web. --Izno (talk) 22:11, 19 December 2019 (UTC)

is the wrong template; better is (which itself uses ):

—Trappist the monk (talk) 22:33, 19 December 2019 (UTC)
 * Given https://api.crossref.org/works/10.1093/ref:odnb/29929, can all items with a DOI of type "reference-entry" use cite encyclopedia, with whatever is the "container-title" going into encyclopedia? The manual says it shouldn't be used for just any book with multiple authors; on the other hand, not all reference works are books. Nemo 05:15, 20 December 2019 (UTC)

Didn't know about - very useful - I'll use it in future! Topo122 (talk) 11:42, 21 December 2019 (UTC)

fixed AManWithNoPlan (talk) 19:27, 4 February 2020 (UTC)

Zombie DOI
The doi should be tested to verify it resolves properly before removing doi-broken-date.

Consider verifying that the page reached via both the url and doi parameters agree before removing the url in the citation.
 * we do verify them. But the process is not 100% perfect.  This is one reason why DOIs are superior to URLs.  Urls move around, and dois eventually follow them And a quick google search finds them.  These transition periods are annoying. AManWithNoPlan (talk) 13:37, 22 January 2020 (UTC)
 * in this case the doi is in crossref. The journal is on Elsevier, the doi is owned by Blackwell, and the wrong url is Springer!  The real problem is that the doi is not inactive, but wrong!  The doi needs removed and replaced with a comment.  AManWithNoPlan (talk) 13:50, 22 January 2020 (UTC)
 * What is the verification? Why does it remove the doi-broken-date for a doi that resolves to a page with a 404 status? Whywhenwhohow (talk) 21:29, 23 January 2020 (UTC)
 * We verify serveral things. But, when CrossRef has it, we take that as almost gold.  AManWithNoPlan (talk) 23:23, 23 January 2020 (UTC)
 * will look at 404 AManWithNoPlan (talk) 23:24, 23 January 2020 (UTC)
 * You got me thinking. This will help and maybe fully fix it. https://github.com/ms609/citation-bot/pull/2476 AManWithNoPlan (talk) 00:50, 24 January 2020 (UTC)
 * And now this: https://github.com/ms609/citation-bot/pull/2477 AManWithNoPlan (talk) 01:11, 24 January 2020 (UTC)
 * Thanks. Whywhenwhohow (talk) 03:52, 24 January 2020 (UTC)
 * Still trying to figure our how to treat this properly. I have reported the DOI problem.  I have done this three times before.  They all have gotten resolved.  Hopefully this fourth DOI complaint gets fixed too. AManWithNoPlan (talk) 22:56, 31 January 2020 (UTC)

Failed to fix linked caps
We don't fix links with more dead links at this time. AManWithNoPlan (talk) 17:46, 4 February 2020 (UTC)
 * Red links are irrelevant, the point is that those should be capitalized, just like everything else. There is nothing special about a linked term vs an unlinked one. &#32; Headbomb {t · c · p · b} 22:24, 4 February 2020 (UTC)

bot adds |editor1-last= and |editor1-first= when |editor-last1= and |editor-first1= are already present
The CrossRef code has been broken for years. I just fixed it. That seems to have exposed the lack of support for the five bazillion aliases for the same thing. Will have a fix out soon. AManWithNoPlan (talk) 22:16, 4 February 2020 (UTC)

Author field unlinking

 * It doesn't, it links them via authorlink. &#32; Headbomb {t · c · p · b} 18:11, 5 February 2020 (UTC)
 * this edit fixed the invalid COINS data. AManWithNoPlan (talk) 18:57, 5 February 2020 (UTC)
 * There was no invalid COinS metadata as a result of wikilinking author. The edit added pmc, pmid, and doi to an unrelated template so the edit as a whole was not a cosmetic edit.  Metadata was never an issue with the author-link 'fixes'.
 * —Trappist the monk (talk) 23:53, 5 February 2020 (UTC)

Changing ISBN to isbn

 * isbn is the canonical parameter, while ISBN isn't wrong, having lowercase identifier parameters is better and reduces problems for AWB routines and other bots. &#32; Headbomb {t · c · p · b} 18:14, 5 February 2020 (UTC)
 * they have also dropped some all caps aliases recently. AManWithNoPlan (talk) 19:00, 5 February 2020 (UTC)

Curious regarding Citation Bot unlinking and relinking and changing ISBN to isbn

 * Copied over from User:Smith609's talkpage:

Hi. I find your bot really useful in sorting out and updating cites. However, I'm curious regarding two things.

Why does Citation Bot unlink and then relink authors, as here: when it changed this: "author=Roger Protz" to this: "author=Roger Protz|author-link=Roger Protz". I checked, and the end result is the same. I understand that some people use author-link because they feel that authors in cites should have their names displayed backwards, as in "author=Protz, Roger", which can't be linked, so author link is required. (I'm not clear as to why some editors do this as citations are not listed alphabetically, and it is harder to read and recognise someone's name when it is presented backwards, but they do and others copy, so be it.) But if a name is linked in the author field, it is generally because the name is presented with the words in the right order. Your bot would only need to make changes if the name was incorrectly linked, but your bot would not know that, as even if you asked it to detect if a comma was in the link, there are names which have commas, such as Prince Edward, Earl of Wessex (though I suppose your bot could check the link to see if it is working?). If there is a problem with directly linking authors in the author field it would be useful to know, so I can make adjustments. But if there is no problem then it might be worth having a think as article watchers could be having their watchlist light up and go check over what your bot has done pointlessly.

And the ISBN number. The bot changes ISBN to isbn, but both display on the page as ISBN, and link appropriately. Is there something unseen which means that some bots or some software will not function if isbn is shown as ISBN in the cite template? Again, it would be useful to know, so I could adjust my own editing to avoid causing problems. But if it serves do real purpose, then doing it can cause a mild nuisance to article watchers.

Regards, and thanks for the work you do. SilkTork (talk) 12:17, 3 February 2020 (UTC)
 * Re linking of author names, see the documentation for the cite templates, e.g. Template:Cite book, which say last: Surname of a single author. Do not wikilink—use author-link instead. I don't have an answer for the ISBN change, except to say that you do not need to use the lower-case form. – Jonesey95 (talk) 13:49, 3 February 2020 (UTC)
 * Your link is to when surname only is used - such as "Protz" (which wikilinking would rarely result in arriving at the correct article - Protz), not as in the example I gave above where the whole name is given - Roger Protz. Separating surname and first name is done by a number of editors, though it is not helpful, as it presents the author's name backwards in a non-alphabetical list. SilkTork (talk) 21:33, 3 February 2020 (UTC)
 * author is an alias of last, so the instructions apply to both parameters. Wikilinks should not be used in either parameter, or in first. – Jonesey95 (talk) 23:40, 3 February 2020 (UTC)
 * Cool. You sound as if you know something Jonesey95. Why shouldn't links be used in author? They can be used, and they do work. So what problems are being caused by using them that aren't caused by using them in author-link= ? SilkTork (talk) 11:56, 4 February 2020 (UTC)
 * There is a partial explanation at Template:Cite book. Tools that use author information from WP citations end up with bad data. – Jonesey95 (talk) 15:51, 4 February 2020 (UTC)
 * I'm not understanding the explanation in your link as it doesn't appear to relate to your view that it is OK to put a wiki link in author-link= but not in author=. Is there some special coding in the author-link= field that makes it OK, but that special coding is absent in author= which causes "bad data"? Surely the solution (if there is such a problem) would be to put the special coding in the author= field as well? My own understanding of why we have the author-link= field is not because it has special coding to allow a wiki-link but because a) a number of editors like to place the author's name backwards in a belief that this is how we display author information in citations, but a backwards name can't be linked without piping, and I understand piping is a problem in templates, and b) because some names are disambiguated; so a separate field was created. But if the author's name is not backwards or contains a disamb - such as Michael Jackson (writer), then it can be linked. I have done this for years without, to my knowledge, breaking the internet. But as a bot has been designed to undo perfectly correct links in author= and place them elsewhere in author-link=, I'd like to know - for certain, from someone who knows - if that is actually necessary because then I will stop putting links in the author= field. But if it's just a mistaken assumption that we can't link a name in the author= field that is correctly displayed, then this bot should be adjusted. If,  Jonesey95 (or anyone else), you do know for certain that harm will be done by linking a correct article name in the author= field, please point it out to me. SilkTork (talk) 18:22, 4 February 2020 (UTC)
 * lastn and firstn render the author's name in surname given name order. This is very commonly used in bibliographic listings so that readers can quickly locate the source when the article uses short-form (Harvard) referencing.  When this form is used, author-linkn wikilinks both names.  While it is possible to separately wikilink both the surname and the given name, that is redundant so should be avoided.  I have occasionally seen cs1|2 templates where editors have only wikilinked lastn.  I know of no technical reason why this should not be allowed.
 * When using authorn, wikilinking the assigned value is allowed because Module:Citation/CS1 (the engine that drives the cs1|2 templates) is smart enough to extract the important bits from the wikilink, piped or no, for rendering and for the citation's metadata. In author, contributor, editor, interviewer, and translator name lists any of these forms is allowed:
 * &lt;author name>
 * &lt;author name>
 * &lt;author name> &lt;author article link>
 * I know of no technical reason to prefer any one of the above over the others.
 * —Trappist the monk (talk) 19:27, 4 February 2020 (UTC)
 * ISBN is an alias for isbn in the template, and there are other tools that do the same change for that reason. By the way, you should probably discuss bot related issues on the bot page. AManWithNoPlan (talk) 14:03, 3 February 2020 (UTC)
 * I assumed this bot was run by Smith609, so I thought I'd reach out here first as this isn't a bot broken report, more of a query regarding the bot's operation. At this point I don't know if it is a bot problem, or if I am doing something incorrect. But I will take your advice and copy this discussion to the bot page. Thanks. SilkTork (talk) 18:33, 4 February 2020 (UTC)
 * I assumed this bot was run by Smith609, so I thought I'd reach out here first as this isn't a bot broken report, more of a query regarding the bot's operation. At this point I don't know if it is a bot problem, or if I am doing something incorrect. But I will take your advice and copy this discussion to the bot page. Thanks. SilkTork (talk) 18:33, 4 February 2020 (UTC)


 * I've just noticed on the diff that it says: Activated by User:Nemo bis. I'm not familiar with how this bot works - but is it likely that Nemo bis is the one who set up the instructions for the bot to delink "author=Roger Protz", and create "author-link=Roger Protz"? SilkTork (talk) 18:38, 4 February 2020 (UTC)
 * Like all bots, it is activated by someone or something. Nemo bis activated the bot, but Nemo has no control over what the bot does. AManWithNoPlan (talk) 18:56, 4 February 2020 (UTC)
 * It means I asked the bot to work on that page. (I'm no longer using the bot.) I do agree with those changes, especially changing parameter names from "ISBN" to "isbn": it has no visible effect I know of, but it reduces confusion and errors with some things which expect the standard parameter name. "Roger Protz" was not unlinked either (I can't see any occurrence which isn't a link): the link was just expressed in another way which is more compatible with some things and which is apparently recommended by the documentation. Nemo 18:56, 4 February 2020 (UTC)
 * Roger Protz was unlinked. It was unlinked from one field and then created in another link. Which apparently serves no purpose according to Trappist the monk, and I'm inclined to believe them as they seem to speak knowledgeably about the template. As the edits serve no purpose, they shouldn't be done as the bot is then just making work for no valid reason; but doing those unnecessary changes will prompt article page watchers to check over the edits to make sure nothing has been broken. So, as neither the ISBN change nor the author field change do anything necessary, per WP:COSMETICBOT, could someone adjust the bot so it stops tampering with those fields. SilkTork (talk) 16:31, 5 February 2020 (UTC)
 * I've filled in bot reports for the author field change and the ISBN change. I've probably worded it incorrectly, but I think the intent is clear. SilkTork (talk) 16:35, 5 February 2020 (UTC)
 * If these are cosmetic changes, then the edit you flagged is fine: "Such changes should not usually be done on their own, but may be allowed in an edit that also includes a substantive change". Nemo 17:31, 5 February 2020 (UTC)

These are not cosmetic to users of COINS information. Other all caps aliases have been removed and template simplification is a goal as for the ISBN change. AManWithNoPlan (talk) 18:56, 5 February 2020 (UTC)

notabug since COINS data is repaired. AManWithNoPlan (talk) 15:38, 6 February 2020 (UTC)

ClueBot III escapes status templates when archiving
The archive configuration includes many of the status templates (e.g., ) in the archivenow parm. As a result, ClueBot III turns  into   when it archives. Why is this desirable/necessary? —[ Alan M 1 (talk) ]— 09:55, 7 February 2020 (UTC)

bot changes cite web to cite ODNB but leaves |work= parameter
is a wrapper template of. As such it sets certain parameters to default values so that editors don't have to. One of those is:

encyclopedia is one of a few parameters that masquerade as periodicals but aren't. Someday there may be a fix for that in cs1|2.

—Trappist the monk (talk) 12:52, 7 February 2020 (UTC)

File breaking

 * Why do you believe it was Citation bot that made the error here, ? There were 3 or 4 tools used in the mix there. --Izno (talk) 19:12, 7 February 2020 (UTC)

Actually, I'm not sure. But this is where the 'report bug' link went. - Sumanuil (talk) 19:20, 7 February 2020 (UTC)


 * That is https://en.wikipedia.org/wiki/Wikipedia:AutoEd I am 99% sure. I use it often, but you really have to check it, since it is not reliable AManWithNoPlan (talk) 21:18, 7 February 2020 (UTC)

converts bare arxiv url to cite document when a bibcode is found
cite arxiv does not support bibcode. I wonder if simply dropping the extra and mostly useless bibcode is bettter. AManWithNoPlan (talk) 14:41, 8 February 2020 (UTC)
 * I see why that code does not always run. Fixing it now. https://github.com/ms609/citation-bot/pull/2596 AManWithNoPlan (talk) 14:53, 8 February 2020 (UTC)

Makes up URL for fatally incomplete cite web
Obviously, when https://XXXX.YYY.ZZZ.zzz/DSFADS/SDFDSF/DSFAD then conversion to url makes sense. Probably, via make sense for urls that are just the hostname. AManWithNoPlan (talk) 14:18, 8 February 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2595 AManWithNoPlan (talk) 14:27, 8 February 2020 (UTC)

Creates broken citations by adding urls to templates with title=none
Only cite journal/citation in journal mode permits none. --Izno (talk) 03:10, 8 February 2020 (UTC)
 * At June Barrow-Green, I've twice had to remove URLs added by Citation bot that point to the wrong place in addition to breaking the templates. The links were supposed to be to reviews of a book, but they pointed to a scanned copy of the doctoral thesis the book was made from. XOR&#39;easter (talk) 04:37, 8 February 2020 (UTC)
 * Thanks for catching that. In the specific case of Barrow-Green, I've added a temporary exclusion for this bot until the problem is fixed. But that's a different bug: whatever algorithm the bot is using to match up these things is faulty in this case as well. By the way, if you were wondering why one might use title=none: Because none of these reviews really has its own separate title, and because making up something like "Review of [book title]" would be redundant (they are part of a list of reviews of that book labeled as such at the top of the list). 237789, for instance, is labeled by jstor as "[Untitled]", labeled by doi.org as "Poincare and the Three-Body Problem. June Barrow-Green", or labeled on the actual journal page as "June Barrow-Green. Poincaré and the Three-Body Problem. (History of Mathematics, 11.) xvi + 272 pp., illus., figs., apps., bibl., index. Providence/London: American Mathematical Society/London Mathematical Society, 1997. $49." Which of those do you use as the title? Better just to omit it. And I have also seen similar examples where the big long listing of metadata is what you get as a title from doi.org, even including the price at the end. —David Eppstein (talk) 05:03, 8 February 2020 (UTC)
 * I have clamped down on the OA url adding and it now requires a higher match probability before adding. The unpaywall is sometimes overly optimistic.  AManWithNoPlan (talk) 13:45, 8 February 2020 (UTC)
 * thanks for the note on title=none not allowing a url. https://github.com/ms609/citation-bot/pull/2593 AManWithNoPlan (talk) 14:14, 8 February 2020 (UTC)
 * Note, removing OAI-PMH matches by title and author affect over 3 million records. This is really an issue about reviews (and bad cataloguing thereof by publishers), which is important but affects a tiny minority of those 3 million records. The best way to handle it is to report issues to Unpaywall (I've already reported this): they are very responsive and everyone can see and share their code and data. Nemo 23:53, 8 February 2020 (UTC)
 * If your bot consistently uses a source of data known to be bad for a certain class of citations, and consistently breaks those citations, the problem is with your bot and its choice of data to use. Do not pass it off to other people and make it other people's work to correct your mistakes. —David Eppstein (talk) 00:46, 9 February 2020 (UTC)

Removed "::" from a title (I have no idea why)
I've reverted this part of the automated edit in. BernardoSulzbach (talk) 13:02, 8 February 2020 (UTC)


 * Double colons will no longer be removed. fixed AManWithNoPlan (talk) 14:13, 8 February 2020 (UTC)

Caps: Avtomatika I Telemekhanika → Avtomatika i Telemekhanika
Self-explanatory. &#32; Headbomb {t · c · p · b} 13:33, 11 February 2020 (UTC)

fixed

look at code coverage and TODO
AManWithNoPlan (talk) 22:31, 15 December 2019 (UTC)

fixed about a dozen bugs AManWithNoPlan (talk) 22:43, 12 February 2020 (UTC)

caps

 * Also Fizika Goreniya I Vzryva → Fizika Goreniya i Vzryva &#32; Headbomb {t · c · p · b} 13:35, 11 February 2020 (UTC)

fixed

Semantic scholar
Since when has Citation bot been authorized to add Semantic Scholar URLs to citations, as it did in Special:Diff/937558164? Semantic Scholar is a web scraper that sometimes (unintentionally) copies pirated copies of papers. Because its copies do not include any information about where it found its files, they cannot be checked for being free of copyright violations and it cannot be trusted as a source for automatically-generated links. See WP:RSN. Please immediately stop adding these links. —David Eppstein (talk) 20:34, 25 January 2020 (UTC)
 * It's definitely been doing this for quite a while. I've noticed the addition for the past week or so, and never considered checking to see if it was supposed to. I probably should have made a ticket when someone messaged me to complain about it, instead of assuming the bot was infaliable and the human mistaken. --~ ฅ(ↀωↀ&#61;) neko-channyan 20:43, 25 January 2020 (UTC)
 * To clarify the addition of Semantic Scholar links to Wikipedia citations: Semantic Scholar is a free, non-profit academic search and discovery engine developed by the Allen Institute for AI (AI2). Semantic Scholar is committed to providing high-quality results that respect copyright. We have licensing agreements to index scientific content from 550+ publishers, pre-print servers and academic societies and we are integrated with multiple data partners including PubMed, Microsoft Academic, Unpaywall and others that provide us with high-quality metadata for our results. As you mention, we also crawl the web for publicly accessible open-access PDFs, but we have procedures in place to address any copyright issues that may arise (please feel free to contact us at feedback@semanticscholar.org if you notice any issues).
 * Our goal in incorporating links to Semantic Scholar in Wikipedia citations is to provide an additional discovery entry point for Wikipedia users to explore our open literature graph and find additional relevant information that they are unlikely to find elsewhere. For example, we provide AI-based features such as citation classifications and high-quality supplemental content like videos, presentation slides, and links to code libraries (you can see an example here). If you have any additional questions or concerns please let us know, we are happy to provide additional information. Sebaskohl (talk) 00:50, 29 January 2020 (UTC)
 * To address similar concerns that were highlighted here and to satisfy copyright requirements for linking we plan to do the following:
 * 1. Add an "is_publisher_licensed" boolean flag to the Semantic Scholar API to indicate when a paper has been licensed to us for indexing by one of our 550+ publisher and academic society partners via a signed indexing licensing agreement.
 * 2. Add logic to only insert links via the Citation Bot if the flag is set to ensure that we are linking only to licensed content (this will prevent links to content that was crawled).
 * Let us know if this will address the concerns that have been raised in this discussion. Sebaskohl (talk) 17:31, 29 January 2020 (UTC)

fixed AManWithNoPlan (talk) 13:24, 30 January 2020 (UTC)
 * The fix is incorrectly coded, I've left some comments. Nemo 14:17, 30 January 2020 (UTC)
 * please don't comment there. I will not read any more comments hidden on a merged pull request.  They are hard to find. AManWithNoPlan (talk) 15:17, 30 January 2020 (UTC)
 * This update will reduce things quite a bit. https://github.com/ms609/citation-bot/pull/2532 AManWithNoPlan (talk) 15:37, 30 January 2020 (UTC)
 * Using Semantic Scholar as the URL for the citation is counter-intuitive and confusing. When I click on the title of the citation I end up at Semantic Scholar instead of the article. Folks need to know they need to click on the DOI to reach the actual article. The Semantic Scholar URL should be in a different field/parameter of the citation.Whywhenwhohow (talk) 20:04, 31 January 2020 (UTC)
 * Where is the discussion and consensus to use Semantic Scholar as the URL for a citation?Whywhenwhohow (talk) 20:08, 31 January 2020 (UTC)
 * Adding publicly available links was discussed as long as licensed. AManWithNoPlan (talk) 22:43, 31 January 2020 (UTC)
 * I would like to read the discussion. Can you provide a link? According to the cite journal doc, when a DOI is present the URL parameter is supposed to be used for its prime purpose of providing a convenience link to an open access copy which would not otherwise be obviously accessible. The Semantic Scholar pages are not open access copies of the articles. It takes multiple steps to reach the actual article for users that don't know to click the DOI link instead of the title link. If the Semantic Scholar links are useful they should be provided in a separate parameter. Whywhenwhohow (talk) 00:02, 1 February 2020 (UTC)
 * I don’t have time to find it. But, these links are not added when the open-access system reports that the publisher DOI is free nor does it get added if the doi is flagged in the template as free. AManWithNoPlan (talk) 01:46, 1 February 2020 (UTC)
 * I should add that if the CiteSeerX or PMC or arXiv is already present them it won’t add either. It’s a very last resort thing now that we filter them. AManWithNoPlan (talk) 01:50, 1 February 2020 (UTC)
 * If the DOI is unfree, it is very unlikely that the SemanticScholar pdf is an exact copy of the publisher journal version, free, and properly licensed. When I found a SemanticScholar copy of a paper that was otherwise paywalled a couple weeks back, and (indirectly) queried SemanticScholar about how they had obtained and licensed it, their immediate response was to take it down. So my strong impression is that any use you might make of direct links to their pdfs is likely to be inappropriate: either something free elsewhere, something they would take down if they only knew about it, or something that does not accurately represent the publication. It also does not appear to match the intent discussed by their representative above, of providing links to their indexing services. Such a link would only be provided by going to their landing page for a paper, rather than a direct link to the pdf, and could be useful even for paywalled papers that they do not provide pdfs for. But it would only make sense to link to this using an id, not through the url parameter of a citation. —David Eppstein (talk) 08:10, 4 February 2020 (UTC)


 * much of this discussion does not reflect the current state of the bot code. It will not add the link if there is an exciting arxiv, pmc, CiteSeerX, doi-free=yup, url, or if OA database reports the publisher is free, or if the schematic scholar link is scraped instead of licensed. It’s actually rare to add one.   AManWithNoPlan (talk) 15:20, 4 February 2020 (UTC)
 * The bot appears to have changed to link to the index page of Semantic Scholar rather than to bare pdfs (or maybe it always did this when no pdf is linked): see e.g. Special:Diff/939500674. I think the link additions in this diff are completely ok from the copyright point of view. However, it is an inappropriate use of the url parameter, which should only be for links from which readers can find the paper itself. I agree with above that this is a problem, and I would like to repeat the question: where, in the bot approval process, was this bot approved to add links of this nature to citations? —David Eppstein (talk) 21:29, 6 February 2020 (UTC)
 * I agree that linking to a semantic scholar page that doesn't containt a link to a PDF is utterly pointless. &#32; Headbomb {t · c · p · b} 21:32, 6 February 2020 (UTC)
 * I wouldn't say completely pointless: you can use those pages to find other works that cite the source, for instance. But because it is not actually a link to the paper itself it belongs in the id parameter (for lack of a designated special parameter for these links) rather than in the url parameter. —David Eppstein (talk) 21:42, 6 February 2020 (UTC)


 * While I can see possible utility in linking to a page that doesn't contain a link to a PDF, and am more relaxed about the use of url=, yet I will join David in questioning why (and how?) this Semantic Scholar "feature" came about. &diams; J. Johnson (JJ) (talk) 21:45, 6 February 2020 (UTC)
 * It appears it was added as a result of this change request on Github. I think it would probably be a good idea for the bot operator and maintainers to request community feedback on this talk page about possible implementations of "new sources" and similar, given past history. It's not okay that this was inserted entirely off Wikipedia and flies somewhat in the face of WP:Consensus, and if I didn't feel as INVOLVED as I do about citation bot I'd have blocked the bot by now. --Izno (talk) 21:54, 6 February 2020 (UTC)
 * Isn't the Bot Approval Group supposed to approve significant changes in functionality? Where is their approval for this change? —David Eppstein (talk) 05:39, 7 February 2020 (UTC)
 * Here are some recent examples
 * https://en.wikipedia.org/w/index.php?title=Calcium_supplement&diff=928593068&oldid=918812972
 * https://en.wikipedia.org/w/index.php?title=Ceftriaxone&diff=931067948&oldid=923108617
 * https://en.wikipedia.org/w/index.php?title=Crohn%27s_disease&diff=929682634&oldid=929461108
 * https://en.wikipedia.org/w/index.php?title=DPT_vaccine&diff=928756663&oldid=927109877
 * https://en.wikipedia.org/w/index.php?title=Fecal_occult_blood&diff=931031388&oldid=928936257
 * https://en.wikipedia.org/w/index.php?title=Influenza&diff=938504673&oldid=938461227
 * https://en.wikipedia.org/w/index.php?title=Isoniazid&diff=937406134&oldid=936739097
 * https://en.wikipedia.org/w/index.php?title=Laryngopharyngeal_reflux&diff=928608763&oldid=912819138
 * https://en.wikipedia.org/w/index.php?title=MMR_vaccine&diff=928730115&oldid=927586891
 * https://en.wikipedia.org/w/index.php?title=Nicotinamide&diff=928945223&oldid=921045063
 * https://en.wikipedia.org/w/index.php?title=Nifedipine&diff=931143486&oldid=917258760
 * https://en.wikipedia.org/w/index.php?title=Oseltamivir&diff=934660610&oldid=933206612
 * https://en.wikipedia.org/w/index.php?title=Peanut&diff=931014845&oldid=926127787
 * https://en.wikipedia.org/w/index.php?title=Psoriasis&diff=928537694&oldid=927675315
 * https://en.wikipedia.org/w/index.php?title=Tamoxifen&diff=928017090&oldid=927879423
 * https://en.wikipedia.org/w/index.php?title=Hand_sanitizer&diff=939563747&oldid=939034797
 * https://en.wikipedia.org/w/index.php?title=Zinc_pyrithione&diff=939564935&oldid=935650865
 * https://en.wikipedia.org/w/index.php?title=Acetic_acid&diff=939565429&oldid=938713061
 * Whywhenwhohow (talk) 03:32, 7 February 2020 (UTC)

So that's why there's suddenly been an increase in semantic scholar links. An obscure repository is nowhere to get consensus, that needs to be done on Wikipedia, and there clearly isn't support for blindly adding semantic scholar links willy nilly, especially when there's no freely accessible PDF at the end of it. &#32; Headbomb {t · c · p · b} 05:49, 7 February 2020 (UTC)
 * I agree it's a bug to add a link if there's no PDF. appears to be such a case. Unpaywall for 10.1080/10915810152630729 has no such error. Nemo 07:13, 7 February 2020 (UTC)
 * The increase came from unpaywall. We then had code implemented to greatly reduce that number being added.  I will stop it for now.  AManWithNoPlan (talk) 21:22, 7 February 2020 (UTC)
 * One problem is that the paywall lies https://api.unpaywall.org/v2/10.1080/10915810152630729?email=k@x.com AManWithNoPlan (talk) 21:24, 7 February 2020 (UTC)
 * Thanks AManWithNoPlan for disabling the API call for now until we figure out the right way to link to ensure there is consensus. Based on the follow-up discussion here it sounds like the right thing to do is to propose to add links to Semantic Scholar IDs as a new identifier type in the Citation Template which can then be used by the Citation Bot. This avoids instances where the URL doesn't give users direct access to the PDF, but will still give users the ability to access licensed content and leverage Semantic Scholar's discovery experience to find and discover research paper content (e.g. ability to browse citations/references, view figures and tables, view extracted snippets of information such as classified citation contexts, find supplemental content such as code libraries, videos, slides, clinical trials and more, etc. [see example]). If yes, it would be great if someone could point me to the right place where I should submit this request (I'm assuming the Citation Template Talk page?). We can then work with the Citation Bot owners to update the Semantic Scholar API call logic. Sebaskohl (talk) 21:27, 7 February 2020 (UTC)
 * That's a good idea. They seem to have a good set up images, links, etc.  AManWithNoPlan (talk) 21:46, 7 February 2020 (UTC)
 * The paywall as in the big publishers? Sure, their metadata lies all the time. Unpaywall not quite: it sometimes has false negatives or false positives but that's a very small minority thanks to painstaking work over a number of years, open source code and thousands of libraries which use the software and report errors. In the example you link, has no OA links to offer, just a generic link to the publisher and to the pubmed abstract. The publisher happens to have made this PDF available for now (Unpaywall would call it "bronze OA") but such PDFs vanish all the time on the publishers' websites, which are not as reliable as university-provided open archives. Nemo 21:49, 7 February 2020 (UTC)
 * (edit conflict) Sebaskohl, sure, you can ask a new identifier at Help_talk:Citation_Style_1; I expect there will be some questions but it can continue there. Because the identifier sometimes comes with a full text and sometimes not, it will also need an -access field, similar to the "hdl" field. Speaking of which, maybe it would be easier if you joined the Handle System, then you'd nicely fit in the existing identifiers. Nemo 21:49, 7 February 2020 (UTC)

if we have semantic scholar people here looking at creating a new identifier, I'd be thrilled for that. A few things though. Make the identifier short and snappy, like SemID (because SSID be very confusing), and have a clear structure to the identifier, whether it's pure numbers (0123456789), or something more elaborate (1998.02.01.012345). Having those allow us to have validation and makes it much easier to maintain and code bots for. Instead of something like 1fa190b60988a4ad272e39e132bcc12b00429464 which is way too long and human-unreadable. &#32; Headbomb {t · c · p · b} 22:37, 7 February 2020 (UTC)
 * Thank you for the great suggestions Nemo and Headbomb! I will submit a request early next week after collecting some more feedback. The Semantic Scholar API supports redirects using a doi (e.g. http://api.semanticscholar.org/10.1038/nrn3241) which we can use as the identifier instead of our long IDs: (10.1038/nrn3241).Sebaskohl (talk) 23:16, 7 February 2020 (UTC)
 * while it's a nifty feature to implement a DOI resolver (it makes it easy to find papers on SS, at least those with DOIs), several papers hosted on SS won't have DOIs, and it would generally make for a poor identifier and cause increased confusion between what is a semantic scholar link, and what's a non-semantic scholar link. &#32; Headbomb {t · c · p · b} 23:22, 7 February 2020 (UTC)
 * I also find it concerning that someone appearing to represent Semantic Scholar is here, apparently working with the goal of incorporating more links to their commercial site into the encyclopedia rather than with the goal of improving the encyclopedia, and with no user-page disclosure of the WP:COI. That is not what the encyclopedia is for and it appears to be a violation of the Wikimedia policies on undisclosed paid editing. —David Eppstein (talk) 00:26, 8 February 2020 (UTC)
 * Disclosure would be nice, but let's not throw unnecessary epithets. Semantic Scholar is proprietary, but it's not commercial as far as I can tell; moreover, the Allen Institute is a 501(c)(3). Nemo 00:40, 8 February 2020 (UTC)
 * Honestly I wish ResearchGate and IEEE would do the same and help us expand their references. AManWithNoPlan (talk) 02:22, 8 February 2020 (UTC)
 * Thank you for the additional feedback Headbomb! Much appreciated. We'll hold off on submitting the request for the identifier until we have a good solution in place that makes sense in terms of best practices. Also, apologies for not making it clearer earlier that I'm part of the Semantic Scholar team (I was hoping the initial overview that I provided in the conversation was sufficient). Semantic Scholar is a free and non-profit academic search and discovery engine developed by the Allen Institute for AI that does not generate any revenue (our site is free of advertising and always will be). Our mission is to contribute to humanity through high-impact AI research and engineering. Here's an example of a sub-project called Supp.ai that we launched last year to identify supplement-drug interactions in scientific literature (a highly unregulated industry) that showcases the type of research that we work on. We also open source our data and code whenever possible (subject to our content licensing agreements). I'm happy to provide additional context as needed! Sebaskohl (talk) 15:54, 10 February 2020 (UTC)
 * Perhaps Y would be the way to go, and it would use the DOI to make the link. That way the parameter could not be vandalized.  Also, it would require a DOI first.  Lastly, it would make for a pretty link, when all it said was something like See on SS, but better phrased than SS AManWithNoPlan (talk) 19:10, 10 February 2020 (UTC)

Flagging as fixed to archive it: most of the issues were resolved. The remaining ones are more of a template format/design issue. AManWithNoPlan (talk) 12:33, 15 February 2020 (UTC)

Citation bot
When the Citation bot is changing ONDB citations to the cite ODNB template, is there a way it could also remove the ODNBsub template associated with the citation? It not, we end up with references that look like this ("Amery, John (1912–1945)". Oxford Dictionary of National Biography (online ed.). Oxford University Press. 2006. doi:10.1093/ref:odnb/37112. (Subscription or UK public library membership required.) (subscription or UK public library membership required)". Thanks - SchroCat (talk) 17:16, 12 February 2020 (UTC)
 * I think this will do it once I deploy it. https://github.com/ms609/citation-bot/pull/2632 AManWithNoPlan (talk) 21:29, 13 February 2020 (UTC)


 * fixed AManWithNoPlan (talk) 12:31, 15 February 2020 (UTC)

Deleting used parameters?
I noticed one change that puzzled me, and I found some more. (This bot does smart and useful work, but it might be hiccuping.)
 * Special:Diff/940164519 2020-02-10T17:34:18, Special:Diff/940799133 2020-02-14T13:28:21
 * It deleted "|format= PDF" for a working link to a PDF file. (A direct link, not an abstract or download page.) This is bad optics, if it's not actually wrong.


 * Special:Diff/940798177 2020-02-14T13:28:21
 * It deleted a blank "|first=" (okay), but it left (in the same cite) the subsequent "|date=|website=" (not sure of guidelines).
 * It deleted "|date=2017-09-08" from inside "|url-status=live|date=2017-09-08}}". (It was after "|url-status=live". (But "|url-status=live" shouldn't even be last; it should precede "archive-url=" and "archive-date=".)) - A876 (talk) 18:54, 14 February 2020 (UTC)
 * The removal of format=PDF has been widely discussed. As for the date, I see the opposite: it added the date. Nemo 20:11, 14 February 2020 (UTC)
 * Re. "|format= PDF": Having been "widely discussed" is not a passive act on the part of deleting "|format= PDF". (Less cryptically,) I looked at the documentation for Template:Cite web and I saw nothing about "|format=" being deprecated, disused, or delete-able. Instead, it is still included in two examples. That means the bot is deleting this field even as humans are adding it. If deleting "|format=" has been so "widely discussed" that a consensus was found and a decision was made (citation needed), then that fact must first be added to the template's documentation, and then the template must be altered to ignore the parameter. After that, bots can incidentally or systematically delete the disused field. Anything else LOOKS LIKE a bot running amok, carrying out an unsourced, un-agreed, counterproductive directive.
 * Re. "|date= ...": Oops. I might have seen another problem in a random look, but I lost the place because new edits keep pushing old entries down on the user-contributions page, and in too much of a hurry I mis-grabbed what I mis-perceived as another example. - A876 (talk) 06:06, 15 February 2020 (UTC)
 * that’s all good. I would much rather you show up and say I see a bug and it turn out to not be one then have you not mention it.  We have people show up and say 'there is this bug I have been seeing for years and ....' and that is annoying  AManWithNoPlan (talk) 12:37, 15 February 2020 (UTC)

Update year when adding pagination
I will look into this. This year I am not giving up Wikipedia for Lent. AManWithNoPlan (talk) 17:06, 16 February 2020 (UTC)

Grove

 * Grove Music Online was converted to cite journal here, removing the access date, the subscription info, and I think causing "CS1 errors: missing periodical".
 * This also removed the access date and the subscription info, and probably generated the same error message. Grove isn't a journal. EddieHugh (talk) 22:53, 4 February 2020 (UTC)


 * Probably should be cite document instead. AManWithNoPlan (talk) 23:01, 4 February 2020 (UTC)
 * Perhaps is a better choice.   is merely a redirect to  which requires journal or some other periodical parameter.
 * —Trappist the monk (talk) 16:39, 5 February 2020 (UTC)
 * has problems of its own. EddieHugh (talk) 18:00, 5 February 2020 (UTC)
 * cite document shouldn't required journal. &#32; Headbomb {t · c · p · b} 18:18, 5 February 2020 (UTC)
 * There is no template called . The thing that is  is just a redirect to .  It is  that  s Module:Citation/CS1.  The module has no knowledge of ; never has.
 * —Trappist the monk (talk) 23:46, 5 February 2020 (UTC)
 * I'm aware of what the current status is. I'm saying what it should be. &#32; Headbomb {t · c · p · b} 00:32, 6 February 2020 (UTC)

10.1093 DOI non-journal special case code added for when converting url to doi. fixed AManWithNoPlan (talk) 12:29, 16 February 2020 (UTC)

Should not automatically convert work= to publisher=
In the diff that I gave above, the conversion from publisher to work is correct: Forbes is a magazine, not a magazine publisher. However, in many cases cite web is used for stand-alone titles that are not part of any larger work, or with a listed publisher (such as a news organization) instead of a work (the name of the publication in which the news organization published the reference). In those cases, leaving work empty and having a non-empty publisher field can be correct. Help:Citation Style 1 says that the publisher field should not be used to italicize metadata that really is the name of the work or website, but it does not say (and should not say) to avoid empty work and nonempty publisher. So unless Citation bot is a lot smarter than I expect it to be in understanding which publisher names really are work names and which are not, it should leave this field alone. —David Eppstein (talk) 01:38, 16 February 2020 (UTC)
 * Hmm. In all the examples I found the converted publisher/work is the name of a major newspaper or magazine. If this conversion is only done for a short whitelist of titles, rather than for all empty work nonempty publisher combinations, I think it could be ok (not a bug). I did find one other example, that puzzled me, though: in Special:Diff/940842479 the bot failed to convert "Los Angeles Times" from publisher to work. Is the LA Times not on the whitelist, or was it confused by the explicit empty work parameter that it removed? —David Eppstein (talk) 01:54, 16 February 2020 (UTC)


 * you are correct, it is a whitelist. LA Times added. For almost nothing except actual books most people mean work when they say publisher, but we use a whitelist since it’s not 100%. notabug AManWithNoPlan (talk) 12:25, 16 February 2020 (UTC)

It deletes "|format=_", even though {cite _} documentation shows it in examples
not a bug

This bot deleted "|format= PDF" for working links to PDF files. (Direct links, not links to abstracts or download pages.) At Special:Diff/940164519 2020-02-10T17:34:18 and Special:Diff/940799133 2020-02-14T13:28:21.

I looked in the one location that makes sense to me, the documentation for Template:Cite web. It includes "|format=PDF" in three examples! I saw nothing about "|format=_" being deprecated, disused, or delete-able.

Citation bot deletes a parameter that humans are still advised to add.

IMHO, WTH? This is not a technical error; the bot isn't running amok. This is a policy error; it looks like a coder has run amok, giving the bot a directive that has no basis. (Does Wikipedia need another layer of watchers?)

(A reply to my prior report was "The removal of format=PDF has been widely discussed. ...." Nemo 20:11, 14 February 2020 (UTC)".) (I updated my prior report, but it got archived.)

I don't know whether deleting "|format=_" "has been widely discussed", or where to look. (I looked in one location that must agree.) Presumably a consensus was found and a decision was made. (Link please?)

Either way, action is required:
 * If it was decided to delete "|format=_", then it must be carried out sensibly (if retroactively). 1) The template's documentation must be adjusted. 2) The template must be altered to ignore the parameter. 3) After that, it is legitimate for editors and bots to incidentally (or systematically) delete the disused parameter.
 * If it was not decided, then this bot must stop undoing what the documentation suggests.

This bot does smart and useful work. Why does it also do something that is contradicted by template documentation? - A876 (talk) 20:26, 16 February 2020 (UTC)


 * PDF is automatically added by templates, the documentation is out of date and having it in the edit window serves no purpose whatsoever. &#32; Headbomb {t · c · p · b} 20:35, 16 February 2020 (UTC)
 * See User talk:Citation bot/Archive 13 for more details. &#32; Headbomb {t · c · p · b} 20:48, 16 February 2020 (UTC)

Explicit pdf is not required as indeed the module automatically sets the parameter where it detects the file to be a PDF (to wit, I believe that is only URLs ending with --that's just from memory and it would be trivial to find the function in the code). There may be some cases where PDF is preferred, as in the case of something like ; I do not believe Citation bot makes changes on such citations, but I could be wrong. --Izno (talk) 22:02, 16 February 2020 (UTC)
 * It doesn't.&#32; Headbomb {t · c · p · b} 22:06, 16 February 2020 (UTC)

Fails to add bibcodes
Gotta love changing APIs. AManWithNoPlan (talk) 23:07, 18 February 2020 (UTC)

More DOIs for IEEE citations
According to a query, IEEE URLs remain among the most intractable for Citation bot: there are some 2-3000 which resist metadata fixes, largely because they don't have a DOI and the usual technical limitations make it hard to find one. Matching over the document/AR number in the CrossRef dump, I believe I can make a list of URLs linked in our articles and their corresponding DOI. Then, it would need to be added by a bot, probably with a regex replacement: is there some bot or AWB operator here interested in doing it? Nemo 10:12, 16 December 2019 (UTC)


 * Sounds like an AWB bot. It seems that https://ieeexplore.ieee.org/document/##### often seems to have a DOI of 10.1109/JOURNAL_CODE.YEAR.#####  which means that there probably a unique number there. AManWithNoPlan (talk) 17:39, 17 February 2020 (UTC)


 * looks like one can get an account (probably a good job for AWB to do the initial run, but this bot could use a key for long-term purposes). https://developer.ieee.org/docs/read/Metadata_API_details  https://developer.ieee.org/member/register  https://developer.ieee.org/docs/read/Metadata_API_details AManWithNoPlan (talk) 17:45, 17 February 2020 (UTC)


 * Maybe IEEE would give you a spreadsheet that has ALL DOIs and numberical IDs in them? AManWithNoPlan (talk) 17:52, 17 February 2020 (UTC)


 * No need, I can make such a spreadsheet myself from a CrossRef dump. As soon as someone has a use for it, I can produce the list of substitutions needed. Nemo 20:39, 17 February 2020 (UTC)


 * How big a dump? AManWithNoPlan (talk) 22:29, 17 February 2020 (UTC)


 * Some tens of GB IIRC, why? Nemo 00:06, 18 February 2020 (UTC)


 * Just enhanced bot to get a lot more IEEE doi's. AManWithNoPlan (talk) 01:15, 18 February 2020 (UTC)
 * Tens of GBs seems like a lot more than I would think. AManWithNoPlan (talk) 01:30, 18 February 2020 (UTC)

Flagging as fixed for this bot. I have made some significant improvements to the bot, but a massive table is not our style nor would it hit the non-Template URLS. AManWithNoPlan (talk) 16:32, 20 February 2020 (UTC)

Removed a freely accessible url link that wasn't actually a repeated unique identifier

 * The DOI resolves to the URL in question. --Izno (talk) 16:29, 20 February 2020 (UTC)

Keep authors together
That's a cosmetic bug that predates me. https://github.com/ms609/citation-bot/pull/2681 AManWithNoPlan (talk) 14:02, 21 February 2020 (UTC)

incorrect converted bare reference on Macbeth (1948 film)
hello incorrectly converted bare reference on Macbeth (1948 film). see diff Special:Diff/941880753. Leela52452 (talk) 06:48, 21 February 2020 (UTC)

This will fix that. https://github.com/ms609/citation-bot/pull/2682 AManWithNoPlan (talk) 14:14, 21 February 2020 (UTC)


 * fixed AManWithNoPlan (talk) 16:58, 21 February 2020 (UTC)

Don't use arxiv to supersede existing dates
had an = where an === should have been AManWithNoPlan (talk) 11:55, 24 February 2020 (UTC)

Handles list expansion
will provide a list of Handle providers that we will add to our constants files AManWithNoPlan (talk) 19:03, 16 October 2019 (UTC)🤔
 * Time to call in Leeroy Jenkins to extract the handles. AManWithNoPlan (talk) 22:18, 31 October 2019 (UTC)
 * Feel free to work on User:Headbomb/Sandbox and see which prefix resolves or not. &#32; Headbomb {t · c · p · b} 23:45, 31 October 2019 (UTC)

wontfix will just add as need. AManWithNoPlan (talk) 17:28, 24 February 2020 (UTC)

Disruptive line break replaced with space

 * Pretty hard to know what causes the error, given if you have a line break in the middle of year, it's very likely that the field is further garbage, like '20 08' for 20 August. &#32; Headbomb {t · c · p · b} 14:21, 24 February 2020 (UTC)


 * I do not see how a bot could intelligently fix this any better. AManWithNoPlan (talk) 14:30, 24 February 2020 (UTC)

Expand arxiv into cite arxiv similar to doi→cite journal/book

 * I assume you mean to do this only when the arxiv template is the only content of a footnote. Otherwise, we'll run into big trouble expanding it in citations where only the arXiv identifier is intended (for instance, as part of larger manually-formatted citations). This happens very very rarely: I did a search for insource:" →, same as → &#32; Headbomb {t · c · p · b} 04:31, 21 February 2020 (UTC)
 * Same for the other identifier templates it can handle, bibcode, jstor, etc... &#32; Headbomb {t · c · p · b} 15:43, 21 February 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2697 AManWithNoPlan (talk) 17:43, 24 February 2020 (UTC)

FDA web site is not a journal
https://github.com/ms609/citation-bot/pull/2694 AManWithNoPlan (talk) 14:09, 24 February 2020 (UTC)

arxiv links should expand to cite arxiv, not cite documents
https://github.com/ms609/citation-bot/pull/2695 AManWithNoPlan (talk) 14:10, 24 February 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2696 AManWithNoPlan (talk) 14:29, 24 February 2020 (UTC)

Bot removed URL, then complained of missing URL
The bots actions are correct, they just are a little odd in the phrasing. AManWithNoPlan (talk) 18:15, 26 February 2020 (UTC)

Adding broken bioRxiv DOIs
Thank you. AManWithNoPlan (talk) 18:39, 29 February 2020 (UTC)

mixer formatting
The cite book references were split in half already by a separate line without any reason, but the bot added the date/year on a new line itself. Redalert2fan (talk) 22:26, 24 February 2020 (UTC)
 * We cannot easily fix that. The bot tries to figure out the best thing based upon existing line breaks.  We do a guess.  I will at some point look at the guess codee again. AManWithNoPlan (talk) 22:46, 24 February 2020 (UTC)

Bot removes archive links
Archive links must be deleted when the URL is removed because there is nothing to archive if there is no url. Also, there are full copies at the DOI, PMC, etc. No need for a junky archive copy. AManWithNoPlan (talk) 17:01, 5 March 2020 (UTC)

Incorrect cite book
The reference in question is a link to a page to buy a book with some information on it, no content from the actual book is being cited or used as a citation, this is not a link to a readable copy of the book. This should probably stay as cite web. Redalert2fan (talk) 22:34, 24 February 2020 (UTC)
 * See User:Citation_bot/use &#32; Headbomb {t · c · p · b} 23:25, 24 February 2020 (UTC)

Conflicting removals and insertions of redundant external links
not sure whether the pings above reached you (as the template placed my signature above the pings), so re-pinging. --Francis Schonken (talk) 15:42, 5 March 2020 (UTC)


 * I'm not sure what's your point. InternetArchiveBot doesn't add redundant links in the "url" parameter. Nemo 15:46, 5 March 2020 (UTC)
 * In the first example above InternetArchiveBot (instructed by ) removed a redundant link, which doubled with the doi, complete with url parameter; in the second example above InternetArchiveBot (instructed by ) inserted a redundant link, which doubles with the link from the chapter-url parameter. --Francis Schonken (talk) 16:01, 5 March 2020 (UTC)

The issue with GreenC bot was already reported at its talk so I have no idea why it's being reported on this page, it is clearly a bug to have 3 copies of a URL, it will be fixed but it has nothing to do with Citation bot. Also I don't see any problem with Citation bot's edit. -- Green  C  15:49, 5 March 2020 (UTC)
 * please discuss removals and insertions of "doubles" of external links in a citation template on the talk pages of the respective articles: the bot has no business there. --Francis Schonken (talk) 16:01, 5 March 2020 (UTC)


 * The CitationBots removals are well supported by wiki styles and template documentation. Not sure what GreenC is up to. AManWithNoPlan (talk) 17:04, 5 March 2020 (UTC)
 * "it is clearly a bug to have 3 copies of a URL, it will be fixed" -- Green  C  17:57, 5 March 2020 (UTC)

incorrectly added dates ? on Natalie Batalha
see https://en.m.wikipedia.org/wiki/Special:MobileDiff/943812210

i am sure about ref name="NASA-bio", however i am not sure about ref name="Kepler-bio".

if this is wrong, please excuse Leela52452 (talk) 01:59, 4 March 2020 (UTC)


 * Those are the dates on the pages. I am not sure what you mean by wrong. AManWithNoPlan (talk) 12:04, 4 March 2020 (UTC)

hello again,

https://web.archive.org/web/20150915203353/http://www.nasa.gov/web/20150915000257/http://www.nasa.gov/mission_pages/kepler/team/batalha.html contains march 12, 2012 and new version of site is slightly different from archived version and cite button is adding 2015 year. excuse for noise Leela52452 (talk) 13:34, 5 March 2020 (UTC)


 * Since getting dates from archive pages is impossible, perhaps we should not add dates when it is later than archive date. AManWithNoPlan (talk) 01:59, 6 March 2020 (UTC)


 * https://github.com/ms609/citation-bot/pull/2722 AManWithNoPlan (talk) 18:51, 8 March 2020 (UTC)

fixed

Publisher that isn't one
https://github.com/ms609/citation-bot/pull/2720 AManWithNoPlan (talk) 18:39, 8 March 2020 (UTC)

Cover the PLOS caps to all PLOS-related journals
https://github.com/ms609/citation-bot/pull/2721 AManWithNoPlan (talk) 18:44, 8 March 2020 (UTC)

adds |editor= params when template already has |veditors= param
https://github.com/ms609/citation-bot/pull/2719 AManWithNoPlan (talk) 18:35, 8 March 2020 (UTC)

go looking for bugs
https://en.wikipedia.org/wiki/User:AnomieBOT/Nobots_Hall_of_Shame/0 AManWithNoPlan (talk) 12:29, 24 December 2019 (UTC)

notabug left, just opinions mostly.

Mass DOI finder by CrossRef
Converting unstructured references is much more fun using https://doi.crossref.org/SimpleTextQuery ! I don't know you, but I get tired copy-and-pasting from articles to a search engine and back. For days I failed to get anything out of it, until I realised that I must paste my list of references into LibreOffice, click the "numbered list" button, and paste the numbered list into the tool. If you have no numbers, or if you add them manually like a human would do, it's not going to do anything.

Although there is no shortage of citation farms and messy citation sections, I wondered if there's a faster way to find the low hanging fruit. So I made a file with 25k lines from the latest English Wikipedia dump, which look like they might be titles of some work by some very simplistic grepping. If you copy up to 1000 lines into https://doi.crossref.org/SimpleTextQuery, you get a decent amount of DOIs and then you can go look for those titles in articles. I did the biggest chunks in the first 2k lines so far. Nemo 21:32, 2 December 2019 (UTC)
 * I pasted some examples at User:Nemo bis/Missing cite journal. Nemo 13:00, 3 December 2019 (UTC)

notabug wrong tool. Good luck. AManWithNoPlan (talk) 21:53, 8 March 2020 (UTC)

pp. and p. in page= or pages=
https://github.com/ms609/citation-bot/pull/2723 AManWithNoPlan (talk) 21:56, 8 March 2020 (UTC)

Special CS2 code
Does not take into account comments in parameters AManWithNoPlan (talk) 21:03, 12 March 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2727 AManWithNoPlan (talk) 21:41, 12 March 2020 (UTC)
 * fixed

Removes no-break-space from the middle of a multi-digit number in citation title
The character in question is U+2008 punctuation space. This character is a 'breakable' space; see the unicode properties. From General Punctuation, 'space equal to narrow punctuation of a font'. MOS:DIGITS notes that use of spaces for digit grouping may be problematic for screen readers. Perhaps this is a case where spaces used for digit grouping should be replaced not with other spaces but with commas which do not break.

—Trappist the monk (talk) 22:05, 14 March 2020 (UTC)
 * User-unreadable whitespace is deprecated in general across Wikipedia, see MOS:NBSP. If it's intentional, hardcode it via &amp;nbsp; or nbsp, like everywhere else. &#32; Headbomb {t · c · p · b} 06:47, 15 March 2020 (UTC)
 * Do not use in cs1|2 parameters that are included in the citation's metadata.
 * —Trappist the monk (talk) 10:24, 15 March 2020 (UTC)

Garbage volumes
Same for issues/pages if that's found in there. &#32; Headbomb {t · c · p · b} 17:05, 4 March 2020 (UTC)

Replaces publication-place= with location=
The parameter "location" is ambiguous. In some Citation Style 1 templates, it only refers to publication place, but in others, such as "cite journal" (which is an alias of "cite news"), when both "location" and "publication-place" are present, "location" refers to the byline dateline, that is, the place the story was written. The bot should not replace a correct unambiguous parameter with a potentially incorrect parameter. Jc3s5h (talk) 17:28, 8 March 2020 (UTC) Fixed 8 March 2020 18:52 UTC
 * We already discussed this several times: User_talk:Citation_bot/Archive_15, User_talk:Citation_bot/Archive_19. This talk page is unlikely to be the correct forum to achieve such a change. Nemo 18:20, 8 March 2020 (UTC)

Remove via from cite arxiv, cite biorxiv, cite citeseerx, cite ssrn
The via serves no purpose in those, empty or filled, and should be removed as pointless clutter. &#32; Headbomb {t · c · p · b} 21:48, 9 March 2020 (UTC)
 * Obviously this is WP:COSMETICBOT stuff, so should be treated like an optional edit, to be suggested to users, but only made automatically when there's other things to do. &#32; Headbomb {t · c · p · b} 22:02, 9 March 2020 (UTC)

Other more general issues
I would like to suggest, for sake of manual follow-up in editing, that the actions of this and various other citation-fixing bots result in the presentation of the fields in the  markup so that they roughly follow the presentation of the citation's formatted content. That is, rather than appearing, after your work, as, etc., that the citation appears in markup as   with other fields inserted in similarly logical order. I would also strongly suggest introducing spaces, as shown in this example (see following). The odd and sometimes semi-random order in which the fields are presented, alongside the run-on nature of the content, make it very difficult to catch mistakes in fields, and to catch all empty fields, and so—for the significantly amplified work involved in trying to improve citation completeness—the work simply does not get done. Making the automated output easier to work with should at least be worth a beta test. Cheers, a prof and former logging editor. 2601:246:C700:19D:F47B:FAEC:3C25:6306 (talk) 05:24, 15 March 2020 (UTC)
 * After writing it zillions of times by hand, I prefer, so the pipe, the parm, and the value try to stay together when it (inevitably) has to line-wrap, and having to do with my programming sense, error likelihood, logical equivalence to the "vertical" format (thinking of the pipe as a prefix to the parm), aesthetics, etc.. As far as parm order, author first is pretty uncommon in existing usage, too, since people that create cites manually usually start with a URL and then read and enter the title, author, date, etc. I wouldn't object to that order coming out of automated tools, though. —[  Alan M 1  (talk) ]— 08:52, 15 March 2020 (UTC)
 * there is an effort to keep things in a reasonable order, but when adding to existing there is nothing reasonable usually. Also, reordering of existing parameters is something that has been talked about, but we will never do because it ticks way to many people off. AManWithNoPlan (talk) 11:01, 15 March 2020 (UTC)
 * Insuring a space to the left of each "|" would be helpful in creating reasonable line breaks.
 * I can understand people getting upset with automated reordering of parameters; if there was any logical ordering in the article, there won't be when citation bot gets done with it. But if it comes up in this or another automated process, I wouldn't agree with "as far as parm order, author first is pretty uncommon in existing usage...." (AlanM1) The citations may be in an alphabetical list, or may be so rearranged later. In such cases, the authors should come first, in the same order as in the publication, then the date. If there are no authors, the title should come first. This facilitates manual alphabetical ordering when working with wikitext. If the process doesn't have access to the publication, the authors should be kept in the same order before the alteration. Jc3s5h (talk) 12:15, 15 March 2020 (UTC)
 * What you ask for will require a major discussion for bot approval and hug buy-in from the template crowd, etc. And you will never get it.  The bot makes these thing better, but we cannot achieve perfection since no one agrees on what that is. AManWithNoPlan (talk) 13:33, 15 March 2020 (UTC)
 * It would probably be best handled as a script. (Although I still support a TNT checkbox for use on individual articles through the button, since that's functionally a script.) &#32; Headbomb {t · c · p · b} 17:08, 15 March 2020 (UTC)
 * I'm not entirely sure why, but WP:CITEVAR has generally been interpreted as asking for the formatting of citation templates themselves to be preserved, not just the visible results of the template. —David Eppstein (talk) 17:19, 15 March 2020 (UTC)

Mostly to prevent pointless edit wars and arguments about multiline vs single line presentations and between sane variants of parameter order like last/first or first/last in the edit window. It would be very hard for a bot to know that is ridiculous formatting, but that is entirely fine, just as would be. &#32; Headbomb {t · c · p · b} 17:29, 15 March 2020 (UTC)
 * By the way, my default parameter orderings (plural!) are: (1) authors first, everything else alphabetical, so that I can find them quickly without having to remember how the "logical" ordering of parameters works, or (2) whatever order I get them from the site I'm getting the citation from, so that I don't have to put effort into hand-ordering the parameters. —David Eppstein (talk) 17:33, 15 March 2020 (UTC)

Authors first + everything else alphabetical like is a pretty ridiculous ordering. Best practice is something that somewhat resembles presentation order and groups similar things together. Authors/Editors, dates, chapter/title/journal/series/publisher, volume/issue/pages, identifiers, urls. &#32; Headbomb {t · c · p · b} 17:54, 15 March 2020 (UTC)
 * It is a useful ordering, because that way I can use alphabetization to quickly spot the parameter I'm looking for. I would be annoyed if a bot started making cosmetic changes to reorder it. —David Eppstein (talk) 18:47, 15 March 2020 (UTC)

I'm going to mark this as a wontfix since there's just too many problems with this. &#32; Headbomb {t · c · p · b} 14:48, 16 March 2020 (UTC)

Wrong year
I seem to remember you arguing for updating dates with newer Crossref based dates a while ago. I will investigate what this AManWithNoPlan (talk) 11:07, 16 March 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2736. Crossref is not God.  AManWithNoPlan (talk) 11:26, 16 March 2020 (UTC)
 * I argued for that when upgrading from cite arxiv --> cite journal. &#32; Headbomb {t · c · p · b} 14:46, 16 March 2020 (UTC)

p. or page in |page= or |pages=
There are several variations to treat: page, Page, pages, Pages, p. Grimes2 (talk) 07:48, 16 March 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2737 AManWithNoPlan (talk) 11:36, 16 March 2020 (UTC)

Italic or bold in |publisher=
This would fix markup errors:

Italic (  ) or bold ( ' ) markup not allowed in: |&lt;param>n=


 * |publisher=
 * |journal=
 * |magazine=
 * |newspaper=
 * |periodical=
 * |website=
 * |work=

Grimes2 (talk) 13:18, 16 March 2020 (UTC)
 * To do it right isn't as simple as just stripping the markup as you did. Don't do that.  In your example, Metropolitan Barcelona is a magazine.  So, what should happen is:
 * Barcelona Metropolitan (Barcelona's magazine in English)
 * should be changed to:
 * Barcelona Metropolitan
 * and the template changed from to
 * If this bot does anything with this category of errors, for those templates with improper italic markup, it should (and I believe that it does to some extent) maintain a dictionary of periodicals from which it can determine the correct template name and periodical parameter. In your example, publisher also contains editorial commentary which it should not.  We should not expect this, or any other, bot to know what to do with that kind of improper parameter content.  The bot can remove bold markup outright – though that markup is, when compared to italic markup, somewhat rare.
 * —Trappist the monk (talk) 13:42, 16 March 2020 (UTC)
 * —Trappist the monk (talk) 13:42, 16 March 2020 (UTC)


 * Too complicated for a bot. It's better to do it manually. wontfix Grimes2 (talk) 15:14, 16 March 2020 (UTC)


 * we have a very short whitelist we use for this type of thing. AManWithNoPlan (talk) 19:06, 16 March 2020 (UTC)

Caps: I, U, Y
Too many of those to assume any default behaviour when not on a whitelist. 'I' and 'i' should be left alone. &#32; Headbomb {t · c · p · b} 14:17, 25 February 2020 (UTC)
 * Same for U and Y. &#32; Headbomb {t · c · p · b} 05:05, 6 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2741 AManWithNoPlan (talk) 12:17, 17 March 2020 (UTC)

Expand journals if title=none
Example of a failure please. AManWithNoPlan (talk) 18:32, 8 March 2020 (UTC)
 * Try on this one
 * vs
 * &#32; Headbomb {t · c · p · b} 17:20, 15 March 2020 (UTC)
 * &#32; Headbomb {t · c · p · b} 17:20, 15 March 2020 (UTC)
 * &#32; Headbomb {t · c · p · b} 17:20, 15 March 2020 (UTC)

Citation bot userbox
Hello, I've created a userbox for those who use Citation bot. Jerm (talk) 01:23, 9 March 2020 (UTC)

notabug flag to archive and copying to non talk page AManWithNoPlan (talk) 00:44, 17 March 2020 (UTC)

Another incorrect capitalization of stop words in a non-English journal title
https://github.com/ms609/citation-bot/pull/2741 AManWithNoPlan (talk) 12:17, 17 March 2020 (UTC)

Edit-warring
The bot seems currently engaged in a slow edit-war at the Christmas Oratorio page. Please stop that behaviour. The proper way is to take the issue up at the article's talk page. Tx. --Francis Schonken (talk) 10:41, 17 March 2020 (UTC)

notabug Cannot stop three different people AManWithNoPlan (talk) 11:45, 17 March 2020 (UTC)
 * How would a bot go and discuss it's edits on a talk page btw? I think it is made pretty clear that this bot is user activated... --Redalert2fan (talk) 12:11, 17 March 2020 (UTC)

Adsabs issue
https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&page=Siphonostomites

is throwing an AdsAbs issue that looks like it might be fixable:

> Checking AdsAbs database ! Error 400 in query_adsabs: org.apache.solr.search.SyntaxError: Query exceed maxAllowedDepth of 100 tokens for query redistribution: Message with key:Query exceed maxAllowedDepth of 100 tokens for query redistribution and locale: en_US not found. - URL was: https://api.adsabs.harvard.edu/v1/search/query?q=title:%22Excursion+guidebook+CBEP+2014-EPPC+2014-EAVP+2014-Taphos+2014+Conferences%3A+The+Bolca+Fossil-Lagerst%C3%A4tten%3A+A+window+into+the+Eocene+World%22&fl=arxiv_class,author,bibcode,doi,doctype,identifier,issue,page,pub,pubdate,title,volume,year

Martin  (Smith609 – Talk)  10:47, 17 March 2020 (UTC)


 * We cannot get around it since it is internal error. fixed This will make the message no longer red text and will make the text more accurate. https://github.com/ms609/citation-bot/pull/2740 AManWithNoPlan (talk) 12:07, 17 March 2020 (UTC)

Replacement of `publication` with `publicationdate`
https://github.com/ms609/citation-bot/pull/2739 AManWithNoPlan (talk) 11:54, 17 March 2020 (UTC)

Still multiple erors
For instance in Viola (plant) it changed a book url to a chapter-url, when no such url exists, the reference was to the book. Michael Goodyear ✐ ✉  23:55, 18 March 2020 (UTC)


 * Thank you. Added some more code fixed. AManWithNoPlan (talk) 13:06, 19 March 2020 (UTC)

chapter / title error and editor-list error
fixed editor problem. Added comments to the article itself to deal with usage of complete book DOI with chapter, etc. AManWithNoPlan (talk) 13:50, 19 March 2020 (UTC)

books.google.com/books?id= not clean enough

 * I would like these urls stripped down to id= and pg=. All else is unnecessary. —David Eppstein (talk) 04:53, 12 March 2020 (UTC)
 * It's tricky and easy to cause damage. They don't always have page numbers and when they do there can be multiple page number arguments. I believe the last one takes priority? Removing the quotes dq= unclear that should be done as it allows highlighting of passages, but there are multiple types of quotes eg. ldq= and unclear which takes priority based on position in the URL or name of argument. There is no documentation for Google URLs so everything is based on supposition and in my experience when you think you understand it you then find exceptions where it works differently. Would be great if someone took this one to find all the permutations and rules and document it for the world. --  Green  C  15:41, 12 March 2020 (UTC)
 * One thing for sure that should not be done is to convert the quote parts of a google url into quote when the google url is converted to archive.org url; highlighted search string is not a quotation as quote is a quotation. ( – yeah, I know, not this bot ...)
 * —Trappist the monk (talk) 15:54, 12 March 2020 (UTC)
 * Unless someone can docuement this better, we simply will continue to not touch the post hash stuff. Although maybe everything AFTER the hash should be deleted?   AManWithNoPlan (talk) 22:24, 14 March 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2749 AManWithNoPlan (talk) 14:19, 21 March 2020 (UTC)
 * Compare:
 * https://books.google.com/books?id=4ZpVntUTZfkC&pg=PA39&dq=I+have+often+thought+that+i+am+the+most+clever+woman+that+ever+lived,+and+others+cannot+compare+with+me&cd=1#v=onepage&q=customs%20surplus%20merchants%20levy%20taxes&f=false
 * https://books.google.com/books?id=4ZpVntUTZfkC&pg=PA39&dq=I+have+often+thought+that+i+am+the+most+clever+woman+that+ever+lived,+and+others+cannot+compare+with+me&cd=1
 * The first case is how it exists on Wikipedia. The second case is how it would be if the fragment were removed. Another:
 * https://books.google.com/books?id=sLEMdjRhDgQC&pg=PA193&dq=little+pad+beach+boys&hl=en&sa=X&ei=LUntU8CDKa3lsASF54KwAQ&ved=0CDMQ6AEwAQ#v=onepage&q=little%20pad&f=false
 * https://books.google.com/books?id=sLEMdjRhDgQC&pg=PA193&dq=little+pad+beach+boys&hl=en&sa=X&ei=LUntU8CDKa3lsASF54KwAQ&ved=0CDMQ6AEwAQ
 * Different results. (User:AManWithNoPlan) --  Green  C  20:41, 21 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2750 AManWithNoPlan (talk) 20:52, 21 March 2020 (UTC)

Edits at Sociology of language
I'm not certain whether this is a bug or some interaction with human error. This [//en.wikipedia.org/w/index.php?title=Sociology_of_language&type=revision&diff=930247644&oldid=926471459 December 2019 edit] to Sociology of language repeated the book title as chapter. A human editor removed that parameter the next day. Citation bot changed then title= to chapter= in [//en.wikipedia.org/w/index.php?title=Sociology_of_language&diff=next&oldid=930451731 March 2020]. Cnilep (talk) 03:18, 21 March 2020 (UTC)
 * Seems related to a bad url/doi, which eventually got fixed? &#32; Headbomb {t · c · p · b} 04:32, 21 March 2020 (UTC)

notabug since bad DOI AManWithNoPlan (talk) 01:47, 22 March 2020 (UTC)

Adds new dates in non-ideal format
You are incorrect or at least have a significantly different interpretation of the interesting MOS rule. MOS:DATEUNIFY permits these also in citation publication dates per

Of which ISO 8601 is one of the included date formats. --Izno (talk) 21:03, 14 March 2020 (UTC)
 * You are the one that is incorrect. The added dates are not all in the same format as the rest of the article's citations. This is not allowed by the part of the MOS that you directly quoted. —David Eppstein (talk) 21:11, 14 March 2020 (UTC)
 * The issue you reported was not inconsistency, it was that the date format was simply wrong for use a publication date. It is not. Which is the issue you have? --Izno (talk) 21:11, 14 March 2020 (UTC)


 * Since the standard templates for date style are not present; how does someone suggest we proceed. AManWithNoPlan (talk) 02:02, 15 March 2020 (UTC)
 * Better dates with a different style than no dates at all. AManWithNoPlan (talk) 02:04, 15 March 2020 (UTC)
 * You just keep convincing yourself that your bot is doing good instead of making work for others. —David Eppstein (talk) 04:55, 15 March 2020 (UTC)
 * The work was there to be done before since the date was missing. The bot facilitates the work by putting said missing date. Could the bot's logic be improved? Possibly. Maybe by seeing what other citations uses for date format. But a date is better than no date. &#32; Headbomb {t · c · p · b} 06:50, 15 March 2020 (UTC)
 * Adding a date in YYYY-MM-DD format where there is none is clearly an improvement, while venturing a guess to what other formats to use sounds dangerous. Nemo 17:51, 15 March 2020 (UTC)
 * Most of the dates added are from web pages rather than dated publications. It is not at all obvious to me that they are helpful or improvements. (For most web pages, accessdates are more important than the date the web page claims to have been created or updated.) —David Eppstein (talk) 00:49, 17 March 2020 (UTC)
 * I completely agree. That’s why we don’t add dates that are after archive or access date.  People really need to add access dates. AManWithNoPlan (talk) 01:53, 17 March 2020 (UTC)

https://github.com/ms609/citation-bot/pull/2754 AManWithNoPlan (talk) 20:46, 22 March 2020 (UTC)

OAuth requests
No idea why. It's only a minor inconvenience. It might be related to the bot requesting both identity and edit permissions. We actually only need identity. AManWithNoPlan (talk) 23:08, 21 March 2020 (UTC)
 * It's really annoying for sure. Never seems to remember it's permission for much more than 5-10 minutes. Super annoying when you're asking the bot to process 20-25 distinct pages and then they each fail, and then you have to reload each page, ask for the first one to be processed, wait for OAuth to ask for permission, and then request the other 19-24 pages to be processed. &#32; Headbomb {t · c · p · b} 23:12, 21 March 2020 (UTC)
 * I think I figured it out. Stay tuned. AManWithNoPlan (talk) 23:58, 21 March 2020 (UTC)
 * give it a shot. AManWithNoPlan (talk) 00:26, 22 March 2020 (UTC)
 * Will take some time to know for sure, but after a day or two, I'll likely know if things have improved. &#32; Headbomb {t · c · p · b} 00:51, 22 March 2020 (UTC)

WP:CITEVAR violation using citation bot
fixed - copy from my talk page

When using citation bot: please be more careful about not changing instances of {citation} to {cite book} (especially where the source is not a book) where the former is the established usage, as done at Puget Sound faults, and other places. (Haven't I mentioned this before?) Nor should the first author's first/last be concatenated with preceding line, as it makes it harder to scan the citation for accuracy. Your attention to this would be appreciated. &diams; J. Johnson (JJ) (talk) 20:46, 12 March 2020 (UTC)
 * I see the problem. It has a journal set which is invalid for citation, so it has to be changed to cite book.  BUT, the journal is set to a comment which is a strange edge case. AManWithNoPlan (talk) 21:00, 12 March 2020 (UTC)
 * https://github.com/ms609/citation-bot/pull/2727 AManWithNoPlan (talk) 21:41, 12 March 2020 (UTC)

Question...
Why is citationbot stripping notable authors of being wikilinked, instead adding new authorlink fields?

Did someone decide this was a good idea? Doesn't it lapse from the long honoured engineering principle of "Don't fix it if it ain't broke"? Geo Swan (talk) 01:47, 25 March 2020 (UTC)
 * It's not? Diff? &#32; Headbomb {t · c · p · b} 02:53, 25 March 2020 (UTC)
 * I did see one diff recently where it took two consecutive authors (author2 and author3) with linked names, moved the link in author3 from that parameter to an author3-link parameter just before where it was (good), and moved the link in author2 from that parameter to an author2-link parameter placed all the way at the end of the citation (bad). Unfortunately I don't remember which article it was and didn't save a bookmark. But that's cosmetic, not at all the same as stripping links. —David Eppstein (talk) 05:06, 25 March 2020 (UTC)
 * It is not just cosmetic, it fixes the COINS data. AManWithNoPlan (talk) 12:40, 25 March 2020 (UTC)
 * By "cosmetic", I meant the bad placement of the link parameter, not the choice to put the link in a different parameter than the name. —David Eppstein (talk) 06:43, 26 March 2020 (UTC)
 * Umm, not true. It is ok to wikilink authorn using either style of wikilink; both of these produce acceptable metadata:
 * author-linkn is intended for cs1|2 templates that use lastn and firstn so that the whole name may be rendered as a single wikilink. I have seen cases like this:
 * Lincoln Abraham
 * That kind of construct should probably be changed to:
 * Lincoln Abraham Abraham Lincoln
 * —Trappist the monk (talk) 13:17, 25 March 2020 (UTC)
 * author-linkn is intended for cs1|2 templates that use lastn and firstn so that the whole name may be rendered as a single wikilink. I have seen cases like this:
 * Lincoln Abraham
 * That kind of construct should probably be changed to:
 * Lincoln Abraham Abraham Lincoln
 * —Trappist the monk (talk) 13:17, 25 March 2020 (UTC)
 * —Trappist the monk (talk) 13:17, 25 March 2020 (UTC)

Wikilinks in authors used to generate corrupt COINS data. Interesting that this has been fixed. AManWithNoPlan (talk) 21:26, 25 March 2020 (UTC)


 * Since this is no longer a COINS problem and is fixed. Will no longer do author links, but still do last links. https://github.com/ms609/citation-bot/pull/2755 AManWithNoPlan (talk) 00:28, 26 March 2020 (UTC)

Minor edits
Should the minor flag be removed? AManWithNoPlan (talk) 21:47, 25 March 2020 (UTC)


 * As soon as this is accepted, the minor flag will be removed from edits. Too many things being done now to qualify as minor.  https://github.com/ms609/citation-bot/pull/2756 AManWithNoPlan (talk) 22:06, 25 March 2020 (UTC)
 * The context is that I asked not to mark edits as minor when they may require review. The policy on WP:Minor edits is "A minor edit is one that the editor believes requires no review and could never be the subject of a dispute."
 * The edit in question changed a reference from cite web to cite document, which was incorrect. The source used originally was a web page, but latterly we have been able to access an online archive of the original magazine article, so it has ended up as cite magazine. The problem is that marking these edits as "minor" should be a guarantee that they require no review, yet this edit evidently required review, and could have been missed.
 * I appreciate that the scope of the bot has expanded over time, to the extent that some of its edits may benefit from review, but it is already flagged as a "bot" edit, which will allow editors who don't want to review bot edits to ignore them. Flagging these edits as "minor" as well is surely disadvantageous, as it is no longer possible to be sure that the changes produces need no scrutiny. --RexxS (talk) 22:13, 25 March 2020 (UTC)
 * There's really nothing that changes in the appearance from changing a cite web to a cite document, save for correctly displaying the volume/issue information, so I don't really see why that's something that should particularly require review. Compare
 * &#32; Headbomb {t · c · p · b} 22:48, 25 March 2020 (UTC)
 * The point is not whether a particular edit, such as the one that triggered the request was significant. It wasn't. This is what we should see:
 * The point is that the bot demonstrably makes mistakes and edits that may require review, so the minor flag is inappropriate (as well as unnecessary). --RexxS (talk) 23:08, 25 March 2020 (UTC)
 * Yeah, remove the minor flag. I guess I'm surprised this bot's actions were ever considered minor since they always made a non-negligible change to the pages it visited (even when all it was doing was expanding cite dois and others). --Izno (talk) 22:48, 25 March 2020 (UTC)
 * The point is not whether a particular edit, such as the one that triggered the request was significant. It wasn't. This is what we should see:
 * The point is that the bot demonstrably makes mistakes and edits that may require review, so the minor flag is inappropriate (as well as unnecessary). --RexxS (talk) 23:08, 25 March 2020 (UTC)
 * Yeah, remove the minor flag. I guess I'm surprised this bot's actions were ever considered minor since they always made a non-negligible change to the pages it visited (even when all it was doing was expanding cite dois and others). --Izno (talk) 22:48, 25 March 2020 (UTC)
 * Yeah, remove the minor flag. I guess I'm surprised this bot's actions were ever considered minor since they always made a non-negligible change to the pages it visited (even when all it was doing was expanding cite dois and others). --Izno (talk) 22:48, 25 March 2020 (UTC)

fixed this decade old oddity AManWithNoPlan (talk) 23:11, 25 March 2020 (UTC)


 * Now wait for the crowd soon coming to demand that the bot edits be marked minor. :) Nemo 06:15, 26 March 2020 (UTC)

Suggest modifying Zotero timeout

 * In normal circumstances I'd say that anything above 1000 ms is crazy slow. However I have no idea what's the median response time from our Zotero server. Do you know? Nemo 20:07, 3 March 2020 (UTC)
 * This is the total time from initiating the connection until data is received and the connection is closed. There is a separate timeout for just connecting.  The more urls on the page, the shorter the timeout. AManWithNoPlan (talk) 20:25, 3 March 2020 (UTC)

if ($url_count < 5) { curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 15); } elseif ($url_count < 25) { curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 10); } else { curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 5); }
 * If we reduced that to, say, 3, 2 and 1 respectively, would we be able to tell from the logs or something whether the success rate (however defined) increases? Nemo 20:56, 3 March 2020 (UTC)
 * I had something similar for User:Bibcode Bot, but I had increasing timeouts (5/10/15 seconds) for the ADSABS database before failure. But this was a bot doing its on thing, without anyone waiting after it. For what's essentially a communal tool, I'd say 10 seconds total wait time for a single url should be more than enough. And if multiple distinct Zotero calls fail in succession, maybe skip Zotero for the next 5 minutes so we're not constantly querying a dead connection during a server hiccup or something. &#32; Headbomb {t · c · p · b} 22:23, 3 March 2020 (UTC)
 * We do skip after enough fails, but that is per run and not global. AManWithNoPlan (talk) 22:53, 3 March 2020 (UTC)
 * I don't see this warning right now. AManWithNoPlan (talk) 22:54, 3 March 2020 (UTC)

if (!$is_a_man_with_no_plan) $this->expand_templates_from_identifier('url',    $our_templates);
 * long-term it would be good to take advantage of the bulk API and submit all urls at once AManWithNoPlan (talk) 00:57, 4 March 2020 (UTC)
 * True for all APIs. &#32; Headbomb {t · c · p · b} 19:17, 16 March 2020 (UTC)
 * Already true for the slow ones that allow it (other than zotero). AManWithNoPlan (talk) 12:20, 17 March 2020 (UTC)

remove website and synonyms from cite arxiv
https://github.com/ms609/citation-bot/pull/2760 AManWithNoPlan (talk) 11:41, 26 March 2020 (UTC)

Flagging edits as "bot"
It appears that the wikipedia API is ignoring the bot=1 flag we are passing it. Someone with wikipedia superpowers needs to flag this account as a bot, so that this flag is accepted. I know this flag is not required for bots, but it would be nice. AManWithNoPlan (talk) 11:36, 26 March 2020 (UTC)
 * any insights here? &#32; Headbomb {t · c · p · b} 11:58, 26 March 2020 (UTC)
 * The edits are being correctly flagged as bot. Remember that only the recentchanges table stores this information, so you can see it |flags|user&rclimit=5 from the recentchanges API or your own watchlist.
 * Example which show the flag is correctly registered:
 * Nemo 12:52, 26 March 2020 (UTC)
 * My watchlist was not showing the "b" flag. I don't know why, but now it is.  That was really weird.  notabug AManWithNoPlan (talk) 14:26, 26 March 2020 (UTC)
 * In hindsight, I should have noticed that NO bots were flagged as "b". AManWithNoPlan (talk) 14:41, 26 March 2020 (UTC)

Removes valid partial title link
Partial wikilinks should not be used (according to the styles), and are 99% of the time invalid (ie. they link to IBM in the title instead of the actual thing, for example) AManWithNoPlan (talk) 12:41, 19 March 2020 (UTC)

Incorrect change from "url=" to "chapter-url="

 * Manual bypass seems the solution here. &#32; Headbomb {t · c · p · b} 15:27, 22 December 2019 (UTC)
 * Not making bot edits beyond the capacity of the bot to understand the actual meaning of the content at the link seems to be the answer to me. If we're going to have two different url parameters with different meanings and one of them is chosen as the correct one by a human editor, why should the bot be second-guessing that? —David Eppstein (talk) 18:52, 22 December 2019 (UTC)
 * Because in 99%+ of cases, humans are wrong and use url instead of chapter url. &#32; Headbomb {t · c · p · b} 20:45, 22 December 2019 (UTC)
 * This is directly counter to the philosophy according to which, several years ago, the url parameter was changed from being a catch-all parameter that would by default bind to the tightest title in the template, and instead became split into several parameters that each had a specific meaning. If I want to use a parameter with its correct meaning, and the bot refuses to let me, that seems like the very definition of a bug to me. —David Eppstein (talk) 01:11, 31 December 2019 (UTC)
 * Just encountered this again at Modern Jazz Quartet. The bot should have some code that helps it figure out that this, added by InternetArchiveBot, is most definitely not a chapter URL. Graham 87 04:24, 12 March 2020 (UTC)
 * Here's another one. If Citation bot is too stupid to recognize that an archive.org url like this, without any extra page-number complications, is going to be a link to the whole book, it is too stupid to be making these changes at all. url= without chapter-url= is a perfectly valid combination of parameters and should not need special bot-exclusion code to prevent it from being broken by marauding bots. —David Eppstein (talk) 06:36, 20 March 2020 (UTC)
 * I agree. Book URLs are not rare, and blindly changing url to chapter-url is introducing a significant number of errors.  It needs to be stopped.  Kanguole 10:17, 26 March 2020 (UTC)


 * this should help a lot https://github.com/ms609/citation-bot/pull/2765 AManWithNoPlan (talk) 17:07, 27 March 2020 (UTC)
 * Checking for google.com and archive.org will reduce the number of errors, but the bot will still be making many erroneous edits. This is not an edit that can be safely automated.  Kanguole 22:29, 27 March 2020 (UTC)


 * please point me to examples where there is a problem now. AManWithNoPlan (talk) 22:55, 27 March 2020 (UTC)
 * Sorry, I misread it. The new approach, only moving for the few websites where you know the format of URLs for parts of books, is what I wanted.  Kanguole 23:36, 27 March 2020 (UTC)

Discussion at Village Pump
Of possible interest: Village_pump_(technical) -- Green  C  14:18, 18 March 2020 (UTC)

fixed - flag for archive. AManWithNoPlan (talk) 21:12, 28 March 2020 (UTC)

Clean up todo's and fix code coverage
Aggressive fixing of bugs has left the code with some technical debt. Need to fix. AManWithNoPlan (talk) 11:41, 26 March 2020 (UTC)

fixed for now. Will look again in the future. AManWithNoPlan (talk) 21:11, 28 March 2020 (UTC)

ANI notice
There is currently a discussion at Administrators' noticeboard/Incidents regarding an issue with which you may have been involved. The thread is AManWithNoPlan and Citation bot. . HJ Mitchell &#124; Penny for your thoughts? 10:08, 28 March 2020 (UTC)

fixed

URL added for journal title
I know this is some kind of GIGO, but a sanity check to not add ULRs to journal field cound be applied. Jonatan Svensson Glad (talk) 16:36, 28 March 2020 (UTC)
 * I am once again disappointed in zotero's error checking. We do a lot of data sanitization.  https://github.com/ms609/citation-bot/pull/2767 AManWithNoPlan (talk) 19:49, 28 March 2020 (UTC)
 * This one isn't even a journal! cite encyclopedia would have been a better choice. —David Eppstein (talk) 20:10, 28 March 2020 (UTC)