User talk:Citation bot/Archive 18

Dropping parameter "access-date" from other templates without URLs like cite episode, cite AV media notes & cite ODNB
wontfix since often the url is in the website parameter or someplace else and we won’t remove access dates unless template is something we work with in general. AManWithNoPlan (talk) 11:52, 11 August 2019 (UTC)

Web site changed to book
archive.org has revampted the website. Will look more. AManWithNoPlan (talk) 18:34, 9 August 2019 (UTC)


 * URL structure the same. The media type can be determined with an API call. --  Green  C  20:45, 9 August 2019 (UTC)


 * what do you mean by api call? AManWithNoPlan (talk) 21:02, 9 August 2019 (UTC)


 * "Advanced Search returning JSON, XML, and more", enter the ID in the search field (BritishNuclearTestOperationHurricaneDeclassifiedReportsToWinston), choose "mediatype" in the fields to return box, choose a format (JSON etc): return. This work is unusual because it is a multi-file so it gives a media type for each one (all the same: "texts") but there may be cases where it is mixed (texts and audio). A more typical book eg. raven01poegoog has a single mediatype on return. -- Green  C  22:31, 9 August 2019 (UTC)
 * we might eventually do that. Depending upon free time and the need level. AManWithNoPlan (talk) 23:14, 9 August 2019 (UTC)

The media type is pretty generic: https://archive.org/advancedsearch.php?fl[]=mediatype&output=xml&rows=5000&page=1&q=random AManWithNoPlan (talk) 02:10, 10 August 2019 (UTC)


 * Hmm it might not work to distinguish between books and other printed non-book texts. -- Green  C  05:15, 10 August 2019 (UTC)


 * Closing as fixed as best we can, since the types archive.org uses (audio, web, account, movies, collection, texts, image, software) are pretty useless). AManWithNoPlan (talk) 18:10, 11 August 2019 (UTC)

JSTOR books
expands to when it should expand to

You can use the fact that the JSTOR start with  to know it's a book. &#32; Headbomb {t · c · p · b} 17:12, 12 August 2019 (UTC)


 * The j. is irrelevant. We just need to parse the RIS data better. Which I am working on.  https://github.com/ms609/citation-bot/pull/2079 AManWithNoPlan (talk) 23:45, 12 August 2019 (UTC)

fixed

Please fix titles with volumes, issue, etc in them
This is probably tricky to implement, but if a pattern can be generalized, e.g. (untested pseudocode, lacking some punctuation)

that could be worth it. A more limited scope could also be easier to implement. &#32; Headbomb {t · c · p · b} 04:13, 23 June 2019 (UTC)


 * Some care necessary; V. might show up in regard to law cases and any variation of pages in the context of book reviews. --Izno (talk) 13:45, 23 June 2019 (UTC)
 * Well, it's not just V. alone, but rather  + V.# + nothing that isn't issue/pages/"Special Issue...". So unless you have something like Journal of Physics v. 1993 Special Issue Ford Mustang, that shouldn't happen. &#32; Headbomb {t · c · p · b} 15:48, 23 June 2019 (UTC)
 * there is always which includes that in the title AManWithNoPlan (talk) 17:36, 8 August 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2075 AManWithNoPlan (talk) 18:28, 11 August 2019 (UTC)

Remove via= if it's the same as journal= or publisher=
With the usual 'The Foobar' = 'Foobar' and other similar variations. &#32; Headbomb {t · c · p · b} 17:15, 12 August 2019 (UTC)

Bot down
The bot is just down in general. Gadget doesn't work. API doesn't work. Nothing works. &#32; Headbomb {t · c · p · b} 23:13, 13 August 2019 (UTC)
 * any ideas. I have a password protected restart.php that I ran and the bot went away for awhile but when it came back it was still dead AManWithNoPlan (talk) 00:28, 14 August 2019 (UTC)
 * Looks like it won't execute any PHP files. Not sure why. Kaldari (talk) 08:01, 14 August 2019 (UTC)
 * I rolled back the most recent change and it seems to work again. (The head is now at 3f21e6a.) Kaldari (talk) 08:29, 14 August 2019 (UTC)
 * Going forward, can we test new changes at citations-dev first? Kaldari (talk) 08:49, 14 August 2019 (UTC)
 * or some one else would need to get that up to date. Also, the branch that works seems to be newer than the bot dying.  Lastly the Bot was alive enough that html files, gitpull, and restart all worked.  I suspect something wrong with Authenticator. AManWithNoPlan (talk) 11:06, 14 August 2019 (UTC)
 * can we get back on the git master branch for now. AManWithNoPlan (talk) 18:05, 14 August 2019 (UTC)
 * The master branch doesn't work. If I set the repo to the current master branch or anything after 3f21e6a, PHP pages won't execute. Kaldari (talk) 10:09, 15 August 2019 (UTC)
 * I imagine its an issue with the webservice configuration, i.e. lighttpd.conf, but I'm not sure. Kaldari (talk) 10:12, 15 August 2019 (UTC)
 * could it be a missing chmod +x ? AManWithNoPlan (talk) 11:19, 15 August 2019 (UTC)
 * try master now. AManWithNoPlan (talk) 18:56, 15 August 2019 (UTC)
 * No... not master! &#32; Headbomb {t · c · p · b} 20:12, 15 August 2019 (UTC)
 * Seems to be fixed now! Kaldari (talk) 14:23, 16 August 2019 (UTC)

fixed ALL php files need to start with magic php keyword, even unused ones. AManWithNoPlan (talk) 17:27, 16 August 2019 (UTC)

JSTOR book chapters
https://github.com/ms609/citation-bot/pull/2086 RIS is a not well standardized format AManWithNoPlan (talk) 18:05, 14 August 2019 (UTC)

More garbage volume/issue cleanup
https://github.com/ms609/citation-bot/pull/2091 AManWithNoPlan (talk) 17:46, 16 August 2019 (UTC)

If DOI = JSTOR, and DOI = Inactive, remove DOI / DOI-BROKEN-DATE, remove URL/CHAPTER-URL
https://github.com/ms609/citation-bot/pull/2089

Caps: La Trobe
https://github.com/ms609/citation-bot/pull/2088

CAPS: Nyt Tidsskrift
https://github.com/ms609/citation-bot/pull/2088

Carnegie Institute Washington D.c. Publication
As a side note, it should just capitalize "D.c." accross the board. &#32; Headbomb {t · c · p · b} 14:48, 16 August 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2090 AManWithNoPlan (talk) 17:45, 16 August 2019 (UTC)

If you have a broken DOI, try it as a JSTOR
could have waited until this one was fixed before running on Category:Pages with DOIs inactive as of 2019...

Capitalize after spaced bracket
https://github.com/ms609/citation-bot/pull/2105 AManWithNoPlan (talk) 19:52, 23 August 2019 (UTC)

Caps: eCrypt
https://github.com/ms609/citation-bot/pull/2105 AManWithNoPlan (talk) 19:52, 23 August 2019 (UTC)

Do not add titles with &#124; (or it's HTML representation)
It's often part of the title. AManWithNoPlan (talk) 18:53, 9 August 2019 (UTC)
 * I've never seen a single place where this was part of a title (and worth keeping; I always strip this). Jonatan Svensson Glad (talk) 18:55, 9 August 2019 (UTC)
 * Titles with pipes are better than no titles. &#32; Headbomb {t · c · p · b} 19:37, 9 August 2019 (UTC)
 * If done properly, yes. But I rather the bot not make any changes, or only "correct" changes. Not 'better than nothing' chnages, since a human still needs to clean it up. Jonatan Svensson Glad (talk) 21:15, 9 August 2019 (UTC)
 * Pipes should convert to | . A literal pipe is a reserved character in CS1|2 and shouldn't exist at all. HTML pipes could ideally be converted to | . -- Green  C  20:31, 9 August 2019 (UTC)


 * I would prefer not to add “fixing this” to the bots tasks. AManWithNoPlan (talk) 21:33, 9 August 2019 (UTC)


 * The website cannot seem to make up its mind about what the better title is. AManWithNoPlan (talk) 21:50, 9 August 2019 (UTC)

One Direction Tour Tickets Sell Out In Minutes | MTV UK  {"@context":"http:\/\/schema.org","@type":"NewsArticle","headline":"One Direction Tour Tickets Sell Out In Minutes","url":"http:\/\/www.mtv.co.uk\/one-direction\/news\/one-direction-tour-tickets-sell-out-in-minutes","keywords":["one direction"],"dateCreated":"2013-05-25T12:32:44+01:00","articleSection":"One Direction"} 


 * reFill and Citoid give the same title we do. AManWithNoPlan (talk) 21:52, 9 August 2019 (UTC)


 * there is no reliable way to determine if the after the pipe stuff is part of the title or not. It is actually more of a philosophical question than a factual question.  AManWithNoPlan (talk) 18:04, 11 August 2019 (UTC)
 * I would return  or the HTML string   rather than avoiding fixing this, whether it's part of a title or elsewhere (except in URLs). --Izno (talk) 18:10, 11 August 2019 (UTC)
 * Just for the records, Citoid also blindly adds titles with pipes, resulting in stray text without a parameter. At least escaping the pipes as Izno says should be uncontroversial. I have no opinions on stripping them (I often do so manually as Josve says). Nemo 09:18, 12 August 2019 (UTC)
 * one problem we run into is websites that pipe parts the opposite direction host|section|title AManWithNoPlan (talk) 10:53, 12 August 2019 (UTC)
 * Converting to  is all that should happen here. &#32; Headbomb {t · c · p · b} 11:44, 12 August 2019 (UTC)
 * is there any reason to prefer the pseudo template over html? AManWithNoPlan (talk) 11:18, 20 August 2019 (UTC)
 * It's more wikitextish, but beyond that, no. --Izno (talk) 12:22, 20 August 2019 (UTC)
 * It's more recognizable in edit window. It's a rather cosmetic and not really a critical issue though. &#32; Headbomb {t · c · p · b} 17:00, 20 August 2019 (UTC)

more details needed on "! CrossRef title did not match existing title"
Hello Martin. Thanks for this amazing power tool, which makes life so much easier!

I wonder if you might add details to the debug messages for "! CrossRef title did not match existing title", which cropped up thrice today on Mefloquine. I suspect that the existing title was an all-caps version of the correctly capitalised title, however, it is not immediately apparent to the casual user. It would be nice if you added a printout of the prior title and the new title, so that users might compare the two on the debug-screen page and, if necessary, act to remedy the inconsistencies.

I doubt that this link will show you what I mean because it is likely to vaporise, but here goes: Mefloquine title did not match

Have I explained myself well? Please contact me if not. And thanks again!

Magnoffiq (talk) 15:46, 23 August 2019 (UTC)


 * We cannot do that without confusing people more. The number of incoming data pieces and existing data pieces is quite large and we have print them all.  You can click on the DOI link and see what CrossRef shows and search for it on the existing page and see what the page already has. AManWithNoPlan (talk) 18:54, 23 August 2019 (UTC)


 * gamma vs the greek character. Extra ": a review" at the end.  AManWithNoPlan (talk) 19:12, 23 August 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2104 AManWithNoPlan (talk) 19:53, 23 August 2019 (UTC)
 * fixed with a few more pulls. AManWithNoPlan (talk) 22:10, 24 August 2019 (UTC)

Books and their reviews
No idea how widespread this is... only noticed as it is on a page I watch.  Catfish  Jim  and the soapdish  08:39, 22 August 2019 (UTC)
 * Thank you. This is a problem in less than 0.5 % cases according to studies of Unpaywall data, but it can happen. The best solution is to report any bug to Unpaywall, the worst solution is to try second-guessing ourselves. Nemo 10:10, 22 August 2019 (UTC)
 * One can reduce the likelihood of this by using cite book instead of citation and by adding things that are “bookish” like isbn, olcn, etc since we use all those to guess whether the reference is a book or a review when querying AddAbs. Lastly, add a comment in the bibcode area to stop it.  This bug is very rare since i added about a dozen lines of code to guess if its a book or a review. AManWithNoPlan (talk) 11:10, 22 August 2019 (UTC)

location vs publication-place
Dear Programmer. Nice tool. I tried your bot on the page Antoine Hamilton. I verified the doi very well. However, it also replaced all the '|publication-place=' parameters in the citation template to '|location=' parameters. My understanding was that publication-place is now old and should be replaced by publication-place. Finally it said it could not find the isbn 9780198613741, which is however the 13-digit version of the isbn 0-19-861374-1 marked in the book. Johannes Schade (talk) 14:51, 10 August 2019 (UTC)

Better series handling
Pleas explain. AManWithNoPlan (talk) 13:54, 26 August 2019 (UTC)
 * What's to explain? Methods of Molecular Biology is a book series, and it should be handled as such? Convert to cite book, use series over journal and remove duplications, and use chapter+title?  &#32; Headbomb {t · c · p · b} 14:08, 26 August 2019 (UTC)

Uses cite book for web site about a journal
That is a very interesing problem. Are we referencing the Book like object itself that the website is a copy thereof or are we referencing the website itself. AManWithNoPlan (talk) 19:55, 23 August 2019 (UTC)
 * What book-like object? What copy? That's a link to the publisher's web site about a journal that they publish. It's a web site. Not a book. Not even a journal. Just a web site. —David Eppstein (talk) 07:13, 25 August 2019 (UTC)
 * my point is that a journal series is a book like object. We convert amazon links and google books links to books. If you are referencing the journal than one can debate if book/journal/web is best.  Unfortunately that website in its meta data presents itself more like a book than a website.  We query citoid, so we can’t really fix that because it happens outside of our code. AManWithNoPlan (talk) 11:11, 25 August 2019 (UTC)
 * I can probably black list that domain. AManWithNoPlan (talk) 11:21, 25 August 2019 (UTC)
 * It's not a reference to a journal. It's a reference to a web site about a journal. It's used to source some information about the journal, not used to source information published in the journal. You are misinterpreting the metadata about the journal as being metadata about the web site about the journal. And much as preventing the creation of links to Elsevier might amuse me, I think it would be a bad idea. Or does "blacklist" merely mean to prevent the bot from touching that link? —David Eppstein (talk) 01:52, 26 August 2019 (UTC)
 * black listed in that any url with ‘journal’ in the hostname with not be web to book changed. AManWithNoPlan (talk) 02:23, 26 August 2019 (UTC)
 * Journal in the hostname is a bad blacklisting, too many journal articles will use it. Blacklisting  however would be fine. &#32; Headbomb {t · c · p · b} 03:50, 26 August 2019 (UTC)
 * Indeed, for instance journals.cambridge.org and tons of university-run journals, some of which often act as books. Nemo 05:15, 26 August 2019 (UTC)
 * It’s only the zotoro based changing. Chapters/isbn/etc will still trigger it.  AManWithNoPlan (talk) 13:54, 26 August 2019 (UTC)

Capitalization of journal titles
I will add some Lithuanian words to the list of foreign words. By the way whether what the bot did is wrong or right depends upon the style. Many styles specify capitalization independently of the native language. AManWithNoPlan (talk) 18:47, 26 August 2019 (UTC)
 * Would it be possible to ignore title capitalization if the language parameter says Lithuanian? That would seem to be a more efficient solution. Renata (talk) 19:11, 26 August 2019 (UTC)
 * not really. It’s rarely set and often journal titles are English even when the articles are not. AManWithNoPlan (talk) 22:54, 26 August 2019 (UTC)

"Removed URL that duplicated unique identifier"
I'm getting these all the time now and I think they arguably make the citation sections worse. There's no way that [edit: general readers] know to click on the linked "doi" when the citation's title itself is unlinked. I'll note that the cite journal documentation examples keep the url parameter even when a doi is provided.

Where is the consensus to make this edit en masse? czar 13:26, 20 July 2019 (UTC)
 * I'm going out now but I'll leave a quick answer to one of your points: a lot of people do, in fact, know to click the DOI. We know for sure from CrossRef data: https://www.crossref.org/blog/https-and-wikipedia/ https://www.crossref.org/blog/real-time-stream-of-dois-being-cited-in-wikipedia/ Nemo 13:36, 20 July 2019 (UTC)


 * unless the url is free to download without logging in, you should not add them unless there is no other links out. AManWithNoPlan (talk) 13:39, 20 July 2019 (UTC)


 * there is even movement afoot to remove the automatic linking of titles when a PMC is present. AManWithNoPlan (talk) 13:47, 20 July 2019 (UTC)
 * My question was where this consensus has been established, or if this is just a practice localized to editors using this bot/tool. czar  15:17, 20 July 2019 (UTC)
 * I don’t have time to look it up, but the links are in the talk archives somewhere—hopefully someone not in an auto parts store can respond better. AManWithNoPlan (talk) 15:31, 20 July 2019 (UTC)
 * The general idea is that these links are redundant with the DOI/other identifiers, who are clear about where they take you (doi: version of record, jstor = jstor repository, etc... If you don't know what those are, we have the wikilinks). url is then freed up to be used for freely-available full text versions-of-record of the paper hosted on an author's website, or similar. If the DOI version is free, you can use free to mark it as free, etc. &#32; Headbomb {t · c · p · b} 17:29, 20 July 2019 (UTC)

Please see the usage page for why notabug AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)
 * , sorry, where on the usage page is the consensus/discussion to remove url parameters when a doi is provided? czar  23:21, 21 July 2019 (UTC)
 * I thought someone added it. Weird.  AManWithNoPlan (talk) 23:51, 21 July 2019 (UTC)
 * It's long standing practice to do this, for the reasons outlined above. Many bots have been approved for this sort of cleanup too, e.g. User:CitationCleanerBot. If you want the title always linked, go to Help talk:CS1 and request that url is automatically set to https://doi.org/10.1234/1234567890 whenever a DOI is present. Likewise for other identifiers of record. &#32; Headbomb {t · c · p · b} 00:10, 22 July 2019 (UTC)
 * So is the answer to my question that there is no documented discussion of consensus? czar  12:44, 27 July 2019 (UTC)
 * the answer is that people are to busy too dig it up. AManWithNoPlan (talk) 14:35, 27 July 2019 (UTC)
 * Another answer might just be that there has been no 'formal' discussion because formal discussion is a requirement for something that, it would appear, has silent consensus.  I would guess that thousands of edits of this type have been made by the bot and by individual editors (I am one).  As far as I know, there has been little to no discussion about removing urls that duplicate the named identifiers.  I've done it a lot and have seen quite a few where the url had rotted on the vine while the named-identifier link worked properly.
 * —Trappist the monk (talk) 15:04, 27 July 2019 (UTC)
 * And bots like User:CitationCleanerBot which has explicit approval for such things. &#32; Headbomb {t · c · p · b} 20:18, 27 July 2019 (UTC)
 * Per Trappist, I would ask Czar to find any past discussion (with a few users who argued) against this established practice. I'm sure you can find some and it would help focus the discussion, because there are various ways to look at it.
 * I've searched a bit at the village pump and I couldn't find any, although I did find a rather surreal discussion of 2010 on the relationship between DOI and promotion to publishers (you can presumably find many variants of that argument in discussions on Credo and other similar schemes) plus a few discussions with relevant comments in passing such as "urls to dois should generally not be placed in |url= when there is |doi= because that constitutes overlinking and because most most dois are behind paywalls"
 * In general, in my opinion policies and guidelines contain two signs that the removal of URL redundant with DOI is desirable.
 * The very fact that there is consensus on adding a parameter for a certain identifier in cite journal or others proves that there is a desire to have that identifier presented in a structured way (see Citation templates now support more identifiers, 2011). It follows logically that there is a desire for the identifier information/link to be moved to the structured parameter rather than left lingering in N other ways it can be inserted (the id and url parameter, free text after the citation template, other templates after the citation templates etc. etc.). Nobody ever complained of people removing links to PubMed or CiteSeerX to use the corresponding identifier parameters instead, after they were introduced: it was the logical expectation. The same for the DOI, especially when doi-access was introduced to give more granular information about it and its target.
 * At Help:Citation Style 1 elsewhere you can see that the URL parameter is generally expected to point to a full text of the cited document, open for everyone to see. So strong is the assumption, that in a few places you find a note that yes, a paywalled URL is acceptable if necessary for verifiability: it's clearly considered an exception, because nowhere you will find a general statement that paywalled and commercial copies are preferred over the others. (Such notes were added relatively late in the life cycle of the citation guidelines, around 2009; see also 2010, 2011, 2014 discussions.) The official publisher URL (to which the DOI leads when resolved with doi.org) is generally paywalled so it would by default not be the ideal content of an URL parameter even if the DOI parameter didn't exist.
 * Nemo 09:09, 28 July 2019 (UTC)
 * Nemo 09:09, 28 July 2019 (UTC)


 * As I was pointed here after I had reverted a "Dup URL" edit, I think if there is consensus, then another change should be made to the templates particularly cite journal that both doi= and url= should not be present, that doi= takes precedence and should automagically populate the URL field with the correct DOI URL, and that this can be flagged in red text in the reflist as other errors. you can still have the bot go around cleaning it up, too, but this helps users to clean it faster (those red errors are easy to spot). I do note that even for paywalled URLs, you still get that the cited journal article exists, its abstract, and sufficient citation deals to meet WP:V, but ideally the DOI URL should get you there too. --M asem (t) 17:26, 2 August 2019 (UTC)


 * Found this thread from 2015: But yes, not an easy discussion topic to query, hence why I thought I'd have better luck with meatspace. My concern is essentially the same one I'm quoting (and as Masem alludes). I have no strong opinion on removing url when a total duplicate for the doi but from my experience watching people use Wikipedia, when the citation's title is unlinked, readers with no knowledge of DOIs aren't going to click through the links unless they're interested in figuring out what a DOI is (same for ISSNs, ISBNs, and similar identifiers). Maybe that makes this more of a CS1 discussion now? There is also a separate discussion to be had re: the edit I first cited above, which removed a url that linked to the full text but no free was replaced in its stead.  czar  21:47, 3 August 2019 (UTC)
 * Currently |doi-access=free does not turn the title into a link, so I'm personally not especially motivated to add it. Nemo 19:02, 4 August 2019 (UTC)
 * you may be interested in this RFC which would make that option a reality. &#32; Headbomb {t · c · p · b} 21:20, 4 August 2019 (UTC)
 * Another discussion was Bots/Requests for approval/DOI bot 2 "the usual style in articles I edit is that url= is reserved for articles where the entire text is freely readable, and that url= is not used for articles where just the abstract is readable (for that, you can just live with the DOI or PMID or whatever)". Nemo 13:12, 6 August 2019 (UTC)
 * If that is the common sentiment, shouldn't it be added to the CS1 documentation? It's hard to have a discussion for/against the practice because the current standard isn't documented in a central location. czar  20:40, 11 August 2019 (UTC)

notabug &#32; Headbomb {t · c · p · b} 23:13, 29 August 2019 (UTC)

Surgeon General of the United States
The bot should undo this edit. QuackGuru ( talk ) 18:56, 5 August 2019 (UTC)
 * According to the website, the preferred citation includes that information: U.S. Department of Health and Human Services. The Health Consequences of Smoking: 50 Years of Progress. A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health, 2014.  AManWithNoPlan (talk) 18:18, 11 August 2019 (UTC)
 * The bot is listing "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health" as an "author". QuackGuru ( talk ) 18:34, 11 August 2019 (UTC)


 * if you do not like who the publisher lists as the author, then block the bot with author1= AManWithNoPlan (talk) 17:34, 12 August 2019 (UTC)
 * I thought "author" parameters were reserved for a person's name. Is there a publisher1 and publisher2 and so on for co-publishers or could this be created for co-publishers when other authors are not a person's name? QuackGuru ( talk ) 18:29, 12 August 2019 (UTC)
 * Well, that's an interesting question. There is a loose hierarchy publisher > editor > author, similar to series > title > chapter in books.  But that still pretty vague, and non-humans can be authors. AManWithNoPlan (talk) 23:50, 12 August 2019 (UTC)
 * See Template:Cite book. I could not find where it mentions "author" for non-humans. There is no solution for when there are multiple co-publishers. Just let it be or someone could propose creating new parameters for co-publishers. QuackGuru ( talk ) 00:35, 13 August 2019 (UTC)
 * The use of author for organizational authors is permitted. --Izno (talk) 23:25, 13 August 2019 (UTC)
 * I prefer the creation of new parameters for publisher1 and publisher2 and so on. QuackGuru ( talk ) 23:38, 13 August 2019 (UTC)


 * I have seen cases of co-publishers, but that is not the case here. Looking at the edit QG links to, it appears to me that the essential problem is citing the "Surgeon General of the United States" as the publisher. It is quite unlikely that the Surgeon General has personally published that item. (I can conceive of the office "of the Surgeon General" doing so, but unlikely.) Someone should take a closer look at the source to sort out who the (possibly "corporate" or institutional) author is, and who actually managed the publication. At any rate, this case does not warrant citing multiple publishers when the real issue is who is the (singular) publisher. And certainly does not warrant multiple publisher parameters. &diams; J. Johnson (JJ) (talk) 23:55, 13 August 2019 (UTC)
 * It is a report of the SG. Adding "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health" is fine. But what is the best way to add it? How can I list it as a co-publisher? QuackGuru ( talk ) 16:19, 14 August 2019 (UTC)


 * "Report of the SG" is ambiguous as to authorship (responsibility), publisher, etc. You want the "best way to add" something, where I would say it is not clear as to exactly what should be added. (And unless the document says "co-published" I rather doubt that is the case.) What you need to do is examine the document closely, perhaps with the help of a medical librarian. Or look at how other publications cite it. But be cautious. E.g., I would not go with citoid's identification of the author as "General". &diams; J. Johnson (JJ) (talk) 23:22, 14 August 2019 (UTC)


 * if there is only a single non-human author from citoid, we reject it. This author is from the pubmed API based upon the PMID. AManWithNoPlan (talk) 00:00, 15 August 2019 (UTC)


 * So is the Surgeon General a non-human author? [Caution! lots of sharp edges in that question; handle with care.] &diams; J. Johnson (JJ) (talk) 19:04, 15 August 2019 (UTC)
 * See how other publications cite it. For example, see "While the most recent Surgeon General's Report on the "Health Consequences of Smoking"..." <b style="color: #e34234;">QuackGuru</b> ( talk ) 19:38, 15 August 2019 (UTC)


 * Isn't that just what I said? ("Or look at how other publications cite it.")
 * Note that what you just quoted is not a citation. A citation – more precisely, a full citation – has bibliographic details, etc. Which medical journals tend to pare down to what is minimally sufficient (such as leaving off the publisher), but if you search for this report on Google Scholar you should find lots of hits, and quite likely some useful examples.
 * There is no bot issue here, so I think we're done. &diams; J. Johnson (JJ) (talk) 21:36, 16 August 2019 (UTC)
 * It is not a bot issue unless there is a new way to format it for organizational authors in the future. For now this is the way to cite it. <b style="color: #e34234;">QuackGuru</b> ( talk ) 21:42, 16 August 2019 (UTC)
 * Not quite; cs1|2 has chapter and chapter-url; use them:
 * I left 107–138 but do your readers a favor: for in-line citations like this one, use an appropriate in-source location parameter and value to identify in the source the supporting information is; don't make readers search through 32ish pages to find the the supporting information.
 * —Trappist the monk (talk) 22:06, 16 August 2019 (UTC)
 * For the page numbers I had to re-format it. <b style="color: #e34234;">QuackGuru</b> ( talk ) 23:12, 16 August 2019 (UTC)
 * Four things about that:
 * SGUS is not an author listed in so readers who might read a printed copy of the article won't be able to find it without a special decoder-ring that tells them that SGUS = National Center for Chronic Disease ...
 * items in §Bibliography should be listed in alpha order by author
 * clicking this title link The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General I don't expect to land at "Nicotine". Don't astonish readers.
 * why is it https://stacks.cdc.gov/view/cdc/21569/Share (not a dead link) but https://web.archive.org/web/20150915172434/http://www.surgeongeneral.gov/library/reports/50-years-of-progress/sgr50-chap-5.pdf? The root url should be the same in both.
 * —Trappist the monk (talk) 23:59, 16 August 2019 (UTC)
 * SGUS stands for Surgeon General of the United States. They are the publisher. The National Center for Chronic Disease... is a co-publisher/author. I listed it by year. I removed the archived link. The other link has a PDF file. <b style="color: #e34234;">QuackGuru</b> ( talk ) 00:29, 17 August 2019 (UTC)
 * —Trappist the monk (talk) 23:59, 16 August 2019 (UTC)
 * SGUS stands for Surgeon General of the United States. They are the publisher. The National Center for Chronic Disease... is a co-publisher/author. I listed it by year. I removed the archived link. The other link has a PDF file. <b style="color: #e34234;">QuackGuru</b> ( talk ) 00:29, 17 August 2019 (UTC)

Date formats
I would say that this is not a bug. Add either of the or  to an article and the cs1|2 templates will render dates in the chosen format; see the  documentation.

—Trappist the monk (talk) 23:42, 30 August 2019 (UTC)


 * definitely not a bug. AManWithNoPlan (talk) 01:17, 31 August 2019 (UTC)

Convert &amp;#x2013; to &amp;ndash;
https://github.com/ms609/citation-bot/pull/2130 AManWithNoPlan (talk) 18:56, 31 August 2019 (UTC)

Caps: B/gcvs instead of B/GCVS or whatever other variations (B/Gcvs, b/Gcvs...)
For sources like. &#32; Headbomb {t · c · p · b} 19:53, 2 September 2019 (UTC)

fixed

Archive-url & associated parameters stripped out.
That’s a good thing. Archive URLS are copies stored on remote archive websites. Not the original URLS. AManWithNoPlan (talk) 10:52, 3 September 2019 (UTC)
 * Yes, but shouldn't they stay when the original is dead? Quuux (talk) 11:28, 3 September 2019 (UTC)
 * The uses that AMWNP removed with this edit are incorrect. The intent of archive-url is to hold a web archiving webpage, such as the same page hosted at Internet Archive. --Izno (talk) 13:55, 3 September 2019 (UTC)

Request: add "subscription required" tag
How would we reliably know? AManWithNoPlan (talk) 23:50, 31 August 2019 (UTC)

Title’s where the original title has quotes
A more precise style guide https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Titles#Typographic_effects AManWithNoPlan (talk) 21:27, 4 September 2019 (UTC)
 * it is up to the editor to take special care in these extraordinarily limited case and flag the title with a comment to this effect.   AManWithNoPlan (talk) 21:30, 4 September 2019 (UTC)
 * I have now done that. AManWithNoPlan (talk) 22:18, 4 September 2019 (UTC)

WebCite query strings
This statement on that page “This is followed by the original URL which helps protect against malicious code that is hiding an inappropriate link, such as spam.” is blatantly wrong. The truth is the opposite “the url at the end is meaningless and allows people to append anything they want to and lie to people” is the truth. AManWithNoPlan (talk) 00:51, 9 September 2019 (UTC)


 * No. The reason we do this is to avoid malicious URLs from getting past Wikipedia edit list filters during page-save. This is done per an RfC as a solution to the WikiCite problem since web shortening is otherwise disallowed on Wikipedia because it avoids the edit filters. Thus we tack on the URL at the end so the Wikipedia edit filters can process it. The URL is not a "lie" it is checked and rechecked by bots using the WebCite API to ensure they match, our bots are continually checking. Please do not remove this URL otherwise we are in a bot war, our bots will just re-add it per the RfC requirement and policy about web shortening and edit filters. -- Green  C  03:51, 9 September 2019 (UTC)
 * If the URL parameter needs to be checked/enforced by bots anyway, why can't that job just be performed on the actual url parameter of the template? That's what AbuseFilter rules should be targeting. Nemo 06:20, 9 September 2019 (UTC)
 * AbuseFilters can not target templates as far as I know. Archive URLs often exist outside templates (external links, bare links etc) -- Green  C  06:34, 9 September 2019 (UTC)
 * so the ones I found with the URL set to the wrong thing would have gotten caught eventually. Interesting.  Those bots should probably change the URL to the other non-time stamp format to fix this problem for good. AManWithNoPlan (talk) 10:53, 9 September 2019 (UTC)
 * Yes. When people inject a false ?url=http:// into the WebCite URL it will get caught eventually. I run into them and its always a clever way to avoid a blacklist filter. Not super common but they do exist. Or they don't have a ?url=http:// at all, in which case the bot will try to add it (based on data from the WebCite API not the url field) and the bot gets blocked on page save due to the blacklist filter which is a flag of this problem. -- Green  C  14:45, 9 September 2019 (UTC)

Temporary Block
A 1-hour block has been placed related to Administrators'_noticeboard. — xaosflux  Talk 18:15, 10 September 2019 (UTC)

fixed for now...

Regular expression failure

 * Also "Regular expression failure in Palermo when extracting Templates" Jonatan Svensson Glad (talk) 08:04, 12 September 2019 (UTC)

Request: more citation templates
Why? What is the need that is not adequately handled with the existing tools? &diams; J. Johnson (JJ) (talk) 20:51, 12 September 2019 (UTC)


 * If you just want more templates, this is the wrong place to ask for those. Try Help:CS1. However, many of those already exist, see Citation Style 1 and Category:Citation Style 1 specific-source templates, which will contain things like cite tweet and cite Youtube (an alias of cite AV media). &#32; Headbomb {t · c · p · b} 21:09, 12 September 2019 (UTC)


 * I think that the user is referring to the options provided in the dropdown menu when you edit a page via a web browser. Cite web, news, book, and journal are the only available options. These provide a user-friendly field form that is much easier to fill out than the wikitext templates. Still not the right place to request, though, I'm guessing. -2pou (talk) 05:06, 13 September 2019 (UTC)
 * Yes. I guess I shouldn't have added it to the Archive page (I didn't think I had), but I don't know where else such a request would go. -- TrottieTrue (talk) 23:06, 14 September 2019 (UTC)

Missing space in title
There is nothing we can do about GIGO. The meta data in the crossref database does not have the space. So, the authoritative answer is missing the space. AManWithNoPlan (talk) 17:09, 14 September 2019 (UTC)

The Satanic Bible
It cannot possibly be intended behavior for this bot to be doing this... right? Pinging Chris Capoccia since that is the user who has "activated" the majority of these edits. Happy to try to address the DOI issue when I get a second, but I can't see any reason why the date as of which the DOI is broken needs to be updated approximately daily. Is there no logic to avoid this kind of watchlist/page history-clogging, unhelpful edit? GorillaWarfare (talk) 02:08, 14 September 2019 (UTC)
 * Perhaps if the day is within the last month then do not update I will look into that.  Generally speaking running the bot on the same over and over again should be pointless.  We consider running the bot again and getting more results to be a bug.  AManWithNoPlan (talk) 02:16, 14 September 2019 (UTC)
 * Chris should also not be asking the same categories to be pointlessly processed over and over. &#32; Headbomb {t · c · p · b} 02:21, 14 September 2019 (UTC)
 * and the bot should not be giving people a false sense of usefulness by changing dates by less than a month. AManWithNoPlan (talk) 02:35, 14 September 2019 (UTC)
 * For sure. If a DOI is marked as broken, there's no real need to update the date all that often. Through the gadget, sure, since people choose to save or not. But through batch runs? Once a year should be enough. &#32; Headbomb {t · c · p · b} 04:05, 14 September 2019 (UTC)
 * fixed the trigger. Perlmutter ref had an access date with no URL. — Chris Capoccia 💬 12:59, 14 September 2019 (UTC)
 * also fixed the doi. — Chris Capoccia 💬 13:02, 14 September 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2145 AManWithNoPlan (talk) 20:16, 15 September 2019 (UTC)


 * fixed minimum of a months change in date. AManWithNoPlan (talk) 23:15, 15 September 2019 (UTC)

most likely bad meta data used
The meta data is “interesting”, I will think about filtering. AManWithNoPlan (talk) 17:21, 14 September 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2144 and https://github.com/ms609/citation-bot/pull/2143 AManWithNoPlan (talk) 19:40, 15 September 2019 (UTC)

Chapter url
https://github.com/ms609/citation-bot/pull/2142 AManWithNoPlan (talk) 19:21, 15 September 2019 (UTC)
 * Thank you. DuncanHill (talk) 20:31, 15 September 2019 (UTC)

If you remove firstn/lastn, also remove author-linkn/authorn-link
That's more annoying than it sounds since we have to check a lot of name parameters. AManWithNoPlan (talk) 18:25, 9 August 2019 (UTC)

Purple background in this page makes it hard to read
I use the blackscreen gadget (it gives green text on a black screen) as it make Wikipedia much more readable for me. The purpleish background on parts of this page makes it almost impossible to read the text. DuncanHill (talk) 12:21, 15 September 2019 (UTC)


 * feel free to suggest coding changes to the bot bug template. Do ANY other templates detect your non standard style AManWithNoPlan (talk) 18:11, 15 September 2019 (UTC)


 * also, please point to information about this gadget, I have no knowledge of it. AManWithNoPlan (talk) 19:22, 15 September 2019 (UTC)


 * 1) I am not a coder, 2) NONE cause any problems that I am aware of at the moment, there have been a few in the past but people have been very helpful in making their templates compliant when asked, and 3) it's called blackskin, it's one of the gadgets available in preferences ("Use a black background with green text" in Appearance), and is listed at Gadget. DuncanHill (talk) 20:30, 15 September 2019 (UTC)


 * do you remember any of the templates names. I am curious what they did. AManWithNoPlan (talk) 23:19, 15 September 2019 (UTC)


 * How is this? AManWithNoPlan (talk) 14:22, 16 September 2019 (UTC)


 * It does look clearer now, thank you. DuncanHill (talk) 14:24, 16 September 2019 (UTC)

fixed

Convert Template:PMID when in ref tags with eight digits
https://github.com/ms609/citation-bot/pull/2146 AManWithNoPlan (talk) 20:53, 16 September 2019 (UTC)

Strip semicolons
This should perhaps not apply to title however. Also might not be safe to do in some identifiers. &#32; Headbomb {t · c · p · b} 00:14, 18 June 2019 (UTC)
 * and as always titles good friend chapter too. AManWithNoPlan (talk) 13:57, 23 June 2019 (UTC)
 * And contribution and other aliases. &#32; Headbomb {t · c · p · b} 15:49, 23 June 2019 (UTC)
 * And NOT & a m p ; and his friends. AManWithNoPlan (talk) 01:25, 30 June 2019 (UTC)
 * Rather than a blacklist, we would want a white list of parameters. AManWithNoPlan (talk) 15:17, 6 July 2019 (UTC)
 * Get the list of all parameters and remove those then. &#32; Headbomb {t · c · p · b} 20:07, 6 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2130 AManWithNoPlan (talk) 19:06, 31 August 2019 (UTC)
 * Should also handle author/editors/contributors/others (and their variants) &#32; Headbomb {t · c · p · b} 19:55, 31 August 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2153 AManWithNoPlan (talk) 18:37, 18 September 2019 (UTC)

Garbage archive-url cleanup
https://github.com/ms609/citation-bot/pull/2154 AManWithNoPlan (talk) 18:56, 18 September 2019 (UTC)

Bot dies
https://github.com/ms609/citation-bot/pull/2156 AManWithNoPlan (talk) 21:24, 18 September 2019 (UTC)

Request: Capitalize linked journals
That is very dangerous territory. We would have to verify that the old page did not exist at all and that the new page did exist. We really have not ever got in the business of fixing red links. AManWithNoPlan (talk) 15:01, 2 May 2019 (UTC)
 * It's not a matter of fixing redlinks, it's a matter of capitalization. E.g. Journal of physics vs Journal of Physics or INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY vs International Journal of Systematic and Evolutionary Microbiology. Or Developmental neuroscience vs Developmental Neuroscience. &#32; Headbomb {t · c · p · b} 15:43, 2 May 2019 (UTC)
 * And in the rare case that the capitalized version links to a different page, it will link to the correct page instead of the wrong one. &#32; Headbomb {t · c · p · b} 15:48, 2 May 2019 (UTC)
 * Unless it’s a foreign-language title — the bot sometimes gets a little overzealous capitalizing words, and a redirect from a title with extra capitalization might not exist yet for articles about some publications. Umimmak (talk) 14:23, 4 September 2019 (UTC)
 * That's mostly taken care of through this + a custom list of foreign titles. This is just bringing the bot inline with what it would do to an unlinked title, so if there's an issue with capitalization, it wouldn't be specific to the linked version. &#32; Headbomb {t · c · p · b} 14:33, 4 September 2019 (UTC)

Incorrectly capitilized word in piped link

 * This should apply to all disambiguators e.g. (Hindawi journal), (magazine), (website), (musicology journal), ... &#32; Headbomb {t · c · p · b} 23:07, 22 September 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2158 AManWithNoPlan (talk) 23:18, 22 September 2019 (UTC)

Unexpected conversion from cite web to cite journal
The Stanford Encyclopedia of Philosophy entry looked more like a website than a journal and was surprised by this conversion by Citation bot diff. Is it supposed to work this way? Final output is OK, but not sure what was wrong with cite web. — Chris Capoccia 💬 12:14, 23 September 2019 (UTC)
 * This one should probably go to cite encyclopedia instead. --Izno (talk) 16:21, 23 September 2019 (UTC)

fixed

Caps: The De Paulia; dell'
De Paulia done. Waiting on this for dell words: Italian dell'xxx words AManWithNoPlan (talk) 17:41, 24 September 2019 (UTC)
 * Why waiting on for ? &#32; Headbomb {t · c · p · b} 19:02, 24 September 2019 (UTC)
 * Wiktionary is not aware of any other language using dell'. Of all the preposizioni articolate, degli and delle should also be rather safe. Nemo 19:25, 24 September 2019 (UTC)

Hep Lib.web and other arXiv-mirrors are not journals
HEP Lib.Web. does seem to be a journal or work of some kind, e.g.. Full name is High Energy Physics Libraries Webzine. &#32; Headbomb {t · c · p · b} 19:30, 25 September 2019 (UTC)
 * 🙄 AManWithNoPlan (talk) 21:35, 25 September 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2165 AManWithNoPlan (talk) 21:45, 25 September 2019 (UTC)
 * Ok, thanks. So the only action needed is to spell out the full name in such a way as to make it look like a name of a journal and not a web site to avoid the same confusion befalling others. —David Eppstein (talk) 22:28, 25 September 2019 (UTC)

Διδακτορική Διατριβή
https://github.com/ms609/citation-bot/pull/2164 AManWithNoPlan (talk) 21:34, 25 September 2019 (UTC)

Talk pages
Is the bot supposed to work on talk pages such as Talk:Lockheed SR-71 Blackbird ?  Stepho  talk 10:39, 22 September 2019 (UTC)
 * Only if requested. Here User:GeneralPoxter asked the bot to make an edit. &#32; Headbomb {t · c · p · b} 22:14, 22 September 2019 (UTC)
 * If a category includes it then yes also. AManWithNoPlan (talk) 11:07, 23 September 2019 (UTC)

notabug, I guess....

I think that the "main" function when running the bot on a category should be to only run on pages in article namespace (ns 0), and require a manual opt-in (such as &allNS=true) or something, since maintenance categories mistakenly has talk pages and template documentation pages in them. Jonatan Svensson Glad (talk) 20:12, 26 September 2019 (UTC)
 * It wouldn't be a bad idea to have category runs only allow for Main/Draft spaces by default. &#32; Headbomb {t · c · p · b} 22:24, 26 September 2019 (UTC)
 * fixed no longer in category mode. AManWithNoPlan (talk) 14:51, 27 September 2019 (UTC)

For purpose of title matching, strip sub/sup markup
If you have

This should be treated as equivalent to

For purpose of title-matching. &#32; Headbomb {t · c · p · b} 22:44, 26 September 2019 (UTC)

fixed

Author-link vs Agency
https://github.com/ms609/citation-bot/pull/2171 AManWithNoPlan (talk) 17:44, 29 September 2019 (UTC)

A question on removal of my edits
You removed my edits from Proximity space with the log message "Removing self promotion".

Please explain which Wikipedia policy (the page and the exact quote, please) strictly disallows self-promotion.

I do realize that self-promotion should be restricted, but I see no rule in Wikipedia policy that would completely disallow it and thus would justify your removal.

If you don't explain it soon and do not restore it back, I will dispute it with the Wikipedia authorities.

--VictorPorton (talk) 21:22, 1 October 2019 (UTC)
 * You are mistaken. As you can see from, Citation bot did not revert your edits.
 * —Trappist the monk (talk) 21:28, 1 October 2019 (UTC)


 * notabug, just a user who cannot read edit logs. AManWithNoPlan (talk) 22:21, 1 October 2019 (UTC)

Idea: Usage stats
Not really anything pressing, but now that we have OAuth in, it would be neat to have usage statistics. Who makes use of the bot. If the bot is activated via the web interface, scripts, etc... Or whatever else is trackable. &#32; Headbomb {t · c · p · b} 06:15, 27 June 2019 (UTC)
 * I guess one could sort the bot contributions based on if the edit summary said “category” and one could query Wikipedia and search for edit summaries with the “use this tool” text in them. AManWithNoPlan (talk) 13:40, 27 June 2019 (UTC)
 * Having a  in the API would likely be a better way of tracking things, but right now I'm mostly thinking about something very non-critical. I'll take any bug fix and things that actual affect the edits of the bots over usage stats thought. Just figured if one of the talk page stalkers felt like compiling stats, or build a sub-module that would export information into an external database after every edit, well that's a nice little project. &#32; Headbomb {t · c · p · b} 17:17, 27 June 2019 (UTC)
 * We currently have no logging, so any logging would have to be done in the edit summaries. AManWithNoPlan (talk) 14:32, 28 June 2019 (UTC)
 * "Currently", yup. But if there was logging, we could have graphs/stats like |User:AManWithNoPlan|User:Marianne_Zimmerman, except for citation bot usage, instead of pageviews.
 * Anyway, it's an idea more than anything. Not critical by far, and I'd rather have someone else work on that if that ever gets done (unless we suddendly run out of edit-related bug fixes and feature requests). &#32; Headbomb {t · c · p · b} 15:15, 28 June 2019 (UTC)
 * Could it be enough to add a hashtag and rely on hashtags/? Nemo 15:37, 28 June 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2170 AManWithNoPlan (talk) 17:27, 29 September 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2173 AManWithNoPlan (talk) 19:58, 30 September 2019 (UTC)
 * fixed using hashish tags tags. AManWithNoPlan (talk) 11:15, 5 October 2019 (UTC)

OAuth every time I use Citebot?
Why does Wikipedia citation bot (WP:UCB) require WP:OAuth every time I use it? I know there was a stir regarding the tool a while back but...? 18:49, 18 September 2019 (UTC)


 * I don't know. Your granting permission should stick.  It might be related to the fact that require permission to edit pages "as the user".  We never actually use that, and the keeper of the OAuth setting could probably fix that.  AManWithNoPlan (talk) 18:58, 18 September 2019 (UTC)


 * Waddie96, do you have cookies enabled for tools.wmflabs.org? Nemo 19:38, 18 September 2019 (UTC)

I agree this is something rather annoying. It didn't do that initially, but started to require it every time a few weeks/month ish after the rollout. &#32; Headbomb {t · c · p · b} 20:20, 18 September 2019 (UTC)


 * Every time there is a bug fix applied, ALL cookies are lost on the server side. 20:24, 18 September 2019 (UTC)


 * I think it is mostly fixed now. AManWithNoPlan (talk) 21:04, 5 October 2019 (UTC)
 * It certainly seems better. &#32; Headbomb {t · c · p · b} 21:13, 5 October 2019 (UTC)

fixed for the better. AManWithNoPlan (talk) 01:42, 7 October 2019 (UTC)

Caps:NDT & e International → NDT & E International
Self explanatory &#32; Headbomb {t · c · p · b} 02:52, 6 October 2019 (UTC)

fixed AManWithNoPlan (talk) 01:41, 7 October 2019 (UTC)

Cleanup: Remove empty journal/issue from cite book, remove empty ISBNs from cite journals
Those parameters should not be present on a cite book, likewise isbn should not (normally) be present in a cite journal. Only remove the empty ones though. &#32; Headbomb {t · c · p · b} 19:26, 5 October 2019 (UTC)


 * fixed AManWithNoPlan (talk) 17:17, 7 October 2019 (UTC)

500 error
Citation bot server keeps throwing a 500 error. &#32; Headbomb {t · c · p · b} 15:46, 7 October 2019 (UTC)
 * I don’t see it. You might need to delete your tools.wikimedoa.org cookies  AManWithNoPlan (talk) 17:20, 7 October 2019 (UTC)
 * Probably a temporary hiccup. Things are fine now. &#32; Headbomb {t · c · p · b} 17:24, 7 October 2019 (UTC)

fixed now. AManWithNoPlan (talk) 18:06, 7 October 2019 (UTC)

Need to run twice to decapitalize
https://github.com/ms609/citation-bot/pull/2186  JSTOR's were not on the list of reliable information. AManWithNoPlan (talk) 21:40, 7 October 2019 (UTC)

Category/batch whitelist
Category/batch runs are being abused. Possibilities on dealing with this are:
 * a) a whitelist of people allowed to ask for unlimited category/batch runs
 * This could just be something like extended confirmed.  Edit: Template editors might be a better idea. &#32; Headbomb {t · c · p · b} 03:25, 11 September 2019 (UTC) 
 * That is not what the permission is for. Please do not attach things to random permissions. --Izno (talk) 13:25, 11 September 2019 (UTC)
 * b) a whitelist for limited category/batch runs (say ~250 pages at once, tops)
 * This could just be something like autoconfirmed/confirmed.
 * c) a way to kill inappropriate category/batch runs

And have category/batch runs disabled/greatly limited (~25 articles) for non-confirmed/whitelisted users. &#32; Headbomb {t · c · p · b} 17:48, 7 August 2019 (UTC)


 * A/B may also prevent sock puppets and "suspicious" new users that may intend to use the bot in ways that are undesired from doing so. Users without edits or very few edits might not check their edits or won't see possible mistakes by the bot and as such won't report them. Proposal B seems like a good one to go forward with in any case in my opinion. For proposal C it might good to define who could use that option, only maintainers and the operator or also some "trusted" users + we would also need to define what is considered inappropriate. For option A it might also be an idea to let extended confirmed up to 1000 pages, and then have a further whitelist of users who can do unlimited runs ie bureaucrats,administrators and "normal" users who have proven to understand of what the bot does, the impact of extremely large runs (ie don't run during high usage times) and possibly are also actively reporting bugs and joining in discussion here. Just a few things to think about. -Redalert2fan (talk) 20:05, 16 August 2019 (UTC)

The bot has been effectively disabled for the last week or so due to Chris Capoccia's insanely large-category requests (e.g. Category:Pages with citations having bare URLs) that hog all the resources. Please implement better parallelism à la first "Extended content" box in the see also link above, or something similar enough that one or two large requests doesn't disable the bot for everyone else. &#32; Headbomb {t · c · p · b} 15:40, 1 September 2019 (UTC)
 * maybe some simple intermediate steps would be good. currently the bot is still churning on something from a couple days ago. it doesn't even appear in any of my browser windows and there's no way for me to stop it. maybe the bot could refuse to do large requests. or even eliminate the category box altogether. — Chris Capoccia 💬 14:12, 2 September 2019 (UTC)
 * For intermediate steps, see . &#32; Headbomb {t · c · p · b} 19:55, 2 September 2019 (UTC)
 * it would be nice to have multiple bot/zotero accounts AManWithNoPlan (talk) 20:36, 2 September 2019 (UTC)

And again, because of the massive run against Category:CS1 errors: missing periodical, with over 300K articles in it. Please kill this! &#32; Headbomb {t · c · p · b} 14:54, 10 September 2019 (UTC)
 * I requested a bot block at WP:AN for this. This is way too large an unsupervised bot run. &#32; Headbomb {t · c · p · b} 17:43, 10 September 2019 (UTC)
 * Made another such request. &#32; Headbomb {t · c · p · b} 01:18, 14 September 2019 (UTC)


 * 500 is a typical number for API request blocks, and the size of a page of diffs. Checking 500 diffs is some work, a common max size with the option to request another 500 etc.. --  Green  C
 * I suggest a soft limit of 100 to 250 save for a handful of users. That will usually take over an hour to process, and quite long to review as well. &#32; Headbomb {t · c · p · b} 01:44, 14 September 2019 (UTC)
 * When folks were doing large runs, I routinely checked thousands of edits a day out of curiosity. I'm not sure that a limit of few hundreds would be appropriate. Nemo 13:25, 14 September 2019 (UTC)
 * The issue here isn't so much the lack of checking, more than asking for runs larger than 100 or so disables the bot for everyone else for several hours. &#32; Headbomb {t · c · p · b} 16:11, 14 September 2019 (UTC)

Changes made. Should be mostly fixed AManWithNoPlan (talk) 00:27, 10 October 2019 (UTC)
 * What was done, exactly? &#32; Headbomb {t · c · p · b} 00:54, 10 October 2019 (UTC)
 * will refuse to run on large categories AManWithNoPlan (talk) 02:40, 10 October 2019 (UTC)
 * Yes, but what is the definition of a 'large category' here? &#32; Headbomb {t · c · p · b} 02:58, 10 October 2019 (UTC)
 * According to https://github.com/ms609/citation-bot/pull/2189 > 10000 pages. --Redalert2fan (talk) 04:23, 10 October 2019 (UTC)
 * That's way too big a limit. &#32; Headbomb {t · c · p · b} 17:47, 11 October 2019 (UTC)
 * Cut to 1000 AManWithNoPlan (talk) 19:48, 11 October 2019 (UTC)
 * Call that fixed for now AManWithNoPlan (talk) 20:44, 11 October 2019 (UTC)

url jstor cleanup
https://github.com/ms609/citation-bot/pull/2193 AManWithNoPlan (talk) 19:51, 11 October 2019 (UTC)

HTML vs real character and weird interaction with ref tags
I don’t see the bold. AManWithNoPlan (talk) 17:48, 29 September 2019 (UTC)
 * See ref 17 in that diff above. Jonatan Svensson Glad (talk) 19:37, 29 September 2019 (UTC)
 * Only when immediately preceded with a ref tag on the same line. AManWithNoPlan (talk) 20:05, 29 September 2019 (UTC)
 * Looks like a bug in the handling of references in general in wikiland. AManWithNoPlan (talk) 18:49, 30 September 2019 (UTC)

Daily News is a newspaper so it's online presence should be treated as such. 'Fixing' the wiki markup error by the simple expedient of stripping the markup without also ensuring that the parameters in the template are used correctly only masks the underlying problem: newspaper name in publisher. The other wiki markup fixes are probably correct but Citation bot should not just strip markup as it appears that it does.

—Trappist the monk (talk) 12:52, 12 October 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/2201 adds more newspapers to the publisher confusion list. AManWithNoPlan (talk) 14:12, 12 October 2019 (UTC)
 * Such a short list is PUBLISHERS_ARE_WORKS. In the code for Monkbot/task 14 I have a list of 1800+ newspapers (canonical names and redirects) all of which I have found in use at en.wiki.  Perhaps it would be best to not attempt to make these kinds of fixes.
 * —Trappist the monk (talk) 15:46, 12 October 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2205 AManWithNoPlan (talk) 14:54, 15 October 2019 (UTC)


 * fixed

Missed a jstor cleanup
we don’t handle jstor.or AManWithNoPlan (talk) 21:53, 15 October 2019 (UTC)
 * Well look at that. I completely missed the typo. &#32; Headbomb {t · c · p · b} 02:03, 16 October 2019 (UTC)

Question about handles
I'm building a list of various handle links, e.g.
 * http://digilib.gmu.edu/bitstream/handle/1920/1595/483_15_01.pdf
 * http://digilib.gmu.edu/dspace/handle/1920/2931
 * http://digilib.gmu.edu/handle/1920/505

What do you need to know to implement hdl convertion? Do you need to know all root paths domains? Or just or even just ? &#32; Headbomb {t · c · p · b} 07:24, 27 June 2019 (UTC)
 * http://digilib.gmu.edu/bitstream/handle/...
 * http://digilib.gmu.edu/dspace/handle/...
 * http://digilib.gmu.edu/handle/...
 * http://digilib.gmu.edu/handle/...
 * http://digilib.gmu.edu/...

Also does knowing http vs https matter? &#32; Headbomb {t · c · p · b} 07:25, 27 June 2019 (UTC)


 * http and https is irrelevant. Right now, each and every URL path is specific.  I should change it to be hosts and paths separate.  Hosts is probably enough, unless you find a new file path beyond the usual suspects. Please verify each host actually works though;  http://oasis.postech.ac.kr/handle/2014.oak/9965 is not a handle 🙄. AManWithNoPlan (talk) 13:45, 27 June 2019 (UTC)

, well, I'm building a massive list with the help of others (e.g. ), so I want to know what's the most useful format. Right now, if I have something like I'll eliminate things that only differ after the /handle/ part, and have something like and currently have 2169 such paths. Which I could reduce to (after checking that they indeed work inside a hdl) But I was wondering if there was a way to trim that down further to something more manageable/less redundant. &#32; Headbomb {t · c · p · b} 17:05, 27 June 2019 (UTC)
 * http://digilib.gmu.edu/bitstream/handle/1920/1595/483_15_01.pdf
 * http://digilib.gmu.edu/dspace/handle/1920/2931
 * http://digilib.gmu.edu/dspace/handle/1920/2932
 * http://digilib.gmu.edu/dspace/handle/1920/2933
 * http://digilib.gmu.edu/handle/1920/505
 * http://digilib.gmu.edu/handle/1920/525
 * http://digilib.gmu.edu/bitstream/handle/1920/1595/483_15_01.pdf
 * http://digilib.gmu.edu/dspace/handle/1920/2931
 * http://digilib.gmu.edu/handle/1920/505
 * http://digilib.gmu.edu/bitstream/handle/...
 * http://digilib.gmu.edu/dspace/handle/...
 * http://digilib.gmu.edu/handle/...

While is is true that some of them probably do not have all these possibilities, I doudt that we would run into a case where http://digilib.gmu.edu/dspace/handle/ works, but http://digilib.gmu.edu/bitstream/handle/ is not a handle but some thing else. So, what I need are three lists:
 * 1)  Protocol: http and https (short list)
 * 2)  Host names (HUGE list)
 * 3)  Suffix list (/handle/, /bitstream/handle/, ....) (medium sized list).

The code can then accept and convert any combination. AManWithNoPlan (talk) 17:22, 27 June 2019 (UTC)


 * That works. &#32; Headbomb {t · c · p · b} 17:29, 27 June 2019 (UTC)

The easy stuff
 * Protocols:
 * Suffix:

Going to build the host names list. It's in the ballpark of 1228 domains. &#32; Headbomb {t · c · p · b} 17:55, 27 June 2019 (UTC)


 * currently we use a single Regex. I will need to change that. I already have a plan for some simple fast code.  AManWithNoPlan (talk) 18:58, 27 June 2019 (UTC)
 * Code written, now for testing. AManWithNoPlan (talk) 21:19, 27 June 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1856 AManWithNoPlan (talk) 21:20, 27 June 2019 (UTC)
 * More https://github.com/ms609/citation-bot/pull/1857 AManWithNoPlan (talk) 23:37, 27 June 2019 (UTC)
 * when you have a host list post the link. AManWithNoPlan (talk) 03:49, 28 June 2019 (UTC)
 * A preview is in User:Headbomb/Sandbox. User:Betacommand will run a script to see which handle links resolve when put into a hdl. I'll then be able to give you a list of domains that could be converted. It likely won't cover everything, but it'll probably cover 95%+ of cases. &#32; Headbomb {t · c · p · b} 03:56, 28 June 2019 (UTC)
 * Got a final list yet? AManWithNoPlan (talk) 15:31, 3 July 2019 (UTC)

Still chugging at it. The list of HDL urls that didn't work needs manual review still, because some of the servers were only temporarily down and was not in the most convenient of formats. Should have it by the end of the week though.&#32; Headbomb {t · c · p · b} 15:47, 3 July 2019 (UTC)
 * Got a final list yet? AManWithNoPlan (talk) 14:27, 19 July 2019 (UTC)
 * Still working on it. Not forgotten though. I was travelling for a while, then had computer issues (dead PSU) which prevented me from. Hoping to have it done this weekend. &#32; Headbomb {t · c · p · b} 16:08, 19 July 2019 (UTC)
 * any progress AManWithNoPlan (talk) 17:28, 16 August 2019 (UTC)
 * It's still on the to-do list. &#32; Headbomb {t · c · p · b} 22:30, 24 August 2019 (UTC)
 * the year (fiscal) is almost done. AManWithNoPlan (talk) 22:07, 29 September 2019 (UTC)

Cough Cough.... AManWithNoPlan (talk) 19:52, 11 October 2019 (UTC)

Flag as notabug to archive discussion and split off from discussion of getting data. AManWithNoPlan (talk) 19:03, 16 October 2019 (UTC)

Don't change case if linked name is correct/not a redirect

 * The behaviour is correct, we use title case, regardless of whatever style the publication uses. The article should be moved to either Montana: The Magazine of Western History or Montana (magazine), per the usual convention for magazine titles. &#32; Headbomb {t · c · p · b} 23:23, 16 October 2019 (UTC)
 * No matter our naming standards, if a magazine is linked and is named a certain way on Wikipedia (there may have been discussions about speicifc artilces), we should not chnage the actual link. We could change to Montana the Magazine of Western History''' (adding a pipe to the link) in order to not break links, in case there isn't a redirect for that new capitilization. Jonatan Svensson Glad (talk) 10:35, 17 October 2019 (UTC)
 * The link wasn't broken. &#32; Headbomb {t · c · p · b} 13:33, 17 October 2019 (UTC)
 * In this case no, but we really should have to do a patch-work to see if something brakes or not. Jonatan Svensson Glad (talk) 13:58, 17 October 2019 (UTC)
 * This doesn't breaks things anymore than changing an unlinked 'Journal Of Foobar' to 'Journal of Foobar' breaks things. If it did, then a redirect is missing somewhere / a page is located at the wrong title, and it would get picked up by WP:JCW/Miscapitalisations. &#32; Headbomb {t · c · p · b} 18:19, 17 October 2019 (UTC)
 * While that is true, in revision 1 the link works, in revision 2 a bot has changed the link to a red-link braking the link. That is not what the bot is either intended to do (brake things), not accepted bot behavior (even if commons sence would accept it). I just feel it is not ok for a bot to change a working link to a possible non-working link. Jonatan Svensson Glad (talk) 18:33, 17 October 2019 (UTC)
 * The second revision had a working link, not a broken one. &#32; Headbomb {t · c · p · b} 18:43, 17 October 2019 (UTC)
 * This time. Jonatan Svensson Glad (talk) 20:24, 17 October 2019 (UTC)

Don't get title from dead URLs
https://github.com/ms609/citation-bot/pull/2210 AManWithNoPlan (talk) 22:18, 17 October 2019 (UTC)

handle Methods in Molecular Biology better
This is a book series, so should be converted to a cite book, with series, and drop journal. See whatever you are doing with Methods in Enzymology for reference. &#32; Headbomb {t · c · p · b} 11:52, 14 October 2019 (UTC)
 * It works better after stripping (Clifton, N.J.) from journal, but the conversion to cite book isn't complete . Running again converts to cite book . At this point however, it doesn't add chapter/title correctly . &#32; Headbomb {t · c · p · b} 11:57, 14 October 2019 (UTC)
 * I will have to write special code. AManWithNoPlan (talk) 23:29, 17 October 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2212 AManWithNoPlan (talk) 14:45, 18 October 2019 (UTC)
 * please look and and flag as fixed or point out more issues. AManWithNoPlan (talk) 16:01, 18 October 2019 (UTC)

caps: eGEMs
https://github.com/ms609/citation-bot/pull/2213 AManWithNoPlan (talk) 14:44, 25 October 2019 (UTC)

Also remove empty location/place from cite journal
https://github.com/ms609/citation-bot/pull/2214 AManWithNoPlan (talk) 14:50, 25 October 2019 (UTC)

Crossref search truncates name
Actually it is bad crossref data for or perhaps bad formatted response AManWithNoPlan (talk) 01:27, 26 October 2019 (UTC)

Hyphens and dashes and accents
I notice lots of edits like this one assisted by Citation bot have created date ranges in titles (e.g. title=Arthur Erdelyi. 2 October 1908-12 December 1977) with unspaced hyphens where the source has a spaced hyphen and Wikipedia style would be to use a spaced en dash. If this hasn't been fixed in recent years, maybe we can work on it. I have no idea what it takes, but will help as I can; I've been fixing a ton of these by hand. That particular example also dropped the accent from Erdélyi; is that expected? Dicklyon (talk) 03:20, 26 October 2019 (UTC)
 * Those were, I believe, simple imports of the various cite doi subtemplates. A more recent diff would be better here. &#32; Headbomb {t · c · p · b} 03:30, 26 October 2019 (UTC)
 * If you're saying this is a thing of the past only, I'm happy. If I find a newer one, I'll be back. Dicklyon (talk) 04:33, 26 October 2019 (UTC)
 * notabug or fixed. Impossible to tell which since the meta-data gets better over time and the bit gets better over time. AManWithNoPlan (talk) 11:14, 26 October 2019 (UTC)

Caps: Off
https://github.com/ms609/citation-bot/pull/2217 AManWithNoPlan (talk) 11:20, 26 October 2019 (UTC)