User talk:Citation bot/Archive 17

APIs not working: PubMed, PMID

 * I have not figured out pubmed yet. AManWithNoPlan (talk) 02:04, 15 June 2019 (UTC)
 * We have been black listed by pubmed. I have emailed them.  AManWithNoPlan (talk) 21:41, 16 June 2019 (UTC)
 * A bug report has also been filed. Apparently the Wikimedia tool server which hosts many of these citations tools has been blocked from accessing the PubMed name server. Wikimedia Cloud Services has contacted the NIH with a request to lift the block. Boghog (talk) 16:54, 22 June 2019 (UTC)

Seems to be working now. &#32; Headbomb {t · c · p · b} 07:23, 24 June 2019 (UTC)
 * Tool server is still blocked. Boghog (talk) 12:08, 24 June 2019 (UTC)


 * I have a rest page and one expanded and one didn’t. It changes while running the page!!! AManWithNoPlan (talk) 13:06, 24 June 2019 (UTC)


 * I have seen that too. With the citation filling tool that also downloads data from PubMed and runs on the tool server, it occasionally works, but the vast majority of time, it doesn't. Boghog (talk) 14:45, 24 June 2019 (UTC)


 * Can't they just give the toolserver an /etc/hosts file or backup DNS to 8.8.8.8? Is it really that hard?  AManWithNoPlan (talk) 18:45, 28 June 2019 (UTC)

include cite news and thesis in comma, colon, semicolon removal
https://github.com/ms609/citation-bot/pull/1900 AManWithNoPlan (talk) 23:57, 5 July 2019 (UTC)

Fails to add volume/page (Zookeys)

 * Possibly issue/page instead of volume/page &#32; Headbomb {t · c · p · b} 03:43, 1 July 2019 (UTC)
 * that’s what happens when the title is massively different than crossref. AManWithNoPlan (talk) 03:59, 1 July 2019 (UTC)
 * I have to say I'm consistently puzzled by this logic of not adding missing information based on an already-provided DOI because of a title mismatch. I get not adding a missing DOI based on a title mismatch, but once the DOI is provided, it should be used. &#32; Headbomb {t · c · p · b} 04:09, 1 July 2019 (UTC)
 * because all people are imperfect and careless and a larger source of gigo than we want to deal with. I will think about perhaps if the title is a subset 🤔.  AManWithNoPlan (talk) 04:16, 1 July 2019 (UTC)
 * This code will attempt to remove stuff after roman numerals from titles before doing the comparison. https://github.com/ms609/citation-bot/pull/1898 AManWithNoPlan (talk) 15:46, 5 July 2019 (UTC)

Remove trailing &amp;nbsp;
Hardcoded or softcoded ones. &#32; Headbomb {t · c · p · b} 18:22, 5 July 2019 (UTC)
 * what do you mean by soft and hard? Are you referring to the html thingy and the actual auto-8 character? AManWithNoPlan (talk) 21:23, 5 July 2019 (UTC)


 * Both " " and " " &#32; Headbomb {t · c · p · b} 21:45, 5 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1899 AManWithNoPlan (talk) 23:51, 5 July 2019 (UTC)

Adding URL when there's a DOI
See here where the bot adds a URL to a reference that has a doi and a pmid. I thought that in such cases a URL was not desired. --Randykitty (talk) 09:33, 6 July 2019 (UTC)
 * PS: just noted that a little bit lower, the bot removes a URL. I'm puzzled. --Randykitty (talk) 09:34, 6 July 2019 (UTC)
 * The URL points to the same place as the DOI, so it's redundant. Where it adds a url, it should be a free full version of the article. &#32; Headbomb {t · c · p · b} 10:04, 6 July 2019 (UTC)
 * I don't see any redundant URL added by the bot in this diff, can you be clearer? ucl.ac.uk and caltech.edu are institutional repository URLs, which are not redirected to by the DOI resolved with doi.org. Nemo 13:05, 6 July 2019 (UTC)

notabug, since FREE links are added. Links that are the same as an identifier are removed. AManWithNoPlan (talk) 17:21, 6 July 2019 (UTC)
 * if you find a dead link or pay link or DOI equivelent link added, then please report that. We can feed it back to the free DOI system and possibly black list it. AManWithNoPlan (talk) 17:22, 6 July 2019 (UTC)

Title = Loading
It seems that this once was a website for the station but now it redirects to multiple spam websites. "Loading" nevertheless does not seem like a title we should accept in other cases as well. --Redalert2fan (talk) 10:31, 6 July 2019 (UTC)

Japanese titles removed while they appear to be correct
title = マートン＆ゴメス大暴れ　先制３ランだダメ押し打だ title = 阪神ドラ2石崎が仮契約151キロ超えだ title = JNR/JR 25年の大アルバム title =トラ番担当記者コラム

and many more

--Redalert2fan (talk) 11:52, 6 July 2019 (UTC)
 * I would add to this. The bot also adds redundant similarly named The Japan Times Online when The Japan Times; see .  The correct action here is to rename publisher to newspaper and not add publisher.
 * When title is primarily CJK script, in the best of all possible worlds, replace title with &lt;language code>:&lt;title text>. Yeah, this is a best of all possible worlds thing because it isn't always easy or even possible to know what the language is.  At the next release of Module:Citation/CS1, script-title will require a valid language code for non-Latin scripts (a limited list) so writing script-title without the language code will just result in a profusion of errors.
 * —Trappist the monk (talk) 12:15, 6 July 2019 (UTC)
 * the utf-8 stuff is the problem, i will get the patch added ASAP. Then I will add a test to make sure this never occurs again.  AManWithNoPlan (talk) 12:58, 6 July 2019 (UTC)
 * the utf-8 stuff is the problem, i will get the patch added ASAP. Then I will add a test to make sure this never occurs again.  AManWithNoPlan (talk) 12:58, 6 July 2019 (UTC)

Slavic names
The bot is incorrectly capitalizing non-English journal names (as here). The correctly formatted Ekolist: revija o okolju and Acta geographica Slovenica were changed to the incorrect Ekolist: Revija O Okolju and Acta Geographica Slovenica. Doremo (talk) 05:34, 8 July 2019 (UTC)
 * The don't really know the rules for slovenanian, but at the very least the O in "Ekolist: Revija o Okolju", should be lowercase. Latin should be capitalized however. &#32; Headbomb {t · c · p · b} 06:00, 8 July 2019 (UTC)
 * fixed by adding relevant words to list of non-English word. AManWithNoPlan (talk) 17:22, 8 July 2019 (UTC)

Pubmed not available right now -- not a bot bug though
To discuss go to https://phabricator.wikimedia.org/T226088
 * I've seen the bot add pmid/pmc today... wonder if this is fixed, or just a hiccup. &#32; Headbomb {t · c · p · b} 19:50, 10 July 2019 (UTC)
 * Yup, is closed as resolved! &#32; Headbomb {t · c · p · b} 19:50, 10 July 2019 (UTC)

I pointed them to the root cause and they they fixed it. AManWithNoPlan (talk) 23:44, 10 July 2019 (UTC)

removal of url
When a url is not a free copy, then it must be removed IF there is another identifier according Wikipedia style guides (we don’t do this with google books, but we should). Also, if the url matches the doi, then it should be removed. AManWithNoPlan (talk) 21:46, 9 July 2019 (UTC)

Postscript thing in cite news
https://github.com/ms609/citation-bot/pull/1915 AManWithNoPlan (talk) 17:09, 11 July 2019 (UTC)

API tweaks: Put diff | history in a fixed location
If you do a multiple bot run, you will have a list of stuff like

Written to Hoyt Vandenberg diff | history ... Written to Hubert Winthrop Young diff | history ... Written to Humanity and Paper Balloons diff | history

So to reviewing for diffs, you search for "diff | history" in the page, and you press Ctrl+G (in Firefox) to jump around. However, because Title in

Written to Title diff | history

isn't of fixed length, you need to spend time aligning your mouse with the diff link. Now this isn't the worse thing in the world, but if you have a list of 100 diffs, that's making a task that could take 20 seconds take 5 minutes. So instead, I suggest either of

[diff | history] Written to Title

Written [diff | history] to Title

Written to Title [diff | history]

As better presentation that would allow for the more efficient reviewing of multiple diffs. &#32; Headbomb {t · c · p · b} 18:09, 10 July 2019 (UTC)


 * However, see also User talk:Citation bot/Archive_17 below, which may be a better way of doing this. &#32; Headbomb {t · c · p · b} 19:41, 10 July 2019 (UTC)


 * This is easy. https://github.com/ms609/citation-bot/pull/1914 AManWithNoPlan (talk) 16:21, 11 July 2019 (UTC)

fixed

API tweaks: Redundancy elimination
Instead of

> Expanding 'Jakob Ackeret'; will commit edits. --- [17:56:00] Processing page 'Jakob Ackeret' — edit—history

This could be combined in one single line

--- [17:56:00] Processing page 'Jakob Ackeret' — edit—history; will commit edits

&#32; Headbomb {t · c · p · b} 18:17, 10 July 2019 (UTC)

Good catch. It was already fixed in the other API interfaces. https://github.com/ms609/citation-bot/pull/1914 AManWithNoPlan (talk) 16:36, 11 July 2019 (UTC)

fixed

GIGO journal stuff?
This is possibly GIGO. &#32; Headbomb {t · c · p · b} 18:29, 10 July 2019 (UTC)


 * https://github.com/ms609/citation-bot/pull/1912 Zotero works and then doesn't work. They change too often for us to support.  Will add to blacklist. AManWithNoPlan (talk) 15:52, 11 July 2019 (UTC)

Caps JR for journal
JR is in this case short for Japan Rail. --Redalert2fan (talk) 18:29, 10 July 2019 (UTC)
 * So it seemed to be fixed before for some time today, but just now I got numerous of the same changes again. --Redalert2fan (talk) 21:01, 11 July 2019 (UTC)
 * --Redalert2fan (talk) 21:05, 11 July 2019 (UTC)

Support OL
https://github.com/ms609/citation-bot/pull/1911 AManWithNoPlan (talk) 15:46, 11 July 2019 (UTC)

Ignore Template:full and Template:Deadlink in otherwise bare url refs
Why deadlink? That seems like asking for trouble. Dead links often end up pointing to the wrong thing. FindArticles.com and such. AManWithNoPlan (talk) 16:10, 11 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1913 AManWithNoPlan (talk) 16:11, 11 July 2019 (UTC)
 * Please note that full is only an alias/redirect of Full citation needed (t) Josve05a  (c) 16:22, 11 July 2019 (UTC)
 * thanks, I added the full template name also. Today's lesson in irony is that full is not the full name.... AManWithNoPlan (talk) 16:28, 11 July 2019 (UTC)
 * Well, I was mostly thinking if you find something like, it could maybe be parsed as a DOI link were deadlink not there, even if the full url didn't resolve. I figured that if the link was dead, there would be nothing to be parsed and it wouldn't expand. Maybe I'm wrong there. &#32; Headbomb {t · c · p · b} 18:14, 11 July 2019 (UTC)
 * I will look at DOIs. Links are marked as dead often when the title has changed to “girls girls girls!!!!!” and such.  AManWithNoPlan (talk) 19:46, 11 July 2019 (UTC)
 * Same for other identifiers if possible. It's not the most critical of things, so not toooo much thought needs to be put on this. But I figured if a link could be parsed when the if a deadlink template wasn't there, it'd be nice to have the bot do something with the link when the template was there. &#32; Headbomb {t · c · p · b} 20:57, 11 July 2019 (UTC)

GIT problems
Not sure which fixes are making it to tool server at this time. AManWithNoPlan (talk) 17:55, 11 July 2019 (UTC)
 * fatal: unable to look up current user in the passwd file: No such file or directory 🤔 AManWithNoPlan (talk) 19:38, 11 July 2019 (UTC)
 * fixed

ScienceDirect stuff
It looks like you got unlucky. AManWithNoPlan (talk) 00:56, 10 July 2019 (UTC)
 * It's happening on other articles too. There's this sequence for example, + . It's possibly the ?via=ihub that throws things off.  &#32; Headbomb {t · c · p · b} 02:24, 10 July 2019 (UTC)
 * Not sure. It works sometimes.AManWithNoPlan (talk) 04:14, 10 July 2019 (UTC)
 * Very hit and miss. For now, I'm just running it multiple times until it finds nothing else to do. &#32; Headbomb {t · c · p · b} 04:17, 10 July 2019 (UTC)
 * I had just noticed it in another article where the expansion then succeeded as I entered the DOI manually. Did you submit many articles with sciencedirect.com URLs at once? Maybe we got throttled? Nemo 07:45, 10 July 2019 (UTC)
 * I suspect the expansion would have worked the next time because ?via=ihub was stripped from the URL in the previous bot's run. &#32; Headbomb {t · c · p · b} 07:53, 10 July 2019 (UTC)
 * Not sure. Right now I can't test because the tool doesn't have any spare capacity. Nemo 08:59, 10 July 2019 (UTC)
 * As in it's going through too many requests? I could hold on for a bit, it's at the end of ~100 article run or so. &#32; Headbomb {t · c · p · b} 09:03, 10 July 2019 (UTC)
 * not really sure, we are probably just getting throttled some place. AManWithNoPlan (talk) 14:18, 10 July 2019 (UTC)

Bad Title
What happens is when a page is no longer avaiable on japantimes.co.jp you get redirected to https://www.japantimes.co.jp/article-expired/ which states: "The article you have been looking for has expired and is not longer available on our system. This is due to newswire licensing terms." and has the title "Article expired". This is not clearly not the title we are looking for. --Redalert2fan (talk) 23:53, 11 July 2019 (UTC)

Character encoding issue for author name "Fürst"
GIGO. Literally nothing we can do. We have complained to crossref and the publisher and they promised to fix the data someday. AManWithNoPlan (talk) 20:45, 12 July 2019 (UTC)

Bad title
"OpenId transaction in progress" diff --Redalert2fan (talk) 20:49, 12 July 2019 (UTC)

fixed

"Catpostrophe"
This is proper behavior since Wikipedia style guides mandate non-fancy punctuation be used. AManWithNoPlan (talk) 03:39, 14 July 2019 (UTC)
 * In a citation of a source, you can't go around changing the apostrophes as they are found in the original source. Geographyinitiative (talk) 04:02, 14 July 2019 (UTC)
 * Both are apostrophes, curly vs straight is a stylistic typographic change, not a semantic one. On Wikipedia, we mandate straight quotes and apostrophes. Even in quoted material. Even in citations. &#32; Headbomb {t · c · p · b} 04:29, 14 July 2019 (UTC)
 * Thanks for your reply. You may be right, but if English Wikipedia outright bans all of the curly apostrophes, the readers will never get a chance to find out on their own whether or not there is a stylistic, semantic or other difference between the two types of apostrophes. Don't be so quick to assume things that you don't know for a fact. Just because they are similar doesn't mean they are the same- in fact, calling them 'similar' implies that they are 'different', otherwise we would call them 'identical'. No, I'm sorry, you can't change the name of cited sources randomly. I strong believe that you are dead, dead wrong on this one- you don't know what you are talking about in fact. Why have the two code points if there's no difference? I have to strongly rebuke you here otherwise you might not realize the error you are perpetrating on English Wikipedia. Thanks for your time. Geographyinitiative (talk) 04:51, 14 July 2019 (UTC)
 * This will not do, this will not do at all. Let the author give you the apostrophes they want to. What is this garbage? Geographyinitiative (talk) 05:00, 14 July 2019 (UTC)
 * No semantic difference, eh? Alright buddy, you look at this edit and tell me there's no semantic or stylistic difference: . The authors are using curly apostrophes that curl inward from both directions. That's the author's way of writing in English. The author doesn't need your fascist hand to come down on them when someone uses this citation bot. So everything has to be simplified now- what is this, 1984? Just let the apostrophes alone. Geographyinitiative (talk) 05:03, 14 July 2019 (UTC)
 * Right, there is no semantic or stylistic difference there. The difference in orthography does not change the meaning or style of the word. Nothing "fascist" about these edits, and it is entirely inappropriate to refer to other editors in that way - please do not do that. And follow the consensus even if you do not agree with it, until and unless you have been able to change the existing consensus (which at the moment seems unlikely). Regards, --bonadea contributions talk 08:54, 14 July 2019 (UTC)
 * Some things are the way they are. Leave them alone. That's my opinion. Thanks for your work here. Geographyinitiative (talk) 05:06, 14 July 2019 (UTC)
 * Wait. The curly apostrophe isn't even there in the source linked in the article (this diff from the original report above) - it is a translation of the actual Chinese title. The "author" thus appears to be yourself, since you were the one to add the link with the translated title. --bonadea contributions talk 10:01, 14 July 2019 (UTC)
 * For reference the relevant manual of style can be found here: WP:MOSCQ. --Redalert2fan (talk) 11:09, 14 July 2019 (UTC)
 * There is a similar-looking character, ʻOkina (U+02BB), commonly used in Polynesian languages. That character should not be converted to typewriter apostrophe.
 * —Trappist the monk (talk) 14:37, 14 July 2019 (UTC)
 * —Trappist the monk (talk) 14:37, 14 July 2019 (UTC)

partial removal of "subscription" and "via" parameters
Looking at this diff, the bot appears to be removing the via= and subscription= parameters from citations. I personally find those useful, but I'm not too fussed about it. However, the bot has only carried out a partial removal; other citations that include the first of these have not been touched. Is this intentional? Vanamonde (Talk) 23:05, 13 July 2019 (UTC)
 * they get removed when the associated url is removed and they no longer serve a valid purpose. URLs that duplicate doi are removed in a accordance with style guides. AManWithNoPlan (talk) 23:23, 13 July 2019 (UTC)

notabug

nature.com down for the bot
Up to until an hour running the bot on bare URLs from nature.com (such as https://www.nature.com/articles/nature05769) worked. Now we get "Operation timed out after 10000 milliseconds with 0 bytes received". Have they blocked/blacklisted/throttled us? See for example with existing identifiers in template and bare URL (t) Josve05a  (c) 20:13, 14 July 2019 (UTC)
 * Citoid in Visual Editor does not seem to have any issues at all with this link. (t) Josve05a  (c) 20:13, 14 July 2019 (UTC)
 * notabug seems to be working now (t) Josve05a  (c) 21:44, 14 July 2019 (UTC)

Ovid / pmid redundancy
Works fine in the case of a doi redundancy, btw. &#32; Headbomb {t · c · p · b} 01:13, 12 July 2019 (UTC)
 * Ovid is a pain, since they have two websites that you get to choose from, so nothing is ever redundant. I will have to write specific code.  IF(doi && pmid=OvidUrl)THEN drop url.  AManWithNoPlan (talk) 14:12, 12 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1923 AManWithNoPlan (talk) 14:56, 15 July 2019 (UTC)

Better handling of edit conflicts
This would be particularly helpful for batch runs. &#32; Headbomb {t · c · p · b} 01:25, 12 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1924 AManWithNoPlan (talk) 14:54, 15 July 2019 (UTC)

duplicate ieeexplore URLs
Looks like need to deal with https://ieeexplore.ieee.org/document/5671934?reload=true&arnumber=5671934 vs without all the extra stuff. AManWithNoPlan (talk) 22:35, 15 July 2019 (UTC)
 * IEEE is very annoying: some of their URLs redirect and some others don't. Their rate limits are also horrendous and their staff even boasts about how mean they are towards their users. Nemo 10:27, 16 July 2019 (UTC)
 * in that case https://github.com/ms609/citation-bot/pull/1928 AManWithNoPlan (talk) 15:00, 16 July 2019 (UTC)
 * I am interested in learning more about your comment about IEEE staff boasting about being mean to users. Do you know of links documenting this? —David Eppstein (talk) 19:25, 16 July 2019 (UTC)
 * David Eppstein, try and ask some university administrator who has received some "education" from IEEE. Nemo 19:30, 16 July 2019 (UTC)
 * I fail to see how copyright education is being 'hostile'. &#32; Headbomb {t · c · p · b} 21:03, 16 July 2019 (UTC)

Broken Bx-y-z Elsevier URLs
All these URLs are broken: insource:"www.sciencedirect.com" insource:/science\/article\/B....-.{7}-../. Nemo 19:07, 16 July 2019 (UTC)

fixed I think

User:Marianne Zimmerman
notabug


 * Cross-posted here, at User talk:Marianne Zimmerman, Bots/Noticeboard, and at User talk:Smith609

This account has made tens of thousands of edits by proxy using the Citation bot. It is still ongoing while I'm writing this. The account itself has made only 11 edits so far.

It is obvious that this 'Marianne Zimmerman' account is a bot, since it is working around the clock, 24/7. The account is not labeled as such, and has not been authorized by the Bot Approvals Group. In itself not a big deal, because the account has been making only positive edits and has not caused disruption. Still, it is technically violating policy, and I'm wondering why a bot would use another bot to make bot edits. That seems rather silly. I hope the author of the 'Marianne bot' can come forward so that we can work things out. Cheers, Manifestation (talk) 12:04, 14 July 2019 (UTC)
 * The one thing I wonder about if the user checks their edits for possible bugs or mistakes, but seeing not a single revert of citation bot by this user makes me believe they absolutely do not. This means that it would be quite possible that actual bad edits are made... If you can run 24/7 and check your edits In the end it might technically not be a problem (policy aside), but I can tell you that I spend quite sometime checking every edit made by the bot under my request and then posting bug reports here, even on 1000 page category runs.
 * Further point I wonder how they run citation bot in an automatic way, they either must use a very large input of pages via the web interface ( pages separated with |) or made some sort of script that interacts with the web interface, it is clear with the edit summaries that category mode is not used. In any case basically running an unauthorized bot aside, it is possible that bad edits have and will be made, unless Marianne can kindly prove that they check the bot's edits as is requested. Redalert2fan (talk) 12:15, 14 July 2019 (UTC)


 * Yeah, I think you're right. This 'Marianne Zimmerman' account must be blocked, at least for now, even though the owner seems to be acting in good faith. There is a reason why Wikipedia bots have a trial period. But more importantly, this 'Zimmerman' bot seems a bit redundant. It appears to just roam around, randomly cleaning up articles it encounters. Can't the Citation bot itself do that? Cheers, Manifestation (talk) 12:24, 14 July 2019 (UTC)
 * Well currently citation bot does not operate by itself and is only user activated so without activation nothing will happen. "Editors who activate this bot should carefully check the results to make sure that they are as expected. While the bot does the best it can, it cannot anticipate the existing misuse of template parameters or anticipate bad/incomplete metadata from citation databases." is clearly stated on the bots userpage. Since you activate it yourself you should check the edits made. For why citation bot does not operate in automatic mode I suspect that is exactly the reason currently, Maybe the maintainers can further explain this? Since I'm not totally clear on that. Ofcourse Citation bot has long passed its trial period but you can see many pages of bug reports in the archives just because things change on the internet and the maintainers/operator can not predict every variation in templates,characters,languages etc. Which is why it is so important to check these edits. Anyone can run the bot for any reason, including random runs or just pages/categories of interest so that's not a problem as long as edits are checked in my opinion. Thanks, Redalert2fan (talk) 12:37, 14 July 2019 (UTC)
 * Oh, I didn't saw that. I believed the Citation bot made relatively simple changes, so I thought it wasn't a big deal if someone makes mass-edits with it. But you're right, this may not be a good idea after all. I've reported 'Marianne Zimmerman' to WP:ANI. Thanks, Manifestation (talk) 12:58, 14 July 2019 (UTC)
 * it was pretty conclusively decided in the last discussion that the bot is responsible for its edits. I am not the operator. AManWithNoPlan (talk) 13:22, 14 July 2019 (UTC)

Black list?
would it be possible to have an onwiki page of blacklisted users for citation bot? Now that activation requires authentication, having an admin-editable page of "blacklisted" users would help in situations like this (while not requiring a full block of the activator). — xaosflux  Talk 13:16, 14 July 2019 (UTC)


 * That would be a good idea, but it would still require us to check the edits/activations manually. Perhaps you could built in some kind of limit per user, with an internal warning being triggered if that user surpasses it. I've been scrolling through the thousands and thousands of edits of the Citation bot commissioned by the Marianne bot, and the earliest activation by the Marianne bot I could find was at 20:24, 24 June 2019. Go here, press, and search for "Marianne". Safe for a few pauses, the bot had been running non-stop for 20 days straight, with no one noticing until now. - Manifestation (talk) 16:35, 14 July 2019 (UTC)


 * Oh, and maybe captcha would be a good idea? - Manifestation (talk) 16:40, 14 July 2019 (UTC)


 * “no one noticed” — actually I noticed and I am sure many others noticed too. Just no one cared. AManWithNoPlan (talk) 18:48, 14 July 2019 (UTC)
 * The edits I checked looked harmless enough. And if (as in this case) a sock is really just running the bot on randomly selected articles, I don't see the problem. But we do want to block activations from blocked users, to head off more problematic behavior like stalking other users (using the bot to send the message that the user is being stalked, and by whom) or to mask bad edits (by running the bot afterwards to hide the edits from watchlists and make them harder to roll back). —David Eppstein (talk) 19:02, 14 July 2019 (UTC)
 * I noticed a lot came from the Marianne account, didn't really consider them harmful, although the volume is more than you'd expect from a manual activations. Blacklisting is a good feature to have, although if it needs to be deployed here, I got no real opinion on. &#32; Headbomb {t · c · p · b} 19:49, 14 July 2019 (UTC)
 * I, too, had sampled a number of those diffs (a few thousands, I think; an addictive game which consumed many hours of my time). I reported the issues I found, which were very few. It would be nice to have server-side runs on larger sets of "safe" articles (such as bare refs) so that the bot would become a no-op on those. Nemo 22:39, 14 July 2019 (UTC)
 * Alternatively, you could use https://tools.wmflabs.org/iabot as a model. It automatically blocks all on wiki blocked users and sysops here have admin privileges on the web interface.  They can block users there too.  Everything is permissions based, and new users simply cannot do as much as established users and still much less than admins.  See https://tools.wmflabs.org/iabot/index.php?page=metainfo&wiki=enwiki for info.  Not to mention plenty of other abuse counter measures in place.— CYBERPOWER  ( Chat ) 13:47, 15 July 2019 (UTC)


 * blocked users are blocked by the bot already. Better controls might be in order though.  AManWithNoPlan (talk) 14:15, 15 July 2019 (UTC)


 * notabug

Markup in linked title
This:
 * A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. ISBN 978-953-7619-96-1
 * A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. ISBN 978-953-7619-96-1

Becomes this:

—Trappist the monk (talk) 14:46, 17 July 2019 (UTC)
 * There are other errors in ; see in particular @ Don Chance and @ D. Mauer.
 * —Trappist the monk (talk) 14:51, 17 July 2019 (UTC)
 * These are not errors introduced by the bot, it was all manual work. (Thank you for finishing it, I was tired.) Nemo 15:20, 17 July 2019 (UTC)

better edit summary when bot deliberately removes the citation url
The edit summary the bot produced was: "Add: issue. Removed accessdate with no specified URL. Removed parameters. | You can use this bot yourself. Report bugs here. | User-activated." Perhaps before "Removed accessdate ..." could be added something like "Removed URL that matched DOI." or "Removed nonfree URL." (or at least "Removed URL."). Rayhartung (talk) 12:29, 18 July 2019 (UTC)
 * Thanks for the suggestion. That edit is 5 months old, this was already fixed in March. Now the edit summary states "Removed URL that duplicated unique identifier" (example). Nemo 12:41, 18 July 2019 (UTC)

Ambiguous edit summary
Please see this diff: where the edit summary is: "Add: date. Removed parameters." The actual change made was: publication-date=August 2018 was changed to date=August 2018. While a part of the parameter was removed no full parameters were removed and no new date was added making the summary a bit inaccurate. If possible could the summary for edits like these be changed to something that describes the specific action a more closely? --Redalert2fan (talk) 20:20, 18 July 2019 (UTC)
 * we walk a thin line between logging everything in horrible detail and not describing everything. We might want a “parameter name changed” at some point.  AManWithNoPlan (talk) 21:10, 18 July 2019 (UTC)
 * Mostly fixed. Now warns users some add/dels are actually changes. AManWithNoPlan (talk) 00:51, 19 July 2019 (UTC)

Running both 2 times gives more changes
The bot ran 2 times on the following pages; diff 1 and diff 2 on this page only dates were added. diff 3 and diff 4 multiple actions were performed on this 2nd page. --Redalert2fan (talk) 20:58, 18 July 2019 (UTC)


 * website parsing goes through a separate server process and sometimes that times out. AManWithNoPlan (talk) 21:13, 18 July 2019 (UTC)


 * wontfix

Figure out missing archive date
The date can easily be determined through the webarchive url. &#32; Headbomb {t · c · p · b} 09:26, 12 July 2019 (UTC)

archive-date added while archivedate is present
Started since https://github.com/ms609/citation-bot/pull/1947 was merged. --Redalert2fan (talk) 14:59, 19 July 2019 (UTC)
 * GRRRRR. I checked for "archive-date" and "archive-date".  Will be fixed soon. AManWithNoPlan (talk) 15:41, 19 July 2019 (UTC)
 * Ah I just came to report this. Nemo 15:46, 19 July 2019 (UTC)

Thanks for the quick fix! --Redalert2fan (talk) 16:05, 19 July 2019 (UTC)

If/when deployed, it's worth re-running on all pages in Category:Pages with citations having redundant parameters, which swelled a bit today (thanks Trappist the monk for reporting). Nemo 16:05, 19 July 2019 (UTC)


 * if they are identical, now the bot removes the extra one. AManWithNoPlan (talk) 18:13, 19 July 2019 (UTC)

Proxy subzero.lib.uoguelph.ca
Should be generalized to cover every http(s)://www.sciencedirect.com(proxycrap)/ possible. &#32; Headbomb {t · c · p · b} 16:11, 19 July 2019 (UTC)

LIPIcs

 * The "problem" here is that in cite journal the "journal" field is really used to mean serial. I doubt we even have templates to precisely replicate all the FRBR and host/components hierarchies. Nemo 19:36, 19 July 2019 (UTC)
 * Well, this is a cite conference with a series already present. That should be enough to figure out that adding a journal to that likely doesn't make much sense. &#32; Headbomb {t · c · p · b} 20:36, 19 July 2019 (UTC)

Re 'ref' and 'mode' parameters
Could the bot drivers be requested to not add line breaks where ref or mode are on the same line with (that is, immediately following) " {{cite xxx " or " {{citation "? These parameters change the behavior of those templates in very significant ways, effectively changing the template. Having these parameters deeper into the argument makes them less visible, and creates confusion. Where an editor sees fit to put them on the same line, that should be respected. &diams; J. Johnson (JJ) (talk) 22:10, 18 July 2019 (UTC)


 * Could you point out a case where that was done? AManWithNoPlan (talk) 22:51, 18 July 2019 (UTC)


 * The instance I have at hand was actually InternetArchiveBot's doing, whereas the instance I thought(?) was Citation_bot's doing is not readily at hand. Okay, maybe not a problem here. &diams; J. Johnson (JJ) (talk) 23:20, 19 July 2019 (UTC)


 * {{tl|notabug}} for now. If you find one, then bring it up again. AManWithNoPlan (talk) 23:28, 19 July 2019 (UTC)

Vol/Issue cleanup
See the 24/2 --> 24 + 2 type of stuff. &#32; Headbomb {t · c · p · b} 19:46, 14 July 2019 (UTC)


 * Might be too tricky to separate from cases like 18/19 → 18–19 however. &#32; Headbomb {t · c · p · b} 23:13, 14 July 2019 (UTC)

cite theses vs cite document for same link
At cite web was changed to cite thesis. Also type = Thesis was added. but at Cite web was changed to Cite document. As far as I can see the only difference before was c vs C in cite. Further do we need type = Thesis if we have cite thesis? --Redalert2fan (talk) 19:47, 19 July 2019 (UTC)
 * thesis differentiates it from dissertation. (tiny difference). AManWithNoPlan (talk) 21:49, 19 July 2019 (UTC)
 * the difference comes from the bot not getting consistent meta data for odd reasons. AManWithNoPlan (talk) 21:56, 19 July 2019 (UTC)
 * wontfix but odd. AManWithNoPlan (talk) 22:58, 20 July 2019 (UTC)

DOIs that CrossRef does not resolve, but some other provider resolves

 * I have encountered and manually corrected a few of these too. Another common pattern is Wiley DOIs losing a central <> element like "<839::AID-NME423>" and DOIs truncated after a dot or missing dots between digits. Nemo 08:34, 20 July 2019 (UTC)
 * even worse, DOIs that end with a dot and the dots part of the doi. AManWithNoPlan (talk) 18:34, 20 July 2019 (UTC)
 * Never saw those. throws an error, and there are no such errors found on Wikipedia. &#32; Headbomb {t · c · p · b} 18:51, 20 July 2019 (UTC)
 * The evil period ending doi required some magic to avoid the error (it might have been encoding OR the URL was used, but both the url and doi fields had bot stopping comments added). I do not remember what I did. AManWithNoPlan (talk) 20:05, 20 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1962 The problem is not the period, the code already fixed that. The problem was that the DOI is not in CrossRef, so that code did not realize that dropping the period fixed it. AManWithNoPlan (talk) 20:05, 20 July 2019 (UTC)
 * Also, add this which is extra aggressive, since period ending dois generate errors. https://github.com/ms609/citation-bot/pull/1966 AManWithNoPlan (talk) 21:31, 20 July 2019 (UTC)

Question about API output / double work?
Hello, I just ran the bot on this revision of TRAPPIST-1 giving these results. In the API output this section caught my eye;

> Remedial work to prepare citations > Trying to convert ID parameter to parameterized identifiers. > Trying to convert ID parameter to parameterized identifiers. ~ Renamed "date" -> "CITATION_BOT_PLACEHOLDER_date" ~ Renamed "CITATION_BOT_PLACEHOLDER_date" -> "date" > Trying to convert ID parameter to parameterized identifiers. ~ Renamed "year" -> "CITATION_BOT_PLACEHOLDER_year" ~ Renamed "CITATION_BOT_PLACEHOLDER_year" -> "year" ~ Renamed "date" -> "CITATION_BOT_PLACEHOLDER_date" ~ Renamed "CITATION_BOT_PLACEHOLDER_date" -> "date"

In the end no dates were changed or added. Is this intended behavior or is there some accidental double work going on? --Redalert2fan (talk) 11:43, 20 July 2019 (UTC)


 * You have just seen inside the machine where the sausage is being made. We have to temporarily move some things out of the way and then put them back during some API calls. AManWithNoPlan (talk) 14:11, 20 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1961 AManWithNoPlan (talk) 14:18, 20 July 2019 (UTC)
 * fixed

Remove truncated Elsevier URLs
https://github.com/ms609/citation-bot/pull/1964 AManWithNoPlan (talk) 20:02, 20 July 2019 (UTC)

Had to run the bot twice
Running the bot another time (3rd time), it also finds PMC ID: https://en.wikipedia.org/w/index.php?title=Pestalotiopsis&diff=prev&oldid=907103709 (t) Josve05a  (c) 14:31, 20 July 2019 (UTC)
 * pubmed is acting funny right now. AManWithNoPlan (talk) 15:32, 20 July 2019 (UTC)
 * annoyingly notabug on our part. AManWithNoPlan (talk) 16:26, 20 July 2019 (UTC)

API: Batch run summaries
Currently when a run is completed you get:
 * Done all 100 pages in Category:X.

Could the number of pages edited also be added to this? Giving:


 * Done all 100 pages in Category:X. Made changes to 25 pages.

Or some sort of a variation of that?

--Redalert2fan (talk) 18:24, 10 July 2019 (UTC)

Or a summary with diffs

Batch completed, 145 page(s) processed, 2 page(s) skipped, 24 edit(s) made. Report issues/suggestions. [diff | history] Hoyt Vandenberg – ''Add: title. Converted bare reference to cite template.'' [diff | history] Title2 – Edit summary [diff | history] Title3 – Edit summary [diff | history] Title4 – Skipped, page is fully protected! [diff | history] Title5 – Edit summary ... [diff | history] Title25 – Skipped, found! [diff | history] Title26 – Edit summary To get the best results, see our helpful user guide! Suppressing the ''| You can use this bot yourself. Report bugs here. | Activated by User:Username'' part of the edit summary. &#32; Headbomb {t · c · p · b} 19:37, 10 July 2019 (UTC)
 * I suppose you mean suppressing it in the API only and not the actual posted edit summary by the bot? This would massively help with checking the edits so support for this. But if its quick to implement my original suggestion at least helps a bit already in my opinion. Redalert2fan (talk) 19:52, 10 July 2019 (UTC)
 * Yes, in the API only. Whoever activated the bot knows they activated the bot and that it's possible for them to do so. &#32; Headbomb {t · c · p · b} 20:05, 10 July 2019 (UTC)


 * fixed a lot for now. AManWithNoPlan (talk) 23:30, 19 July 2019 (UTC)
 * still missing for multiple articles (e.g. ). &#32; Headbomb {t · c · p · b} 06:53, 20 July 2019 (UTC)

fixed AManWithNoPlan (talk) 13:27, 20 July 2019 (UTC)
 * the summary diffs don't contain the oldids and you get stuff like . &#32; Headbomb {t · c · p · b} 18:59, 20 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1963 copy/pasted code leads to a variable not being defined. AManWithNoPlan (talk) 20:06, 20 July 2019 (UTC)
 * fixed

"Removed URL that duplicated unique identifier"
I'm getting these all the time now and I think they arguably make the citation sections worse. There's no way that [edit: general readers] know to click on the linked "doi" when the citation's title itself is unlinked. I'll note that the cite journal documentation examples keep the url parameter even when a doi is provided.

Where is the consensus to make this edit en masse? czar 13:26, 20 July 2019 (UTC)
 * I'm going out now but I'll leave a quick answer to one of your points: a lot of people do, in fact, know to click the DOI. We know for sure from CrossRef data: https://www.crossref.org/blog/https-and-wikipedia/ https://www.crossref.org/blog/real-time-stream-of-dois-being-cited-in-wikipedia/ Nemo 13:36, 20 July 2019 (UTC)


 * unless the url is free to download without logging in, you should not add them unless there is no other links out. AManWithNoPlan (talk) 13:39, 20 July 2019 (UTC)


 * there is even movement afoot to remove the automatic linking of titles when a PMC is present. AManWithNoPlan (talk) 13:47, 20 July 2019 (UTC)
 * My question was where this consensus has been established, or if this is just a practice localized to editors using this bot/tool. czar  15:17, 20 July 2019 (UTC)
 * I don’t have time to look it up, but the links are in the talk archives somewhere—hopefully someone not in an auto parts store can respond better. AManWithNoPlan (talk) 15:31, 20 July 2019 (UTC)
 * The general idea is that these links are redundant with the DOI/other identifiers, who are clear about where they take you (doi: version of record, jstor = jstor repository, etc... If you don't know what those are, we have the wikilinks). url is then freed up to be used for freely-available full text versions-of-record of the paper hosted on an author's website, or similar. If the DOI version is free, you can use free to mark it as free, etc. &#32; Headbomb {t · c · p · b} 17:29, 20 July 2019 (UTC)

Please see the usage page for why notabug AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)

103 more proxies
Proxies for www.sciencedirect.com which we currently link somewhere: query/37794. Nemo 00:00, 21 July 2019 (UTC)
 * fixed got them all. AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)
 * Thanks. Only for ScienceDirect itself though? I suppose these proxies are used for other publishers too. Nemo 15:54, 21 July 2019 (UTC)
 * ouch, that hurts. 😂🤣AManWithNoPlan (talk) 15:55, 21 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1975 this will get most of those and many others. AManWithNoPlan (talk) 16:53, 21 July 2019 (UTC)
 * fixed

Adds two weird DOIs
I checked an older version and it did it back in the day too. Will investigate. AManWithNoPlan (talk) 16:54, 21 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1977 AManWithNoPlan (talk) 19:17, 21 July 2019 (UTC)

Weird and capitalization
https://github.com/ms609/citation-bot/pull/1974 AManWithNoPlan (talk) 16:54, 21 July 2019 (UTC)

Privacy settings is not a good title
Comes from a (redirect to a) cookie consent popup. --Redalert2fan (talk) 21:22, 21 July 2019 (UTC)

Caps
The bot is incorrectly changing non-English capitalization (as here, where društva za should be lower case). Doremo (talk) 02:44, 23 July 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/1989 AManWithNoPlan (talk) 11:32, 23 July 2019 (UTC)

Bad title: WebCite query result
https://github.com/ms609/citation-bot/pull/1990 AManWithNoPlan (talk) 11:31, 23 July 2019 (UTC)

Untitled_new_bug
Probably time to fix that blank publisher removal. AManWithNoPlan (talk) 12:20, 26 July 2019 (UTC)
 * Probably could be general to all blank removals. If only blank stuff is done, skip. &#32; Headbomb {t · c · p · b} 12:41, 26 July 2019 (UTC)
 * I found a couple places we called “tidy” on blank parameters. Will update soon. The only blank thing we remove then will be some postscript parameters when meaningless and the empty via parameter since its presence is rare and leads to misuse and parameters that duplicate set parameters (remove blank year if date is set)  AManWithNoPlan (talk) 13:10, 26 July 2019 (UTC)
 * May be worth keeping removal of empty depreciated parameters such as coauthor. Keith D (talk) 17:17, 26 July 2019 (UTC)

Suppport template:Cite LSA
I just undid your edit. Seeing as the template does not support doi. AManWithNoPlan (talk) 17:17, 26 July 2019 (UTC)
 * Converted them to citation in that article. No reason to use such a feature-poor template on a non-linguistics related article. &#32; Headbomb {t · c · p · b} 19:46, 26 July 2019 (UTC)

Convert cite web to journal and add DOI
Yet another reason to drop urls and us doi instead as discussed above. Probably science direct being grumpy. AManWithNoPlan (talk) 16:22, 27 July 2019 (UTC)
 * Same with cite article and cite conference: are they supposed to be left alone or did something go wrong? special:diff/908234528. Nemo 11:17, 28 July 2019 (UTC)

Bracketed issues
https://github.com/ms609/citation-bot/pull/2034 AManWithNoPlan (talk) 11:51, 30 July 2019 (UTC)

CAPS: JAMA
https://github.com/ms609/citation-bot/pull/2031 AManWithNoPlan (talk) 02:14, 30 July 2019 (UTC)

Science (New York, N.y.)
https://github.com/ms609/citation-bot/pull/2033 AManWithNoPlan (talk) 11:51, 30 July 2019 (UTC)
 * This should just be streamlined to Science, yup. I could find other similar cases too. &#32; Headbomb {t · c · p · b} 12:07, 30 July 2019 (UTC)

Consider linkinghub.elsevier.com always redundant
https://github.com/ms609/citation-bot/pull/2032 AManWithNoPlan (talk) 11:51, 30 July 2019 (UTC)
 * Thanks, it works! Now I know 3300 articles on which the bot should be run. :) Nemo 20:23, 30 July 2019 (UTC)

journal/series clash
This mostly due to the (Clifton, NJ) thing in one but not the other. Probably should be a hardcoded exception/equivalence. &#32; Headbomb {t · c · p · b} 04:16, 27 July 2019 (UTC)
 * Isn't that really a series of books so the citation should be something more like:
 * —Trappist the monk (talk) 00:53, 31 July 2019 (UTC)
 * Yes, with Methods in Molecular Biology and 1961. —David Eppstein (talk) 01:38, 31 July 2019 (UTC)
 * Also, if the 'fix' is to keep the template as, the next release of Module:Citation/CS1 suite will require to have journal.
 * —Trappist the monk (talk) 00:56, 31 July 2019 (UTC)
 * Also, if the 'fix' is to keep the template as, the next release of Module:Citation/CS1 suite will require to have journal.
 * —Trappist the monk (talk) 00:56, 31 July 2019 (UTC)

Minor change which should not be done as a stand-alone edit
See also User talk:Citation bot/Archive 17. Jonatan Svensson Glad (talk) 19:55, 30 July 2019 (UTC)

Fails on Soil

 * Not sure i'd call it crashing, but it does quit before it finishes the page. Is the page too big at 389kb? When I tried, it got about half way through to checking AdsAbs for the citation of DOI 10.1111/j.1438-8677.1971.tb00715.x  — Chris Capoccia 💬 01:39, 1 August 2019 (UTC)
 * 600 references is quite near the point where I've often seen citation bot fail. Nemo 06:55, 1 August 2019 (UTC)

Full run (sadly tests can not save) Time: 43.14 minutes, Memory: 42.01MB AManWithNoPlan (talk) 14:54, 2 August 2019 (UTC)
 * Does it go down if we reduce the timeout? 20 seconds multiplied by 500 makes for a very high upper limit. Nemo 15:35, 2 August 2019 (UTC)

It can work if the article is split in half and run the bot two times for each half. QuackGuru ( talk ) 17:18, 2 August 2019 (UTC)

Foreign language capitalization
The bot is incorrectly capitalizing non-English journal names, as here, where razgledi should not be capitalized. Doremo (talk) 07:29, 1 August 2019 (UTC)

Compatible license
See this edit. I would like the bot to automatedly do this without having to summon the bot.

See this edit. I would like the bot to automatedly do this without having to summon the bot.

This is not a bot bug. Is it possible to program the bot to automatedly restore the required proper attribution in accordance with WP:MEDCOPY? If this bot can't be programmed to do this then which bot on Wikipedia can be programmed to do this? QuackGuru ( talk ) 17:34, 2 August 2019 (UTC)


 * The link is not required by the license: what matters is that you name the authors and the license. So, personally I prefer to leave the URL in the citation. Nemo 17:42, 2 August 2019 (UTC)


 * Spanning over multiple templates is beyond the scope of this bot. Not really sure what the distinction between referencing something and adding a separate "we stole a bunch of text from this freely copyable source" template also is.  The extra template relisting the exact same information is quite ugly. AManWithNoPlan (talk) 17:50, 2 August 2019 (UTC)


 * Seems like this would be better:  AManWithNoPlan (talk) 17:53, 2 August 2019 (UTC)
 * That is missing the authors and a link to the full paper. Proper and full attribution is required for each license. See Template:CC-notice. QuackGuru ( talk ) 18:31, 2 August 2019 (UTC)

BU RoBOT disabled. There may be something useful to salvage. QuackGuru ( talk ) 20:05, 2 August 2019 (UTC)

wontfix this bot is a poor choice. AManWithNoPlan (talk) 20:23, 4 August 2019 (UTC)

More references with DOI but no template
Can the bot be slightly more comprehensive in catching references with unstructured citations like this? (Where I had to manually remove everything and replace with cite journal + doi.) Nemo 16:23, 12 July 2019 (UTC)


 * the bot does this kind of thing, when it sees that citation templates dominate over non-citation templates. We avoid running a bulldozer over citevar AManWithNoPlan (talk) 16:46, 12 July 2019 (UTC)


 * Yes, I'm asking if it would be fine to catch a case like this one I linked. If so, I could submit a patch. Nemo 17:05, 12 July 2019 (UTC)


 * I verified that the specific case you mention is not supported. A patch would be good. AManWithNoPlan (talk) 17:40, 12 July 2019 (UTC)


 * wontfix that's just too complicated and risky for an automated process. AManWithNoPlan (talk) 13:35, 19 July 2019 (UTC)

Sorry, dearchived because I'm still looking for good examples to treat: special:diff/907215103, special:diff/907215476, special:Diff/907219641. We can also send an entire line to Citoid and it will use the CrossRef service to get suggestions on what that might be. Sometimes the result is far off, but we can try and make sure it's similar enough. Nemo 10:09, 21 July 2019 (UTC)
 * Brockliss, Laurence W B, The University of Oxford: A History, Oxford University Press (Oxford, 2016); 11th century to present; online

wontfix because of risks of deleting notes, etc. and CITEVAR rules. AManWithNoPlan (talk) 15:23, 5 August 2019 (UTC)

Do not use in title=
A quite weird instance, it does seem that on the reference there are 2 titles used because the press release discusses multiple things so the actual given title is " Bombardier Announces Financial Results for the Third Quarter Ended September 30, 2015 Government of Québec Partners with Bombardier for $1 billion in C Series as Certification Nears ". However I think this is clearly unwanted because it adds unnsecary blank lines in the reflist. --Redalert2fan (talk) 12:29, 31 July 2019 (UTC)

More minor changes that should not be done as single edit
These edits are done to prevent future errors. The better parameter is website not work for this citation, so we fix it now. AManWithNoPlan (talk) 14:25, 31 July 2019 (UTC)
 * Wouldn't it be beter then to remove it completely in cases like these citations when it is empty? That would remove possibilities for future errors. --Redalert2fan (talk) 14:36, 31 July 2019 (UTC)
 * That assumes the website is empty on purpose, rather than by omission. &#32; Headbomb {t · c · p · b} 19:53, 31 July 2019 (UTC)
 * In my experience a large fraction of cite web templates should really be cite journal, cite magazine, cite news, etc., and their work parameters should really be the title of the journal, magazine, or newspaper. Calling it a website makes a stupid use of the wrong template even stupider, and will no doubt encourage users to fill it in with the url or hostname instead of the actual title of the collective work. I think switching the name of the parameter in this way is a bad idea. —David Eppstein (talk) 01:51, 1 August 2019 (UTC)

Adding chapter-url identical to URL
Never seen that before. Probably should also detect and fix this too. AManWithNoPlan (talk) 20:31, 4 August 2019 (UTC)

garbage publisher
This is because people think journals with a PMC/PMID entry are published by 'National Center for Biotechnology Information, U.S. National Library of Medicine'. &#32; Headbomb {t · c · p · b} 03:25, 27 July 2019 (UTC)
 * there are lot of things where that is the valid publisher. Will have to think about. AManWithNoPlan (talk) 21:28, 5 August 2019 (UTC)
 * It won't be a legit publisher for a journal, though. &#32; Headbomb {t · c · p · b} 21:30, 5 August 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2051 AManWithNoPlan (talk) 14:16, 6 August 2019 (UTC)

If title = journal, TNT both and refill
https://github.com/ms609/citation-bot/pull/2054 AManWithNoPlan (talk) 14:54, 6 August 2019 (UTC)

If a title is in allcaps (and long?), TNT and reget
https://github.com/ms609/citation-bot/pull/2054 AManWithNoPlan (talk) 14:54, 6 August 2019 (UTC)

More JSTOR
Only stable JSTORs that match specific patterns are processed. I will have to look into adding more Regex. AManWithNoPlan (talk) 21:27, 5 August 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2052 AManWithNoPlan (talk) 14:16, 6 August 2019 (UTC)

Broken AcademiaEdu and Silverchair URLs
https://github.com/ms609/citation-bot/pull/2053 AManWithNoPlan (talk) 14:28, 6 August 2019 (UTC)


 * URL change requests can also be made at WP:URLREQ which can do many URL-specific issues like URLs located outside of CS1|2 templates, templates, archive-url additions and deletions, fixing bad encoding, updating IABot database, URLs on Commons, etc..  --  Green  C  14:47, 6 August 2019 (UTC)
 * Thank you. However, the last time I understood it doesn't handle outright removal of URLs which are pure garbage. Adding wayback machine links to those garbage URLs only multiplies the garbage for no benefit. Nemo 16:48, 6 August 2019 (UTC)

Ugh there's another one, onlinelibrarystatic.wiley.com/store/ Nemo 18:15, 6 August 2019 (UTC)

MR Support
I find the bot an extremely good idea; many thanks to its developers! Here is one feature request: the bot should also apply to snippets such as

by referring to MathSciNet (in this case to and retrieve the information from there (or possibly retrieve the doi from there and then proceed as usual). Thanks for considering this extension! Jakob.scholbach (talk) 09:56, 23 July 2019 (UTC)
 * The big issue is that MathSciNet is subscription only, and will likely not allow bots to scrape the data. &#32; Headbomb {t · c · p · b} 11:03, 23 July 2019 (UTC)
 * we already support this, but explicitly do not enable this feature because you have to manually do a captcha. AManWithNoPlan (talk) 11:12, 23 July 2019 (UTC)


 * OK, I am entirely ignorant about how the bot works, but if I am able to access the mathscinet content in my browser, is there no way of giving the same access to the bot?
 * Of course I would not expect it to work for someone without a subscription. Jakob.scholbach (talk) 11:42, 23 July 2019 (UTC)


 * I just saw at the github page that the bot is able to pull data from JSTOR. JSTOR is also behind a pay-wall, so what precisely is the difference here? Jakob.scholbach (talk) 11:44, 23 July 2019 (UTC)


 * jstor is not behind a paywall. You can visit it all day and night. Secondly, the meta-data is per my request not protected by any captcha on jstor.  AManWithNoPlan (talk) 11:57, 23 July 2019 (UTC)


 * I have a couple ideas and will look again. I must admit I was wrong.  MR is NOT CAPTCHA protected. AManWithNoPlan (talk) 11:22, 25 July 2019 (UTC)
 * I feel you will likely get hit by the MR banhammer for bots, but it's at least worth investigating using existing MR to complete the rest of the information. I think the confusion happened because Zbl is captcha protected. You won't be able to search MR though, that's a subscription only thing. &#32; Headbomb {t · c · p · b} 13:17, 25 July 2019 (UTC)

Am I seeing it in action? special:diff/908301084 at least fixed the case. Thanks, Nemo 21:00, 28 July 2019 (UTC)
 * nope. AManWithNoPlan (talk) 00:25, 7 August 2019 (UTC)
 * I will look at. For example, some even have doi information https://mathscinet.ams.org/mathscinet-getitem?mr=22222 AManWithNoPlan (talk) 01:10, 7 August 2019 (UTC)
 * will now see if MR linked page has a DOI and will add that. fixed as we can since refs are mostly free format AManWithNoPlan (talk) 19:21, 7 August 2019 (UTC)

Bot down?
I can't seem to make the bot edit for a few hours now. &#32; Headbomb {t · c · p · b} 01:52, 7 August 2019 (UTC)
 * Very much so. I have asked for a reboot. AManWithNoPlan (talk) 14:03, 7 August 2019 (UTC)
 * Webservice restarted and bot appears operational. Martin  (Smith609 – Talk)  15:24, 7 August 2019 (UTC)

fixed

Better batch/queuing handling
When you ask the bot to run on say Category:Foobar, the entire category will enter the job queue and get processed. So if means if you have something like


 * 10:00:00 am Category:Foobar A is requested to be processed by User:A
 * 10:00:01 am Foobar B is requested to be processed by User:B
 * 10:00:02 am Category:Foobar C is requested to be processed by User:C
 * 10:00:03 am Foobar D is requested to be processed by User:D

You could very well have 10:00 Foobar A1 is being processed 10:01 Foobar A2 is being processed 10:02 Foobar A3 is being processed 10:03 Foobar A4 is being processed 10:04 Foobar A5 is being processed 10:05 Foobar A6 is being processed 10:06 Foobar A7 is being processed 10:07 Foobar A8 is being processed 10:08 Foobar A9 is being processed 10:10 Foobar A10 is being processed 10:11 Foobar A11 is being processed 10:12 Foobar A12 is being processed 10:13 Foobar A13 is being processed 10:14 Foobar A14 is being processed 10:15 Foobar A15 is being processed 10:16 Foobar A16 is being processed 10:17 Foobar B is being processed 10:18 Foobar C1 is being processed ... 13:14 Foobar C235 is being processed 13:15 Foobar D is being processed

Leading to massive delays for User B and User D. A fairer queuing process would be to put each request into a bin


 * Bin A [16 articles]
 * Bin B [1 article]
 * Bin C [235 articles]
 * Bin D [1 article]

And cycle between active 'bins' until each get empty. So you'd have a queue that looks like 10:00 Foobar A1 is being processed 10:01 Foobar B1 is being processed 10:02 Foobar C1 is being processed 10:03 Foobar D1 is being processed 10:04 Foobar A2 is being processed 10:05 Foobar B2 is being processed 10:06 Foobar A2 is being processed 10:07 Foobar B2 is being processed ... 10:35 Foobar A16 is being processed 10:36 Foobar B16 is being processed 10:36 Foobar B17 is being processed 10:36 Foobar B18 is being processed 10:36 Foobar B19 is being processed ... 13:15 Foobar C235 is being processed &#32; Headbomb {t · c · p · b} 21:27, 29 June 2019 (UTC)

I would have to think about that. It your description of the current mode of operation is off; but, there could be improvements done. AManWithNoPlan (talk) 22:58, 29 June 2019 (UTC)
 * Whatever the current logic is, the taxonbar run right now is blocking anyone else from requesting edits. Similar things happen whenever I requested category runs.&#32; Headbomb {t · c · p · b} 23:30, 29 June 2019 (UTC)
 * There is no logic. Tasks are processed—for the most part—as received.  Category and multiple page runs are treated as multiple tasks. AManWithNoPlan (talk) 02:02, 30 June 2019 (UTC)
 * The bot just does the entire category in one PHP request which has no knowledge of other people waiting. I see two possibilities: 1) ask Toolforge sysadmins how to get it to handle more requests concurrently; 2) split up the category job in multiple requests, e.g. by making the category page redirect to the process page URL for one title which will process just one page and redirect to the next and so on.
 * Arguably, the fundamental problem with category runs is that they mostly encounter pages which don't need to be treated. This run presumably went through over 1000 pages, checking all their URLs and identifiers and everything, but was only needed for less than 200. On the other hand, the whole point of the bot is that it saves time to a human who would otherwise have to do the hard work, such as selecting the pages which need some edits. Nemo 06:10, 30 June 2019 (UTC)
 * Refill2 uses celery to manage worker. If we go that type of route, then the category API would be changed to list generator that then calls the page API with a list.  Single point of entry.   AManWithNoPlan (talk) 15:12, 30 June 2019 (UTC)
 * Yes, with a tiny bit of additional complexity (preferably handled by some external library) the multi-page editing could be handled much better. Nemo 15:47, 30 June 2019 (UTC)

On top of binning, there could be some parallel processing of some kind, like having multiple instances of Citation bot running on the tool server, and when one of them was ready to make an edit, it would get queued. This way if you run on an article that takes ~10 minutes to process, other articles could still get dealt with. &#32; Headbomb {t · c · p · b} 23:19, 2 July 2019 (UTC)
 * Job Arrays work on Toolforge, it will run 16 slots at a time filling in empty slots until the submitted job queue is done (unlimited size). Requires something like ZOT to do file locking on disk writes, or application-level file locking. -- Green  C  01:17, 3 July 2019 (UTC)


 * I have to say that with the way the bot is being used right now, this really, really would help. I made a ~50 article request last night that took something like 3-4 hours to process. Would have been nice to be able to use the bot on select articles while the large run was going on. &#32; Headbomb {t · c · p · b} 15:51, 3 July 2019 (UTC)
 * Since usage has gone up recently by quite a bit this would definitely help (obviously). This actually would enable more people to use the bot at the same time, or if a single person splits their request to do said request faster. We do have to then look out for people who might (ab)use this by just submitting 10x the jobs and taking everything for themselves either by accident or lack of patience but it seems that that might be mitigated by keeping some "slots" free for "Expand citations" via the toolbar, AFCH templates and single page request via the api/interface if that would be possible. --Redalert2fan (talk) 12:50, 14 July 2019 (UTC)
 * Usage should go down substantially now the Marianne account is blocked. I and a few others will still make big requests, but at least it won't be constant. &#32; Headbomb {t · c · p · b} 20:03, 14 July 2019 (UTC)
 * Definitely better since the block yes, however when 2 users run (like you and I at the moment of posting) there is already a noticeable delay. While not constant right now now it could become a problem if even more people use the bot. In my opinion it would be better to "future proof" the bot, I understand this takes a lot of work but again in my opinion if it can be done would be helpful. --Redalert2fan (talk) 17:21, 19 July 2019 (UTC)
 * Yes, even small-ish ~100 article runs are nightmares to do sometimes. I found that asking to do more that than are often leads to large delays and timeout errors. Which is a shame because you can find articles in need of highly-probably cleanup/tidying that the bot could do (like running on all pages that contain ), but those often number in the thousands. &#32; Headbomb {t · c · p · b} 17:30, 19 July 2019 (UTC)
 * I just waited an hour for a batch of about 10 to start and got a 504 timeout, not very encouraging to operate. I have no problem with waiting and running again but others might not and lose interest. Productivity is being lost sadly. I'm not particularly looking for the bot to be quicker, when one person runs it the time it takes to check is fine, what I would be looking for is that multiple users can use it at the same time. Would it be possible to run more instances at the same time? Yes the bot might have to throttle its edits but that's better than having user jobs not starting within a reasonable time. Redalert2fan (talk) 18:42, 19 July 2019 (UTC)

I suggest you do not run it in slow mode. Disables AdsAbs and zotero AManWithNoPlan (talk) 18:05, 19 July 2019 (UTC)
 * Not sure what more would be lost, so I'd rather run the full gamut of fixes. Also ADSABS is very desirable. &#32; Headbomb {t · c · p · b} 18:18, 19 July 2019 (UTC)
 * Speaking of AdsAbs, seems we might run out of uses again today. Redalert2fan (talk) 18:42, 19 July 2019 (UTC)
 * The easiest solution is to further reduce the timeout on individual requests to Zotero and others: it helps avoid traffic jams when there are too many requests and/or a single URL inside a page is especially slow.
 * Let me also remind that when citation bot doesn't make you happy you can always spend some time on OABOT! Nemo 18:38, 19 July 2019 (UTC)
 * OABot is good after cleanup has been done usually. &#32; Headbomb {t · c · p · b} 20:39, 19 July 2019 (UTC)
 * Yes, and many articles are now ready for an OAbot run: there are about 30k articles in the queue as of now, with 35k link suggestions. Nemo 08:53, 20 July 2019 (UTC)
 * I added a note on the bot's userpage, letting people know of OAbot. Feel free to tweak it. &#32; Headbomb {t · c · p · b} 19:32, 20 July 2019 (UTC)
 * Surprised to see that nobody blocked OABot for adding CiteSeerX links. Go figure. — kashmīrī  TALK  03:49, 27 July 2019 (UTC)

Some statistics on the busiest months, just for context:

MariaDB [enwiki_p]> select substr(rev_timestamp, 1, 6) as date, count(rev_id) AS count from revision_userindex where rev_actor=307 group by date having count > 2000;                                    ++---+ ++---+ ++---+ 26 rows in set (5 min 21.82 sec) Nemo 09:33, 22 July 2019 (UTC)
 * date  | count |
 * 200810 | 2260 |
 * 200812 | 47504 |
 * 200903 | 2963 |
 * 200904 | 16344 |
 * 200905 | 7279 |
 * 201001 | 4072 |
 * 201003 | 4356 |
 * 201012 | 4251 |
 * 201103 | 2818 |
 * 201105 | 2398 |
 * 201302 | 2453 |
 * 201303 | 2244 |
 * 201403 | 2935 |
 * 201410 | 2059 |
 * 201708 | 6116 |
 * 201805 | 3245 |
 * 201808 | 6531 |
 * 201809 | 10076 |
 * 201810 | 7365 |
 * 201811 | 11001 |
 * 201812 | 10289 |
 * 201901 | 19332 |
 * 201902 | 47795 |
 * 201903 | 28010 |
 * 201906 | 7557 |
 * 201907 | 26383 |
 * Stupid large runs hogging all the resources... a category with 5K+ articles is not great. &#32; Headbomb {t · c · p · b} 23:38, 29 July 2019 (UTC)
 * It used to have 16k! When either you or Chris are using the bot for batch runs, I just go do something else. :) Nemo 06:54, 30 July 2019 (UTC)
 * Yeah, well not much choice. I limit mine in bunches of 100 usually, this way any other request made will not be delayed for too long and won't time out. But it would be nice to just be able "Alright, deal with those X thousand pages with this stuff that's completely fixable". &#32; Headbomb {t · c · p · b} 12:05, 30 July 2019 (UTC)

This is getting really, really annoying to have request constantly timeout for hours because large categories are being requested. Please prioritize this. &#32; Headbomb {t · c · p · b} 09:09, 5 August 2019 (UTC)


 * I have no idea how how the tool servers handle multiple requests. Is seems as if they all run in parallel and the tool server only give so much cpu to the tools as a whole.  AManWithNoPlan (talk) 12:06, 5 August 2019 (UTC)
 * any ideas/feedback here? &#32; Headbomb {t · c · p · b} 12:35, 5 August 2019 (UTC)


 * I am willing to bet money that this is 95% zotero/citoid and 5% the bot. I have an idea. AManWithNoPlan (talk) 17:34, 6 August 2019 (UTC)


 * I've implemented Nemo's suggestion for citation bot and restarted the webservice. Let me know if it makes any difference. Martin  (Smith609 – Talk)  15:54, 7 August 2019 (UTC)


 * Well, so far, nope. But someone just requested Category:Living people to be processed, with 900K articles in it. Please kill that run! &#32; Headbomb {t · c · p · b} 17:40, 7 August 2019 (UTC)

Today it feels better for me: I managed to use the gadget with very good response times even as Headbomb was doing some batches. Nemo 12:57, 8 August 2019 (UTC)
 * The tool does feel better/faster. However, I've yet to see different batches run alternatively. Nemo's success is possibly due to breaks in my requests (I ask for ~100 articles at a time which gives the bot a chance to catch up on other requests without timeouts). I do recall being able to use the citation helper script while the bot was doing a batch run though. &#32; Headbomb {t · c · p · b} 13:56, 8 August 2019 (UTC)
 * Nope, I mean I get speedy responses from the gadget in the midst of one of your run, in the same minutes when I see the bot perform several edits. the requests for single pages are much more efficient than batch requests, yes. Nemo 14:08, 8 August 2019 (UTC)
 * I confirm speedy response via the of the gadget. Doesn't work through toolbar link/API however. &#32; Headbomb {t · c · p · b} 16:23, 8 August 2019 (UTC)
 * Many small requests work better than few huge ones, if you want I can write you a small script to do it efficiently. Email me to have it in your inbox. Nemo 17:21, 8 August 2019 (UTC)

Flagging as fixed for now. Will loop back as needed. Continue discussion under white list topic as needed. AManWithNoPlan (talk) 15:07, 9 August 2019 (UTC)

Links to search.proquest.com
What's the point of all those search.proquest.com links? When I click one from an otherwise complete citation template, I'm not even presented with a title for the resource, so I can't be sure whether the link points to something else entirely. I see they're sometimes pasted as part of some ready made textual citation with a "Retrieved from" link, so I doubt the editors were actually interested in keeping such links. Are they fine to remove? Nemo 18:12, 12 July 2019 (UTC)


 * if you are at the library (or have a library card), you can login and get them. Also, the link sometimes leads to a preview.  Often when logged in with my library card, I can get a preview.  AManWithNoPlan (talk) 20:44, 12 July 2019 (UTC)
 * But how does one verify the link leads to the correct resource, without access? Nemo 21:40, 12 July 2019 (UTC)


 * I think it might be better to use Template:ProQuest within id instead of placing the paywalled URL in url, but agree with that there’s no need to remove the link altogether. Umimmak (talk) 20:48, 12 July 2019 (UTC)
 * Using id seems indeed superior to me. Is that something we can do systematically? Nemo 21:40, 12 July 2019 (UTC)


 * some other bot needs to change all the proquest.umi.com links into the equivalent search.proquest.com urls too (the document numbers are not the same 🙄) AManWithNoPlan (talk) 03:08, 14 July 2019 (UTC)
 * the bot now does extensive pro quest url cleanup. The umi.com ones are now fixed and most proxies and session specific information should be removed.  AManWithNoPlan (talk) 12:49, 20 July 2019 (UTC)

fixed AManWithNoPlan (talk) 15:13, 9 August 2019 (UTC)

If title ends with 'on JSTOR', TNT title and reget
To be clear, this isn't simply stripping 'on JSTOR' form the title, but rather reseting it entirely. &#32; Headbomb {t · c · p · b} 02:21, 8 August 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2060 AManWithNoPlan (talk) 13:51, 8 August 2019 (UTC)

title / encyclopedia duplication in cite encyclopedia
Code did not realize that encyclopeAdia was alias. AManWithNoPlan (talk) 15:08, 9 August 2019 (UTC)

10.5555 / Global Plants DOI invalid
10.5555 is a test doi prefix and will never resolve. On Wikipedia, the vast majority of them are for JSTOR Global Plants. In fact, nearly all 10.5555/... DOIs can probably be removed and converted to https://plants.jstor.org/stable/10.5555/.... They should check if that url resolves however, since there are some 10.5555 DOIs that are tests for other things. &#32; Headbomb {t · c · p · b} 09:31, 8 August 2019 (UTC)
 * Any other 10.5555 DOI should be removed if there's a working URL provided. In total, those account for 435/ 3,281 = 13.25% of Category:Pages with DOIs inactive as of 2019 &#32; Headbomb {t · c · p · b} 09:40, 8 August 2019 (UTC)
 * https://github.com/ms609/citation-bot/pull/2059 AManWithNoPlan (talk) 13:38, 8 August 2019 (UTC)
 * that should also remove doi-broken-date, e.g. ... &#32; Headbomb {t · c · p · b} 16:10, 9 August 2019 (UTC)
 * Very true indeed. https://github.com/ms609/citation-bot/pull/2066 AManWithNoPlan (talk) 17:48, 9 August 2019 (UTC)

If title ends with 'IEEE Xplore Document', TNT title and reget
Not fixed. At least not fully. I had to manually TNT +. &#32; Headbomb {t · c · p · b} 18:39, 9 August 2019 (UTC)
 * Not sure what the delay was. AManWithNoPlan (talk) 18:51, 9 August 2019 (UTC)

remove doi-broken-date if no doi
Kinda duplicate with one above, but this should be generalized behaviour, not just specific to 10.5555 broken DOIs. &#32; Headbomb {t · c · p · b} 16:12, 9 August 2019 (UTC)
 * Already thought of that. https://github.com/ms609/citation-bot/pull/2066 AManWithNoPlan (talk) 17:47, 9 August 2019 (UTC)
 * Not fixed +  &#32; Headbomb {t · c · p · b} 18:46, 9 August 2019 (UTC)
 * it can take a second for source new code to start executing. Slower than usual today (or you are faster). AManWithNoPlan (talk) 18:52, 9 August 2019 (UTC)