User talk:Citation bot/Archive 6

Bot does not handle aliases
The bot also adds "pages=" when there is already a "p=" or "pp=" or "page=" card. AManWithNoPlan (talk) 21:00, 26 December 2015 (UTC)
 * ...or "at=". Lithopsian (talk) 16:36, 29 December 2015 (UTC)
 * https://en.wikipedia.org/w/index.php?title=Calutron&type=revision&diff=688482098&oldid=688481880  Hawkeye7 (talk) 07:02, 1 November 2015 (UTC)


 * Another example of pages= being added when page= already exists: https://en.wikipedia.org/w/index.php?title=Inflow_%28meteorology%29&diff=next&oldid=698774275


 * Issue and number too: https://en.wikipedia.org/w/index.php?title=Ferruccio_Busoni&diff=prev&oldid=724431335 AManWithNoPlan (talk) 20:04, 16 July 2016 (UTC)


 * Here is hoses things by replacing a page range, with a page beginning https://en.wikipedia.org/w/index.php?title=History_of_Kentucky&diff=731670614&oldid=731670055   Stevie is the man!  Talk • Work

The solution is to edit objects.php in the functions add_if_new adding the needed things such as changing into Also will need to add some, like this: since they are caught in the catch all: AManWithNoPlan (talk) 15:06, 9 August 2016 (UTC)

See this diff, which results in a slew of citation errors for having both pages and pp, and note that in many of those entries, it munges the page range into an (inaccurate) single page. Squeamish Ossifrage (talk) 13:18, 18 October 2016 (UTC)
 * this is really a bug in the citation templates for allowing a bazillion different ways to say the same thing, but the bot needs to deal with it. The code to fix it is in the git repository.  No one with the power upload it to the wmflabs has done so.  So, it's also a bug in us meat bags too.  AManWithNoPlan (talk) 14:25, 18 October 2016 (UTC)

Bot is running but bug is not fixed. https://en.wikipedia.org/w/index.php?title=S-50_%28Manhattan_Project%29&type=revision&diff=773923091&oldid=773516462  Bot must be shut down until bugs can be fixed. Hawkeye7 (talk) 06:53, 5 April 2017 (UTC)
 * Anyone with the power to stop the bot probably has the power to upload the fixes. AManWithNoPlan (talk) 15:16, 5 April 2017 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:17, 7 September 2017 (UTC)

Edits citations inside of nowiki tags
The solution is to deal with this at the same time that the code escapes out comments AManWithNoPlan (talk) 04:42, 6 August 2016 (UTC) In objects.php add these lines right after equivalent comment lines: AManWithNoPlan (talk) 16:08, 9 August 2016 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:17, 7 September 2017 (UTC)

Duplicating jstor
The bad code are these lines of get_identifiers_from_url in objects.php: They should match the doi code, which is a forget followed by a set: I can't explain why one works and the other does not, but that is what happens. AManWithNoPlan (talk) 03:13, 9 August 2016 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:17, 7 September 2017 (UTC)

Google books data is sometimes rubbish
Also: https://en.wikipedia.org/w/index.php?title=Homing_pigeon&diff=prev&oldid=682284024
 * the bot thinks it can interpret Google Books metadata, and fails badly for journal articles that are published within journal issues listed as books by Google Books. —David Eppstein (talk) 04:56, 23 September 2015 (UTC)
 * (EC) I think you have to propose a solution if you want this fixed - the bot took the "title" from the Google books link, which is generally appropriate. Example of solution: ask the bot to leave the title untouched IF the template type is "cite journal" AND the url contains "books.google" AND the citation is not retrievable through crossref/pmid/etc databases, but still fix the title if the template is "cite book"? (I admit this criterion is somewhat too complex.). Materialscientist (talk) 04:57, 23 September 2015 (UTC)
 * In my experience the metadata at Google books is too unreliable to ever use without human intervention. It's often a good starting point, but it regularly does things like replacing the actual publisher name with the name of a business entity that later bought the publisher, using publication years that are much later than the actual publisher, mangling author names, listing minor contributors (e.g. the author of a preface) as the author of a whole book, listing multiple book series for a book only one of which is correct, listing publisher names as authors and author names as publishers, filling in the "edition" field with descriptive text instead of the edition number, listing only one author or editor for a book that has more than one, etc. —David Eppstein (talk) 05:33, 23 September 2015 (UTC)


 * Yes. I think we should avoid any automated, or even semi-automated, any extractions from Google metadata. Even having a human pass on such extractions is too slack, as, at best, such data is in no way authoritative, and suitable only as hints for further research. ~ J. Johnson (JJ) (talk) 21:51, 23 September 2015 (UTC)
 * I think manual extractions are ok as long as they are doublechecked against either the preview or a hardcopy. And editors who don't have a preview or a hardcopy shouldn't be adding the citation at all. But the bot can't do any of that, it can only copy what Google already has wrong, and that's not good enough. —David Eppstein (talk) 03:04, 30 September 2015 (UTC)
 * In such cases we are not doublechecking the metadata; we're using it to find an authoritative instance from which to extract the data directly. At any rate, I think we are agreed that a bot should not be making any changes or additions based on the Google metadata. ~ J. Johnson (JJ) (talk) 22:01, 2 October 2015 (UTC)

In objects.php AManWithNoPlan (talk) 02:49, 7 August 2016 (UTC) Change to:

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:17, 7 September 2017 (UTC)

Erroneously reports DOI as broken
I thought this was fixed and marked it as so. Currently, doi is flagged as invalid if crossref fails, which is reasonable, but need to also check is dx.doi.org also failed AManWithNoPlan (talk) 00:42, 18 November 2015 (UTC)
 * I encounter this bug quite often and find it annoying, because in my naive thinking it should be easy to make the bot check the dx.doi.org/xxx link for a "broken" doi. A fresh example: run doi bot on Africanized bee, it will mark as inactive. Materialscientist (talk) 03:34, 5 February 2016 (UTC)
 * Maybe the solution is change this code. I think this code only adds broken date if there is no re-direct information in dx.doi.org headers (lack of redirect implies dead doi):

to: and change this code: to: AManWithNoPlan (talk) 16:28, 9 August 2016 (UTC)

Is this the same bug? Another editor reverted before I could act, but I checked and the doi is not broken at all. Hawkeye7 (talk) 22:25, 4 October 2016 (UTC)
 * Hard to tell. It works now.  Probably a transient cross-ref failure. AManWithNoPlan (talk) 23:49, 4 October 2016 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

Bot created arXiv= parameter error
The bot removed the class portion of the arXiv parameter value in. It should not have done so. There are two kinds of arXiv parameters, explained in the documentation as follows:
 * arxiv or eprint (Mandatory): arXiv/Eprint identifier, without any "arXiv:" prefix. Prior to April 2007, the identifiers included a classification, an optional two-letter subdivision, and a 7-digit YYMMNNN year, month, and sequence number of submission in that category.  E.g. gr-qc/0610068 or math.GT/0309136.  After April 2007, the format was changed to a simple YYMM.NNNN. Starting in January 2015, the identifier was changed to be 5 digits: YYMM.NNNNN.
 * class: arXiv classification, e.g. hep-th. Optional.  To be used only with new-style (2007 and later) eprint identifiers that do not include the classification.

The bot should not modify valid arxiv or eprint parameters. – Jonesey95 (talk) 03:56, 7 December 2015 (UTC)
 * Here's a minimal diff showing this problem. Lithopsian (talk) 00:02, 29 December 2015 (UTC)
 * This is still happening. – Jonesey95 (talk) 13:30, 27 July 2016 (UTC)

Here is an example of one that gets broken. AManWithNoPlan (talk) 15:49, 9 August 2016 (UTC)

Here is the offending source code from objects.php: that should be: AManWithNoPlan (talk) 15:56, 9 August 2016 (UTC)

This only occurs if class is set AManWithNoPlan (talk) 00:26, 14 October 2016 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 03:19, 7 September 2017 (UTC)

Link at top of results page leads to error
This code in objects.php : needs changed to AManWithNoPlan (talk) 21:10, 6 August 2016 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

Error converting url to arxiv parameter
Just need to strip the .pdf off of url when converting url to eprint. Super easy code change. AManWithNoPlan (talk) 19:19, 9 January 2016 (UTC)

Change in objects.php to and change this: to:

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

JSTOR plant link mistaken for journal
That's annoying that JSTOR has chosen to add a new type of stable link (although it does start with plant) AManWithNoPlan (talk) 19:21, 7 February 2016 (UTC)

The fix needs put in objects.php the third through fifth lines AManWithNoPlan (talk) 21:00, 6 August 2016 (UTC)
 * new github pull that changes plants to plants.jstor.com AManWithNoPlan (talk) 22:33, 4 September 2017 (UTC)
 * That pull is accepted, but #116 opened 12 minutes ago by kaldari has now been added that fixes the missing bracket before the elseif. AManWithNoPlan (talk) 16:42, 5 September 2017 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

When bibcodes ends with a dot, it leaves the dot out
I think the solution is to modify objects.php to add a special case for bibcodes, to sit above the catch all code: such as: AManWithNoPlan (talk) 21:34, 6 August 2016 (UTC)
 * Here's another diff showing this bug. – Jonesey95 (talk) 16:17, 10 August 2016 (UTC)
 * And another diff showing this bug. GoingBatty (talk) 13:38, 19 August 2016 (UTC)
 * new github pull fixes this better by changing V-9 to 9-V. Which is correct fix.  AManWithNoPlan (talk) 22:35, 4 September 2017 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

Comments cause trouble
As far as I can tell, there were no duplicated parameters when the bot did its edit. – Jonesey95 (talk) 02:54, 9 November 2014 (UTC)
 * How did you get this? The bot is not currently working.-- Auric    talk  13:49, 9 November 2014 (UTC)
 * The edit is date-stamped 15 October 2014. I just discovered it yesterday while going through . – Jonesey95 (talk) 15:48, 9 November 2014 (UTC)
 * Here's another similar one, adding DUPLICATE to archiveurl and archivedate. – Jonesey95 (talk) 19:53, 10 November 2014 (UTC)
 * This looks like it related to comments in the references in all cases. This appears to be a common thread in bot bugs on this page. AManWithNoPlan (talk) 04:45, 1 February 2015 (UTC)

Adding bogus year https://en.wikipedia.org/w/index.php?title=Wealden_Line&diff=629805699&oldid=629545497

DUPLICATE_ added: https://en.wikipedia.org/w/index.php?title=509th_Composite_Group&diff=636859536&oldid=636220208

DUPLICATE_ added: https://en.wikipedia.org/w/index.php?title=Shapley%E2%80%93Folkman_lemma&diff=655089982&oldid=651991293


 * This bug appears to still be present in the current version, as of this date stamp. Pinging . – Jonesey95 (talk) 03:46, 22 September 2015 (UTC)
 * Give it another try? I tested the dev version (now the actual version) on testwiki and it didn't add DUPLICATE: https://test.wikipedia.org/w/index.php?title=User%3AFhocutt_%28WMF%29%2FCitation_bot_test&type=revision&diff=243602&oldid=243601 . --Fhocutt (WMF) (talk) 23:04, 9 October 2015 (UTC)
 * It's still doing it here on en.WP. – Jonesey95 (talk) 23:15, 9 October 2015 (UTC)


 * Here is a very simple reproducer

Here are a variety of lines from the bot source code (i might have missed one) I think the problem is the first one. It is greedy. The .* needs to be .*? like number three. AManWithNoPlan (talk) 20:41, 7 August 2016 (UTC)


 * Still happening Hawkeye7 (talk) 02:36, 14 February 2017 (UTC)

https://en.wikipedia.org/w/index.php?title=2010_New_York_Yankees_season&type=revision&diff=797318586&oldid=796799252 Plastikspork

https://en.wikipedia.org/w/index.php?title=Alpha_particle&diff=795641460&oldid=795641155 Headbomb

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

Google data is not always right, and the bot is not telepathic
The date is grabbed from Google and not massaged at all. AManWithNoPlan (talk) 00:40, 23 January 2017 (UTC)
 * Pull added to github that detects Time Inc AManWithNoPlan (talk) 00:14, 6 September 2017 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

Bot generated invalid cite data "# # # comment"
This is because the search and replace is case sensitive, which is fine an dandy 99.9% of the time. Obviously, 0.1% of the time it fails. AManWithNoPlan (talk) 15:16, 5 April 2017 (UTC)

resolved in development branch. Live soon. AManWithNoPlan (talk) 02:33, 7 September 2017 (UTC)

Incorrect DOI removal
This is the comments bug. The bot uses a greedy search for comments. AManWithNoPlan (talk) 13:13, 22 July 2017 (UTC)
 * I just proved it by undoing the previous edit, then removing the comments, and finally running the bot again. https://en.wikipedia.org/w/index.php?title=Referred_itch&diff=prev&oldid=791790626 AManWithNoPlan (talk) 14:21, 22 July 2017 (UTC)

resolved in development branch. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

Authors must be people, not companies
Perhaps the bot could look for keywords like 'magazine', 'journal', 'newspaper', etc and common variations (eg upper/lowercase, plurals).  Stepho  talk 09:29, 16 August 2017 (UTC)

resolved in development branch for a few select authors. Soon to be live. AManWithNoPlan (talk) 02:18, 7 September 2017 (UTC)

issue vs. volume confusion for journals with no volumes
http://search.crossref.org/?q=10.3897/zookeys.445.7778 The cross-ref data is wrong. So, it is not a bot bug, but the bot could easily fix it. AManWithNoPlan (talk) 19:15, 2 October 2015 (UTC)
 * The bot need to add special code for journals like this. And then internally store a list of of such journals.  AManWithNoPlan (talk) 00:13, 3 January 2016 (UTC)

The solution is to add code to objects.php in the public function add_if_new($param, $value) AManWithNoPlan (talk) 02:10, 7 August 2016 (UTC) And change this code: to

Might be best long term to have a global array of such journals rather than having to keep adding them one by one.


 * New github pull request with zookeys spelled correct added. AManWithNoPlan (talk) 16:56, 7 September 2017 (UTC)


 * resolved AManWithNoPlan (talk) 17:42, 7 September 2017 (UTC)

is not a journal name

 * In this case it looks like bad data at ADS rather than the bot's fault. —David Eppstein (talk) 06:56, 24 September 2015 (UTC)
 * Yes, but I think that the bot can have one line of code that refuses to add a journal name that is unknown. AManWithNoPlan (talk) 15:22, 1 October 2015 (UTC)
 * I think this fix is needed in objects.php is second line and fourth line AManWithNoPlan (talk) 20:59, 6 August 2016 (UTC)


 * A new git pull has been submitted by someone to add the == 0 part to it that I missed. I guess the fact that I do not know php is showing.  AManWithNoPlan (talk) 16:56, 5 September 2017 (UTC)


 * New pull added that checks in more places. AManWithNoPlan (talk) 16:57, 7 September 2017 (UTC)


 * resolved AManWithNoPlan (talk) 17:41, 7 September 2017 (UTC)

Special characters in data need escaped
This is a pretty obscure bug, but if someone wanted to fix it, they could run the title through a regex to look for "Kaldari (talk) 20:56, 22 September 2015 (UTC)

And pipes too: https://en.wikipedia.org/w/index.php?title=User%3AJonesey95%2Fsandbox2&diff=prev&oldid=694077824
 * The problem is that the source of the metadata, http://adsabs.harvard.edu/abs/1991bsc..book.....H, has a vbar within an author's name, I think erroneously as the author in question doesn't use a middle name or initial, and the bot doesn't recognize it and quote it to prevent it becoming a parameter delimiter. So I think there are really two issues here: (1) bad data elsewhere that we can't do much about, and (2) better bot handing of special characters in external data. —David Eppstein (talk) 21:39, 6 December 2015 (UTC)
 * I have added a diff in the bug description above. When vertical bars occur in URLs, replace each vertical bar with . When vertical bars occur in parameter values that are not URLs, replace each vertical bar with  . – Jonesey95 (talk) 23:46, 6 December 2015 (UTC)
 * Yes that's it. Sounds like a sensible solution.  I've not seen one of these where the vertical bar is anything other than a mistake, but I suppose it is possible in some cases.  Even for a mistake, it is perhaps best for the bot to keep the character, without breaking the formatting, and someone to take it out by hand if it is really obnoxious. Lithopsian (talk) 12:25, 7 December 2015 (UTC)
 * Sometimes for news site or web site sources, the pipe character or spaced dash may come up in title values, where it should really be treated as a field delimiter between title and publisher. I'm not sure if citationbot checks for that, but certainly there are some other tools that are getting it wrong. It would be good if citationbot caught and corrected those errors, rather than just converting the character to have a less-obvious error. LeadSongDog come howl!  17:06, 7 December 2015 (UTC)

Need to add the second line here in expandFns.php AManWithNoPlan (talk) 15:22, 9 August 2016 (UTC) also in object.php need to do a lot of changing this: to this: within these areas:


 * the top bug fix is missing the semicolon at the end of the line. I had GlazerMann submit a new pull request to github.  AManWithNoPlan (talk) 16:04, 31 August 2017 (UTC)


 * New github pull submitted that does this for more types of data sources (DOI, PMID, etc.) AManWithNoPlan (talk) 16:56, 7 September 2017 (UTC)

resolved in development branch. AManWithNoPlan (talk) 15:20, 11 September 2017 (UTC)

Inline "Citations" button does not work as well as calling the bot through link
Button on the left sets slow=1, inline editing does not. Maybe slow should be removed from bot, made default, or added to button. AManWithNoPlan (talk) 13:48, 15 September 2017 (UTC)
 * I have whined at https://en.wikipedia.org/wiki/MediaWiki_talk:Gadget-citations.js AManWithNoPlan (talk) 15:03, 21 September 2017 (UTC)
 * you know it’s a big deal when the bot writer reports a bug. AManWithNoPlan (talk) 03:41, 23 September 2017 (UTC)

resolved The page is edited. AManWithNoPlan (talk) 15:02, 25 September 2017 (UTC)

Adding invalid field (|DUPLICATE_work)
This is not a bug. Check archive discussion to see why this is a good thing. AManWithNoPlan (talk) 16:07, 27 September 2017 (UTC)
 * https://en.wikipedia.org/wiki/User_talk:Citation_bot/Archive_5#DUPLICATE_parameters AManWithNoPlan (talk) 16:24, 27 September 2017 (UTC)

notabug

Standardize and Customize Journal Capitalization
Data on NCBI seems to be ok: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC83919/ where the Journal is written as "Mol Cell Biol." on the webpage and as "MOLECULAR AND CELLULAR BIOLOGY" in the full text pdf.

What to do in those cases? Include "Molecular and Cellular Biology" in: https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions in sush cases?

The same with

-"The Journal of biological chemistry" e.g.

-"The Journal of cell biology" e.g.

an other cases seen in https://en.wikipedia.org/wiki/Special:RecentChangesLinked/Category:Cite_doi_templates ? Thanks--Saimondo (talk) 16:21, 3 August 2014 (UTC)
 * Actually PubMed lists the journal as "Molecular and cellular biology" in the webpage meta data. A very minor case of GIGO. AManWithNoPlan (talk) 02:31, 4 August 2014 (UTC)


 * Perhaps its worth quoting the University of Chicago Manual of Style (14th ed.) on this matter:
 * "In regular title capitalization, also known as headline style, the first and last words and all nouns, pronouns, adjectives, verbs, adverbs, and subordinating conjunctions (if, because, as, that, etc.) are capitalized. Articles (a, an, the), coordinating conjunctions (and, but, or, for, nor), and prepositions, regardless of length are lowercased unless they are the first or last word of the title or subtitle.  The to in infinitives is also lowercased."
 * On the other hand, it is common in library cataloging following MARC format to capitalize only the initial word, proper nouns, and, if the title begins with an article, that article and the following noun.
 * Wikipedia citations should follow citation style, rather than library cataloging style. In this case, the appropriate form would be "Molecular and Cellular Biology".  The Wikipedia Manual of Style provides much the same advice on the capitalization of titles.  SteveMcCluskey (talk) 18:40, 4 August 2014 (UTC)


 * I am not very familiar with PHP (the language that Citation Bot is coded in), but it would appear that there is a mb_convert_case function:  that can transform a string into title case (i.e., capitalize the first and last words of the title and all nouns, pronouns, adjectives, verbs, adverbs, and subordinating conjunctions).  This function would probably work well for most journal names. Boghog (talk) 19:15, 4 August 2014 (UTC)
 * This should be easy to implement, but I anticipate that some time down the line it will upset someone. Before I implement it, could we establish consensus and file a bot approval request if necessary? Thanks. Martin  (Smith609 – Talk)  08:49, 25 August 2014 (UTC)
 * How about your implement it for adding journal titles, but don't implement it for changing existing entries. Eventually, the list of titles that violate the rules will be built up, and then you can make it is a fix for existing journal titles. AManWithNoPlan (talk) 01:48, 4 September 2014 (UTC)

You are of course right, it´s no error it´s the catalog style NCBI is using. I don´t have the complete overview what capitalization format is obtained by the doi or issn vs pmid queries. But if you use the cite-> templates-> cite journal option here in the edit window and use autofill with the  you get "Molecular and Cellular Biology" if you use the same publications  with autofill you get "Molecular and cellular biology". If capitalization means also harmonization I think few wikipedians would be against it.

Furthermore, as far as I understand https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Titles_of_works the capitalization format like above should be ok (I have the impression that most journals use capitalization for their own names on their homepages/pdfs). Should we ask on the Manual of style talk page to see if there´s a consensus for capitalization? In case someone is interested, here is a recent reply of an email I (re-)sent to NCBI some time ago:

''"...Standard cataloging requires that the first word in the full journal title begins with an upper case letter and remaining words (except for proper nouns) begin with lower case. Journal title abbreviations begin with all upper-case letters. I checked the XML data for several journals and found that each of the title listed in this manner. You can see several examples at the bottom of this document:

''Fact Sheet: Construction of the National Library of Medicine Title Abbreviations http://www.nlm.nih.gov/pubs/factsheets/constructitle.html ''Sincerely, Ellen M. L. ...

''-Original Message-

''Dear NCBI Team, ''in the xml data of a specific article https://www.ncbi.nlm.nih.gov/pubmed/9858585?dopt=Abstract&report=xml&format=text ''the journal name is written "Molecular and cellular biology" and the abbreviation is "Mol Cell Biol.". I think the correct journal name should be "Molecular and Cellular Biology" as written on the journal homepage http://mcb.asm.org/content/19/1/612.long ."'' Saimondo (talk) 17:29, 10 September 2014 (UTC)


 * https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions seems to be being ignored by the latest bot also. AManWithNoPlan (talk) 16:12, 27 September 2015 (UTC)


 * So, the conclusion is

AManWithNoPlan (talk) 20:43, 2 January 2016 (UTC)
 * 1) covert journal names to title case when adding them
 * 2) Add back in exclusions list support https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions


 * I think the solution is to change add_if_new in objects.php like this:

changing

into

AManWithNoPlan (talk) 20:58, 6 August 2016 (UTC)


 * New github pull submitted that applies title case in more locations. AManWithNoPlan (talk) 16:55, 7 September 2017 (UTC)


 * Need to add option $title = mb_convert_case($title, MB_CASE_TITLE, "UTF-8") AManWithNoPlan (talk) 13:22, 10 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:25, 2 October 2017 (UTC)

citing using pmid creates author1 instead of last1
author1 is an alias of last1. This would be a cosmetic fix (in the code) only. – Jonesey95 (talk) 22:55, 31 May 2016 (UTC)
 * Agreed, but since it's such a simple fix it would be a shame not to do it. Also I actively search for "author" when most of the refs are lastn/firstn to edit them for consistency, and that creates false positives. Ihaveacatonmydesk (talk) 08:28, 1 June 2016 (UTC)
 * When splitting an author into last and first, it keeps the original type when setting the last name. Pull request done to switch to last.  https://github.com/ms609/citation-bot/pull/169 AManWithNoPlan (talk) 15:14, 25 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:42, 2 October 2017 (UTC)

URL in the website field instead of the URL field (common newbie error)
The access date that is deleted is not actually shown to humans. Attempt to have bot do this: https://github.com/ms609/citation-bot/pull/172 AManWithNoPlan (talk) 21:37, 25 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:26, 2 October 2017 (UTC)

lowercasing "the" as the first word in a subtitle?
Is it correct for this bot to remove capitalization from the word "the" when it's immediately following a colon as the first word in a subtitle? That's a wordy sentence, and might be confusing, so I'll also ask: is it correct for this bot to do this: https://en.wikipedia.org/w/index.php?title=Initiations_%28Star_Trek%3A_Voyager%29&type=revision&diff=795130102&oldid=795049756 ? —  fourthords  &#124; =Λ= &#124;  15:11, 12 August 2017 (UTC)
 * That is a good question. I edited https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions to make this Star Trek magazine have a capital The.  Generally, a the is not capitalized in the middle of a sentence, but this is a weird case where a colon really is being used more like a period than a colon.   AManWithNoPlan (talk) 13:44, 13 August 2017 (UTC)
 * This is in general true of all words following colons and dashes, not just 'The'. 'A' and "An" are very common, as are many others. The bot should leave the capitalization of all follow-up words alone. Headbomb {t · c · p · b} 14:40, 13 August 2017 (UTC)
 * The convention I've usually seen is that the word following a colon in a complete English sentence is not capitalized (although I think in earlier styles it might have been) but the word following a colon in the title of a publication is capitalized. For instance the mathematics publication database MathSciNet, which aggressively lowercases even words after the first in titles of books (unlike most other bibliographic sources), nevertheless follows this convention. —David Eppstein (talk) 18:23, 13 August 2017 (UTC)
 * There is a style out there which will capitalize the first word after a colon even in a full sentence, but that's a rare one; most of my experience has been the same as David's. --Izno (talk) 19:58, 25 August 2017 (UTC)
 * This might be working in the development version on github. Not yet deployed to wiki land. AManWithNoPlan (talk) 21:37, 12 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:33, 2 October 2017 (UTC)

Creates invalid ISO date

 * This looks like a GIGO error. "2007-08-01" should not be in year. A format like that should be in date. The bot could perhaps ignore this incorrect format, leaving it for a human editor to fix. In this case, the bot did human editors a favor by highlighting an erroneous parameter value. – Jonesey95 (talk) 22:12, 18 August 2017 (UTC)

I guess we agree on this AManWithNoPlan (talk) 19:24, 5 September 2017 (UTC)
 * Have to disagree with this conclusion, something should be done in the code to stop this happening even when there is incorrect usage of fields. Keith D (talk) 21:05, 5 September 2017 (UTC)
 * Garbage in; Garbage out. I will write a patch to detect more than done dash.  That why the original garbage satay put AManWithNoPlan (talk) 00:46, 6 September 2017 (UTC)
 * Need to add this  (this means that if more than one dash is found, then do not change, unless there are dashes next to each other).  Probably change year to date. AManWithNoPlan (talk) 16:20, 13 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:36, 2 October 2017 (UTC)

Bot broke a URL
This edit altered an dash to an ndash in a URL within a  parameter. You need to check that if the page or pages parameter includes an open square bracket nothing is changed before a space or a close square bracket. -- PBS (talk) 16:29, 27 August 2017 (UTC)


 * GIGO. That is a misuse of the template.  I will fix it.  There is no way for the bot to deal with all the ways that templates can be used wrong  AManWithNoPlan (talk) 16:37, 27 August 2017 (UTC)

GIGO "garbage in garbage out" do you meant "Rubbish in rubbish out?" It is no rubbish in to use a url link for a page number.

It is not a misuse of the template is is a misuse of the bot. fix please the bot. I have only had a limited time to sample the bots output. Here are some other problems:
 * 14:07, 27 August 2017‎ broke url
 * Revision as of 13:07, 27 August 2017 broke url

This is something generated by the goggle book tool. While it is not a bug to change dash to ndash the correct thing to do if the parameter is  is to remove the trailing dash not change it to mdash
 * 16:13, 27 August 2017

These should probably not have been touched: --PBS (talk) 22:17, 27 August 2017 (UTC)
 * 16:03, 27 August 2017 questionable
 * 14:32, 27 August 2017 questionable

"There is no way for the bot to deal with all the ways that templates can be used wrong" The template is no being used "wrong" do you need help fixeing the bot? -- PBS (talk) 22:20, 27 August 2017 (UTC)

BTW I am very consented with the string of edits you made to Murder Act 1751 after I raised problem with the bot. Please explain -- PBS (talk) 22:40, 27 August 2017 (UTC)
 * The documentation for the parallel series of templates sfn etc explicitly recommends using a url in its page parameter: see Template:Sfnp. So characterizing this usage as "garbage" seems overly harsh. And Citation bot should clearly detect page parameter formats that it doesn't understand and not break them. I think this is indeed a bug. —David Eppstein (talk) 22:52, 27 August 2017 (UTC)
 * This is definitely a bug. The bot is altering a URL that it shouldn't be touching. – Jonesey95 (talk) 22:57, 27 August 2017 (UTC)
 * citation do not want url's in page numbers. They have an explicit URL parameter for that.  sfn does not have that.  I do not see how the bot could deal with people putting stuff in the wrong places. AManWithNoPlan (talk) 00:32, 28 August 2017 (UTC)
 * [citation needed].I know that is cs2, but cs1|2 share Module:Citation/CS1.  A lot of the documentation at Help:Citation Style 1 applies to cs2.  Particularly, and pertinent to this discussion: this.  There we have an example that shows page externally linked with a url.
 * —Trappist the monk (talk) 01:07, 28 August 2017 (UTC)
 * Interesting. That documentation is poorly organized, but clear. It also says that you should use a template for dashes that you do not wan changed. AManWithNoPlan (talk) 01:20, 28 August 2017 (UTC)
 * of course unless some one is willing to test out change by change new code, there is no reason for anyone to fix this "bug" AManWithNoPlan (talk) 01:22, 28 August 2017 (UTC)
 * it seems to be discouraged https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1/Archive_2 AManWithNoPlan (talk) 01:30, 28 August 2017 (UTC)
 * https://en.wikipedia.org/w/index.php?title=Help:Citation_Style_1&diff=next&oldid=774956610
 * A lot has happened since that discussion from May 2013. Editor Redrose64's point about corrupted COinS metadata for the   keyword has been addressed long since.  The original GoingBatty template implementation now produces:
 * in which you will find:
 * which is slightly off because page numbers should be using page or pages, whichever is appropriate. Use of the correct parameter will render the metadata as:
 * And, please don't remove content from other editor's posts. You made a claim that citation do not want url's in page numbers.  I wanted to know then, and still want to know now, what it is that you believe supports that claim.
 * —Trappist the monk (talk) 09:56, 28 August 2017 (UTC)
 * Added in these two edits in 2011. It would be hard to suggest that's not supported by at least a few people simply due to its age in the documentation. I would guess that phab:T151301 can/will put an end to the practice, since if you're only citing the work once on one page, you can link the page directly in the URL field; and if you're citing it multiple times, you'd rather use the official extension mechanism. --Izno (talk) 03:41, 28 August 2017 (UTC)
 * And, please don't remove content from other editor's posts. You made a claim that citation do not want url's in page numbers.  I wanted to know then, and still want to know now, what it is that you believe supports that claim.
 * —Trappist the monk (talk) 09:56, 28 August 2017 (UTC)
 * Added in these two edits in 2011. It would be hard to suggest that's not supported by at least a few people simply due to its age in the documentation. I would guess that phab:T151301 can/will put an end to the practice, since if you're only citing the work once on one page, you can link the page directly in the URL field; and if you're citing it multiple times, you'd rather use the official extension mechanism. --Izno (talk) 03:41, 28 August 2017 (UTC)

, your claim that the url parameter makes links in pages unnecessary is, simply, wrong. A link in the url parameter will show the link on the title of the work, as a link to the whole work. A link in the page parameter will put the link on the page number, making clear that it is a link to that specific page. There is no way to achieve that effect with any of the url parameters. —David Eppstein (talk) 08:59, 28 August 2017 (UTC)
 * It looks like standard for the citation templates has changed (again -- the documentation has only suggested this for less than half a year). I will concede that now it is okay to have url's in the page number area.  The templates really should have a page-url option for that, but until they do, this seems okay.  This discussion is moot until we get the latest bot git source debugged and amended.  Fixing this in the source should be be easy:  If page_number contains either "[" or "http" then do not change.  AManWithNoPlan (talk) 15:06, 28 August 2017 (UTC)
 * Adding urls to page parameters has been done for years, whether or not it has been documented. -- PBS (talk) 19:59, 29 August 2017 (UTC)


 * Enough discussion of this good/evil practice. The end result is that the bot needs updated.  The existing code in Template.php is:

Should be:

I will have a git pull submitted. AManWithNoPlan (talk) 20:38, 5 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:40, 2 October 2017 (UTC)

Linefeeds
Arxiv often has linefeeds in titles and journal names. Need to strip them out and probably replace with a space. AManWithNoPlan (talk) 04:03, 10 September 2017 (UTC)
 * Something like find ' ' replace ' ' should do the trick. Headbomb {t · c · p · b} 14:53, 12 September 2017 (UTC)
 * I have added code to github to replace  each with a single space (all four are valid depending upon your OS).  Once the dev version is updated, I will test it out.  AManWithNoPlan (talk) 15:12, 12 September 2017 (UTC)
 * it should strip tabs too. Headbomb {t · c · p · b} 15:38, 12 September 2017 (UTC)
 * $v = preg_replace('/(\s\s+|\t|\n)/', ' ', $v);  I think this grabs all of them and all spaces and cuts them down to one space.   AManWithNoPlan (talk) 16:10, 12 September 2017 (UTC)
 * Wouldn't  cover all of that though? Headbomb {t · c · p · b} 16:43, 12 September 2017 (UTC)
 * There you go being right. AManWithNoPlan (talk) 16:52, 12 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:39, 2 October 2017 (UTC)

First parameter gets deleted
lead to this:

or more likely   in   is to blame.
 * This becomes  because the bot deletes the first entry. AManWithNoPlan (talk) 19:30, 19 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:39, 2 October 2017 (UTC)

bot adds |year= when |date= already holds valid date
The article title is funny (I thought you were being funny).. is enough to get the bug. AManWithNoPlan (talk) 13:57, 27 September 2017 (UTC)
 * It is the SICI data that was used. https://github.com/ms609/citation-bot/pull/176  AManWithNoPlan (talk) 01:53, 29 September 2017 (UTC)

resolved in Dev AManWithNoPlan (talk) 18:37, 2 October 2017 (UTC)

Update jstor links
Old links include SICI, they redirect to stable jstor https://www.jstor.org/sici?sici=0003-0279(196101%2F03)81%3A1%3C43%3AWLIMP%3E2.0.CO%3B2-9  Should figure that out and update. AManWithNoPlan (talk) 04:08, 30 September 2017 (UTC) https://github.com/ms609/citation-bot/pull/201 and test later with this: AManWithNoPlan (talk) 04:19, 2 October 2017 (UTC)

resolved in GitHub. AManWithNoPlan (talk) 02:14, 3 October 2017 (UTC)