User talk:Citation bot/Archive 37

weird cite arxiv convertion
Possibly from garbage Pubmed metadata &#32; Headbomb {t · c · p · b} 21:57, 2 October 2023 (UTC)
 * Really weird that PubMed is now tracking arxiv stuff. AManWithNoPlan (talk) 15:14, 3 October 2023 (UTC)

Journal capitalization
The issue here should be to recognize language=rup &#32; Headbomb {t · c · p · b} 22:41, 13 October 2023 (UTC)
 * Thanks for the fix! Super Dromaeosaurus (talk) 22:33, 15 October 2023 (UTC)

Moving Jstor and Worldcat URLs to parameters
From discussions (1, 2, 3) on stopping useless cruft – for example this useless blank archive of a Jstor article – from semi-automated mass archiving, a number of editors have noted their support for a bot to parse Jstor and Worldcat URLs (eg https://www.jstor.org/stable/24432812) for their respective and  parameters where relevant and purge URLs, archive URLs, and archive metadata for CS1 templates.

Is this something that can be done with citation bot? I will note that I'm not saying to purge all URLs – they can be useful if the full text is separately hosted elsewhere – just URLs and archives thereof (almost always useless blank pages) that are duplicative of the generated parameter URLs. Tagging. Ifly6 (talk) 06:19, 22 September 2023 (UTC)


 * The bot got blocked for doing this (although the person who lead the charge on this themselves eventually got banned). The main arguement was that the users of wikipedia are only capable of clicking on title-links, and numbers after the reference as above their IQ level.  Although I would argue that having these as title links is misleading since they they almost never lead to the source, but just a page listing the source.  AManWithNoPlan (talk) 13:09, 22 September 2023 (UTC)
 * That policy feels like insanity. Is it possible to determine whether the Jstor link leads to a full source and the URL (metadata, archive URL, and archive metadata) only if it does not lead to a full source? Worldcat is easier because it never(?) leads thereto. Ifly6 (talk) 14:11, 22 September 2023 (UTC)
 * I feel there's a case to remove links that will never host the full text, like PMID, OCLC, etc... because they mislead the reader into thinking there's a full text available at the end. But that would require an RFC. &#32; Headbomb {t · c · p · b} 03:34, 23 September 2023 (UTC)
 * Is it really the case that we cannot do anything to change this (to me at least) absurdist combination where the following series of events keep occurring:
 * People use Citoid which places Jstor links into cite journal
 * Citation bot comes around and extracts the Jstor ID etc but doesn't remove the URL
 * Some NPC hits ARCHIVE EVERYTHING with the IA Bot check box (eg IA Bot) and now we have a massive pile of archive URL cruft (nb the check box does not actually archive anything)
 * After this rigmarole an editor can now see the result, which is:
 * A main URL that doesn't give you full text
 * A duplicated parameter which renders an identical URL link
 * An archive URL which is a literally blank page
 * A mark up reference which is now 70 per cent longer than it needs to be to do the exact same thing
 * Ifly6 (talk) 15:17, 25 September 2023 (UTC)
 * The bot used to do this until the argument was made that: our users were too stupid to figure out non-title links, and yet so smart that they needed links to scientific journals, since wikipedia was too simple for them. AManWithNoPlan (talk) 16:26, 25 September 2023 (UTC)

Is there really nothing we can do on this without an RFC? Ifly6 (talk) 17:13, 25 September 2023 (UTC)


 * Getting blocked twice for the same thing is probably an existential risk.
 * I think Headbomb makes a good point, removing title-links that don't contain full content and that can be replaced with non-title-links. Sometimes JSTOR has the full content sometimes not, sometimes freely accessible (pre-1923), sometimes not. As for archive URLs, this will depend what is cited, if the content is available in the archive URL. It's context sensitive. I would be careful with an RfC, they can be counter-productive with complex matters. An RfC might codify a minority opinion that bots should not be used at all due to "context sensitive" and the "community" will take care of it, which dooms the whole thing to fantasy land due the reality of the scale.
 * It's possible a bot (this one or another) could start on JSTOR, determine content availability, url-status, and edit accordingly. It might also check archive URLs for possible problems. This is going to be a slow process, and it might run into bot blockers at JSTOR, rate limiting, which further complicates. If true that would leave the "blind" edit option of simply removing all JSTOR links from the title-link as the only viable method, unless someone has another idea how to determine content availability. --  Green  C  20:01, 25 September 2023 (UTC)
 * Some people have deeper access to JSTOR resources than others, depending on where they are. Surely when a JSTOR resource is cited, no-one is seriously suggesting that only open-access ones may be given? Is anyone suggesting that we deprecate ISBNs because some readers might have to buy the actual book? Or have I completely missed the point? --𝕁𝕄𝔽 (talk) 22:57, 25 September 2023 (UTC)
 * Nobody is saying that Jstor should not be cited. The dispute here is whether a link to the Jstor page should be included in the URL parameter. For me this emerges from the really pointless practice of adding the "archive" version of Jstor links so you can get the glory of gazing upon a blank page. Removing the entry would prevent "archive" links from being added. It is a dispute between whether a reference should look like this:
 * Or, by almost inevitable accretion through inaction, like this:
 * The portions at the end after entirely duplicate existing links in the citation and regardless add nothing for the unprivileged reader while clogging up the mark up and making it difficult to do the edit part of "editor". Even if I have Ivy League library access and be able to read all full texts through proxies (eg Penn Libraries), that doesn't mean that linking the proxy page whole (like https://www-jstor-org.wikipedialibrary.idm.oclc.org/) does any good for readers without Penn or Wikipedia library privileges. Ifly6 (talk) 23:37, 25 September 2023 (UTC)
 * The number of Wikipedians who potentially have access to JSTOR sources that are hidden by paywalls may be larger than you think. "Veteran" Wikipedians (I believe the cut-off is 500 life-time edits) can avail themselves of access to JSTOR (and many ohter sources barred to the hoi polloi) via the Wikipedia library. So I think for these relatively "privileged" people giving a link to a page that contains a doi is till useful. I have no problem doing it, also for sources like Cambridge U.P and the like. Ereunetes (talk) 23:37, 25 September 2023 (UTC)
 * If the purpose of adding is for the "average" reader this link does nothing because they will not have a Jstor subscription. If adding it is to help the "average" university student, the link also does nothing because they will have to go through their university proxy. If it is to help the privileged editor with WP:LIBRARY access, it also does nothing because we have to go through a proxy too. The only people it supports are those few who have direct access to Jstor (which ironically includes me via the Federal Reserve). Ifly6 (talk) 23:43, 25 September 2023 (UTC)
 * What do you mean "the link does nothing"? If someone has access to JSTOR, via WP:LIBRARY, their local public library, an academic library, or whatever, seeing that there is in the jstor parameter lets them know that the article is on JSTOR and they will likely have access to it, and once they click on the link they can easily log in via whatever gives them access via whatever proxy, or if they're physically at their library just click the link and access it. The JSTOR link also provides metabibliographic information, a first page preview, and abstract. Plus JSTOR allows independent researchers 100 free articles each month, and if someone so chooses they have the option to buy it à la carte. Anything which helps a reader access a source is useful, and quite often JSTOR is  electronic place of record for a journal.  [Edit: sorry I'm following more closely now, I still think it should be in jstor -- that's why we have that parameter; it does not also belong in url.] Umimmak (talk) 23:49, 25 September 2023 (UTC)
 * What do you mean "the link does nothing"? The link to the native Jstor website in is not the proper one and will not yield the full text unless you have direct Jstor access. If you access it through a proxy, you would have to copy the Jstor ID and paste it in after ../static/. Putting the direct URL in  is not very useful and largely facilitates WP:MEATBOTs crufting up articles with unnecessary mark up pointing to blank archive pages. Ifly6 (talk) 01:13, 26 September 2023 (UTC)
 * OK. Forgive my ignorance; I didn't know about the "jstor=" parameter and will use it in future if the case applies, instead of the "url=" parameter. Would it be possible to enable the Citation bot to change "url=" to "jstor=" if that would be appropriate? Or am I stupid again? Ereunetes (talk) 20:42, 27 September 2023 (UTC)
 * The discussion we are in is whether citation bot should extract Jstor URLs and put them into . Apparently there was an RFD, ban, or something of the sort which has led the maintainer(s) of the bot not being willing to re-enable that previously-present functionality. Ifly6 (talk) 21:54, 27 September 2023 (UTC)
 * The RFC you are looking for is this one.
 * Perhaps the maintainers of the bots should put together an FAQ somewhere about why the bot does some things that it does and some things that it does not with links to appropriate major discussions. Izno (talk) 00:37, 30 September 2023 (UTC)
 * I think it might be possible to effect a change like this if we take it slowly. If we can start with getting consensus that archives of paywall landing pages (like Jstor) should be removed, and access-date in cite book and cite journal (and maybe others) should be removed, we'll have solved almost the entire problem of these kinds of URLs without needing to determine whether or not readers / editors will understand the alternative stable identifiers. Folly Mox (talk) 04:44, 30 September 2023 (UTC)
 * While I agree that those two should be done, it doesn't appear to me to solve the problem of someone driving by to blindly hit the check box and add those archives back in. Ifly6 (talk) 23:28, 30 September 2023 (UTC)
 * Right, the prevention is more difficult than the cure, but if we have consensus to remove archives to paywall landing pages, we could get a bot to do it, and getting consensus to remove would be a step towards consensus against adding. I don't think this is a one-step recipe. Folly Mox (talk) 00:39, 1 October 2023 (UTC)
 * And there's no way to prevent the URLs and prefer custom stable identifiers. Citoid guarantees a valid URL in its output, and works across multiple projects, most of which don't implement custom stable identifiers. We'd have to get every maintainer of every automated referencing script, including VisualEditor, to build in functionality to reach our end goal here, which it's unclear if there's even consensus for in all facets. Folly Mox (talk) 00:43, 1 October 2023 (UTC)
 * Right, the prevention is more difficult than the cure, but if we have consensus to remove archives to paywall landing pages, we could get a bot to do it, and getting consensus to remove would be a step towards consensus against adding. I don't think this is a one-step recipe. Folly Mox (talk) 00:39, 1 October 2023 (UTC)
 * And there's no way to prevent the URLs and prefer custom stable identifiers. Citoid guarantees a valid URL in its output, and works across multiple projects, most of which don't implement custom stable identifiers. We'd have to get every maintainer of every automated referencing script, including VisualEditor, to build in functionality to reach our end goal here, which it's unclear if there's even consensus for in all facets. Folly Mox (talk) 00:43, 1 October 2023 (UTC)

Well that issue is why we're here at Citation bot. Do you think it's actually impossible to get a decision for Citation bot to remove those URLs? A bot to remove those archives would produce even more watchlist events, which people in the discussion below seem to be adamantly against, while also probably being impossible to implement per GreenC's comment above. Ifly6 (talk) 19:36, 1 October 2023 (UTC)


 * I don't know what venue should generate the consensus, but we do need the theoretical underpinnings of a discussion reaching consensus regarding archives of paywall landing pages before a Bot request or BRFA for a new task could be submitted. I wouldn't necessarily frame it as something that Citation bot in particular needs to handle, instead of some other bot, and I wouldn't want it to take place in absence of other constructive edits even though it doesn't violate COSMETICBOT.So, I'd try to frame this bit of the discussion as "archives to paywall landing pages are useless cruft: they don't archive the content and you can't use them to navigate to the content", not "proposal for a one-time bot run to have User:Citation bot remove archives to paywall landing pages in 1,700,000 articles".So no, I don't think it's actually impossible, and I think setting jstor.org to permalive for IABot is also a reasonable first step. Folly Mox (talk) 22:09, 1 October 2023 (UTC)
 * And I do appreciate that the discussion you opened on Wikipedia talk:Link rot is essentially a superset of the discussion I just proposed. Folly Mox (talk) 22:11, 1 October 2023 (UTC)


 * I'm not sure if this has been mentioned before, but just wanted to note that resources in JSTOR: Global Plants have URLS of the form https://plants.jstor.org/stable/10.5555/al.ap.person.bm000000658, and that if a bot naively took any url including a "jstor.org/stable/XXXXX" to turn it into a this would not work; occasionally JSTOR the website gets cited instead of a book/article it is hosting so just bots should be aware of this. Umimmak (talk) 21:20, 5 October 2023 (UTC)

wontfix because people are whiners. AManWithNoPlan (talk) 20:41, 24 October 2023 (UTC)

Bot is not respecting Template:inuse
The bot never respected in use because very often people who use in use will also use the bot to expand citations. &#32; Headbomb {t · c · p · b} 22:42, 13 October 2023 (UTC)
 * Then they can remove the tag and then run the bot. The bot should not cause edit conflicts by interfering when articles have this tag on them. ―Justin ( koa v f ) ❤T☮C☺M☯ 22:47, 13 October 2023 (UTC)
 * The tag is often put to tell others to not edit the article so the bot can make its edits. &#32; Headbomb {t · c · p · b} 09:14, 14 October 2023 (UTC)
 * That's not how it should work: all bots should respect the tag. Other bots do and one should expect it to not edit with the tag on an article. ―Justin ( koa v f ) ❤T☮C☺M☯ 10:31, 14 October 2023 (UTC)

Michael Metcalf
From my discussion page: Hi, I see that you have used citation bot to add dates to references to numismatics.org.uk webpages here. I am not familiar with the bot, so could you explain what the dates mean? The pages seem to be updated regularly.

I think the bot is wrong. Grimes2 (talk) 14:41, 13 October 2023 (UTC)


 * Something (probably upstream in the Zotero libraries) is using meta property="article:published_time" instead of meta property="article: modified_time". Folly Mox (talk) 16:00, 13 October 2023 (UTC)
 * @Grimes2 No.bot 49.237.203.59 (talk) 06:06, 24 October 2023 (UTC)


 * fixed by adding to NO_DATE_WEBSITES array. AManWithNoPlan (talk) 20:35, 24 October 2023 (UTC)

Finna and Elonet are not book refs
2001:14BA:9CE5:8400:20AB:2C62:7318:4F88 (talk) 04:35, 22 October 2023 (UTC)


 * This is the third thread about this behaviour visible on this talkpage, and I'm beginning to wonder why, when editors cite a source to establish the existence of a book it is ever less preferable to include the full publication information of the book, even when the route chosen to establish the book's existence is a website somewhere.I think the root solution here might be additional guidance about writing about books. Like, in an article about a book, just have a section in the appendix called . For articles about authors, put their books in . I don't think this is the right kind of information for inside a citation template inside a pair of ref tags. Folly Mox (talk) 05:00, 22 October 2023 (UTC)
 * I agree with Folly Mox. If the purpose of the citation is to establish the existence of a book, full publication information should always be preferred to a website which says the exact same thing. If I need to establish that Erich S Gruen wrote Last generation of the Roman republic surely the best way to do that would be to give you all the information you would need to find that book in a library and confirm on the cover, title page, and verso for yourself. Ifly6 (talk) 15:17, 23 October 2023 (UTC)


 * And some more:
 * edit 1 (a Finna web page about an image)
 * edit 2 (a Elonet web page about a film)
 * edit 3 (a Finna web page about an image)
 * edit 4 (a Finna database entry about different versions of a book)
 * Probably some more can be found among these causing ref errors, e.g. ,
 * 2001:14BA:9CE5:8400:8CDE:6F36:A6DA:6CE6 (talk) 18:14, 23 October 2023 (UTC)


 * Also, not just  but also   it seems:,  (probably some more). Please stop the bot from changing the citation templates of elonet.fi and finna.fi from "cite web" to "cite book", thank you. 2001:14BA:9CE5:8400:79D9:9129:F234:CDFA (talk) 20:03, 24 October 2023 (UTC)


 * fixed AManWithNoPlan (talk) 20:27, 24 October 2023 (UTC)

Bot adding articles to Category:CS1 errors: missing periodical
At the Kenny Clarke article in the oral history ref, the bot changes "Cite web" to "Cite journal" without changing any other parameters, causing this error message. While checking hidden categories on that page, I discovered that the bot did this in June 2022 and I reproduced the problem just now. Graham87 (talk) 06:50, 23 October 2023 (UTC)


 * fixed with special code for 10.7282 DOIs that have odd meta-data. AManWithNoPlan (talk) 20:33, 24 October 2023 (UTC)
 * fixed for all cases with no work AManWithNoPlan (talk) 20:56, 24 October 2023 (UTC)

web > book template bug
Spinixster  (chat!)  12:53, 30 September 2023 (UTC)

Created {dead link...} with invalid parameter.
The first instance, starting from my edit:
 * (by me) (and 3 more like it) (Not official, though I don't see why.) (Nannyware keeps me from viewing any "archive" websites and/or I didn't have time.)
 * (by Citation bot, Misc citation tidying...) (Made it official, but wrong format) (only got 3 of 4 instances) (disoptimal - should be no space before the closing.
 * (by AnomieBOT, Dating maintenance tags...) (Corrected the 3 changed by Citation bot.)
 * (by AManWithNoPlan, properly flag dead link) (Fixed the 4th instance to match the other 3.)
 * (by AManWithNoPlan), Rescuing 4 sources and tagging 0 as dead.) #IABot (v2.0.9.5)) (disoptimal - field order should be |access-date= |url-status= |archive-url= |archive-date=.)

Config issue
Today I do not see "Expand citations" in my tools menu. I do not know what caused it to disappear.


 * Today I logged off and logged back onto Wikipedia, and Expand citation was back on my Tools menu.

MDPI url ending in /pdf-vor or /pdf
Treat as if /pdf-vor or /pdf isn't there. &#32; Headbomb {t · c · p · b} 01:33, 2 November 2023 (UTC)

Incorrect article title from archive
https://web.archive.org/web/20180312205216/http://www.zdnet.co.kr/news/news_view.asp?artice_id=20140408103154
 * Another example: Special:Diff/1182437641 - https://web.archive.org/web/20150314222820/http://kharkivoda.gov.ua/en/ GoingBatty (talk) 22:26, 29 October 2023 (UTC)

watch out for usurped websites, importing spam text

 * This is a WP:JUDI gambling site. They should be reported to WP:JUDI, so they can be repaired by the automated processes there. There are many 100s of domains, and more being added all the time. It's never ending. If you do match on these titles, it's important to know which domains, so I can usurp them. Currently I find them by looking for spam titles in the wiki code, most often added by Citation bot. If CB were to stop adding these titles, it would be very difficult to find the usurped domains. I've added khyber.org to the list: Special:Diff/1179566502/1183150376 --  Green  C  13:57, 2 November 2023 (UTC)

Cosmetic edit
That is super odd. AManWithNoPlan (talk) 21:24, 2 November 2023 (UTC)
 * The bot adds something and then removes it during the clean-up phase at the end. The net result is that the missing space is added.  AManWithNoPlan (talk) 00:56, 5 November 2023 (UTC)

Bot breaking refs
changing a date to today and breaking a ref in the process. I am sick and tired of Citation bot going around breaking sfn refs willy nilly. DuncanHill (talk) 22:05, 2 November 2023 (UTC)

fixed the bug that was causing extra book clean-ups. But, no idea where that date came from. AManWithNoPlan (talk) 00:55, 5 November 2023 (UTC)

Strip underscore from authorlinks
And similar wikilink parameters. &#32; Headbomb {t · c · p · b} 09:12, 17 October 2023 (UTC)

The following? (this would be compared after removing numbers and dashes): authorlink chapterlink contributorlink editorlink episodelink interviewerlink inventorlink serieslink subjectlink titlelink translatorlink AManWithNoPlan (talk) 13:22, 25 October 2023 (UTC)


 * I think so, yes. &#32; Headbomb {t · c · p · b} 20:39, 25 October 2023 (UTC)
 * No such thing as chapter-link.
 * —Trappist the monk (talk) 20:47, 25 October 2023 (UTC)

Creates title with replacement character
That is the title CrossRef has https://search.crossref.org/?from_ui=yes&q=10.1063%2Fpt.6.4.20200327a AManWithNoPlan (talk) 12:23, 5 November 2023 (UTC)
 * Is there a way Citation bot could check to see if its output matches any characters like that, and decline to fill the value if it does? GIGO is still undesirable behaviour that can be addressed. Folly Mox (talk) 13:42, 5 November 2023 (UTC)
 * A title with partial garbage beats no title at all. Especially since corrupt characters are easily reviewable and rarely last more than a few hours on Wikipedia. &#32; Headbomb {t · c · p · b} 18:39, 5 November 2023 (UTC)
 * How will Citation bot know not to alter an existing title to an incorrect one if it doesn't check its output for obvious errors, like a glyph not used in any language? I don't agree that known incorrect output is superior to declining to return a value, but I understand that's probably a philosophical position: software limitations are safer than software errors. Folly Mox (talk) 19:28, 5 November 2023 (UTC)

Below is a method to detect replacement characters. (Not pretty but works.) I added some inline comments because it's an obscure language

proc isbinary*(s: string): bool {.discardable.} =
 * 1) Return true if string contains a 'replacement' or binary character (black diamond with ? in middle)
 * 2)   Based on: https://unix.stackexchange.com/questions/474709/how-to-grep-for-unicode-in-a-bash-script/474812#474812
 * 3)   Requires a secondary shell layer so UTF-8 works
 * 4)   tcsh -s 'grep -axv ".*" '
 * 1)   tcsh -s 'grep -axv ".*" '

result = false                                                           # default return value let tmpfile = mktempname(GX.ramdir & "isbinary.")                        # Generate a temporary and unique filename "isbinary.xxx" to be located in a ramdisk directory for speed s >* tmpfile                                                             # Write the string to the tempfile let command1 = "tcsh -c 'grep -axv \".*\" \"" & tmpfile & "\" | wc -l'"  # need to use tcsh -c for UTF-8 to work. Bash with similar -c might also work. let c1 = runshellBasic(command1)                                         # run the shell command and capture output to c1  if strip(c1) !~ "^0$":                                                    # If the output is not "0" (only) then it contains a replacement character. result = true removeFile(tmpfile)                                                      # Delete the temp file and return 'result' — Preceding unsigned comment added by GreenC (talk • contribs) 16:32, 5 November 2023 (UTC)


 * Thank you for the explanation. I reported the issue to Crossref.  GoingBatty (talk) 19:26, 5 November 2023 (UTC)

Enhancement request - support more pdfs from major organisations
As you know the "automatic" option in the Visual Editor cite button does not support any pdfs, perhaps because it would be too slow. As this bot is not constrained as much for time it would be great if the bot could expand more pdfs from major organisations. For example the second cite in Agriculture_in_Turkey namely https://www.g20.org/content/dam/gtwenty/gtwenty_new/document/G20_Report_on_Macroeconomic_impacts.pdf Chidgk1 (talk) 09:08, 6 November 2023 (UTC)

wontfix - PDF files, and the bot uses https://en.wikipedia.org/api/rest_v1/#/Citation/getCitation which is outside our control

weird journal overwriting

 * This is the sort of thing that happens when you have a bot whose entire philosophy is "anything the publisher says must be correct and any deviation from that by the person who formatted the citation in the first place must be an error". That aside, the citation was garbage to begin with, as you might have guessed. The doi and editors go to the book "Casimir physics", the arxiv goes to a chapter inside the book (whose author and title are not mentioned), and the pmid and authors go to a different paper "Casimir physics". It looks like the sort of thing that happens when one bot makes a mistake and adds the wrong id to a citation and another bot runs with it to fill in all the details and remove the other details that don't fit. In the long run as humans get tired of chasing after bot mistakes all our citations will become this garbled. —David Eppstein (talk) 07:48, 7 November 2023 (UTC)
 * So much GIGO in that reference. I have cleaned up the reference and removed the bogus PMID, etc. AManWithNoPlan (talk) 14:10, 7 November 2023 (UTC)
 * I didn't even notice the GIGO. That was a bad one. &#32; Headbomb {t · c · p · b} 21:26, 7 November 2023 (UTC)

Could a simple overview be added to the "function summary" please
Although this is a very useful bot I am struggling to understand what it can and cannot do and how it works.

I recently submitted a bug report, and a couple of enhancement requests asking if the bot could be run on pdf files and they were immediately closed because there is an api which is "outside our control".

Could the "function summary" be rewritten with a first paragraph to explain what the bot does in very simple terms and a second para to explain how it works in very simple terms and relegate the technical explanation to later paragraphs?

Also it would be useful if the "won't fix" could be left here for a couple of days for us to read rather than being immediately archived.

Chidgk1 (talk) 12:21, 7 November 2023 (UTC)
 * You can always read the archives. AManWithNoPlan (talk) 14:13, 7 November 2023 (UTC)
 * Lack of PDF support added to description. AManWithNoPlan (talk) 15:45, 7 November 2023 (UTC)
 * documentation fixed AManWithNoPlan (talk) 02:23, 11 November 2023 (UTC)

Caps: Antibiotiki i Khimioterapiia
And it should leave every other 'I' alone too. This is particularly annoying. The only 'I' that needs capitalization are those from Part I, Section I, etc... &#32; Headbomb {t · c · p · b} 22:41, 10 November 2023 (UTC)

Adding CS1|2 templates to bare URLs badly
If the goal is to wrap a URL in a citation template so Internet Archive picks it up, and there's no good translators available for the domain, just set the (URL) so it's obvious the citation is incomplete and needs work. This sort of lazy not-citation is essentially worthless, and encourages people to use scripts for tasks the scripts are not ready to handle, instead of putting in the one minute of work it takes to create a real citation by looking at the source.If Citation bot can't figure out anything from the URL except the title, it should either leave the link alone, set the title to the URL, or tag its change with a template like citation needs human review so this sort of garbage can be tracked.Apologies for the strong language, but if we train a whole generation of editors to rely on pushbutton non-solutions, the maintenance burden of trash citations is going to outpace our capacity and never be fixed. Folly Mox (talk) 12:06, 20 November 2023 (UTC)
 * I hate this, and I'm not sure it has consensus, but even I know it's not a bug. Folly Mox (talk) 15:12, 26 November 2023 (UTC)

Allow 1-click activation of category run on Category:CS1 maint: unflagged free DOI, much like Category:CS1 errors: DOI and the others
Also Category:CS1 errors: dates &#32; Headbomb {t · c · p · b} 12:23, 26 November 2023 (UTC)

last1=(punctuation mark)

 * Another instance of this same error at Special:MobileDiff/1186954603. Folly Mox (talk) 16:36, 26 November 2023 (UTC)

Better IEEE Xplore handling
The garbage human-entered title prevented the full expansion. Wondering if we can't just yeet the title out when converting a cite web to a cite journal/book for ieeexplore links. It's a highly-reliable database. Either way, the website= parameter should be nuked. &#32; Headbomb {t · c · p · b} 00:02, 16 November 2023 (UTC)

Messing with correct citation

 * Last week or the week before, I fixed a lot of instances of this error introduced by Citation bot, and I noticed but did not mention that the "BnF catalog" source seemed to be a major stumbling block. I'm not sure why Citation bot thinks it's a book, but it's common enough and invariably incorrect enough that the source could probably be put on some sort of exclusion list rather than coding more complicated logic. Folly Mox (talk) 12:55, 1 December 2023 (UTC)

Journal capitalization
On Wikipedia, we follow MOS:TITLECAPS. If journals want to style themselves differently, that's is up to them, but we're not bound to follow. &#32; Headbomb {t · c · p · b} 20:54, 2 December 2023 (UTC)
 * The bot isn't even following this policy you've cited. It capitalised the indefinite article "a" which is not allowed by the policy. Super Dromaeosaurus (talk) 21:01, 2 December 2023 (UTC)
 * You capitalize the start of subtitles. &#32; Headbomb {t · c · p · b} 21:10, 2 December 2023 (UTC)
 * That would make sense if it were a subtitle set off by a colon. But here, it follows a comma, which would not usually force capitalization of the next word. —David Eppstein (talk) 21:13, 2 December 2023 (UTC)

Enable 1-click activation of Category:CS1 errors: dates
Per the update description in the category. &#32; Headbomb {t · c · p · b} 23:04, 2 December 2023 (UTC)

AManWithNoPlan (talk) 02:13, 3 December 2023 (UTC)

Leaves journal= parameter in cite arXiv
Not sure how to handle that, in general. AManWithNoPlan (talk) 01:53, 4 December 2023 (UTC)

Replaces – with �
That URL was already broken before Citation bot got to it. Truncating it at the dash glyph might have fixed it. Folly Mox (talk) 01:19, 3 December 2023 (UTC)
 * Before Citation bot got to it, the existing URL redirected to a different Google Books page. Truncating the URL at the dash glyph would not have changed the redirect, and I don't expect Citation bot to try to do that.  However, Citation bot made it worse, and this edit added the article for Category:CS1 errors: invisible characters  GoingBatty (talk) 03:00, 4 December 2023 (UTC)
 * I agree that the edit was a disimprovement, definitely. Both the before and after versions of the gbooks url redirect to the same page for me, but I might be doing it wrong. Folly Mox (talk) 03:27, 4 December 2023 (UTC)
 * @Folly Mox: I agree that the before and after versions (and truncating it at the dash glyph) redirect to the same page. GoingBatty (talk) 04:23, 4 December 2023 (UTC)
 * It definitely looks like a bug. I'm not sure why I hopped in to defend Citation bot here, apart from the fact that I've been showing up here as a critic or reporter of bugs more days than not. I think what should have happened was for me to do nothing. Folly Mox (talk) 04:45, 4 December 2023 (UTC)
 * @Folly Mox: Reporting bugs is important, as is discussing them. I hope the maintainers find our discussion helpful. GoingBatty (talk) 04:54, 4 December 2023 (UTC)

"series" parameter
See also User:Citation_bot/use. &#32; Headbomb {t · c · p · b} 00:18, 15 November 2023 (UTC)
 * The Michigan State University library catalog lists this as having Publications of the Surtees Society rather than the shorter removed series name. No idea whether this would affect the bot's attempted removals. It also needs a publisher; following the same catalog entry, it looks like Andrews & co. and B. Quaritch for the Surtees Society would be accurate. —David Eppstein (talk) 01:29, 15 November 2023 (UTC)

nonsense journal to cite book
Failing to understand "OUP Academic" as "Oxford University Press" (already present, correctly in publisher) is one thing; adding an unsupported journal parameter to cite book is something I thought Citation bot was better than. Folly Mox (talk) 20:28, 11 November 2023 (UTC)
 * Sassy and unconstructive. Struck with apologies. I've been getting back into ReferenceExpander repair, hoping that maybe we'll get those first 2500 diffs checked and fixed by the end of the year. There are another ~3000 we've barely started on, and I feel scared and hurt when I see high volume citation scripts making errors that could be avoided by careful output checking. My own error rate is probably higher. Folly Mox (talk) 13:19, 12 November 2023 (UTC)

page generated un error
and are both of the new problematic "article number" type. issue is wrong, and page is less than ideal, but the best the CS1 and 2 have for us at the moment. The journals clearly state to not use issue for these in the "how to cite" areas. AManWithNoPlan (talk) 20:54, 7 November 2023 (UTC)
 * Umm, nope, supports article-number:
 * —Trappist the monk (talk) 21:02, 7 November 2023 (UTC)
 * I will look into adding support for this parameter. Adding is easy, but dealing with all the edge cases (removing matching pages, etc), will require some work.  AManWithNoPlan (talk) 22:11, 29 November 2023 (UTC)
 * —Trappist the monk (talk) 21:02, 7 November 2023 (UTC)
 * I will look into adding support for this parameter. Adding is easy, but dealing with all the edge cases (removing matching pages, etc), will require some work.  AManWithNoPlan (talk) 22:11, 29 November 2023 (UTC)

Use Project MUSE book parameter instead of adding URL

 * If CS1|2 has a supported ID, it should be used. If there is a template for the ID, it could also be used, although I am personally against URL templates because they create link rot - archive bots have to be programmed to support them, but there are so many thousands of different templates it is impractical to provide automated support. And in this case, the template is more characters than simply using the plain URL, which all standard tools can recognize and support. -- Green  C  17:35, 26 October 2023 (UTC)

cosmetic feature request suggestion
Maybe 5 years ago, IABot had a bug that added a "#" to the end of every archive URL, and sometimes the source URL. The bug is long fixed, and WaybackMedic has been removing the errant #'s, but it's a cosmetic edit that can only be done when making another edit to the page, so it's been a long process. There are a lot of them. An example: Special:Diff/1183493290/1185983128 (second change). My code below if interested, no edge cases, simply removing any trailing # from the URLs. It won't break the archive URL.

# Fix trailing # in |url and |archive-url added by IABot 2.0 beta10

psplit(GX.articlework, GX.cite2, p): if isarg("archive-url", "value", p.field[i]) and isarg("url", "value", p.field[i]): archiveurl = getarg("archive-url", "clean", p.field[i]) sourceurl = getarg("url", "clean", p.field[i]) j = 0 if archiveurl ~ "[#]$": inc(j) sub("[#]$", "", archiveurl) p.field[i] = replacearg(p.field[i], "archive-url", archiveurl, "cosmetic1.1") if sourceurl ~ "[#]$": inc(j) sub("[#]$", "", sourceurl) p.field[i] = replacearg(p.field[i], "url", sourceurl, "cosmetic1.2") if j > 0: p.ok += inclog("cosmetic1.1", GX.esformat, Project.logiats, &"{archiveurl} remove trailing #") psplit iterates over every cite template which are held in p.field[i] Green C  17:35, 20 November 2023 (UTC)


 * AManWithNoPlan (talk) 17:27, 4 December 2023 (UTC)

Surtees Society
Still needs fixed. Fix did not work. AManWithNoPlan (talk) 00:41, 5 December 2023 (UTC)


 * AManWithNoPlan (talk) 17:12, 5 December 2023 (UTC)

10.22323 is open access
Covers both Proceedings of Science and Journal of Science Communication, from SISSA. The other SISSA journals have different prefixes. &#32; Headbomb {t · c · p · b} 02:25, 5 December 2023 (UTC)
 * AManWithNoPlan (talk) 18:07, 5 December 2023 (UTC)

10.15347 is free to read
WikiJournals &#32; Headbomb {t · c · p · b} 15:36, 10 December 2023 (UTC)


 * AManWithNoPlan (talk) 21:03, 10 December 2023 (UTC)

Adds doi-access=free for broken DOI
Unfortunately this journal is not preserved so there are no archived copies either. Nemo 11:19, 3 December 2023 (UTC)
 * For the cases where the DOI used to provide a gratis copy but no longer does, see . Nemo 11:37, 3 December 2023 (UTC)
 * That the DOI is broken is a separate issue than it's free-to-read status. Once repaired, the DOI will be free. &#32; Headbomb {t · c · p · b} 12:12, 3 December 2023 (UTC)
 * And how do you know that? Nemo 13:31, 3 December 2023 (UTC)
 * It's originally from Medknow. All Medknow journals/DOIs are open access. &#32; Headbomb {t · c · p · b} 13:54, 3 December 2023 (UTC)
 * Or were. Now that they've been migrated, anything could happen. This journal has a nonfree license so it could vanish unless someone archives it. If all Medknow DOIs are broken right now, I agree it's likely they'll be fixed within a few months by LWW, but in the meanwhile they're not a suitable link target so it makes no sense to add doi-access=true. Nemo 14:33, 3 December 2023 (UTC)
 * Actually, not all Medknow DOIs are broken, for example The Journal of Indian Prosthodontic Society has functioning DOIs issued by Springer, like 10.1007/s13191-013-0262-x, for 2010–2014. (Didn't check the rest of the archive.) Have you sampled the DOIs under non-Springer prefix to see how many are working? Nemo 14:47, 3 December 2023 (UTC)
 * "This journal has a nonfree license" CC BY-NC-SA is a free-to-read license. &#32; Headbomb {t · c · p · b} 15:12, 3 December 2023 (UTC)
 * "Actually, not all Medknow DOIs are broken" I compliment you on finding one that actually works. AManWithNoPlan (talk) 22:09, 3 December 2023 (UTC)
 * This just blew up because of this https://en.wikipedia.org/wiki/Category:CS1_maint:_DOI_inactive_as_of_December_2023 AManWithNoPlan (talk) 18:53, 4 December 2023 (UTC)
 * I personally patrol this page and report ALL bad DOIs. Many of them point to the wrong place since the journal has been purchased. Or they are data DOIs that are not part of crossref, so who knows.  Or they are MedDontKnow.  AManWithNoPlan (talk) 19:19, 4 December 2023 (UTC)
 * I have no idea what a "free-to-read license" is. A free license is a well-defined concept. A "free-to-read" source is an English-Wikipedia specific moving concept vaguely defined at Access indicators for url-holding parameters. Mixing the two expressions serves no purpose. Nemo 22:36, 4 December 2023 (UTC)
 * I think that the idea of open-source journals that you cannot find is funny, but I do think that keeping the DOIs in the articles is good, since you can sometimes google them and find a copy online. AManWithNoPlan (talk) 19:16, 5 December 2023 (UTC)
 * Keeping the DOI is useful, making it auto-link less so. Nemo 09:20, 8 December 2023 (UTC)
 * If the bot thinks the DOI works, then it will not add the free. AManWithNoPlan (talk) 14:23, 8 December 2023 (UTC)
 * That's an issue for the template to handle, not a reason to not flag things that should be flagged. And the template disables automatic linking via doi-broken-date. &#32; Headbomb {t · c · p · b} 15:01, 8 December 2023 (UTC)
 * Again, we can't know whether the DOI provides a free-to-read copy when we don't even know where the copy is supposed to be. (Yes I know we were discussing this elsewhere, I'm in a hurry now.) But good the autolinking is disabled by the broken-doi parameter; the green lock should be as well. Nemo 15:09, 8 December 2023 (UTC)
 * "we can't know whether the DOI provides a free-to-read copy"
 * Yes we can. &#32; Headbomb {t · c · p · b} 15:33, 8 December 2023 (UTC)
 * How? Nemo 22:08, 10 December 2023 (UTC)
 * Look at the registrant. Is the registrant a fully open access publisher? If yes, then yes. &#32; Headbomb {t · c · p · b} 00:06, 11 December 2023 (UTC)

unsupported parameters when changing template type to cite document
This was my fix: changing back to cite web, adding the url of the source, and an unrelated fix to publisher. I'm not sure this is really Citation bot's fault, or if maybe the parameter set supported by cite document ought be expanded to allow for more stable identifiers. Pinging as the template maintainer, to see if they have input. Folly Mox (talk) 19:11, 9 December 2023 (UTC)
 * Yeah, converting to  is not going to work when url, citeseerx, and s2cid have assigned values.  s2cid is excluded from  because links to readable copies of the source from that identifier are hit-or-miss at best (recall the plethora of complaints about the bot adding s2cid that have been voiced on this talk page).  citeseerx is excluded because we have.
 * Because the original template had 10.1.1.42.3374, an alternate fix might be:
 * is a 'last resort' sort of template when absolutely none of the other cs1|2 templates apply. The bot should avoid using  because, almost always, there is a better choice.
 * —Trappist the monk (talk) 19:47, 9 December 2023 (UTC)
 * is a 'last resort' sort of template when absolutely none of the other cs1|2 templates apply. The bot should avoid using  because, almost always, there is a better choice.
 * —Trappist the monk (talk) 19:47, 9 December 2023 (UTC)
 * —Trappist the monk (talk) 19:47, 9 December 2023 (UTC)

10.1074 and 10.1194 are open access
&#32; Headbomb {t · c · p · b} 01:20, 11 December 2023 (UTC)

AManWithNoPlan (talk) 13:27, 11 December 2023 (UTC)

One cache to rule them all.
Note to me for when I have time. AManWithNoPlan (talk) 16:26, 10 December 2023 (UTC)

Incorrect title
Here's another edit in Gujarati. — Preceding unsigned comment added by GoingBatty (talk • contribs) 20:11, 12 December 2023 (UTC)

Fails to remove invisible character
The character is &amp;#8203; (zero width space). &#32; Headbomb {t · c · p · b} 00:07, 13 December 2023 (UTC)

After inspection, all of 10.5210 are free access, not just 10.5210/fm
&#32; Headbomb {t · c · p · b} 01:22, 15 December 2023 (UTC)
 * AManWithNoPlan (talk) 15:23, 15 December 2023 (UTC)

Specifying name list style for newly-added name entries
There is a pull request that allows specifying name list style for newly-added name entries: https://github.com/ms609/citation-bot/pull/4236

It adds an option to already existing style of first1/last1, first2,last2, etc.

This pull request introduces the following functionality. If a page contains template, then the bot will use |vauthors= and |veditors= attributes rather than firstN/lastN and editor-firstN/editor-lastN when adding name entries for a citation template if the names were not specified in this template. This is similar to template when the bot uses date format as specified on the page. To reproduce this behaviour, edit a page on Wikipedia, add template (or  ), delete author names (firstN/lastN) and run the bot. It will fill the names as vauthors. Maxim Masiutin (talk) 16:48, 7 December 2023 (UTC)
 * Why does exist?  Was there any discussion that brought it into existence?  cs1|2 doesn't know anything about that template but will understand .  Why create a new otherwise non-functional template?
 * —Trappist the monk (talk) 18:26, 7 December 2023 (UTC)
 * I agree, we can use . Should we use  ? If yes, I will update the pull request. Anyway,  is not currently supported by the Citations Bot.Maxim Masiutin (talk) 18:30, 7 December 2023 (UTC)
 * However the templates and  are different.  controls how the names are displayed during the render, whereas  does not affect the rendering but is a hint on whether the templates should use firs/last or vauthors attribute, in analogy to  which also does not control the output but hints how the dates should be specified in the source. This replies your question on why  exist and how it is different from  . Maxim Masiutin (talk) 15:33, 9 December 2023 (UTC)
 * You are mistaken. cs1|2 uses  and  to control date formatting when cs1|2 templates are rendered.  See  for example.  I see no reason to keep.
 * —Trappist the monk (talk) 15:41, 9 December 2023 (UTC)
 * @Trappist the monk thank you for letting me know! Why then there are separate templates for use dmy dates if this can be solved by " "? That is the same question you asked me about the name list style.
 * Anyway, my proposal is not about a particular template but about the functionality of the bot to adhere to the name list style specified for the page. My pull request can be adjusted to any template, and we need a consensus. Maxim Masiutin (talk) 16:16, 9 December 2023 (UTC)
 * The templates came first (January 2009).  Development of Module:Citation (the predecessor to Module:Citation/CS1) began August 2012.  Auto date formatting was added to Module:Citation/CS1 April 2019.  Support for  was added August 2023.   applies only to cs1|2 templates but the  templates apply to both the article body and to article referencing (regardless of how referencing is implemented).
 * —Trappist the monk (talk) 16:37, 9 December 2023 (UTC)
 * My initial proposal for discussion was on a pull request that allows specifying name list style for newly-added name entries: https://github.com/ms609/citation-bot/pull/4236
 * It adds an option of specifying vauthors to already existing style of first1/last1, first2,last2, etc.
 * Is my understanding correct that you support the feature based on but not on  so that this template should never be used at all.
 * If you support the feature based on, I will modify the pull request. Maxim Masiutin (talk) 21:16, 11 December 2023 (UTC)
 * I am indifferent about whether or not Citation bot applies Vancouver style to cs1|2 templates. From the number of participants in this discussion it would seem that the response to the proposal is a resounding 'meh'.  If the bot is going to apply Vancouver style based on some sort of flag template, that template should be  because that template has functionality beyond being a simple flag template.
 * —Trappist the monk (talk) 00:13, 12 December 2023 (UTC)
 * I updated the pull request and I hope that the maintainers will accept it. Maxim Masiutin (talk) 22:00, 13 December 2023 (UTC)
 * Thank you for your guidance! I updated the pull request to support and the mainteners of the citations bot accepted this pull request, making also necessary adjustments. So if you now take the citations bot from Github, it will support this feature. Thank you again for your feedback and guidance. Maxim Masiutin (talk) 04:41, 15 December 2023 (UTC)

Expand non-templated refs
Would it be possible to expand from non-templated reference, as long as title would be exactly the same  which already exists for the URL specified as if the bot would try to expand the bare URL (as long as there is no other content in the ref)? Jonatan Svensson Glad (talk) 17:16, 24 July 2023 (UTC)


 * Example here, I had to remove the brackets and the already provided title prior to running the bot. The outcome provided the exact same title as was already present prior to me doing the removal, causing a lot of manual labor in order to get the bot to attempt to expand the citation. Jonatan Svensson Glad (talk) 17:19, 24 July 2023 (UTC)


 * How close should the titles have to be? Also, it seems that from my experience, the title is often some mix of the title and journal and authors.  AManWithNoPlan (talk) 20:08, 14 August 2023 (UTC)
 * Well, a first start could be exact "only-title" match inside square brackets (with only a preceding period/dot inside or outside the brackets being the difference). To later build upon with more possibilities... Jonatan Svensson Glad (talk) 21:01, 14 August 2023 (UTC)

arxiv is not a journal
If you run the bot again, then it does clean up. I will look at having it not take two times. AManWithNoPlan (talk) 15:10, 9 December 2023 (UTC)

Bug? The bot should not replace first/last to first1/last1 when there is just one author
According to Help:Citation Style, An author may be cited using separate parameters for the author's surname and given name by using

However, the bot replaces |last= and |first= to  |last1= and |first1=  even when there is just one author, which is contrary to the description of the CS1 Citation Style.

The bot should probably already not replace them back, but it should definitely avoid changing that in the future. Also, when there were no authors specified, and there is a single author, the bot should use |last= and |first=

If you agree with that, I can try to submit a pull request. Maxim Masiutin (talk) 15:38, 9 December 2023 (UTC)


 * Could you give en example of where the bot changed last to last1, when there is not second author. AManWithNoPlan (talk) 16:34, 9 December 2023 (UTC)
 * wontfix, since the complexity of going back and changing them will just make the bot's author handling that much more insane, and it is already complicated enough. AManWithNoPlan (talk) 15:26, 15 December 2023 (UTC)

Use of template "ODNB"
Citation bot changed one of the source descriptions in the article James Hamilton (English Army officer) from:

""

to:

"" I wondered why. I read up on Template:ODNB. It says it is a wrapper around Template:Cite encyclopedia. Well, perhaps I should not have used "Cite web" but "Cite encyclopedia" and Citation bot should probably have corrected me to:

""

However, I do not understand why we should be forced to use a wrapper around Cite encyclopedia rather than the original. I thought the use of the ODBC template was voluntary and not obligatory. With thanks and best regards Johannes Schade (talk) 13:10, 19 November 2023 (UTC)

treat #invoke:cite foo | as cite foo
Note, it shouldn't remove the extra pipe. &#32; Headbomb {t · c · p · b} 18:54, 17 December 2023 (UTC)
 * I have had this on my to-do list for a while. Those huge pages are rare, but seem to be the ones with the insane number of citations that need fixed. AManWithNoPlan (talk) 13:39, 18 December 2023 (UTC)
 * For those wondering why, does the same as  . AManWithNoPlan (talk) 13:43, 18 December 2023 (UTC)
 * My tasks : add tests, change wikiname function, and make sure extra pipe does not get removed, since in normal templates that is an error. AManWithNoPlan (talk) 13:45, 18 December 2023 (UTC)

When databases collide
This edit changed a proceedings title from the version given by DBLP ("Proceedings of the 22nd Annual European Symposium on Algorithms (ESA 2014), Wroclaw, Poland, September 8–10, 2014") to a much more concise version from another source ("Algorithms - ESA 2014"), maybe the publisher or maybe MathSciNet (both list it that way). Note that the actual publisher page for the full proceedings lists it has having the more detailed title "Algorithms - ESA 2014: 22th Annual European Symposium, Wrocław, Poland, September 8-10, 2014. Proceedings". The DBLP title is more or less what you get if you put that into a more intelligible order. Curiously, the bot left the DBLP title in place for the other citation it touched, from WG '92. I think that the DBLP version is better and that this level of change (not the correction of any actual error in a citation) constitutes WP:CITEVAR. Please stop. —David Eppstein (talk) 06:46, 30 October 2023 (UTC)

Convesion to mathml conflices with the math:Extension
The Title is CrossRef is "Measurement of lepton universality parameters in \nB<\/mml:mi>+<\/mml:mo><\/mml:msup>\u2192<\/mml:mo>K<\/mml:mi>+<\/mml:mo><\/mml:msup>\u2113<\/mml:mo>+<\/mml:mo><\/mml:msup>\u2113<\/mml:mo>\u2212<\/mml:mo><\/mml:msup><\/mml:math>\n and \n<mml:mi>B<\/mml:mi><mml:mn>0<\/mml:mn><\/mml:msup><mml:mo stretchy=\"false\">\u2192<\/mml:mo><mml:msup><mml:mi>K<\/mml:mi><mml:mrow><mml:mo>*<\/mml:mo><mml:mn>0<\/mml:mn><\/mml:mrow><\/mml:msup><mml:msup><mml:mo>\u2113<\/mml:mo><mml:mo>+<\/mml:mo><\/mml:msup><mml:msup><mml:mo>\u2113<\/mml:mo><mml:mo>\u2212<\/mml:mo><\/mml:msup><\/mml:math>\n decays" which makes it difficult to clean up. I tags that are not used are the annoying ones. https://github.com/ms609/citation-bot/commit/36648e552b4bf9b4f1e7ff1c88383701e79c95e0 AManWithNoPlan (talk) 21:02, 10 December 2023 (UTC)
 * Why not just like, not alter an existing title if the one pulled from Crossref is substantially different? or contains difficult markup? Existing, human-generated titles are more likely to be accurate: database titles often have non-compliant casing, poor OCR, and incompatible markup. No fault against adding a title where none exists, but changing them to match Crossref is problematic. Folly Mox (talk) 21:46, 10 December 2023 (UTC)
 * The problem that a DOI was added to an arXiv, and with the update to the published version then the published title is used instead. Clearly a rare event, with the new title being a problem, but I will look into actually parsing the math.  These few math tags that are ignore pre-date my work on the bot, and were flags as someday wish list.  AManWithNoPlan (talk) 21:53, 10 December 2023 (UTC)
 * Its going to be tricky parsing the mathml, and converting it back to Wikipedia format LaTeX. What might work is to simply enclose the foreign tags inside a . There is another citation in the same revision (actually the same paper) where this has been done. Probably anything with mathematics in it is going to need human attention. Some sort of tracking for these occurrences could be useful. --Salix alba (talk): 12:15, 12 December 2023 (UTC)
 * "Going to need human attention" implies "bot should not think it knows how to change the title". It should give up and not produce garbage.
 * In this instance, arXiv actually has a usable title: the command line
 * produces
 * which requires only changing the dollar signs to math tags to render correctly. The other thing with the mathml is unfit for human consumption and not a useful start to producing a readable title. —David Eppstein (talk) 18:52, 12 December 2023 (UTC)
 * doi.org titles are generally much worse quality than crossref api. AManWithNoPlan (talk) 21:05, 12 December 2023 (UTC)
 * Set them to be wrapped in nowiki tags. AManWithNoPlan (talk) 15:37, 15 December 2023 (UTC)
 * doi.org titles are generally much worse quality than crossref api. AManWithNoPlan (talk) 21:05, 12 December 2023 (UTC)
 * Set them to be wrapped in nowiki tags. AManWithNoPlan (talk) 15:37, 15 December 2023 (UTC)

invoke other template
Bot rebooted to make sure no running jobs continue to use old code. AManWithNoPlan (talk) 15:31, 21 December 2023 (UTC)

Fails to fix date
Date/year/access-date/archive-date/etc.

AManWithNoPlan (talk) 00:52, 22 December 2023 (UTC)

please link to the new Google books web pages
This edit changed links that consistently lead to the new Google books web pages to ones that do not. 50.47.144.129 (talk) 19:49, 30 October 2023 (UTC)
 * Good question. Right now wikipedia prefers https://books.google.com/books?id=fp9wrkMYHvMC but should this be swapped to https://www.google.com/books/edition/_/fp9wrkMYHvMC AManWithNoPlan (talk) 15:41, 7 November 2023 (UTC)

Replace hyphen-like with hyphen in author names
The culprit is U+2010 : HYPHEN, which should be replaced with the standard U+002D : HYPHEN-MINUS. &#32; Headbomb {t · c · p · b} 03:56, 21 December 2023 (UTC)

Repair url=www...
i.e. change url=www.

to

url=https://www.

(same for chapter-url, archive-url, etc...)

&#32; Headbomb {t · c · p · b} 15:57, 16 December 2023 (UTC)


 * How common is this? AManWithNoPlan (talk) 01:53, 20 December 2023 (UTC)
 * Possibly at lot, I haven't really checked, but a quick search shows this . There's lot of other templates polluting this search, but from the 100 first results, it's at least an issue in, so around 1% of 1500? 15 articles ish?
 * It's more an issue of those getting normally fixed fairly easily by AWB runs or the like, which could be also done by this bot. &#32; Headbomb {t · c · p · b} 02:03, 20 December 2023 (UTC)
 * Note to self, look at ALL_URL_TYPES array AManWithNoPlan (talk) 23:08, 21 December 2023 (UTC)
 * @Headbomb: I have an AWB bot that goes through Category:CS1 errors: URL that I run a few times a month to fix issues like this, including the example you provided in this edit. The category only has 5,176 articles out of  articles, and most of those require manual intervention, so the commonality is well below 1%.  Your search captures false positives, such as url in infoboxes (which should probably use URL) and URLs that contain "url=" in the middle.  This modified search finds no articles to be fixed.  However, if you find patterns in the category that bots can fix, please let me know.  GoingBatty (talk) 23:11, 22 December 2023 (UTC)
 * @Headbomb: ...and I just manually used AWB to fix the drafts found in the category. GoingBatty (talk) 23:22, 22 December 2023 (UTC)
 * Will only work if url starts with www. since I recognize that it might not actually be a url. Now off to play with grandkiddo.  Happy advent to all. AManWithNoPlan (talk) 23:35, 22 December 2023 (UTC)

Oversimplification of title

 * Does Citation bot have consensus to be making changes to existing, human-added titles based solely on the metadata it scrapes? This doesn't seem like a good outcome most of the time. Folly Mox (talk) 22:21, 23 November 2023 (UTC)

There the bot is right though. The title is "Graph Drawing". As Springer themselves say, the suggested way to cite this is "Eppstein, D. (2009). Isometric Diamond Subgraphs. In: Tollis, I.G., Patrignani, M. (eds) Graph Drawing. GD 2008. Lecture Notes in Computer Science, vol 5417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00219-9_37"

"16th International Symposium...." is the expanded subtitle of GD 2008. One could replace it with "Graph Drawing: 16th International Symposium..." instead of ""Graph Drawing. GD 2008."

But the word "Proceedings" is nowhere in there, and shouldn't be. &#32; Headbomb {t · c · p · b} 00:42, 24 November 2023 (UTC)


 * The title is not "Graph Drawing". The title suggested at the top of the publisher web page for the individual doi is "International Symposium on Graph Drawing, GD 2008: Graph Drawing". The title given on the landing page for the book doi is "Graph Drawing 16th International Symposium, GD 2008, Heraklion, Crete, Greece, September 21-24, 2008, Revised Papers". The title printed on the cover of the book is similar but with line breaks replacing more of the punctuation. The title given in DBLP  is almost the same, "Graph Drawing, 16th International Symposium, GD 2008, Heraklion, Crete, Greece, September 21-24, 2008. Revised Papers". The title given in zbMATH  is again almost the same, "Graph drawing. 16th international symposium, GD 2008, Heraklion, Crete, Greece, September 21–24, 2008. Revised papers". The title given in MathSciNet  is "Graph drawing. Revised papers from the 16th International Symposium (GD 2008) held in Heraklion, September 2008".
 * All of these are vastly preferable to "Graph Drawing" because they actually identify the precise volume that the work in question comes from, which "Graph Drawing" alone does not. Their preferability should be obvious to anyone who puts actual thought into what citations are for rather than thinking of them as mechanical reproductions of flawed databases. It is exactly that unthinking "we must do what our database of publisher titles says even when it is stupid and uninformative" attitude that I am objecting to here and will continue to strongly object to on individual articles where this attitude translates into disimprovements.
 * As well, the bot dropped the wikilink on the title into the bit bucket, when it would have been preferable to keep it or move it to a title-link parameter. —David Eppstein (talk) 01:43, 24 November 2023 (UTC)
 * I keep seeing more of these on my watchlist, and have begun completely blocking citation bot from the affected articles. It won't take much more of this continued damage for me to switch to completely blocking citation bot from all articles that I edit. —David Eppstein (talk) 22:54, 24 November 2023 (UTC)
 * I'm not there yet, but I did recently turn off the "hide bot edits" watchlist toggle for the first time in a decade or so because of this script. I don't think Citation bot is a bad tool – despite my accumulating complaints on this talk page – but it's not better than a human: just faster.I do note that the BRFA that supported Citation bot adding missing parameters (Bots/Requests for approval/DOI bot 2, 2008) specifically says This seems wise, and I'm wondering when the behaviour was changed, and where the consensus for the change arose. Folly Mox (talk) 23:26, 24 November 2023 (UTC)
 * If a citation includes a  parameter or a wikilink in the   itself, that seems like a pretty good sign that a human took the trouble to get the information right. A bot shouldn't override that. XOR&#39;easter (talk) 18:57, 25 November 2023 (UTC)
 * This is a subgenre of the issue: existing parameters should be known before an edit is made. If title-link is present, title should not be altered outside of punctuation changes. If periodical (or one of its aliases) is present, the wrapper template should not be changed to cite book. If adding chapter, and journal or issue is present, the wrapper template should be changed to cite conference rather than cite book. If none of title and chapter match the existing title (delta punctuation), there's a mismatch between the database record and the work intending to be cited. Folly Mox (talk) 20:54, 25 November 2023 (UTC)

Ok, after seeing this keep going and going with no effort to fix or address the problem, I am going to start adding to all new articles I create, instead of merely the ones where I see this happening. —David Eppstein (talk) 07:55, 3 December 2023 (UTC)
 * If this goes on for too much longer, the next step will be to ask for a full block of the bot. This continued non-response to this problem is unacceptable. —David Eppstein (talk) 01:11, 4 December 2023 (UTC)
 * I have added "graph drawing" to the rejection list. AManWithNoPlan (talk) 01:38, 4 December 2023 (UTC)
 * This applies to all Springer LNCS proceedings, not just that one. —David Eppstein (talk) 19:15, 5 December 2023 (UTC)
 * https://github.com/ms609/citation-bot/commit/6d644b3bbd7fa038c174e8977cb1ad3e09a60ba7 AManWithNoPlan (talk) 19:51, 5 December 2023 (UTC)

More date format repairs
April-May 1995 to April–May 1995

December 7 2023 to December 7, 2023

AManWithNoPlan (talk) 20:57, 23 December 2023 (UTC)

More invisible character cleanup
Unicode is only of the most useful tools ever invented that is also pure evil. AManWithNoPlan (talk) 23:36, 22 December 2023 (UTC)

Forced "editor" parameter changes
It is just really dandy that the publisher reports that information as the editors. https://api.crossref.org/works/10.1093/gmo/9781561592630.article.48611 AManWithNoPlan (talk) 21:31, 25 December 2023 (UTC)
 * The editor of Grove is Deane Root; the subject editors are other people as well. Grove is clearly inconsistent on how they label the revisers; if you use their citation generator, it just lists them all as authors, yet they note that "James Holmes, revised by Anthony Tommasini and Arlys McDonald" at the article's top. I don't like to list all as authors, since one can easily look at the past revision and see that the article has a clear primary author, whose text was either slightly amended are added to. Readers are expecting an "editor" in a citation to be on in the traditional sense, which these people are certainly not.  Aza24  (talk)   00:45, 26 December 2023 (UTC)
 * It is just really dandy that a bot, whose raison d'etre is to clean up citations on Wikipedia that humans have messed up by cramming metadata into the wrong fields, puts all its trust into citations messed up in exactly the same way on other sites, and uses that messed-up data to replace better data in cases when the human editors here have already put care into getting it right. —David Eppstein (talk) 00:57, 26 December 2023 (UTC)

Christian Science Monitor is not an academic journal
Isn't and probably The Christian Science Monitor a better choice? At The Christian Science Monitor we describe the organization as a 'nonprofit news organization that publishes daily articles both in electronic format and a weekly print edition' originally established 'as a daily newspaper'.

—Trappist the monk (talk) 16:14, 26 December 2023 (UTC)
 * Yes, I see now they don't have issue numbers. Fixed. Folly Mox (talk) 17:19, 26 December 2023 (UTC)


 * Swapped to cite news. AManWithNoPlan (talk) 17:33, 26 December 2023 (UTC)

Bot down?
Seems to not be working ATM. &#32; Headbomb {t · c · p · b} 20:14, 5 January 2024 (UTC)


 * Also experiencing this - the main page loads, but any attempt to run the bot on a specific page results in an HTTP ERROR 500 message. —Ganesha811 (talk) 23:22, 5 January 2024 (UTC)


 * Rebooted and AManWithNoPlan (talk) 02:34, 6 January 2024 (UTC)

Enable 1-click activation of Category:CS1 maint: date format
Would be useful to clear most of that category. &#32; Headbomb {t · c · p · b} 06:30, 5 January 2024 (UTC) Same for &#32; Headbomb {t · c · p · b} 07:09, 5 January 2024 (UTC)
 * Category:CS1 errors: extra text: edition‎
 * Category:CS1 errors: extra text: issue‎
 * Category:CS1 errors: extra text: pages‎
 * Category:CS1 errors: extra text: volume‎
 * Category:CS1 errors: chapter ignored
 * fixed AManWithNoPlan (talk) 15:21, 5 January 2024 (UTC)
 * Nope, not fixed. &#32; Headbomb {t · c · p · b} 06:10, 6 January 2024 (UTC)
 * I did not see the one in the title. AManWithNoPlan (talk) 15:28, 6 January 2024 (UTC)