User talk:GreenC bot/Archive 4

Balearic Islands
Just to let you know, this bot’s edit of the population info in the infobox of the Balearic Islands article made it so that article didn’t display any population info, so I reverted it. Blaylockjam10 (talk) 19:37, 18 May 2019 (UTC)
 * I have added a tracking category to Spain metadata Wikidata so that these errors can be caught and fixed. The same sort of category should be used for all of these metadata Wikidata templates so that they do not fail silently. Pinging, who created these templates. I have no objection to the use of a single tracking category by all of these templates, if that is easier. – Jonesey95 (talk) 21:21, 18 May 2019 (UTC)
 * , great idea. There are two other templates, and I agree it would be easier to track in a single category. -- Green  C  00:04, 19 May 2019 (UTC)
 * Actually maybe it would be better to keep separate and have a parent cat to hold them. -- Green  C  00:06, 19 May 2019 (UTC)
 * DE metadata Wikidata seems to be causing most of the errors - this is because I transcluded it directly from Infobox German location, which fails for places like villages and city districts. This has been undone now; it would be better to let the bot handle it starting from this query as was done for other countries.--eh bien mon prince (talk) 15:12, 19 May 2019 (UTC)

Hi, about Afghanistan population
Hi, on the page https://en.wikipedia.org/wiki/Afghanistan, There are three section about the population are wrong. number one:The population section on the right bar section of the page where it shows 31,575,018. It suppose be 37,135,635 instead of 31,575,018. number two:The section sentence where it shows "Afghanistan is a unitary presidential Islamic republic with a population of 31 million". it suppose to be 37 million population at least. number three:Under the demographics section, where it shows"The population of Afghanistan was estimated at 31.6 million in 2018. Of this, 16.1 million are males and 15.5 million females. ". The population of Afghanistan should be 37,135,635.

Here is my reference https://www.worldometers.info/world-population/afghanistan-population/

Can you change the three section that I mentioned? 37,135,635 or 37 million is the most updated information of Afghanistan population currently. — Preceding unsigned comment added by 198.178.118.50 (talk) 12:37, 30 May 2019 (UTC)


 * You have been using Wikipedia for years. Please register and sign up for an account and do it yourself. -- Green  C  13:30, 30 May 2019 (UTC)

CouchNoise links
Please can your bot work its magic on links matching the pattern  couchnoise.com/articles/ , as found, for example, near the foot of Out of Water? they all seem to be dead; compare and. Andy Mabbett ( Pigsonthewing ); Talk to Andy; Andy's edits 11:24, 1 June 2019 (UTC)

Andy Mabbett, it is in 9 articles and appears the entire domain is dead not just a sub-path portion. I logged into IABot ("Fix dead link" from history tab), "Manage URL Data", set domain global state to "Blacklisted" then "Run on All Pages". The job is queued and should run in the next day or so. -- Green  C  12:56, 1 June 2019 (UTC)

Bot removes infobox closing braces under certain conditions
During this edit the bot removed the article infobox's closing braces, making it unrenderable. Would you check the bot's logic circuits? Dhtwiki (talk) 00:15, 2 June 2019 (UTC)
 * Not sure if this bot will run again but if it does will check into it. -- Green  C  02:37, 2 June 2019 (UTC)

Job 13 bug
In this edit the bot doesn't seem capable of handling a link that uses templates for the display text. Modulus12 (talk) 01:00, 2 June 2019 (UTC)
 * Right, it wasn't programmed for that. It seems to have impacted two articles, the other being List of Sunrisers Hyderabad cricketers. -- Green  C  01:33, 2 June 2019 (UTC)

Job 11/Disambiguation pages
Just noticed this edit. Per WP:DABREF, disambiguation pages should not have references. Is there an additional check that could be made for this? Caeciliusinhorto (talk) 13:25, 29 June 2019 (UTC)
 * There's nothing in this page to indicate it is a dab page is why it got tagged. If the first line had ended in a ":" instead of "." it would have been skipped (or any number of other indicators). So this was an edge case. Recommend doing something to indicate a dab page, normally or a variant. --  Green  C  13:41, 29 June 2019 (UTC)
 * It already has, which is in Category:Set index article templates. If I am reading User:GreenC bot/Job 11/How right this should already be filtered out, so maybe this is a bug? Caeciliusinhorto (talk) 14:57, 29 June 2019 (UTC)
 * was just added. -- Green  C  00:19, 30 June 2019 (UTC)
 * Ah, mea culpa. Caeciliusinhorto (talk) 10:15, 30 June 2019 (UTC)

Popbot not parsing template syntax, resulting in an error
Hello GreenC, I've just come across this edit from last month which added France metadata Wikidata. As you can see it overwrote the template's closing tags and left the page a bit wonky. I didn't notice other pages with this problem, but maybe it would be good to check if any other of these 30,000 articles are no longer transcluding Infobox French commune in case the bot removed other tags. Best, --213.220.68.67 (talk) 00:08, 30 June 2019 (UTC)


 * Okay I found 13 other articles with this problem. Tried to fix some, but the abusefilter decided to stop me. --213.220.68.67 (talk) 00:30, 30 June 2019 (UTC)


 * Fixed, thank you. -- Green  C  02:40, 30 June 2019 (UTC)

Bot adding inaccurate archived versions
Hey...just noticed the bot made some changes to 2015 NCAA Division I FCS football rankings to update archive links since WebCite is currently offline. Unfortunately, it's pulling in Wayback archives from different dates than the original grabs, so incorrect archived versions are being linked.

Many of the references came from a static URL of a college football poll which was updated with the latest poll each week, and each was archived through WebCite shortly after the page was updated in order to preserve the data from the page before it got wiped in favor of the following week's poll.

So any archive that wasn't captured within a week of the original poll publication displays the wrong version of the poll. I appreciate the intent of trying to convert these archived links to a more stable source, but an incorrect archived version of the page isn't any better than not having an archived version at all. In fact, it might be worse, since it gives the impression that the link will show the correct information, when it in fact does not. WildCowboy (talk) 04:03, 2 July 2019 (UTC)


 * I adjusted the algo it's now getting much closer to the previous date. Can not guarantee tight ranges like 7 days - it will depend what is available at archive.org and how their API responds which is a black box - it will probably require manual fine tuning by checking what is available at archive.org - another option is add a next to each and hope that webcitation comes back online, or a mix of these two options. --  Green  C  13:19, 2 July 2019 (UTC)


 * Thanks. Unfortunately, accurate archived versions of a number of them simply don't exist on archive.org, as it wasn't capturing the page frequently enough back then. I'll have to think about the second option...don't have a ton of confidence in WebCite's future, so these references may just be lost forever. WildCowboy (talk) 16:03, 2 July 2019 (UTC)

IABot and GreenCBot keep adding back disabled archivelink
Hi, I have marked a link as a dead link in an attempt to stop the archive bots from adding an archivelink to it. The newspaper that ran the article has pulled it from its live site. It has also apparently disabled the archivelink from working. How can I stop the archivelink from being added over and over? It is not helpful because the article shows up but then it is pulled. Here is the link to the diffs:. Thank you. dawnleelynn(talk) 18:24, 7 July 2019 (UTC)
 * Also same issue here, thanks. dawnleelynn(talk) 19:15, 7 July 2019 (UTC)

, use to keep the bot off a citation. BTW even if the newspaper pulled the story, why wouldn't we include an archive link so readers can still see it? That is the purpose of archive links, when links are no longer available (ie. WP:LINKROT). -- Green  C  22:02, 7 July 2019 (UTC)


 * Also the archive link is not disabled I don't understand. -- Green  C  22:04, 7 July 2019 (UTC)
 * As I explained, when I view the archivelink, the article comes up and then is suddenly pulled away and I am taken to the Wayback Machine home page. I thought this was happening to all users. Like when the owner of the website puts a robots.txt file on their website so the articles can't be used. It also tells me that the article is available live, but that is not true either. If these things are not true, then perhaps it is an issue with my security software that I use like Anti-malwarebytes or 360 Total Security. But it doesn't happen with any other archivelinks and I use archivelinks all the time and fix broken links on a regular basis; I'm no newbie. hmmm... thanks for checking. I will look into this further. dawnleelynn(talk) 22:55, 7 July 2019 (UTC)


 * That doesn't happen to me, it works OK. Must be something in the local browser cache. Try clearing the cache, or a different browser. -- Green  C  23:05, 7 July 2019 (UTC)


 * Ok, I was mistaken. What it actually says after it displays the article in the Wayback Machine and then takes it away and goes back to the Wayback Machine home page is, "The Wayback Machine has not archived that URL." and "This page is not available on the web because page does not exist." Anyway, I cleared completely all caches today and tried again. It didn't make any difference on the browser I use all the time, Chrome. I also cleared the cache for Edge, which I don't use and haven't installed any plugins or addins onto and got the same thing in Edge. I also ran CCleaner yesterday, as well as flushed the Flash cache trying to get Flash to run on a particular page. So, I have done pretty much everything you can do. I also checked again that I disabled my ad blocker on the Wayback Machine, even though I know I had done a that a long time ago. Using uBlock Origin. Obviously, if everyone else can see the article, I will leave the archivelink alone. Thanks. dawnleelynn(talk) 03:29, 8 July 2019 (UTC)


 * Try It's an archive.today save of the archive.org link. It demonstrates the archive.org page works when accessed from somewhere else. Another option is try from a 'private window' (incognito) --  Green  C  03:53, 8 July 2019 (UTC)


 * Yes, it does work for me in the archive.is site which you linked. I do use that site sometimes too when I can't locate one in the Wayback Machine. Good suggestion. I tried to incognito window, but it still didn't work. I'm going to try another computer in our household tomorrow. Thanks a bunch though! Btw, yesterday I actually got flash to run on a certain page by searching and searching the Internet. I did all the suggestions with allowing flash and clearing caches, and the setting in the browser and nothing worked. Reinstall Flash, etc... Finally find a suggestion in the Chrome help forum where you run a "New Incognito Window" and it worked like a charm. It was super hard to find this information. Anyway, thanks again! dawnleelynn(talk) 04:04, 8 July 2019 (UTC)
 * Just a guess, but maybe GreenC has Javascript disabled, while Dawnleelynn doesn't? I also get redirected to a Wayback Machine error page, and I think it has something to do with bad Javascript tracking code in the original page that doesn't play well with the archive URL. Turning off Javascript in Chrome loaded the archived page correctly for me. Modulus12 (talk) 22:01, 8 July 2019 (UTC)

Hi I disabled Javascript completely and cleared my cache in Chrome. You were right, the archivelinks do work after doing that. However, something else strange happens too. I no longer see the Wayback Machine header at the top of the archivelink pages. I tried several and none of them show the header so you know you are on an archivelink site; of course the URL lets you know that you are, but that's it. Thanks though it's a good way to test when I run into this issue. I'll probably keep Javascript on and just turn it off to test for issues. So, your help has been very useful to me; thanks a bunch! Already I can see some UI items missing from this page because I have Javascript off, such as the one that signs my signature for me. dawnleelynn(talk) 23:25, 8 July 2019 (UTC):


 * Here is a Chrome extension that can disable java on a domain basis so it won't run when accessing archive.org - I have not tried it there may be others like it. -- Green  C  13:46, 9 July 2019 (UTC)
 * Thanks for the thought. However, in the JavaScript settings in Chrome, you can enable or disable it completely. Or, you can add or block by sites. I actually tried to block by site first; it didn't seem to work. But now I think I should have cleared the cache when I tried it and restarted the browser. I will try it again later. Will write a short message here if it works. However, it will still prevent the top header portion of the Wayback Machine from displaying when loading archive links. dawnleelynn(talk) 21:22, 9 July 2019 (UTC)
 * I use Firefox and the wayback header shows correctly, with javascript enabled. There must be something else blocking the redirect. --  Green  C  03:04, 10 July 2019 (UTC)
 * I was using Firefox some time ago. I have a number of reasons I am not using Firefox right now. Maybe some day. I tried adding just the Wayback Machine IP to the block list in JavaScript in Site Settings in Chrome. It doesn't seem to work at enabling the archivelink site for Bull Riders Only and Bodacious. The only thing that works is disabling JavaScript completely. For now, I will stick with just disabling it when I want to test if JavaScript is the culprit that is keeping an archivelink from working. Again, thanks for your help, everyone, and it has seriously been very useful. I may try that Chrome extension later, I need a break from this right now. <i style="color:#800000;">dawnleelynn</i>(talk) 19:40, 11 July 2019 (UTC)

NewsBank and ProQuest links
In this edit, GreenC bot changed a number of NewsBank citations to non-working links.

For example, a WebCite link:



was changed to Archive.org:



The Archive.org link says "Error: Your Search session has expired", whereas the WebCite link contains the article. NewsBank links fall under two domains: (1) http://infoweb.newsbank.com/ and (2) http://docs.newsbank.com/. Would you fix GreenC bot to check for when the NewsBank link is expired? Thanks, Cunard (talk) 05:54, 10 July 2019 (UTC)

In this edit, GreenC bot changed a number of ProQuest citations to non-working links.

For example, a WebCite link:



was changed to an Archive.org link:



The Archive.org link redirects to https://search.proquest.com/news and does not contain a PDF of the article, whereas the WebCite link contains a PDF of the article. Cunard (talk) 06:06, 10 July 2019 (UTC) Yeah. These are soft404's difficult to detect. It sometimes works. Let me think about it. -- Green  C  13:23, 10 July 2019 (UTC)

there is a flaw
With the bot converted this:

to this:

By adding http://www.pbs.org/mormons/view/15.html? when title was already wikilinked, caused cs1|2 to emit the error message. title (and other title-holding parameters) cannot have both wikilink and external link. Wikilinks can be added directly, as in these examples, but also with title-link – this applies only to title.

—Trappist the monk (talk) 21:48, 15 July 2019 (UTC)

Still broken. With, bot converted this:

to this:

Again inserting inappropriate http://www1.bartleby.com/65/br/Bradfd1722.html when title is already wikilinked.

—Trappist the monk (talk) 19:32, 16 July 2019 (UTC)

Trappist the monk, normally it skips expansion of url when another URL field exists but contribution-url was missing from the check list. It now has that plus chapter-url, conference-url, map-url, transcript-url and lay-url. -- Green  C  20:59, 16 July 2019 (UTC)
 * I think that for this discussion, the url-holding parameters that work with archive-url are:
 * chapter-url, chapterurl, contribution-url, contributionurl, entry-url, article-url, section-url, sectionurl – these are all aliases of chapter-url so apply to all aliases of chapter
 * map-url – only but is specific to map which can act as title (italicized) or as chapter (upright quoted)
 * url – of course
 * So, if I understand what the bot is supposed to do, the above parameters, when present and set, should suppress the creation of url from archive-url.
 * These are parameter-specific and don't work with archive-url:
 * conference-url is specific to conference
 * transcript-url is specific to transcript
 * lay-url is specific to the plain-text label 'Lay summary'
 * These parameters, when present and set, do not suppress creation of url from archive-url
 * —Trappist the monk (talk) 22:16, 16 July 2019 (UTC)
 * —Trappist the monk (talk) 22:16, 16 July 2019 (UTC)


 * Done, thanks! -- Green  C  00:52, 17 July 2019 (UTC)

Something is wrong
GreenC bot has been making changes to Latter Day Saints articles and causing some problems. One recent example is over these four changes, where the bot seems to be trying to avoid links that are redirected at the target site. Thing is, the edits caused duplicate reference errors because not all of the references with the same name (and the same content) were changed. Why is the bot making these changes in the first plcae? And if it must make them, why isn't it consistently applying the changes within the article -- why does it only fix some of these links? -- Mikeblas (talk) 03:29, 18 July 2019 (UTC)
 * Another change [was done here https://en.wikipedia.org/w/index.php?title=List_of_members_of_the_Quorum_of_the_Twelve_Apostles_(LDS_Church)&oldid=906540163] with similar undesireable results. -- Mikeblas (talk) 03:45, 18 July 2019 (UTC)

Hi Mike, discussed here. I am also perplexed why it changes some links and not others during the same edit, I suspect something to do with LDS website bot control because it is intermittent (the bot won't change a link if it can't determine the redirect). So I have been reprocessing pages multiple times but apparently some are still missed. This is to prevent future breakage if the redirect is lost, and get Wayback machine tracking at the new URL. It's been a complex conversion but trying to do it in the future without accurate redirect information would be impossible (a problem which happens). Sorry about the trouble it caused. The bulk of it is done. -- Green  C  04:14, 18 July 2019 (UTC)

Errol NH, USA
Hi. On the page for Errol New Hampshire I noticed that a date may be wrong. How do I send you a screenshot of what I'm talking about? Thanks. Tim User7998 (talk) 03:35, 9 August 2019 (UTC)

Destroying lots of Chart Stats links
Have a look what you are doing. Just two examples:
 * Special:Diff/915382192 : You Are The Quarry instead of Into Dust? The result: "Sorry, there are no Official Singles Chart results for "you are the quarry""


 * Special:Diff/915550268 : The link is now useless: "Type in artist name"

That's vandalism, isn't it? --95.116.186.122 (talk) 00:00, 14 September 2019 (UTC)


 * The edit summary has a link to the page chartstats.com (ie. Link rot/cases/chartarchive.org and chartstats.com) which has more info. -- Green  C  00:39, 14 September 2019 (UTC)


 * Okay, no vandalism. This wikilink, however, is missing in the edit summary of the first example. --95.116.186.122 (talk) 01:04, 14 September 2019 (UTC)


 * Fixed. --  Green  C  01:20, 14 September 2019 (UTC)

Dead-url is deprecated
Hi, deadurl and dead-url have been deprecated in cite web and its variants in favor of url-status, and a red citation error message now appears in citations which use the deprecated syntax. However, I notice that GreenC bot is still generating dead-url, which has contributed to the growth of the maintenance category Category:CS1_errors: deprecated parameters. Are you able to update the bot accordingly? The particulars of the cite template update that applies here are listed at the maintenance category page (and also at Template:Cite_web), but note that the "yes" or "no" values also need to be changed to "live", "dead", "unfit", or "usurped", as necessary. Thanks.— TAnthonyTalk 17:07, 3 October 2019 (UTC)
 * The last time WaybackMedic ran was September 15. It was completing work already started at WP:URLREQ in August, I was not going to stop that promised work to make this change as it is a significant modification to the software, and the number of added dead-url's was very low. It has not run since and won't until this is repaired. -- Green  C  18:40, 3 October 2019 (UTC)
 * OK great, just wanted to make sure you were aware. Thanks.— TAnthonyTalk 01:46, 4 October 2019 (UTC)
 * TAnthony, this is fixed. WaybackMedic also makes the conversion in case in comes across them, though it appears they are well and truley dead now except for occasional restores from reverts. -- Green  C  15:40, 5 October 2019 (UTC)

Blue linking URLs
I have made a comment about an action relating to blue linking seemingly initiated by your bot at Bots/Noticeboard to which so far I have had one response which was not particularly helpful. I would be grateful if you would respond there. Thankyou. Djm-leighpark (talk) 07:37, 14 November 2019 (UTC)
 * GreenC, I understand you're handling the bug fixes for the footnote links to IA books, is that right? I'd love to see the code (even privately if you're not ready to publish it). Edsu was curious too, don't miss the opportunity to get his eyes on your code!
 * Anyway, I came to report a small issue with roman numbers, which should be either ignored or converted as long as the /page/ feature on archive.org doesn't handle them. Nemo 22:01, 25 November 2019 (UTC)

For the record, because the new actor/comment tables are so slow this is the only query I found so far that manages to complete and count the edits: MariaDB [enwiki_p]> select count(rev_id) from revision_userindex JOIN comment ON rev_comment_id = comment_id AND comment_id > 284004931 AND comment_text LIKE "Bluelink%#IABot%" AND rev_actor = 177; +---+ +---+ +---+ 1 row in set (52 min 4.25 sec) Nemo 09:28, 5 December 2019 (UTC)
 * count(rev_id) |
 * 131884 |
 * A bit faster this way: query/42113 (it might also be enough to use <tt>LIKE "Bluelink%"</tt>). Nemo 21:20, 12 February 2020 (UTC)

broken citation templates
See and its result. If you are going to remove url you must also remove access-date and url-access (and any other parameter that relies on url). This same applies for the chapter (and alias) parameters.

—Trappist the monk (talk) 18:06, 23 November 2019 (UTC)
 * Fixed 40 articles . -- Green  C  21:04, 23 November 2019 (UTC)
 * Nice! Now that you have a process for this, it would be great to remove all those broken links to www3.interscience.wiley.com and www.informaworld.com/smpp/ (and web archivals thereof, also empty). Nemo 09:09, 26 November 2019 (UTC)
 * See my last comment here. Not sure what do about unreliable doi.org links. Do we replace a (possibly) working archive URL with a (possibly) non-working DOI link? In the case of Blackwell-Synergy it is a usurped domain so removed regardless of doi status. If we can find a way to reliably determine the doi links to a working page. Possibly page scrape. I'd need to investigate and find some non-working examples. The page headers always return 200 so can't be used. -- Green  C  17:16, 26 November 2019 (UTC)
 * I'm not sure what you mean by unreliable doi.org links: as soon as you leave doi.org itself you're in a jungle. The www3.interscience.wiley.com and www.informaworld.com/smpp/ URLs are pure garbage and should be removed period. A DOI can be added later based on the title by citation bot or other methods. Nemo 19:53, 26 November 2019 (UTC)
 * Are you proposing removing links only if in a CS1|2 template? When I followed the doi.org link for some doi it didn't work. -- Green  C  01:39, 27 November 2019 (UTC)
 * Yes, I would expect only in templates for now because otherwise citation bot often cannot clean up the citation later. Or, you could templatify unstructured citations with broken links by querying Citoid, which uses the CrossRef API to get a DOI from an unstructured citation. (You need to remove the URL that Citoid adds, though.)
 * Bad publishers often have broken DOIs, but websites of bad publishers tend to be the least reliable too, so it's even more urgent to remove those URLs. They can be replaced with better ones on open archives. If there's reason to think those URLs contain some meaningful ID, that could be moved to an id. Nemo 13:03, 27 November 2019 (UTC)
 * I am unfamiliar with academic publishing. Can you explain what is a bad publisher and why are they bad, how did this mess come about? Are they rouge copyright violations, copied from other sites and re-branded illegally with made-up DOI numbers? If we can establish these sites as unreliable (for reasons) then deleting them entirely is no problem. In that case I'd prefer to clear it through WP:RSN so that no one complains about the bot deleting thousands of links without a DOI url replacement.
 * Otherwise, your last sentence I think holds the key. Given a URL to wiley.com etc.. is it possible to determine what the DOI is? If so, the same strategy as Blackwell-Synergy could be used, because in that case most of the Blackwell URLs had a DOI as part of the url itself. So it was easy to extract the DOI from the URL, delete the url and archiveurl and replace it with the doi / and for unstructured references simply replace the original link with a link to doi.org/doi_# -- no parsing of unstructured citations required. The key is finding the DOI for the original URL. I think it is possible by web scraping the archiveurl. But if your saying bad DOIs are common then this presents problems for unstructured citations (my bot can't convert them to structured), and anyway in that case it should be option 1, RSN. --  Green  C  14:42, 27 November 2019 (UTC)
 * The URLs of which I asked the removal have no information value whatsoever, otherwise I would not have asked their removal. When an URL has some informational value, citation bot is already able to convert it (e.g. URLs which contain a DOI).
 * Bad publishers are usually giant legacy publishers with ultra-outdated and broken technology and processes, like Wiley, Elsevier, OUP. I'm not talking about WP:RS badness. If a publisher left us with thousands of broken meaningless URLs they're clearly a bad publishers and their garbage metadata should be removed. Nemo 15:19, 27 November 2019 (UTC)
 * Right the URLs themselves don't, but retrieve the archived page and web scrape the DOI number, same thing. The information can be retrieved in many cases so long as the archive URL still exists. Then check the doi.org link make sure it works, and if everything checks out, only then remove the archiveurl and replace with a doi (or a doi.org url for non-CS1|2). It wouldn't delete archiveurls unless it can be replaced with a valid DOI at the same time. --  Green  C  15:58, 27 November 2019 (UTC)
 * The archived version for those URLs is also garbage in most cases, from what I've seen. It's much better to start from scratch with current metadata on CrossRef and other trusted providers. The less garbage in the input, the less garbage in the output. By the way, the replacement of the URL with the DOI allowed to rescue a bare ref. Nemo 17:03, 27 November 2019 (UTC)
 * If that's the case, please open a case at RSN because the links are, according to you, entirely unreliable ("garbage"). I'm not doing CrossRef or Citoid sorry ask Citation Bot. -- Green  C  18:40, 27 November 2019 (UTC)
 * Garbage URLs, not garbage sources. It's fine to not do CrossRef or Citoid, it can be left to a later stage once the garbage metadata is removed. Nemo 19:27, 27 November 2019 (UTC)

I'm at a loss. Consensus discussion opened at Reliable_sources/Noticeboard -- Green  C  14:40, 28 November 2019 (UTC)
 * Citation bot took care of most of the obvious cases. (I'm now casting a larger net to catch some more but it probably won't manage to remove many more.) Nemo 21:46, 9 December 2019 (UTC)

Reference replace
Hello,

For this link: https://worldtracker.org/media/library/Reference/Encyclopedia's/Encyclopedia%20of%20Irises.pdf, can I get a reference replacement?

Anywhere where that link is found, replace the text between with ? See this as an example:. Thanks! --evrik (talk) 20:29, 9 December 2019 (UTC)
 * Looks like a book worth sponsoring! Only 50 $ at https://openlibrary.org/books/OL8176432M/Irises Nemo 21:44, 9 December 2019 (UTC)
 * Thanks for the OL suggestion. I am trying to get rid of the dead link. --evrik (talk) 21:46, 9 December 2019 (UTC)
 * Yes. If someone sponsors the scanning by Internet Archive, we can link a preview for each cited page. With almost a hundred citing articles, I'd say it's worth it. Nemo 22:18, 9 December 2019 (UTC)
 * Looks like the book title is a little different: "Irises: A Gardener's Encyclopedia" .. many of them have page numbers (Iris bucharica note #6 pp. 274-275) but the numbers don't sync with the book edition which discusses Iris bucharica on pp. 298-299 (see ). An offset of 24 pages though if that offset holds for every page I have no idea.  Given there are only 75 cases and not all with page numbers ideally someone could determine the new page numbers for the 2005 edition before a bot deleted the original page info. (BTW the PDF is pirate but useful for finding page numbers).  --  Green  C  22:15, 9 December 2019 (UTC)
 * GreenC, You are correct on the title. I fixed that. My goal is a link swap. If you can swap the dead link for the live live that would be acceptable. --evrik (talk) 22:20, 9 December 2019 (UTC)
 * Can't link swap for two reasons: it's a pirated book, the page numbers don't sync. -- Green  C  22:23, 9 December 2019 (UTC)
 * Okay, well I fall back on the original request as I want to remove the dead link. I don't think the page numbers are that important. If you're going to get the book, you can use the index. --evrik (talk) 22:28, 9 December 2019 (UTC)
 * Fair enough. This is your request, and you believe this is not an important issue since the flower names are well indexed. I will let you be responsible for fixing page numbers if it gets raised by the community after the bot runs. You could check the index of the pirated PDF and add the page numbers, manually, should someone request it. Are you OK with that? I doubt anyone will but I don't want to be responsible for page complaints. Also I'm having computer hardware problems so may not get to this for a week or so, currently on laptop. --  Green  C  22:50, 9 December 2019 (UTC)

This was a blunt patch but it's done. In case anyone is interested in the "1-line" unix command: awk -ilibrary '{IGNORECASE=1; f=sys2var("wikiget -w " shquote($0)); c=patsplit(f,field, /[{][{][ ]*cite (web|book)[^}]+[}]/, sep); for(i=1;i<=c;i++){if(field[i] ~ /Encyclopedia%20of%20Irises[.]pdf/) {p = ""; if(match(field[i], /[|][ ]pages?[ ]*[=][ ]*[^\|}]*[^\|}]/,d) > 0){sub(/[|][ ]*pages?[ ]*[=][ ]*/,"",d[0]);p=""}; field[i] = ""; f = unpatsplit(field,sep); print f > "/tmp/out"; close("/tmp/out"); print $0 " " sys2var("wikiget -E " shquote($0) " -S " shquote("Fix cite per request") " -P /tmp/out") }} }' pages.txt
 * Any news? --evrik (talk) 03:16, 9 January 2020 (UTC)

-- Green  C  05:24, 9 January 2020 (UTC)

Bot is deleting a working archive-URL
In this edit the bot deleted a working archive-URL and posted a dead link notice. Toddy1 (talk) 23:53, 8 January 2020 (UTC)
 * URLs end with a space character, the bot sees it as https://archive.is/http://www.artek.org/History%20Artek/history .. which does not work. Fixed. -- Green  C  00:37, 9 January 2020 (UTC)

archive.today doesn't load
The archive.today site doesn't appear to be working.

The archive link added by this edit doesn't load. Same with the archive link added by this edit.

Whywhenwhohow (talk) 22:23, 12 January 2020 (UTC)
 * Works for me. Are you able to access archive.today at all? -- Green  C  04:16, 13 January 2020 (UTC)
 * No. It looks like the problem is DNS lookup for archive.today and archive.is. Whywhenwhohow (talk) 05:23, 13 January 2020 (UTC)
 * This is a known problem for some users it relates to a dispute between archive.today and Cloudflare. The only known solution is use a different DNS resolver that isn't Cloudflare-based. --  Green  C  14:47, 13 January 2020 (UTC)
 * Can we use archive.org instead of archive.is/archive.today? Whywhenwhohow (talk) 18:00, 13 January 2020 (UTC)
 * The bot goes down a list of archive providers and searches each one till if finds a provider who has the page. archive.org is first on the list and archive.today is last and there are about 15 others in between. Just so happens archive.today is the second-largest archive provider so they get a lot of matches, when the others don't have it. -- Green  C  17:26, 14 January 2020 (UTC)

comment re Good articles/mismatches
I wonder if you'd consider having the bot update this page more often (currently once a week). It is quite common for there to be new errors, and it can be hard to gauge progress as any items that one corrects improperly will not be re-listed for some time. I was thinking every two or three days? Another advantage is that the updates will show on watchlists more often. (I have been working with this page, but I forget about it as it hits the watchlist inoften... and IMO Wikipedia's non-improvement of watchlist mechanisms is the real issue but...) Thanks, Outriggr (talk) 08:01, 18 January 2020 (UTC)
 * now set for 3 days a week at 6:30am Sunday, Wednesday and Friday. -- Green  C  17:01, 18 January 2020 (UTC)

Dog house
You deleted the section disappearing of reason not understandable, no source.

I think the connection of watchdogs and dog houses should be noticed. In countries where watch dogs are forbidden (like Sweden the last 50 years) there are no dog houses anymore. In fact watch dog culture and dog house culture are the same.

This topic is the only content in the Swedish Wikipedia page of dog house. --Zzalpha (talk) 11:09, 7 February 2020 (UTC)

Replacing operational Google Books links by (subscription!) archive.org links
Please stop immediately with your replacements of operational Google Books links by (subscription!) archive.org links. For the ones I encountered, I saw no advantage in the replacement subscription link, and this should be brought up on the individual article's talk pages before bot-implementing in mainspace. Consequently, I'll stop your bot. --Francis Schonken (talk) 07:48, 10 February 2020 (UTC)
 * Francis, I suggest to be more careful with your reverts. With this revert, you've restored a URL which is significantly less functional: for me happens to load the entire page, while at  (without being logged in) I can also see the next page and finish the sentence which helps explain the referenced concept. Nemo 09:09, 10 February 2020 (UTC)


 * Internet Archive is not "subscription" it is "free registration" to read the full book, and "no registration" to read the 2-page preview just like Google - either way there is no money required (unless you buy the book through Google who loves we have so many Google Books links). Google Books has a lot of problems, see WP:GOOGLEBOOKS. There is general consensus for non-profit over for-profit when available. There is general consensus for using Internet Archive for linking books. Internet Archive links are more stable and offer users a higher-level of access ie. full book access for free (Google charges money) - and non-registration full-page previews vs Google snippits (noted by Nemo). Internet Archive is a non-profit, academic-level library archive vs. a commercial book seller. Google Books and Amazon "Look Inside" are the same in terms of what they offer and why - to sell you books. -- Green  C  15:46, 10 February 2020 (UTC)
 * Sorry, no, if you want to change such link, discuss it on the talk page. In the above case the bot edit was wrong from about every angle, starting with an edit summary that does not represent what the bot was doing. --Francis Schonken (talk) 19:45, 2 March 2020 (UTC)

... and by links that have no advantage whatsoever
I don't know what the intention of this edit was, but please stop the bot permanently when it continues to apply deteriorations to references. --Francis Schonken (talk) 19:43, 2 March 2020 (UTC)


 * Fixed. -- Green  C  21:36, 2 March 2020 (UTC)

And again
This is a revert of a completely useless edit of your bot. I stopped the bot. --Francis Schonken (talk) 12:43, 5 March 2020 (UTC)

Note, that as I mentioned above, the edit summaries produced by the bot are completely inappropriate: the book was already bluelinked before the edit, and the edit didn't bluelink any book. --Francis Schonken (talk) 12:45, 5 March 2020 (UTC)


 * I have no problem with bug reports, I fix them, but stopping the bot is quite disruptive to a lot of people and you may loose that ability if it is abused. --  Green  C  15:15, 5 March 2020 (UTC)
 * Anyhow, let's continue this discussion at User talk:Citation bot/Archive_19. Tx. --Francis Schonken (talk) 15:45, 5 March 2020 (UTC)

bot removes |url= but does not remove accompanying |url-access=
See. When deleting url, the bot should make sure that it deletes all parameters that require the presence of url. These include: access-date, archive-url, format, url-access.

Also, the edit summary is a bit misleading: Move 2 urls suggests that the urls were retained in a different position. Instead the urls were deleted; removed.

—Trappist the monk (talk) 12:16, 13 February 2020 (UTC)


 * Ok re: deleting args. The move is a canned summary, or I could just say "Fixing Springer links".. it is moving the URL from a direct to Springer to the URL in the doi which then redirects to a new Springer site, it's a sort of move by way of deletion. -- Green  C  14:54, 13 February 2020 (UTC)
 * Also needs to remove any templates --  Green  C  14:57, 13 February 2020 (UTC)

Non-mainspace edits
Hi! Should the bot really be editing outside article space with edits like this or this or this? — HELL KNOWZ   ▎TALK 22:06, 13 February 2020 (UTC)
 * I was actually a mistake to include non-mainspace but it seemed to be doing good things so I went with it. I guess a few spots it could be off. It's only for the Springer URL dead links. -- Green  C  22:33, 13 February 2020 (UTC)

Blue link on Great Expectations does not work
Hello. This edit here at Great Expectations does not work. The correct link for the source is there; I copied it and saw the book, but it does not go to the page where the quote is found. I do not understand how to add a blue link to a short format reference or I would fix it. Can you? --Prairieplant (talk) 11:50, 8 March 2020 (UTC)
 * The link is not shown because it was hidden in an HTML comment and remains so.
 * You can use the full text search from archive.org: search the quote and will get a link to the search within the specific book which then redirects to the specific page with the quotation after you click "borrow". From that you find that the link to the specific page is https://archive.org/details/flintflame00earl/page/262 . The bot cannot guess this because the previous link did not include a reference to a page number (although the harvnb reference did). Nemo 12:57, 8 March 2020 (UTC)

Changing the link to a different edition, changing the year and breaking sfn
On the 8 March 2020 the bot made this edit to the article about René Caillié. The article cited the 1799 edition of a book by Park using sfn with ref=harv and provided a link to the Google scan. The bot changed the year to 1815 and provided a link to an Internet Archive scan of an 1815 edition of the book. This broke the sfn link and the page number. The bot should be programmed not to change links to scans when the year is different. Sometimes more than one edition of a book is published in one year so great care is needed. In general I prefer to cite the Internet Archive but a scan of the edition that I was citing was not available on that site. - Aa77zz (talk) 15:42, 9 March 2020 (UTC)
 * Note that the year in the original reference was 1799b - ie not a number - to distiguish from 1799a using sfn. Could this have confused the bot? - Aa77zz (talk) 15:51, 9 March 2020 (UTC)
 * I think you are right. It didn't recognize 1799 as a valid year due to the "b" and so it found the best match for the book sans date and reset the year to 1815. I will look into this. Normally it matches the year, publisher, author and any other metadata like Volume and Series information. -- Green  C  16:03, 9 March 2020 (UTC)
 * BTW is this the same? --  Green  C  16:09, 9 March 2020 (UTC)
 * Or this --  Green  C  18:19, 9 March 2020 (UTC)
 * Both look good - I cited page 195 which looks the same in both. The 1st one is the 3rd edition - but the same year. For my purposes it doesn't matter which. (I frequently edit bird articles where for the taxonomy the actual edition is all important) - Aa77zz (talk) 18:41, 9 March 2020 (UTC)
 * Ok the bot picked up the second one in a test edit with the fix. --  Green  C  21:06, 9 March 2020 (UTC)
 * Great - many thanks for fixing this. It is fairly standard to add letters to the year when using the sfn template or similar. - Aa77zz (talk) 21:50, 9 March 2020 (UTC)
 * FWIW I read this book about 6 months ago .. - Green  C  22:16, 9 March 2020 (UTC)

When the URL and archive-url are the same ...
I don't imagine this would happen very often, but this edit wasn't ideal; this is a better fix. Graham 87 02:54, 23 March 2020 (UTC)
 * There is so much going on here I won't try to explain it. But I've added some code to try and prevent this from happening. It is an NLA specific problem. -- Green  C  12:56, 23 March 2020 (UTC)

Incorrect edit summary
Hello, in this edit to Joan Baez, the bot said it was reformatting two archive links when, unless I'm completely losing the plot, it only reformatted one. Graham 87 14:36, 30 March 2020 (UTC)
 * In this case it converted from http->https and from short to long form (2 changes). It's true only one citation was modified, but I don't have a good way to know that the way the bot is designed it processes the page sequentially and toggles a counter for each change. --  Green  C  14:51, 30 March 2020 (UTC)

Incorrect archive dates
Hello the BOT appears to be adding a date in an invalid format as the archivedate for example here. The date also does not appear to be the archivedate for the URL. Keith D (talk) 17:39, 30 March 2020 (UTC)
 * Bug fixed. Caused when missing archivedate and when the archive URL timestamp and date are the same date. -- Green  C  21:02, 30 March 2020 (UTC)

Ellicott City, Maryland revision
This revision to Ellicott City, Maryland was just wrong: for no apparent reason it replaced a working archive link for the Ellicott City CDP with an irrelevant archive link for the North Potomac CDP. -- Pemilligan (talk) 15:01, 12 April 2020 (UTC)
 * This is a bug. Looks like it effected 171 articles (out of 50k or so). I'll work on a script to find and restore them. -- Green  C  17:11, 12 April 2020 (UTC)

Why is this happening?
Diff, why is the bot removing perfectly valid archive links from sources? This is the second one I have reverted? « Gonzo fan2007 <small style="color:#2A2722">(talk)  @ 17:30, 11 May 2020 (UTC)
 * Here is the other one (note the date fix was correct, the removal of url-status doesn't seem to make sense). « Gonzo fan2007  <small style="color:#2A2722">(talk)  @ 17:34, 11 May 2020 (UTC)
 * url-status is only used when there is an archive-url, it has no purpose or function other than to tell the cite web template which order URLs are displayed (archive or url first). It does not mean "this URL is dead", you would use for that. A url-status without an archive-url is superfluous. --  Green  C  17:43, 11 May 2020 (UTC)

User:Gonzo_fan2007: Two problems: 1) https://www.newspapers.com/clip/24972268/mrs_kelly_death_notice is not a web archive. Web archives are archive.org, archive.is, etc.. listed at List_of_web_archives_on_Wikipedia. 2) The url and archive-url are identical URLs which is not the purpose of archive-url. The correct solution for archive-url is https://archive.is/xAVsp which means that if the url ever dies, There is a web archive backup. -- Green  C  17:40, 11 May 2020 (UTC)
 * So in the first diff, it was an error. I.e., the actual url was put in accidentally, instead of the archive-url (which was created here). I would imagine that this is a (common?) error that occurs when putting references together. Is there anyway that instead of the bot just removing the url, it could put it into an error category for further review (maybe only if archive-date and url-status is properly filled out, thus when it appears that an editor made an attempt to archive the link)?
 * Regarding the second diff, the bot appears to have removed url-status=live for two properly formatted references that both have archive-url and archive-date. Am I missing something (which is definitely possible)? « Gonzo fan2007  <small style="color:#2A2722">(talk)  @ 20:16, 11 May 2020 (UTC)
 * When the url and archive-url are exactly the same there isn't anything to do but remove it. If the url is dead it would have replaced with an archive URL. If in the future the url dies a bot will replace it. The problem is when there is a URL taking the archive-url real-esate, if/when the url dies, the main archive bot (InternetArchiveBot) will skip it, so it never gets saved and you end up with link rot.
 * Sorry about the second diff I didn't look closely you are right, that is a bug, I know what caused it (it doesn't see beyond the template). I'll fix it. --  Green  C  21:00, 11 May 2020 (UTC)
 * Makes sense. Thanks for the assistance. « Gonzo fan2007  <small style="color:#2A2722">(talk)  @ 21:10, 11 May 2020 (UTC)

Your bot is replacing good archives with bad ones
I've reverted two edits so far, 1 and 2, where the article was using a scanned copy of the original article as a source and your bot replaced it with a web.archive.org link based on the date that the scan was taken (ie, 2011) rather than the date of the article (ie, 1986), and of course the 2011 archive doesn't have the actual article from 25 years earlier. Robman94 (talk) 18:39, 11 May 2020 (UTC)


 * It moved the URL as an alternative URL at the end of the cite to make room for a proper web archive link. When the alt URL ever dies, which it will, it will also need its own web archive link. This is why we only use web archive links in the archive-url field because they are specialized sites for saving copies of links on the Internet. By putting non-web-archive links in the archive-url field, that link will die and (probably) never get saved because there is no place to put the web archive link, and not bots and processes that will maintain it. In this case it should be http://www.rockabilly.net/wikiscans/rs-12-18-1986.shtml and https://archive.is/gdM5 (archive.is in this case but you could use any listed at WP:WEBARCHIVES). --  Green  C  21:09, 11 May 2020 (UTC)
 * I wasn't aware that we had a rule stating that we only use web archive links in the archive-url field, at any rate, the 2 web archive links that the bot substituted are no good as they are from 2011 not 1986. I will update the articles accordingly.  Thanks, Robman94 (talk) 21:08, 12 May 2020 (UTC)

hidden text and |url-status=
Re:, if GreenC bot is clever enough to remove comments around two parameters, perhaps it could be made clever enough to make sure that url-status has valid parameter values?

—Trappist the monk (talk) 13:17, 14 May 2020 (UTC)
 * Normally it would be the clever insertion of a wikicomment in the middle of a key=value pair threw it off. -- Green  C  15:00, 15 May 2020 (UTC)

Partially mangled
had to fix [//en.wikipedia.org/w/index.php?title=Dick_Dreissigacker&type=revision&diff=957659680&oldid=957624824] [//en.wikipedia.org/w/index.php?title=Alan_Brahmst&diff=prev&oldid=957659119] ... Frietjes (talk) 22:53, 19 May 2020 (UTC)
 * name is an unrecognized alias of title, will fix. --  Green  C  23:22, 19 May 2020 (UTC)

Data.tab created seems not completed
Hi !

Maybe you must take a look at the article the Bot created : Data:Wikipedia statistics/data.tab, it seems to me that there are something missing in it.Alexcalamaro (talk) 10:26, 23 May 2020 (UTC)


 * The file lives on Commons, probably an old typo at some point caused the bot to create a page here. I issued a speedy delete. -- Green  C  13:06, 23 May 2020 (UTC)

Incorrectly formatted ISO date
Hello, in this edit the BOT added an incorrectly formatted ISO date to archive-date parameter. Keith D (talk) 11:24, 31 May 2020 (UTC)


 * Keith D, code fixed, thanks. It was actually the wrong date and format in this case should match access-date (dmy) so it was all around fubar. -- Green  C  13:20, 31 May 2020 (UTC)

Chancellor Whiting
Chancellor Whiting passed yesterday June 4, 2020 at the age of 102.

Ashley Elder — Preceding unsigned comment added by 96.240.140.43 (talk) 19:23, 5 June 2020 (UTC)


 * You mean Albert N. Whiting which my bot last edited 2 months ago. -- Green  C  19:53, 5 June 2020 (UTC)