User talk:GreenC bot/Archive 6

whitehouse.gov
Hi there! Edits such as, , and are adding whitehouse.gov to references that already have a work parameter (or one of its aliases), which adds the article to the maintenance Category:CS1 errors: redundant parameter. Could you please fix the references your bot broke and tweak your bot so it doesn't make similar edits in the future? Thanks! GoingBatty (talk) 01:18, 7 March 2021 (UTC)
 * The bot job is done you don't need to hit the stop button. Was not aware newspaper and work are alias. I'll check into fixing them. -- 02:07, 7 March 2021 (UTC)
 * The fix is done for 30 cites in 26 articles. -- Green  C  22:20, 7 March 2021 (UTC)

Archived url removed
Hi! With this edit your bot removed a working archived url. Is that a one-off glitch, or is there a mistake in your programming? Can we be sure that isn't going to happen again? Justlettersandnumbers (talk) 21:16, 13 March 2021 (UTC)
 * The source link was changed and made live again, the archive URL is not longer needed. -- Green  C  21:47, 13 March 2021 (UTC)

Nobots issue
Is it possible to make Shadowbot respect ? Unless is OK with removing the local files there is little reason to constantly flag them as shadowed, especially since the enwiki file and Commons file only differ in the version. Jo-Jo Eumerus (talk) 16:07, 29 March 2021 (UTC)


 * Hi It is programmed to look for shadows since GreenC bot has so many different tasks each has its own name. I just added "GreenC bot" in addition, since that is a valid name to block also - all tasks by the bot. If you still see it editing let me know thanks. --  Green  C

Did GreenC bot join the oversight committee?
There sure are a lot of anonymous users on the List of Wikipedians by article count/1–1000 now! No rush, but I figured I'd point it out. Thanks and take care. – Novem Linguae (talk) 12:16, 16 April 2021 (UTC)

Task 9
You can probably stop task 9 now. No-one is likely to use the old magic word after all this time. MichaelMaggs (talk) 11:14, 29 April 2021 (UTC)
 * , Done. It's still on Toolforge if could be adapted to a similar purpose in the future. --  Green  C  16:33, 29 April 2021 (UTC)

A kitten for you!
Thanks for making all these archive links for us <3

~the.one.and.the.only~ (talk) 02:41, 8 May 2021 (UTC) 

Obviously wrong archive link addition
Hi, with this edit on it.Wikipedia, your bot added a link to a book on archive.org which was completely unrelated with the book the source was citing. Please, could you explain which is the criterion you use to decide whether a link should be added or not? Could you also help me and the it.Wikipedia community seek and correct other false addition like that one?--Ferdi2005 (talk) 14:30, 21 May 2021 (UTC)

Stargely formated date by bot
In the change at https://en.wikipedia.org/w/index.php?title=My_Little_Pony:_The_Movie_(2017_film)&curid=51068079&diff=1028920093&oldid=1026037502, the bot used "date=1.506825145863e+15 October 1, 2017" as the date of a link that was changed to the internet wayback machine. This number seems to by the number of microseconds since Janury first, 1970 (Unix epoch), which is likely used by the bot for internal calculations, but it should not give out that number in the final edit. You might want to look into this if it happend in more occasions and if it can be fixed. Gial Ackbar (talk) 20:25, 16 June 2021 (UTC)
 * OH, big trouble. It happened in 655 citations in 118 articles. Debugging code wasn't removed during production run. Well, now to fix them. Thank you for the report. --  Green  C  20:57, 16 June 2021 (UTC)
 * All fixed. -- Green  C  23:54, 16 June 2021 (UTC)

bot forgot to add |archive-date=
In, bot forgot to add archive-date. Really?

—Trappist the monk (talk) 12:05, 26 June 2021 (UTC)
 * The archive.today API returned unexpected results the bot was not prepared for. Fixed.  --  Green  C  00:48, 27 June 2021 (UTC)
 * Apparently still broken; see
 * —Trappist the monk (talk) 13:11, 30 June 2021 (UTC)
 * Same issue with the fragment not returned by the API confusing the bot, which has been fixed. But unable to duplicate. Suspect the article was processed before the fix. There can be days between processing and upload (on edit conflict it reprocesses in a new batch). The NASA batch took a long time. WaybackMedic is ironically programmed to add missing archive-date in case you think it makes sense to run it on a tracking category backlog. --  Green  C  15:56, 30 June 2021 (UTC)

also missing archive-date but not archive.today and no fragment.

—Trappist the monk (talk) 17:35, 5 July 2021 (UTC)
 * Two bugs fixed that were previously hidden by error correction but surfaced here due to combined cite complications (and your report!). Thanks. -- Green  C  19:06, 5 July 2021 (UTC)

Crime in Washington, D.C.
Hello GreenC bot, doesn't seem to be rescued looking at the resulte. Can you fix it? Thank you for your time. Lotje (talk) 14:01, 27 June 2021 (UTC)


 * Fixed in article and code, thank you for the report. -- Green  C  18:04, 27 June 2021 (UTC)

bot used shortened archive url when creating |url=
See where the bot created:   and omitted archive-date.

It's good that the bot is removing unnecessary archive-date parameters.

—Trappist the monk (talk) 14:30, 28 June 2021 (UTC)


 * Fixed. Four problems: url has a /image - url should be archive.today - url should be in the archive-url field - url should be long format .. the /image tripped it up when combo with the others. A lot of unnecessary archive-date parameters, some predating the existence of the web. --  Green  C  20:44, 28 June 2021 (UTC)

I don't understand what the Bot is trying to accomplish here
I don't understand what the Bot is trying to accomplish with this edit. The link is not dead. Hawkeye7  (discuss)  00:34, 30 June 2021 (UTC)

And I don't understand why the washingtonpost.com is consistently timing out with a header check, and only for this URL: Starting headers (1) for https://www.washingtonpost.com/archive/politics/1979/09/04/nasa-weighs-deferring-1982-mission-to-jupiter/bfe8bb4a-20fe-41f5-af14-d6b1c003b470/ Ending headers (0 / -1) Headers time out It appears to be an issue with the agent string (!). This works: This does not: The difference being the word "bot". But it's not only "bot", other words can cause it to fail. Normally, an agent string exists for human consumption and would not interfere with anything. At a loss. Anyway, this is not happening to all washingtonpost.com URLs only this one that I know of. Look under the surface, the Internet is pretty weird. I'm going to change the agent string to something less likely to trigger aggressive word filters. Zulu might work:, conveying enough contact information. -- Green  C  03:33, 30 June 2021 (UTC)

probably GIGO but ...
, bot removed archive-date and url-status from this:

to make this:

The value in archive-url is completely bogus, but even so, should the bot have removed archive-date? Generally ok to remove dead as redundant.

History: —Trappist the monk (talk) 16:54, 5 July 2021 (UTC)
 * Editor Dwanyewest added the original http://www.webcitation.org/3IK with – presumably a copy/paste from pl:Ateizm w Polsce where that parameter was added by an ip editor with this edit.
 * IAbot changed it to http://www.webcitation.org/3IK?url=http://www.pewinternet.org/css/layoutstyles.css with – of course in the best of all possible worlds, that edit would have been inspected by an interested editor and the bogus archive url replaced with something meaningful ...


 * Added a check for 3IK as the WebCite ID (Unix time 0 in base62) and will delete the three fields when found. It could try to fix it with some difficulty but it is such a rare one-off bug nowhere else that I can find this should be enough for now. --  Green  C  20:07, 5 July 2021 (UTC)

highbeam
At bot added archive-date and dead (not really necessary because cs1|2 presumes   when archive-url has a value) for one Highbeam citation yet did not do the same for the other.

But, because Highbeam is dead long since, and because archives of Highbeam pages show only the first paragraph or so and then prompt the reader to subscribe or login (to no benefit to the reader), perhaps the archive-url, archive-date, and url-status parameters should be removed and  added.

—Trappist the monk (talk) 13:44, 6 July 2021 (UTC)
 * Looks like it might be the same bug above with combined cites, it was processed before the fix was made. HighBeam we have so many I can't delete without consensus. There is an argument they have some value, short of replacing with something better. Sometimes the cited fact is in the snippit which can't be known without a manual check; it might have useful metadata; helps verify the source exist(ed). -- Green  C  15:15, 6 July 2021 (UTC)

Minor vandalism by GreenC bot
In, GreenC bot did some useful expansions of archive URLs (I checked a few, which seem OK), but also removed perfectly valid content from example dummy citations safely enclosed in. This shows my revert of the content that was, it seems to me, invalidly removed. GreenC bot seems to have done this on other articles too. Boud (talk) 00:01, 14 July 2021 (UTC)
 * I'd never heard of before and in 6+ years of work with over a million edits no one has ever brought it to my attention until now. It will need to a-void the void. BTW accusing someone of vandalism is not cool, WP:VANDALISM has a specific meaning, the bot does not do vandalism, which is intentionally causing damage, vandalism is not why I am here. See WP:AGF before assuming otherwise unless there is reason not to AGF. --  Green  C  01:03, 14 July 2021 (UTC)

Weight? (Circumcision controversies)
The organization you removed is the only one of its kind in Germany, intaktiv e.V. regularly organizes protests, gives interviews on the topic and some public figures are advocates for this organization. I would say this organization has the same weight as the rest of those anti-circumcision organizations. And some diversity would do the table good. So far there are only organizations from English speaking countries. So it would be nice if the entry would be restored. I can see that it’s a bot. And still it has removed this organisation on the ground of “Weight”. Which is strange if you look at the other groups in there they are all small and hardly know outside of the the circles of people how care about that topic.

Like I said before, you got the wrong user. Look at the history tab. HERE. Notice who reverted you: User:Alexbrn. They included my name in the edit summary to mean "reverting to the last version by GreenC bot" but neglected to say those words precisely causing some confusion. You can see the transaction of the edits, GreenC bot has nothing to do with it. -- Green  C  00:40, 19 July 2021 (UTC)

Oh, sorry your right.
 * no problem easy to get confused by Wikipedia at first, good luck. -- Green  C  21:34, 21 July 2021 (UTC)

archive.is > archive.today ...???
Hello:

In a 2021-07-21T01:37:18 GreenC bot edit to the Wikipedia article on Daniel Ellsberg, "archive.is" was replaced by "archive.today". I checked and found the following:


 * The "archive.is" link still seemed to give valid content.
 * "archive.today" was auto-forwarded to "archive.ph" with content that looked to be equivalent to "archive.is".
 * Neither the archive.is nor archive.today (or archive.ph) link supported the claims in the Ellsberg article. I found the original link in archive.org with content that seemed to support the claims in the article.

I do not know what if anything you think it might be appropriate to do about this. Thanks for your support of Wikipedia. DavidMCEddy (talk) 06:23, 21 July 2021 (UTC)


 * Archive.today is a front-end re-router to one of the other domains currently active. It's the front door to access one of the 7 other domains. It will work using .is or .ph but maybe not in the future. We had a problem in 2019 where it stopped working for one of the 7 domains (for about a month). The owner of archive.today requested we use it so they are flexible on domain availability. The content is the same regardless of which domain.  --  Green  C  13:51, 21 July 2021 (UTC)

July 2021
Hello, I'm Picard's Facepalm. Your recent edit(s) to the page Miami Vice appear to have added incorrect information, so they have been reverted for now. If you believe the information was correct, please cite a reliable source or discuss your change on the article's talk page. If you would like to experiment, please use your sandbox. If you think I made a mistake, or if you have any questions, you can leave me a message on my talk page. Thank you. Picard&#39;s Facepalm (talk) 16:42, 21 July 2021 (UTC)


 * You are incorrect. The URL is dead. --  Green  C  19:35, 21 July 2021 (UTC)
 * Then why am I looking at the web page right now, again, and from a totally different computer? I can send you a screenshot it you like.  Perhaps the site is blocking your IP because your bot keeps banging against it? Picard&#39;s Facepalm (talk) 00:17, 22 July 2021 (UTC)
 * The bot is correct, the link is dead and time.com is returning 404 for this url (look at the html source where time.com uses 404 in several places):
 * http://www.time.com/time/magazine/article/0,9171,959822,00.html
 * The archive snapshot url isn't a whole lot better:
 * https://web.archive.org/web/20130822235037/http://www.time.com/time/magazine/article/0,9171,959822,00.html
 * The archive snapshot url was added to the article with using an automated process.  A better archive snapshot is:
 * https://web.archive.org/web/20071211221401/http://www.time.com/time/magazine/article/0,9171,959822,00.html
 * —Trappist the monk (talk) 00:31, 22 July 2021 (UTC)
 * https://web.archive.org/web/20130822235037/http://www.time.com/time/magazine/article/0,9171,959822,00.html is working just fine for me. Not sure what to tell you guys. Picard&#39;s Facepalm (talk) 00:39, 22 July 2021 (UTC)
 * url is tied to url-status .. your looking at archive-url which is always live, that's why we have them. --  Green  C  01:09, 22 July 2021 (UTC)
 * Then you don't understand how live works (see template documentation). Compare these:
 * live:
 * dead:
 * And, that archive snapshot that you say is working just fine, while it does 'work', doesn't work well because it is just a teaser requiring login to read the rest of that article. That is why I suggested the better archive snapshot url ...  Also, these templates should be rewritten as  because Time is not a scholarly or academic journal.
 * —Trappist the monk (talk) 01:17, 22 July 2021 (UTC)
 * And, that archive snapshot that you say is working just fine, while it does 'work', doesn't work well because it is just a teaser requiring login to read the rest of that article. That is why I suggested the better archive snapshot url ...  Also, these templates should be rewritten as  because Time is not a scholarly or academic journal.
 * —Trappist the monk (talk) 01:17, 22 July 2021 (UTC)
 * —Trappist the monk (talk) 01:17, 22 July 2021 (UTC)

was this a good fix?
Was a good fix? cs1|2 expects 14-digit timestamps so converting this:

to this:

creates broken cs1|2 templates. Because the archive url does not have a 14-digit timestamp, cs1|2 suppresses the archive-url link so that title is linked with url, the presumably dead url. In preview, cs1|2 creates a  timestamp so that archive.org will show the calendar display for that year. That no longer works and archive.org just returns a "We're sorry — something's gone wrong" message. Apparently, archive.org no longer recognizes the wildcard character unless the timestamp is zero-filled to 14 digits. I'll fix that in the cs1|2 module.

I suppose, to answer my own question, that was as good a fix as should be expected (no need to include dead because that is the default when archive-url has a value). I don't think that automated tools should be choosing which of (possibly) many archive snapshots to use in archive-url so showing the error messages may attract interested editors to make the necessary repairs... or not.

—Trappist the monk (talk) 14:07, 5 August 2021 (UTC)
 * In this particular case, normally Medic would have filled in the snapshot date when going to  which is 20170728023443. With the 8-digit snapshot it is a working URL that redirects to 20170728023443 and the bot confirms that by filling it in, it's not deciding anything. However, it didn't work this time and I know why - sort of intentional but also unintentional.


 * Regarding choosing snapshots, unfortunately it doesn't work to rely solely on the community. For example at dewiki they rejected use of IABot for years, and now people are upset because of the number of dead unmaintained links. On enwiki, in 2015 before IABot existed, there were around 4 million archive links added in the entire 15 year history of Wikipedia (much of that by older bots). Within two years IABot had added over 8 million more. True it would be good if people did it, the best solution, but people don't, evidently, at the scale required. It's hard, repetitive, boring, etc.. and endless, thousands of URLs are dying every day and new ones being added. There is strong community demand for a solution that is not 100% manual.


 * I would love to hear if you have any big picture ideas, you have good experience, bots adding archive URLs doesn't need to be the (only) solution, it certainly has problems and I am keeping a list of them. Relying entirely on manual doesn't work well and the community wants something more. What other solutions might there be? Serious question, there must be other ways that are practical to implement (not require major changes to Mediawiki). I have some ideas, and I'm sure others do as well. -- Green  C  15:29, 5 August 2021 (UTC)

Gambot tasks
Recent changes to some of the Good Article list pages have broken one of the bot tasks. A bit more information at Wikipedia talk:Good articles. I am also unclear why it added Arizona State Route 88. CMD (talk) 09:10, 8 August 2021 (UTC)
 * I actually found a bug in the code, the same variable name used twice for different purposes, why the program ever worked I don't understand. Something probably changed on the incoming data that exposed the bug. It might explain the Arizona 88 also. -- Green  C  14:25, 8 August 2021 (UTC)
 * The magic of code. The Arizona 88 edit also suggested to me the code still produces the desired results in the bugged section, but just reads all the headers as well. Nonetheless, if you have the time to see if the code could be fixed that would be appreciated. CMD (talk) 15:32, 8 August 2021 (UTC)
 * It was a simple fix once identified. If you see any more problems let me know. -- Green  C  15:47, 8 August 2021 (UTC)

Bot is labeling dead links that are NOT dead
Your bot just labelled two links "dead" that are both still good links. I just clicked on both of them. The article is Space Pioneer, and this is the diff. Cheers. N2e (talk) 19:35, 9 August 2021 (UTC)


 * thanks for the report. I think the trouble has been identified and fixed. It's more complex than not detecting the URL status correctly, having to do with soft404s and the way this particular run was configured. There are some others in spacenews.com possibly as many as 300 with false dead links. I'll post here when fixed. --  Green  C  01:01, 10 August 2021 (UTC)


 * Super. So am I hearing you that you will send the bot back to clean it up?  Or should I revert on that article?  Sounds like maybe you are "on it" and will end up fixing those two and many more from SpaceNews (actually, a source I regularly use).  Cheers.  N2e (talk) 01:28, 10 August 2021 (UTC)


 * It was actually only 17 links, I miscalculated with 300. The dead link tags are now removed for the 17. If you see any more problems let me know. -- Green  C  01:43, 10 August 2021 (UTC)


 * Cool. I'm here to say I wandered back to that article today and found GreenC_bot has nicely dropped by and fixed the problem it had created.  diff  Thanks! N2e (talk) 11:48, 17 August 2021 (UTC)

This still seems to be happening – and. SpinningSpark 16:05, 1 June 2022 (UTC)


 * I'm working on the dtic.mil domain and it's complicated. Looks like out of about 5,000 links, 375 are not dead (containing  in the URL). The bot assumed they were all dead.    --  Green  C  16:57, 1 June 2022 (UTC)

Invisible character error
This edit seem to introduce invisible character error for cite news: "replacement character in |url= at position 246 (help); replacement character in |archive-url= at position 288 (help)". I have reverted it. Regards.—Bagumba (talk) 07:07, 11 August 2021 (UTC)
 * Thanks, for the error report. Another edge case bug fixed in my urldecode function. -- Green  C  15:28, 11 August 2021 (UTC)

Bot removing Wayback "id_" identity flag
Hi. The bot appears to be removing the Wayback "id_" identity flag, which is intended for it "perform no alterations of the original resource, return it as it was archived." Removing this is harmless in many cases, but in many others it is not. I have purposely linked to these id_ archive links in the past because of text becoming unreadable with the normal archive link. Here are two examples where the cited text otherwise malfunctions for the reader unless "id_" is used: this vs this, and this vs this. In the latter, the text becomes otherwise truncated. Thank you. Οἶδα (talk) 10:42, 19 August 2021 (UTC)
 * Thank you for bringing this up with the examples. I have not wanted to do this because removing the nav box sort of locks in the archive page so users can't easily navigate around in case the page doesn't verify or changes (WaybackMachine is not static, snapshots change and move). Still, you have shown there are sometimes good reasons for the flags. The bot should yield right of way to whatever flags users want. The code is done to preserve the flags (more complicated then it might seem) has not been tested at scale yet, next batch job will tell. BTW recent docs suggest if_ instead of id_ -- Green  C  04:19, 20 August 2021 (UTC)

Reformatting PDFs cited to archive.org
I undid an edit of yours that reformatted a PDF link on archive.org. I don't think it was a useful and pointless. I've seen this on a number of articles. What it does is requires the reader to click two times to open the PDF. Please do stop doing this.--Dr Silverstein (talk) 03:11, 10 October 2021 (UTC)
 * like this. Those are machine-specific URLs that will break in the future  (when the machine moves or is taken offline) they are ephemeral and should not be linked on Wikipedia. There might be a way to make a permanent link to a PDF but I'm not sure how, and, I'm not sure it's a good idea as the book reader is better for the general reader. The PDF link is still there if they want it. I know you personally prefer PDF but think of other people around the world on dialup modems, slow connections, costly cellular, etc.. From the main page eaders can individually choose what format they want, assuming they even want to download it at all. The main page also has metadata that is not visible when going direct to PDF. --  Green  C  03:33, 10 October 2021 (UTC)
 * This is not about my personal preference. PDF is much more clear and enlarged and does not require any enlarging, page turning or any such thing. Fortunately most people do not use dial-up modems, slow connection, costly cellular and what not. The majority of the world uses Wi-Fi and lives in the year 2021. By removing PDFs to accommodate users of outdated devices, you are disrupting the ability for the vast majority of readers to read clearly. The purpose of these PDFs is not to provide a search option on WayBack but to provide the actual reading content. What do you do to PDFs that are on other websites? I hope you get my concern. As you probably know, I am not a very active user, but as a reader, I cannot access the content of any PDF link to WayBack Machine and finding the PDF option is not as easy as you think. If you look on the right side options, it's unknown to the average reader that the PDF format is available. I see this reformatting a problem on a number of articles. The book reader is too small for the average reader. It requires a lot of clicking, while the PDF is automatically fits the screen size of the device of a reader. This PDF link is permanent as it's not hosted on sites like Research Gate. I suggest you leave it as it is. If it was meant to be uploaded as another format, it would have been done so. The book format is outdated and requires multiple navigation as opposed to the direct PDF. Please do not reformat it anymore. If you are unsure, then leave it until you are certain it will stay permanent, which it will because it's not a Research Gate link.--Dr Silverstein (talk) 08:27, 10 October 2021 (UTC)
 * FYI I have reformatted another PDF back to it's proper format. Please do not reformat them. The link to WayBack is not temporary as it's not a Research Gate paper. There's countless PDFs linked on Wikipedia and are viewed as PDFs, not complicated menus. Please do not make anymore of such changes in the future. It is not helpful at all, unfortunately.--Dr Silverstein (talk) 01:09, 13 October 2021 (UTC)

Adding url-status=dead to citations with |archive-url=... present
Hello. I was wondering why the bot's going round adding dead to citations with value present, as its only edit to a page. E.g. here or here. I thought dead was the default in that case? The edit summary "Move 2 urls. Wayback Medic 2.5" doesn't explain. cheers, Struway2 (talk) 08:48, 20 October 2021 (UTC)
 * Fixed going forward. A few more came through today in the queue. -- Green  C  17:14, 20 October 2021 (UTC)

Adding Usurped and pipes in URL titles
Hi! In this edit, the bot added Usurped around links that have a pipe character in the link text. This breaks the template syntax. Could the bot replace the pipe character with the  magic word? --rchard2scout (talk) 15:04, 8 November 2021 (UTC)
 * Done. Do you know if there any other characters that require escaping in a square-link title? -- Green  C  15:28, 8 November 2021 (UTC)
 * I think basically anything that needs to be escaped when turned into a template argument, so I'd guess the equals sign as well? rchard2scout (talk) 20:06, 8 November 2021 (UTC)
 * Let's try:


 * looks like 1,2,7 (pipe, equals, right-square-bracket). I guess equal would be  - right-bracket would never appear since that would be impossible for the bot to parse. --  Green  C  20:28, 8 November 2021 (UTC)

Altering intro text
Hi GreenC! I tried altering the intro text at List of Wikipedians by article count/1–1000‎, but it appears it's baked into the bot so it reverted me on the next update. Could you make that text configurable? &#123;{u&#124; Sdkb  }&#125;  talk 18:19, 15 November 2021 (UTC)


 * Courtesy pinging GreenC. &#123;{u&#124; Sdkb  }&#125;  talk 23:30, 30 November 2021 (UTC)
 * No because this tool works across multiple language sites and it's not so easy as making a template due to grammar and numerical formatting issues. And also, I like the footnote, sorry you do not, it's harmless anyway. This is a third party tool, it's not an official tool or page, anyone can make their own tool or list. --  Green  C  06:10, 1 December 2021 (UTC)

bot is breaking citations
See. Bot added 2007-06-15 (with the hyphen) when the citation already has 1 February 2017 (without the hyphen).

—Trappist the monk (talk) 01:09, 21 December 2021 (UTC)
 * Also added two templates.  One is sufficient, right?
 * —Trappist the monk (talk) 01:11, 21 December 2021 (UTC)
 * And what is the real archive date for that citation anyway. The url seems to suggest 2017-01-31 which is different from the dates in the archive-date and archivedate parameters.  Only one can be correct, so which one is it?
 * —Trappist the monk (talk) 01:16, 21 December 2021 (UTC)
 * Ah yeah there is a Pandora link in the page field, saw it during testing thought it was fixed guess not, fubar. The archive URL in this case is the https://webarchive.nla.gov.au no idea how it came up with 2007-06-15.  --  Green  C  01:33, 21 December 2021 (UTC)
 * All pages fixed. The solution is remove webarchive (or pandora) .nla.gov.au URLs in a page field when it is the same as the URL in archive-url. Because web archives do not open a PDF to a page number, they drop the fragment. It's redundant. --  Green  C  03:27, 21 December 2021 (UTC)
 * All pages fixed. The solution is remove webarchive (or pandora) .nla.gov.au URLs in a page field when it is the same as the URL in archive-url. Because web archives do not open a PDF to a page number, they drop the fragment. It's redundant. --  Green  C  03:27, 21 December 2021 (UTC)

Sentongo haruna, list of ugandan by net worth
Please can you help me my aticle was blocled amd deleted i need some one to help me 41.210.145.202 (talk) 14:22, 20 January 2022 (UTC)

Date problem
Hello, there is a problem with the archive-date field in this edit. Keith D (talk) 21:31, 20 January 2022 (UTC)


 * Fixed, logs show it was singular. Can't say what caused it yet. -- Green  C  22:26, 20 January 2022 (UTC)


 * Thanks for the fix. Keith D (talk) 22:44, 20 January 2022 (UTC)

archive.org
This just in from the Channel 37 newsroom: You are taking valid links to archive.org and disabling them bu misusing usurped. Case in point and again here. There may be others. There used to be a Clarke Ingram site on uhftelevision.com / dumonthistory.com with a fair amount of good information on a long list of individual stations (and one entire network) which failed in the early days of television (1950's and early 1960's) because TV manufacturers weren't required to include UHF tuners in new TV sets until 1964, leaving only room for two or three main networks over-the-air. Many of the individual station history articles here rely on sources like this as there's relatively little online from that distant era. Sadly, the Ingram domains were allowed to expire and at least one has been cybersquatted with hardcore pornography, leaving the archive.org versions the only readily-available copy of the material. And no, the usual strategy of linking to both the original URL and the archive link present makes no sense (and is actively harmful) because it's linking to a domain which is not under control of the original site and that domain is being abused. We should never be linking to cybersquatted or expired domains, as it only encourages abusive registrations intended to take traffic meant for the cited site and redirect it elsewhere - ads, spam, porn, the occasional attempt to "ticket-scalp" domain registrations by speculatively tying up hundreds or thousands of domains, putting each up for sale for four-figures or worse. This garbage is the scourge of the Internet; let a domain expire and it's gone not in sixty seconds but sixty milliseconds from when it becomes available to new registrants.

It would be better if you leave links to legitimately-archived content alone and remove links to cybersquatted or hijacked domains, instead of the inverse. This sort of edit is not helping the project. Link rot is a problem and archive.org a useful tool in damage control. 66.102.87.40 (talk) 18:26, 7 February 2022 (UTC)
 * does not disable links. I think you misunderstand what this template is for and how it works. -- Green  C  18:39, 7 February 2022 (UTC)

For example. Given this citation:

If seen by Citation bot or reFill or other tools, they will automatically convert to:

This is a problem since the domain is usurped. So we need to a way to communicate when a bare or square archived URL is usurped. Thus is to flag other bots and tools (and people) that the underlying source URL is usurped. --  Green  C  18:54, 7 February 2022 (UTC)


 * The edits appear with summaries like (Remove 3 citations per WP:USURPSOURCE. Wayback Medic 2.5); look up WP:USURPSOURCE and that page isn't about link rot. It's about scraper sites, which steal content from other websites, mangle it to slip past the search engine duplicate content penalties and then repost it without attribution. Entirely different animal. If we're dealing with a scraper site vs. a live originating site, we want the original. If we're dealing with archive.org vs. a cybersquatted domain we don't want a clickable link to the cybersquatters. Maybe usurped is legit for dealing with cybersquatting, but WP:USURPSOURCE is the wrong documentation as it applies to a different issue. 66.102.87.40 (talk) 18:55, 7 February 2022 (UTC)
 * Ah sorry that edit summary is totally wrong. I made a mistake and didn't see it until it was almost done (only 34 edits). These edits have nothing to do with [WP:USURPSOURCE]]. I apologize for the confusion caused by that. --  Green  C  18:58, 7 February 2022 (UTC)
 * I think there are four different WP:USURP policies, guidelines or procedures. The one you're looking for seems to be Link rot/Usurpations aka WP:USURPURL. Labelling archive.org as a scraper site isn't what you want. 66.102.87.40 (talk) 19:00, 7 February 2022 (UTC)
 * I am the author of WP:USURPURL and that is exactly the procedure that was followed (except the erroneous edit summary). --   Green  C  19:33, 7 February 2022 (UTC)

Edits showing template code
For example in Special:Diff/1067557821, where the bot tries to wrap a URL containing an equals sign in usurped. The equals sign gets interpreted as a template parameter, resulting in the literal text [usurped!] showing up. Consider using  instead. * Pppery * it has begun... 22:23, 10 February 2022 (UTC)


 * Oh yeah not good. The problem exists on 426 pages, you are the first to notice and report it over many months. -- Green  C  22:43, 10 February 2022 (UTC)
 * Script running now, example. Bot code updated. Template docs updated. --  Green  C  23:02, 10 February 2022 (UTC)

Bot added 2 reftalk templates
Hey GreenC! I noticed that on Talk:Rocket League, your bot added two reftalk templates. I'm guessing it did this because it saw there were 2 sections (1 was a sub-section) and refs and so it assumed the refs were in 2 different sections. Maybe it should look to see if the refs are part of the same section (or a subsection) and only add 1 reftalk template if the refs are all in the same section? ― Blaze WolfTalkBlaze Wolf#6545 01:29, 11 March 2022 (UTC)


 * Thanks for the fix and notification. Based on the edit summary, the bot thought one was for the 2-level comment one for the 3rd-level comment. Given the location of the refs, free-floating outside any text block, either could be right - there is no indication which section the refs belong to. This is a GIGO situation. I don't know how to fix it, but, it does appear to be pretty rare as I have never seen it before. -- Green  C  03:37, 11 March 2022 (UTC)
 * Alright sounds good. I"ve fixed your fix since the section is meant to be part of the edit request. ― Blaze WolfTalkBlaze Wolf#6545 05:30, 11 March 2022 (UTC)

Bot adding non-working archive link with incorrect archive date
... at Julie Higgins. The link it added goes to a 404 page (presumably a soft 404) from 2014. Graham 87 07:03, 8 May 2022 (UTC)


 * Thanks. NLA has a new URL scheme it should have done this not sure why it didn't will investigate. -- Green  C  10:14, 8 May 2022 (UTC)

Bot claiming links are dead when they are not
This morning my watchlist was flooded with GreenC bot edits like this claiming that a link was dead. Except that it isn't. Something has gone wrong. Hawkeye7  (discuss)  19:53, 1 June 2022 (UTC)


 * Yeah I'm working on dtic.mil and the site fooled me into thinking entire subdomains are dead but it's actually only some links. Will go back over them and return what is working to live. -- Green  C  21:00, 1 June 2022 (UTC)

https://apps.dtic.mil/sti/pdfs/ADA546200.pdf is another example of this problem. --Ancheta Wis   (talk  &#124; contribs) 10:43, 2 June 2022 (UTC)


 * It will be posted soon. From the bot log last night:
 * syslog:United States Army Futures Commandhttps://apps.dtic.mil/docs/citations/ADA546200 MAKELIVE  remove  from squarelink
 * -- Green  C  14:14, 2 June 2022 (UTC)

Thank you
Thank you for the good work this bot is doing! Combating link rot is key to the long-term viability of Wikipedia and the preservation of knowledge; I thank you for your efforts in this direction. Al83tito (talk) 16:22, 15 June 2022 (UTC)


 * Thanks you are welcome. Agree keeping up with archiving is vital for Wikipedia to work. --  Green  C  17:16, 15 June 2022 (UTC)

Great work on archive links
Hi there, thanks for the great work the BOT (and you!) are doing on fixing up many archive links in articles I have done a lot of work on. I see that sometimes you use 'download' when it is png or jpeg and 'detail' if is a pdf. Now, I can also see what I was doing to cause issues. If I uploaded a pdf to internet archive and my page arrived, I was downloading the PDF and using that as the link - when it should have been the window with 'detail'. I see you have made these changes and while they are not easy to navigate on a small device, I have my head around the window now. All good. So, in future I always load the 'detail' link right? What about Jpeg or png uploads? Can I download them and use the link? I am sure you have many articles on the go at the moment, so this one you did is a good example of what I am talking about: Parrs Park. You seem to move from 'detail' to 'download'? Thanks again for getting me on track with this process and I like not having so many pdfs floating around in my references! I hope this makes sense?Realitylink (talk) 21:23, 21 June 2022 (UTC)


 * Thanks! Archive.org has some complications, but is also a powerful system that is open. Generally it's designed so the /details/ is the default landing page where users can then branch out to other options: accessing individual files like the .pdf, reading via the in-page flipbook reader, access metadata. For media files like jpg or mp3, it's often better to use /download/ because that is a direct link to the file it may be part of a larger package, it works better usually to open those directly as they are not textual in nature. Unless the /details/ page is the same content in which case it might be better to use /details/. The other thing is not to use machine-specific links like  .. these are temporary addresses that will change they have a limited lifespan - typically a few years. They are not designed as permanent links. Finally the url field should not contain a web archive link eg. archive.today or web.archive.org  .. only in the archive-url field. Hope that helps. --  Green  C  04:27, 22 June 2022 (UTC)
 * Yes that is a terrific help. Your Bot is doing a lot of work fixing up my links...and not using the machine-specific link (something I have done a lot  previously - but never again promise!!) is a useful piece of information.  I might get back to you if further questions arise.  Appreciate your response.Realitylink (talk) 05:30, 22 June 2022 (UTC)
 * Thanks again. A couple more questions...you say it is better to use the download for media files, but that still relates back to the machine-specific link...is that a problem?  Will the link still have a limited life?  And...the url= field not containing a web archive link...what if a downloaded url is the only link there is?  Can we use it as the url?  We can't use it as an archive link, because I am sure that needs url to be archived against....or is it ok to use links from Internet Archive as urls?  I see the bot has done that a few times... for  example in the page for Alex Hassilev.  It looks good and is accessible...but is still an archived url being used as a url.  Or am I overthinking this??  cheers Greg  Realitylink (talk) 00:18, 23 June 2022 (UTC)
 * archive.org has a lot of services. One is web archiving ie. Wayback Machine at http://web.archive.org Archive.org also has a service where it scans books and media, exactly like Google Books. It's at https://archive.org/details ..  One is sort of like a library of digital holdings. The other is a collection of website scrapes.  So the digital book scans are primary urls and reside in the url field since that is the source URL, it's not archiving a different website somewhere, it's the original destination link. --  Green  C  00:53, 23 June 2022 (UTC)
 * That's most interesting and makes a lot of sense. So I should continue displaying snapshots of news clippings etc as a url? And judging from an earlier statement you made, 'detail' is probably the best way to go? Are there any advantages in using the 'download' function?  Does either compromise the long term safety of the link, or once it is Wayback, is it secure?  I am thinking of going back over the Hassilev article (and some of the others the bot has fixed)  and converting all of those 'download' links to 'detail'.  That won't pose any problems I am thinking?  Realitylink (talk) 02:27, 23 June 2022 (UTC)
 * Yes newsclips in url. They are not Wayback Machine links just normal book links (or "text collection" is the precise name). Anything on archive.org is secure except when it gets taken down :) Like due to a copyright holder request. A good way to deal with that is use archive.today to save the details or download link, then place the archive.today link in the archive-url field. You could also use ghostarchive.org for same purpose as secondary archive. That would be three places which is pretty secure. Could also save a copy on your local disk.  --  Green  C  03:00, 23 June 2022 (UTC)
 * Ah, ok so you can have the 'details' saved as a URL and an archive today link as an archive-url? I did do that for some, but an editor changed it...so for those news clippings if I use the 'detail link' and then an archive today link won't that look like its been archived twice?  Would that matter?  Hey, appreciate you patience and knowledge. Realitylink (talk) 03:12, 23 June 2022 (UTC)
 * No because the details link is not a web archive link. It's not the Wayback Machine, it's a different service. -- Green  C  03:25, 23 June 2022 (UTC)
 * I can see why myself and others get confused...but hey...all good. So they look the same because archive.today just captures the image, and hopefully will retain the link.Realitylink (talk) 03:36, 23 June 2022 (UTC)

Sorry, just one more point to clarify: So I upload a pdf of a scientific paper to the internet archive, and when it comes through, I use the 'detail' link, either as a url or an archive url. Now, is the safety of this detail link related to the machine-specific links like  which if I have got it right, will just give somebody a chance to download and read the paper, but it won't be permanent. Correct? I am thinking that each person who downloads it from the detail page would get a different machine-specific link? And I keep using the 'detail' link. I appreciate your patience and will leave you in peace - hopefully not pieces! - after this. Kind regardsRealitylink (talk) 03:21, 25 June 2022 (UTC)


 * If it does not contain "web.archive.org" it's not an archive URL, and should not go in the archive-url. Only web archive URLs go in the archive-url, and only if they contain web.archive.org are they considered web archive URLs. To illiustrate:
 * https://web.archive.org = web archive URL ---> archive-url
 * https://archive.org = non-web archive URL ---> url
 * Quiz: given the URL https://archive.org/details/wonderfulwizardo00baumiala .. would go in the url or archive-url? Or given the URL  https://web.archive.org/web/20220101000646/http://example.com/ would it go in the url or archive-url?
 * Not sure I understand question about machine-specific links but the rule there is not use them as they expire. Everyone has a chance to download and read the paper from the details link. On the right side of the page it says "Download Options" with links to PDF, Epub, etc.. or they can read via the flip book.
 * -- Green  C  05:54, 25 June 2022 (UTC)
 * So links from archive.today, one I just used looked like this: https://archive.ph/XUik2 can't be used as an archive link?  I have been using them as archive links...but they need to go now?  It gets more confusing indeed...hang on, I just looked about and here is something you said:  "A good way to deal with that is use archive.today to save the details or download link, then place the archive.today link in the |archive-url= field."  It seems I can use archive.today links as archive-links!  I hope so, 'cos I have hundreds of them in place...and I just checked Ghostarchive doesn't have web.archive.org thing either....so can it be used as an archive-link?Realitylink (talk) 07:45, 25 June 2022 (UTC)
 * Yes archive.today, ghost and other web archive's are also archive links.. I was just focusing on the archive.org domain since that is what we were discussing which is a source of confusion. -- Green  C  15:28, 25 June 2022 (UTC)

Thanks, I think we are getting there. I used the Ghost one yesterday and it is pretty quick. So I noticed that an editor changed some links for archive today by removing the 'ph/wip' and adding 'today', the message was: (clean up archive.today work-in-progress links, replaced: https://archive.ph/wip → https://archive.today)  It didn't change how it looked, but does having 'today' there mean the link is more archive-url friendly? And is there a way to default to it in archive today ?

Another interesting thing this editor did was (replaced 12 archive.today URL(s) with more transparent URL from <link rel="bookmark") and they changed this: [archive-url=https://archive.ph/mkqjZ to this: archive-url=http://archive.today/20220508021803/https://ia902502.us.archive.org/23/items/tallahassee-democrat/Tallahassee%20Democrat.jpg]. So is this the long format, the one that seems to be recommended. Help talk:Using archive.today (See the Long format link issue section) Should we use short or long format URLs? If so, I am not sure how to activate it, short of doing the full manual thing and even that is  unclear. I can carry on manually changing the ph/wip to today...but would be interested in what has been done with this long version. Realitylink (talk) 22:02, 25 June 2022 (UTC)


 * That was BrownHairedGirl. Yes please use archive.today as this is a special gateway server they want us to use on Wikipedia - it redirects to one of the servers where the content is hosted (such as .ph) this way if one of the content servers goes offline he can redirect to a different one quickly by making a change in the archive.today server. The /wip/ is not a correct URL it's a temporary until the page is fully saved. You have to wait until the page is saved and it will give the right URL. There is no easy way to get the long form. What you have to do: right-click on the page, select "View source", cntrl-f and search on "long". Then hit the right arrow a few times till you see the long form URL. Copy and paste. It will have a date like 2002-09-01.01.01 you can remove all the "-" and "." so it's just 14 digits long.
 * I see you are still using machine-specific URLs https://ia902502.us.archive.org/23/items/tallahassee-democrat/Tallahassee%20Democrat.jpg - this is not recommend. Why not use https://archive.org/details/tallahassee-democrat/Tallahassee%20Democrat.jpg ? By "machine specific" the name of the machine is ia902502 which is a temporary location. No reason to do so when a permanent long term link is available. --  Green  C  03:05, 26 June 2022 (UTC)
 * Thanks again, you have totally cleared that up for me.  No, I am not using machine specific urls, that was from an earlier edit window.  I am using 'detail' for them all now. It's easy to change all my 'ph's' to 'todays'...I will play around with looking for the long form, and try a few more on Ghost...I really appreciate your generosity in sharing this expertise.  Kind regards. Realitylink (talk) 03:25, 26 June 2022 (UTC)

All good now with the long form. One thing I was wondering...is it necessary to change the http to https in this long form before pasting? Or does it just change into this over time? Realitylink (talk) 20:36, 28 June 2022 (UTC)


 * Yes! https is best.. if you forget a bot will eventually fix it. -- Green  C  22:25, 30 June 2022 (UTC)

Removing dead link tags
It seems that the bot is removing Dead link tags from refs to webcitation.org

See e.g. this edit to 2008 North Indian Ocean cyclone season, which removed the dead link on the bare URL ref to https://www.webcitation.org/5bgwta1al?url=http://www.imd.gov.in/section/nhac/dynamic/endseasonreport.pdf ... which I had added in this edit three weeks previously?

Please can the bot stop doing this? Removing the Dead link tag means that the ref is an untagged bare URL, which ends up in my cleanup lists that get fed to citation bot, or in this case tagged as Bare URL PDF ... which is also unhelpful, 'cos it invites editors to fill a ref which is known to be dead. Brown HairedGirl (talk) • (contribs) 01:29, 24 June 2022 (UTC)


 * was not meant to be used to tag bare archive URLs. Your kind of hacking the system a bit to trigger CitationBot to process the cite, a custom process flow. Normally once a citation has an archive URL, it's no longer tagged with a - it has been "Saved".  The dead link documentation says "Before considering whether to use the  template it is often useful to make a search for an archive copy of the dead link and thereby avoid using the tag altogether." We've always removed these tags once an archive URL is added. I understand this messes with your process flow since you do things in stages often with a long period between steps. Would suggest finding another way such as running citation bot sooner than later on these cases, to avoid overlap with other maintenance bots. Another option is when adding the  it would look like:   .. I can program my bot to avoid removing those dead link tags with that bot and for whatever dates you want. The issue is that there are other cases where these tags need to be removed, that were added in error by users who don't realize that an archive URL + dead link tag is redundant/non-standard. --  Green  C  02:09, 24 June 2022 (UTC)
 * @GreenC, you are missing the point: webcitation.org is dead.
 * There is no system hack, and the whole point of my comment is that I do not want CB to process these dead links to webcitation.org, and having these dead links correctly tagged as dead takes them out of that workflow.
 * I don't usually tag archive links are as dead, because they are live links to an archive site. It would be silly to tag as dead a link for example to https://web.archive.org/web/20120726135924/http://stonewall.org.uk/documents/stonewall_mp_voting_records_2010_1.pdf ... because that is a live link.
 * But webcitation.org is dead, so it is completely appropriate to tag those links as dead ... and no custom variant of the tag is needed. Dead is dead.
 * So I have just re-tagged as Dead link all the bare URL refs to webcitation.org. Please do not remove those tags.
 * It's great to remove deadlink tags from bare URL links to live archives, such as archive.today and archive.org.  That is very helpful.
 * But webcitation.org is a dead site, so please stop removing the Dead link tags from links to webcitation.org. You need to amend your code to make it stop treating the dead site webcitation.org in the same way as it treats live archiving sites.
 * Also, if you reply, please ping me. It's a pain to have to check for a reply, when a simple mechanism exists to notify.   Brown HairedGirl  (talk) • (contribs) 04:03, 24 June 2022 (UTC)

Help of you can
Hi, can you please help me on the wikipedia page of the Beta Israel community? The manager of the website is making up whatever he wants and he deleting sources and he lock pages. 2A02:6680:1106:1FCE:250E:E104:93A1:5652 (talk) 22:14, 30 June 2022 (UTC)


 * You were blocked so can't help. I'm just a bot. -- Green  C  22:24, 30 June 2022 (UTC)

Why can't you help? I was Blocked for no reason 2A02:6680:1106:1FCE:250E:E104:93A1:5652 (talk) 22:53, 30 June 2022 (UTC)

Bot added erroneous archive URL
I would like to report that the bot added the following erroneous archive URL for a journal citation in this edit:

I already manually fixed the archive URL but just wanted to report the error. Regards. Sanglahi86 (talk) 21:04, 14 July 2022 (UTC)
 * Sanglahi86: The problem is the https://web.archive.org/web/2021*/https://www.iseas.edu.sg/wp-content/uploads/2021/10/ISEAS_Perspective_2021_145.pdf which confused the bot. It should be https://www.iseas.edu.sg/wp-content/uploads/2021/10/ISEAS_Perspective_2021_145.pdf . The bot was trying to make that switch but the "2021*" stumped it. There should not be an archive URL in the url field. -- Green  C  21:11, 14 July 2022 (UTC)
 * Sorry, had not seen that citation's  detail. Thank you for clarifying. Regards. –Sanglahi86 (talk) 21:15, 14 July 2022 (UTC)
 * No problem thanks for noticing and reporting and fixing! - Green  C  21:27, 14 July 2022 (UTC)

Flagging non-dead link as dead
This edit flagged this URL as dead even though it isn't. Jo-Jo Eumerus (talk) 11:17, 18 July 2022 (UTC)


 * Same with these edits:
 * https://en.wikipedia.org/w/index.php?title=Tiberius_Gracchus&oldid=1098930968
 * https://en.wikipedia.org/w/index.php?title=Caesar%27s_civil_war&oldid=1098935280


 * I appreciate it probably has to do with some kind of automatic PDF link serving in Javascript that Academia.edu uses wouldn't be readily captured with a bot; I don't know how fixable it is, but the links noted are not dead at all; I reverted both edits that the bot flagged. Ifly6 (talk) 14:35, 18 July 2022 (UTC)
 * The url that Editor Jo-Jo Eumerus linked:
 * https://www.academia.edu/download/30869670/Turismo_y_Territorio_en_Salta-_Caceres_et_al-_CONICET-UBA_2012.pdf – dead for me
 * Both of the urls that Editor Ifly6 links:
 * https://www.academia.edu/download/31557049/Peter_Russell_-_Babeuf_and_the_Gracchi_(MHJ_Vol._36_(2008)__pp._41-57).pdf – dead for me
 * https://www.academia.edu/download/51344857/Iris-_Fall_of_the_Roman_Republic.pdf – dead for me
 * There was some discussion about these kinds of academia links at
 * —Trappist the monk (talk) 14:43, 18 July 2022 (UTC) 14:46, 18 July 2022 (UTC)
 * —Trappist the monk (talk) 14:43, 18 July 2022 (UTC) 14:46, 18 July 2022 (UTC)


 * Jo-Jo Eumerus & User:Ifly6 they are dead for me (USA). Example. Are you getting a redirect to a cloudfront URL? Wondering if there is some kind of location-aware policy that determines when to serve the cloudfront URL vs a 404. If the cloudfront URL was known, it would be possible to save it at the Wayback Machine, then use the Cloudfront-Wayback URL on Wikipedia treated as a dead link (due to its &Expires self-destruct mechanism see WP:AWSURL). However, I wonder about copyright if academia.edu is making them unavailable in the US and possibly elsewhere, question why have that policy if not a rights issue. -- Green  C  15:04, 18 July 2022 (UTC)
 * I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)
 * Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based),  they also get 404.  Archive.today it "works" (global IP pool)  but they are unable to correctly save the PDF. --  Green  C  16:00, 18 July 2022 (UTC)
 * I do get a "d1wqtxts1xzle7.cloudfront.net" sort of thing. Jo-Jo Eumerus (talk) 17:33, 18 July 2022 (UTC)
 * Language heuristics are always right 99pc of the time haha. I've confirmed on Edge (Windows 10) and Safari (macOS) that the Academia.edu link work. I don't have any plugins installed other than ad blockers that would affect something like this. The specific link that got generated for me with Rafferty was https://d1wqtxts1xzle7.cloudfront.net/51344857/Iris-_Fall_of_the_Roman_Republic-with-cover-page-v2.pdf. There were then a pile of GET parameters that I've excerpted – they change every time anyway – but are necessary to get the file served properly. Ifly6 (talk) 19:24, 18 July 2022 (UTC)
 * Jo-Jo Eumerus do you use Edge or Safari? -- Green  C  19:38, 18 July 2022 (UTC)
 * Village_pump_(technical) .. seeing if anything comes up here. -- Green  C  19:52, 18 July 2022 (UTC)
 * Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- Green  C  20:46, 18 July 2022 (UTC)
 * Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)
 * Same for me (Firefox) Jo-Jo Eumerus (talk) 21:12, 18 July 2022 (UTC)
 * Cool, glad it is figured out what is causing it. My thinking is to replace the academia.edu links with a Wayback version of the cloudfront URL so it's accessible for everyone. Or second option is to use registration but that 404 page is confusing and will result in bots marking it dead. --  Green  C  21:30, 18 July 2022 (UTC)

User:Jo-Jo Eumerus|User:Ifly6|User:Biogeographist: Would like to propose this solution: Special:Diff/1098978075/1099315632. It's only for academia.edu/download links, which are about 1,000 on enwiki. This is what I can do somewhat easily right away. There are limits due to bot design and coding efforts what can be done. -- Green  C  04:15, 20 July 2022 (UTC)
 * academia.edu returns a 404 when a user is not registered and logged in, which is most users. It does not say "log in to access paper", rather a misleading 404 dead link page. This causes problems:
 * Archive bots will determine the links are dead (404) and mark with a.
 * Users will be confused thinking the link is dead and not behind a registration wall.
 * Should the link ever actually die for real, there would be no archive available since the Wayback Machine sees only a dead 404 page - the Wayback machine is not an academia.edu registered user.
 * While possible to use registration this does not solve the misleading 404 problems.
 * The cloudfront link is an AWS container with an &Expires self-destruct mechanism. It's where the paper is actually located (not on academia.edu which redirects to cloudfront).
 * The proposal is to determine the active cloudfront link via bot magic, immediately create a Wayback Machine save of the cloudfront URL, and change the citation to the Wayback-cloudfront link. eg. Special:Diff/1098978075/1099315632
 * Hmm. It seems a bit complex and I wonder if people will be deleting the "expires" part of the link. Jo-Jo Eumerus (talk) 10:22, 20 July 2022 (UTC)
 * It's a complex situation. If they delete the &Expires the URL will break (404). It will break anyway, due to the Expires, that is why the archive URL version is made the primary. The archive URL is accessible to everyone - academia.edu account not required. --  Green  C  15:30, 20 July 2022 (UTC)
 * I think it's a problem that the  parameter points to CloudFront instead of Academia.edu; it would offer more transparency for the reader if the domain was academia.edu. Is it possible to retain the clean academia.edu URL (without the expires part) in the   parameter and use the long CloudFront URL in the   parameter? Or to use a separate Webarchive template for the long CloudFront URL? Biogeographist (talk) 16:45, 23 July 2022 (UTC)
 * hi sorry I should have deleted this entire thread, because it was moved to the main talk page (User_talk:GreenC_bot) from this archive page (User_talk:GreenC_bot/Archive 6). I posted an updated situation. Can you follow up there? -- Green  C  17:14, 23 July 2022 (UTC)