Wikipedia:Link rot/URL change requests/Archives/2022/July

The Undefeated is now Andscape
Hello. I was wondering if links from The Undefeated, an ESPN website, could be updated to it's new name at Andscape. For example, this is now at this new URL. The URL format is the exact same. I also wouldn't mind an archived copy of The Undefeated links if they haven't been moved to Andscape. Thanks! --MrLinkinPark333 (talk) 23:34, 17 June 2022 (UTC)


 * That's straightforward no problem. It will verify each link has been migrated and where not add an archive link or . Right now, the bot is tied up with a large webcitation.org migration which will take at least another 2-3 weeks. Then xinhuanet.com above. Thanks for your patience, the bottleneck is how many queries the bot can make to the WaybackMachine. Actually I might try to run this in parallel, since it likely won't make many Wayback queries, assuming most of the links were migrated successfully to the new domain. -- Green  C  03:49, 18 June 2022 (UTC)


 * Done Edited 487 pages, changed 617 links, including metadata. Example. -- Green  C  15:14, 16 July 2022 (UTC)

www.susangibney.com
The site www.susangibney.com leads to a page that contains the message "This domain has expired . . ." It's used on Susan Gibney. That may be the only WP page that has it, since the title is "Susan Gibney Fan Site". Eddie Blick (talk) 01:05, 24 June 2022 (UTC)
 * Done -- Green  C  20:48, 16 July 2022 (UTC)

Deprecate webcitation.org on enwiki
Deprecated webcitation.org URLs due to extended outage (over 6 months) and unlikely odds of it coming back at all.


 * Total number of links: 230,251
 * Total articles: 93,759


 * Total links converted: 199,439
 * Total links not converted: 30,812
 * Percentage converted: 87.7

The remaining 30k still exist in about 25k articles. A large percentage of these have archives available at archive.today but they require manual verification due to the high rate of soft-404s at archive.today


 * Done -- Green  C  06:45, 29 June 2022 (UTC)

news.xinhuanet.com
Many links to news.xinhuanet.com are currently broken. Jarble (talk) 07:49, 15 June 2022 (UTC)


 * - not sure what to do. Take as example http://www.xinhuanet.com/english/2018-09/11/c_137460450.htm it doesn't work it redirects to http://www.xinhuanet.com/webSkipping.htm . However try from Google Tranlate and it and works: https://www-xinhuanet-com.translate.goog/english/2018-09/11/c_137460450.htm?_x_tr_sch=http&_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp .. it also works if you disable JavaScript in the browser. So they have added a JavaScript that forces a redirect for unknown reasons, but the underlying page still exists. Is it temporary? I can't read the language of this "web skipping" error message page http://www.xinhuanet.com/webSkipping.htm it might hold some clues. This is why most of them are not working. -- Green  C  19:54, 3 July 2022 (UTC)
 * Translation request:
 * The content you viewed has expired and been archived; thank you for your attention to Xinhuanet.
 * You can also:
 * Looks like a permanent situation. It's odd the content is still there when you disable JavaScript, they could make it live again with a flip of the switch. Also there is some kind of logic involved in determining when the redirect occurs. I can't tell what the criteria is from the code.  --  Green  C  16:44, 4 July 2022 (UTC)


 * There is no way to determine page status without a JavaScript enabled query ie. web scraping with a headless browser. -- Green  C  04:35, 8 July 2022 (UTC)
 * Good news, now able to determine dead pages via headless browser. Bad news takes forever, about 4x slower than normal and 10,000 articles to check. The site will require regular maintenance as they take pages down quickly within a few years. Currently processing enwiki. --  Green  C  15:56, 11 July 2022 (UTC)


 * Done on enwiki. Processed about 10,000 articles and added about 9,000 new archive URLs and flipped a bunch of existing ones from live to dead. --  Green  C  01:49, 16 July 2022 (UTC)
 * Done on IABot database. The IABot database had about 50,000 unique links it had discovered in about 200 wikis. They are now processed, and set to the appropriate status (live, dead etc) so when the bot encounters on those wikis the links will be saved or not as needed. -- Green  C  03:22, 22 July 2022 (UTC)

Pandora
pandora.nla.gov.au need to be reprocessed with new code base. -- Green  C  02:24, 16 June 2022 (UTC)

www.academia.edu/download/
Most of these links don't seem to work. These pages load correctly when they're linked from Google Scholar, but they display an "Error: 404" page when they're linked from Wikipedia. Jarble (talk) 19:43, 2 July 2022 (UTC)

I am academic-journal-hesitant, without input from more knowledgeable users, before adding archive URLs. -- Green  C  21:06, 16 July 2022 (UTC)
 * Academia.edu is not a journal. The papers available there may be journal articles (or author copies of journal articles) or not.  Of the urls from that search that I tested (I tested only a few) 'work' if you remove the   portion of the path.  My experience with academia.edu urls is that everything after the numeric 'identifier' can also be removed.  If there is a preview, the preview can usually be read in its entirety.  To download the paper, you have to register (which is apparently free).  So this:
 * https://www.academia.edu/download/48901732/1977_Kingdom_of_Ladakh_c_950-1842_AD_by_Petech_s.pdf – does not work
 * but, strip  and  :
 * https://www.academia.edu/48901732 – readable preview; register to download
 * —Trappist the monk (talk) 21:36, 16 July 2022 (UTC)
 * Makes sense, good catch, will test first without the download and filename. Thanks! --  Green  C  23:18, 16 July 2022 (UTC)
 * Looks like they reuse identifiers: Note different name of paper in URL vs. on page. --  Green  C  03:44, 17 July 2022 (UTC)
 * Another -->  from Decolonisation of Africa. Paper should be "Copying informal institutions: the role of British colonial officers during the decolonization of British Africa." Odd how this problem only exists in URLs with "/download/" a spot check of other academia.edu URLs looks OK. From the Google scholar link as suggested by User:Jarble there is a link to academia.edu that works, although it is a redirect to an AWS container with a self-destruct &Expires tag (see WP:AWSURL). In theory it is possible to save at Wayback the Google Scholar obtained AWS link (done) and then redo all the URLs in the citation to match (done) but that would require a whole lot of programming. --  Green  C  04:30, 17 July 2022 (UTC)
 * After more investigation, I don't see a feasible way to automate this; and it's too many to manually convert. As such the best for now is to mark them with a . None of them have Wayback links, I suspect they were expunged from the Wayback Machine. Should anyone want to work on manual conversions the method is outlined above with the example for Decolonisation of Africa. --  Green  C  00:36, 18 July 2022 (UTC)
 * I saw this issue mentioned at WP:VPT but am responding here since this discussion has more information that I want to address: I have known about this issue with academia.edu links for a long time. Whenever I link to a file on academia.edu (which is rarely), I always link to the readable preview page, as explained by Trappist above, and not to the URL that ends with ".pdf", which gives a 404 error in my browser (Firefox) unless the HTTP referer header is from Google Scholar. As mentioned above, it's possible to shorten the URL just to the identifier number (example edit). Since content on academia.edu is user-uploaded, it's also always necessary to make sure that the uploader has the right to distribute the work; I have encountered many files hosted on academia.edu that are clear copyright violations. Biogeographist (talk) 20:24, 18 July 2022 (UTC)
 * User:Biogeographist: The method doesn't work for URLs containing . For example https://www.academia.edu/download/48901732/1977_Kingdom_of_Ladakh_c_950-1842_AD_by_Petech_s.pdf strip /download/ and /1977_Kingdom_of_Ladakh_c_950-1842_AD_by_Petech_s.pdf:
 * https://www.academia.edu/48901732
 * .."The low temperature specific heat of Lu-Cu-Y metallic glasses" is not the Kingdom of Ladakh. Every /download/ link tested this way is the same problem with the wrong article.  --  Green  C  01:37, 19 July 2022 (UTC)
 * How annoying is that? There are some design choices that I find rather dubious on academia.edu... Biogeographist (talk) 02:26, 19 July 2022 (UTC)
 * Parallel discussion at User_talk:GreenC_bot. Result: no good solutions available. --  Green  C  05:19, 24 July 2022 (UTC)

www.cia.gov
I found many broken links to this site throughout Wikipedia (here, for example). Jarble (talk) 01:33, 27 February 2022 (UTC)
 * Done for IABot db - updated about 8,000 URLs, which will propagate to many wikis. Working on enwiki next. --  Green  C  17:46, 24 July 2022 (UTC)
 * Done for Enwiki - edited about 3,200 articles added about 3,500 archive URLs, some and flipped some dead.  --  Green  C