Wikipedia:Link rot/URL change requests/Archives/2021/July

seedmagazine.com - defunct site with unexpected link redirects
Hello. Seems that Seed (magazine)'s old URL has been acquired by somebody and the old references containing seedmagazine.com are being redirected to a new outfit presumably in the chase for the clicks. This is redirecting readers in unexpected ways and preventing normal link-rot bots from recognising the domain is dead. Google search for  suggests there are 130ish mentions remaining, with 30ish on English Wikipedia. I'm sure that somebody who's better versed in the arcane language of Wikipedia in-text search could come up with more usable numbers.

Could a bot writer consider executing following:
 * Add archive URL where it exists and is missing
 * Flip all instances of  -->   (if there are any)
 * Wrap the external links in  tags to discourage readers clicking of the links to prevent them from getting redirected unexpectedly

Thanks. Mel ma nn  18:58, 24 June 2021 (UTC)


 * This is in the parlance a "usurped" domain (or "hijacked"). What I can do for these is Blacklist in the IABot database for each URL along with an archive URL - this will cause the bot to always treat it as dead and archive even if it pings alive, this is across 80+ wiki languages. On enwiki only, can add usurped to any in a CS1|2 template. For bare and square links it will try to convert to an archive URL. If no archive is available, it will require manual attention. -- Green  C  21:34, 24 June 2021 (UTC)

Results done. Links were in about 130 articles on enwiki. Also manually converted a dozen to cite templates with unfit. And cleaned up the Seed (magazine) article. Updated the IABot db etc.. -- Green  C  03:37, 27 June 2021 (UTC)


 * I did not know there was an  parameter. Should really read the documentation more carefully. Thank you for your efforts, much appreciated.   Mel ma nn   21:51, 29 June 2021 (UTC)

www.mod.go.jp only allows https access now
Without a fix each URL gets an error page then a redirect after 10 seconds to the home page (in Japanese) rather than the (for the most past) English page relevant to the topic.

I fixed one article and found there are probably 100-200 articles affected.

Is there a kind soul with a robot to programatically change, seems a lot to do by hand.

http://www.mod.go.jp/* to https://www.mod.go.jp/* ? Alex Sims (talk) 05:00, 2 July 2021 (UTC)


 * Referring to and  . --  Green  C  06:19, 2 July 2021 (UTC)


 * Will do. --bender235 (talk) 12:56, 2 July 2021 (UTC)

Convert 200.57.183.69 to info.guadalajara2011.org.mx
Example. Only convert when there is an archive URL available for the version at info.guadalajara2011.org.mx otherwise add. Update IABot database for the IP URL with the archive URL at info.guadalajara2011.org.mx - mark "domain" (IP) blacklisted. -- Green  C  19:38, 4 July 2021 (UTC)

Results Converted 401 links and added 726. Example. -- Green  C  15:25, 12 July 2021 (UTC)

columbiabusinesstimes.com
I found several articles with columbiabusinesstimes.com broken links to columbiabusinesstimes.com. Can these links be automatically archived? Jarble (talk) 14:52, 5 July 2021 (UTC)


 * 32 articles. Set dead in IABot and created queue to process. -- Green  C  17:04, 12 July 2021 (UTC)

www.ctv.com
I found at least 100 broken links to this site: can they be automatically archived? Jarble (talk) 14:55, 5 July 2021 (UTC)
 * The site has links in over 3000 articles (not just /servlet). Many appear to be soft 404s that are difficult to detect due to JavaScript. There are working links too example. -- Green  C  06:22, 13 July 2021 (UTC)
 * Looks like most links were moved to a new domain ctvnews.ca, sometime before 2016. For example this became this. No redirects and no patterns - everything in the new URL is different. Nevertheless I'm finding a way, for some, to make live again.   --  Green  C  02:17, 14 July 2021 (UTC)

Results Every link in the ctv.ca domain on enwiki was processed. In case anyone wants to replicate this on other wikis in the future: This was difficult. It is hard to detect 404 status, and hard to find redirect URLs. The redirects to ctvnews.ca, if not found in the header, but can be found by searching for the ctv.ca URL at Wayback with a timestamp of 20160101 (ie. the site had working redirects at one time, but then deleted them, some redirect URLs (not pages) were luckily saved in the Wayback Machine). Since the site uses JS, and does not emit a proper 404, the way I used is w3m utility in -dump mode and if the output is < 43 lines it is probably a 404. -- Green  C  01:31, 15 July 2021 (UTC)
 * moved 2,048 links to ctvnews.ca
 * converted 679 ctv.com to archive URLs
 * added 346 tags
 * many other misc fixes

theinquirer.net domain taken over; all deep links dead


theinquirer.net is gone, what we have linked and redirects to somewhat generic trustedreviews.com uri. We should be looking to "url-status" neuter any citations linked to, and killing any "External links". Thanks if someone is set up to do that. — billinghurst  sDrewth  07:16, 8 July 2021 (UTC)
 * Special:LinkSearch/https://*.theinquirer.net
 * Special:LinkSearch/*.theinquirer.net
 * You should probably post this at WP:URLREQ instead. * Pppery * it has begun... 12:07, 8 July 2021 (UTC)

This is done. It togged around 500 unfit or bare URL to archive URL. I manually converted about 6 bare URLs without archives to cite templates, and 2 dozen to cite web. Changed domain status to Blacklisted in IABot database. -- Green  C  22:04, 9 July 2021 (UTC)

Living Books please help!
I wonder if you could help me with some linkrot that has crept into my article Living Books? I have put a lot of effort into putting this mammoth article together, though admittedly citations are not my strong suit. Many of these sources have now fallen to linkrot. Your assistance would be invaluable. :)--Coin945 (talk) 13:28, 13 July 2021 (UTC)
 * Go to History tab, on the top row is a "Fix dead links" to run InternetArchiveBot. Run it a few times over the next 6 weeks, say, 3-4 times because it delays adding until it gets a dead link result multiple time (unless it already knows it's dead in its database). My bot WaybackMedic is more targeted to certain domains with known problems, or links that already have a tag. --  Green  C  16:30, 13 July 2021 (UTC)

Links to World Gazetteer don't work


Hi all, World Gazetteer or  is used as reference for city population sizes on a lot of pages like List of countries by largest and second largest cities, List of highest cities, List of cities in Ghana and many more (wikipedia search). Links to World Gazetteer don't work and many archived links on Wayback Machine don't work too, a message "Sorry, no offline reader allowed. You can use the download function." is returned. A message on Talk:List_of_countries_by_largest_and_second_largest_cities indicates that the links don't work since at least 31 July 2019. Some archived links work though, for example at List of Nigerian states by area. Maybe www.citypopulation.de can be used as an alternative source.

Because there a a lot of pages with links to World Gazetteer, I ask here how to proceed. Difool (talk) 09:10, 9 July 2021 (UTC)
 * Post the above at WP:URLREQ and someone will help you. &#32; Headbomb {t · c · p · b} 18:13, 10 July 2021 (UTC)

OK. Will get to it. -- Green  C  20:21, 11 July 2021 (UTC)

It is done. Any problems let me know. -- Green  C  21:28, 16 July 2021 (UTC)

Results
 * Added 514 new working archive URLs. Example
 * Removed 440 archive URLs due to the "Sorry, no offline reader allowed" soft-404. These had no other archive providers available, the ones that did are in the 514 group.
 * Wow, thanks a lot! The only problem I saw is that an archived "bevoelkerungsstatistik.de" can give a German soft-404: "Leider kein Offline-Reader erlaubt. Dafür gibt es die Download-Funktion.". For example, this one . Difool (talk) 03:29, 17 July 2021 (UTC)
 * lol oh man should have guessed. Will see if I can find these. You are welcome. This actually was a new thing for the program: finding soft-404s in archive URLs pre-existing on-wiki, has never come up before. --  Green  C  03:51, 17 July 2021 (UTC)
 * Found 9, fixed manually. -- Green  C  04:21, 17 July 2021 (UTC)

Reuters (again)


Hi! It seems like Reuters hs changed their URL structure a bit, from e.g https://www.reuters.com/article/2013/12/05/us-sweden-spying-idUSBRE9B40AG20131205 to https://www.reuters.com/article/us-sweden-spying-idUSBRE9B40AG20131205 (removing the date). The old URLs are dead, and IABot are archiving them as such. Could your bot perhaps "update" The URLs instead to the new ones, if the new ones are 200 and the old ones are 404? Jonatan Svensson Glad (talk) 20:50, 1 July 2021 (UTC)
 * See e.g. Special:Diff/1031471416&oldid=1022265762. Jonatan Svensson Glad (talk) 20:51, 1 July 2021 (UTC)
 * Yes, will do. But only for enwiki. And it will take probably 3-4 days before edits. Estimate these are in around 10,000 articles. -- Green  C  01:50, 2 July 2021 (UTC)
 * Great, thanks! Jonatan Svensson Glad (talk) 19:00, 2 July 2021 (UTC)
 * giving a status update [and move of thread]. The first run is uploading diffs right now, incorporating the great discovery you made of the dates, plus fixing 404s, soft-404s, and repairing sub-domain issues first noted by here. It processed 18,290 articles and found changes in 16,506. There are an additional 31,074 articles with Reuters links in a second set. The bot is beginning to process those.. --  Green  C  04:22, 5 July 2021 (UTC)

This is done for now. -- Green  C  20:20, 11 July 2021 (UTC)
 * Great work! Jonatan Svensson Glad (talk) 21:33, 18 July 2021 (UTC)