User talk:GreenC bot

Flagging non-dead link as dead
This edit flagged this URL as dead even though it isn't. Jo-Jo Eumerus (talk) 11:17, 18 July 2022 (UTC)


 * Same with these edits:
 * https://en.wikipedia.org/w/index.php?title=Tiberius_Gracchus&oldid=1098930968
 * https://en.wikipedia.org/w/index.php?title=Caesar%27s_civil_war&oldid=1098935280


 * I appreciate it probably has to do with some kind of automatic PDF link serving in Javascript that Academia.edu uses wouldn't be readily captured with a bot; I don't know how fixable it is, but the links noted are not dead at all; I reverted both edits that the bot flagged. Ifly6 (talk) 14:35, 18 July 2022 (UTC)
 * The url that Editor Jo-Jo Eumerus linked:
 * https://www.academia.edu/download/30869670/Turismo_y_Territorio_en_Salta-_Caceres_et_al-_CONICET-UBA_2012.pdf – dead for me
 * Both of the urls that Editor Ifly6 links:
 * https://www.academia.edu/download/31557049/Peter_Russell_-_Babeuf_and_the_Gracchi_(MHJ_Vol._36_(2008)__pp._41-57).pdf – dead for me
 * https://www.academia.edu/download/51344857/Iris-_Fall_of_the_Roman_Republic.pdf – dead for me
 * There was some discussion about these kinds of academia links at
 * —Trappist the monk (talk) 14:43, 18 July 2022 (UTC) 14:46, 18 July 2022 (UTC)
 * —Trappist the monk (talk) 14:43, 18 July 2022 (UTC) 14:46, 18 July 2022 (UTC)


 * Jo-Jo Eumerus & User:Ifly6 they are dead for me (USA). Example. Are you getting a redirect to a cloudfront URL? Wondering if there is some kind of location-aware policy that determines when to serve the cloudfront URL vs a 404. If the cloudfront URL was known, it would be possible to save it at the Wayback Machine, then use the Cloudfront-Wayback URL on Wikipedia treated as a dead link (due to its &Expires self-destruct mechanism see WP:AWSURL). However, I wonder about copyright if academia.edu is making them unavailable in the US and possibly elsewhere, question why have that policy if not a rights issue. -- Green  C  15:04, 18 July 2022 (UTC)
 * I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)
 * Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based),  they also get 404.  Archive.today it "works" (global IP pool)  but they are unable to correctly save the PDF. --  Green  C  16:00, 18 July 2022 (UTC)
 * I do get a "d1wqtxts1xzle7.cloudfront.net" sort of thing. Jo-Jo Eumerus (talk) 17:33, 18 July 2022 (UTC)
 * Language heuristics are always right 99pc of the time haha. I've confirmed on Edge (Windows 10) and Safari (macOS) that the Academia.edu link work. I don't have any plugins installed other than ad blockers that would affect something like this. The specific link that got generated for me with Rafferty was https://d1wqtxts1xzle7.cloudfront.net/51344857/Iris-_Fall_of_the_Roman_Republic-with-cover-page-v2.pdf. There were then a pile of GET parameters that I've excerpted – they change every time anyway – but are necessary to get the file served properly. Ifly6 (talk) 19:24, 18 July 2022 (UTC)
 * Jo-Jo Eumerus do you use Edge or Safari? -- Green  C  19:38, 18 July 2022 (UTC)
 * Village_pump_(technical) .. seeing if anything comes up here. -- Green  C  19:52, 18 July 2022 (UTC)
 * Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- Green  C  20:46, 18 July 2022 (UTC)
 * Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)
 * Same for me (Firefox) Jo-Jo Eumerus (talk) 21:12, 18 July 2022 (UTC)
 * Cool, glad it is figured out what is causing it. My thinking is to replace the academia.edu links with a Wayback version of the cloudfront URL so it's accessible for everyone. Or second option is to use registration but that 404 page is confusing and will result in bots marking it dead. --  Green  C  21:30, 18 July 2022 (UTC)

User:Jo-Jo Eumerus|User:Ifly6|User:Biogeographist: Would like to propose this solution: Special:Diff/1098978075/1099315632. It's only for academia.edu/download links, which are about 1,000 on enwiki. This is what I can do somewhat easily right away. There are limits due to bot design and coding efforts what can be done. -- Green  C  04:15, 20 July 2022 (UTC)
 * academia.edu returns a 404 when a user is not registered and logged in, which is most users. It does not say "log in to access paper", rather a misleading 404 dead link page. This causes problems:
 * Archive bots will determine the links are dead (404) and mark with a.
 * Users will be confused thinking the link is dead and not behind a registration wall.
 * Should the link ever actually die for real, there would be no archive available since the Wayback Machine sees only a dead 404 page - the Wayback machine is not an academia.edu registered user.
 * While possible to use registration this does not solve the misleading 404 problems.
 * The cloudfront link is an AWS container with an &Expires self-destruct mechanism. It's where the paper is actually located (not on academia.edu which redirects to cloudfront).
 * The proposal is to determine the active cloudfront link via bot magic, immediately create a Wayback Machine save of the cloudfront URL, and change the citation to the Wayback-cloudfront link. eg. Special:Diff/1098978075/1099315632
 * Hmm. It seems a bit complex and I wonder if people will be deleting the "expires" part of the link. Jo-Jo Eumerus (talk) 10:22, 20 July 2022 (UTC)
 * It's a complex situation. If they delete the &Expires the URL will break (404). It will break anyway, due to the Expires, that is why the archive URL version is made the primary. The archive URL is accessible to everyone - academia.edu account not required. --  Green  C  15:30, 20 July 2022 (UTC)

Unfortunately there is something preventing cloudfront pages from being saved at Wayback. Not all pages, but most. So we have a bad situation with academia.edu/download links - ideally they should be converted to a non /download/ links - but can't be done by bot requires manual searching. The /download/ links are probably originating from Google Scholar, copy-pasting. --  Green  C  15:56, 23 July 2022 (UTC)

Backlinks report
User:Certes/Backlinks/Report seems to have stopped, but User:GoingBatty/Backlinks/Report is running normally. I've not added any new backlinks recently. Can you see anything else that I may have broken? Certes (talk) 11:17, 25 July 2022 (UTC)
 * It aborted for unknown reasons. I increased the memory allocation by 10x in case that is the problem. The data may be messed up from the abort. I've restarted the process and will see what happens over the next hour or so if it can recover. Worse case will just delete all the data and it will rebuild from scratch, but that will result in a missed day. --  Green  C  15:34, 25 July 2022 (UTC)
 * Thanks. Let me know if I'm checking too many targets or if some produce exceptionally big reports, and I'll remove the less productive ones. Certes (talk) 15:45, 25 July 2022 (UTC)
 * It was crashing at "m" then after increasing memory made it to "v". Odd bc it should not run out of memory, and there are no error messages system or program to suggest why it's silently halting so it might be something different. I added debug statements, takes a while to replicate an hour or more. Thanks for holding. --  Green  C  04:26, 26 July 2022 (UTC)
 * Odd: "m" and "v" are early in my list, and neither they nor anything earlier have many incoming links. If it's taking an hour then we may need to remove the entries with lowest benefit per second.  A few entries have never triggered a fix and could probably be removed, but I've already removed the resource-heavy ones.  Maybe I need to rate them all by fixes done per 1000 incoming links or similar and chop those scoring lowest. "v" is an oddity because it can indicate that the editor failed to press  when pasting: easy to spot, but hard to fix as you need to guess what was in their clipboard. Certes (talk) 12:39, 26 July 2022 (UTC)
 * The memory problem appears to be cumulative if I run m or v in isolation they do fine but when running the whole bunch there is a massive spike in memory claim that occurs at the same spot around v or x, but also others don't release their claims so it builds up. It could be related to the Sun Grid Engine caching for performance reasons. I've checked the program for errant global vars and it's fine there is nothing holding onto data. I might try separating the backlinks retrieval portion to a different program so it exits between each item clearing any memory claims. --  Green  C  16:48, 26 July 2022 (UTC)
 * I think it is fixed. A combination of repetitive backlinks reported by the API and inefficiencies in the program magnifying those repetitions. It should never use more than about 25MB of ram, but with "V" (and "v") it was as high as 1 gigabyte. Why V? I suspect it's due to WP:V which is so commonly linked outside mainspace. V exposed the problem, but it was occurring at a smaller scale with everything else. (The API typically and erroneously reports 100s of the same backlink - I don't know why it's always done this.) "V" had 2.5 million non-unique occurrences. Add to this the program was inefficient in how it dealt with the repetitions, it added up and the Grid Engine was nope and dropped the job. Right now it's starting over rebuilding the database, it should be back to normal soon. -- Green  C  05:44, 27 July 2022 (UTC)
 * Thanks very much. The current version looks right, considering that it's for a few hours rather than the usual 24.  Is it possible to add the namespace of the link target to the query?  I'm not sure how you're extracting the data but, for example, Quarry would run its SQL much faster with "and pl_namespace=0". Certes (talk) 11:21, 27 July 2022 (UTC)
 * API:Backlinks. When I first made this program (not your fork of it) around April 2015, Quarry was only about 6 months old I think, anyway I wasn't aware of it, and I wanted something that would run from anywhere which left the API. Speed is not an issue when running daily, unless it takes > 24hrs. Your job completes in about 2 hours, it is exceptionally big. The API behavior of multiple results is weird but can be adjusted for. If it continues to be a problem I can look into Quarry, getting a JSON file would nice. -- Green  C  15:41, 27 July 2022 (UTC)
 * In that case, blnamespace is what I meant, but I'm not clear what it should be set to: the several namespaces in which relevant links appear, or ns 0 to which relevant links lead. If my job is taking two hours then I should be checking fewer targets; any clues as to which entries take the most time would help with that. Certes (talk) 18:27, 27 July 2022 (UTC)
 * Below is an 'ls' of the data files. The timestamps show how long each took to complete. The file size is misleading as the program filters out namespaces. Like "V" (and "v" they are indenitcal to the API) is not very large filesize, but took almost 25 minutes to complete. It took about 85m to finish not 120m my mistake. V/v is about 50 minutes. U/u 20 minutes. N/n 10 minutes. Those are the big three and use 95% of the time (is that right?). Probably due to WP:V, WP:U and WP:N. -- Green  C  19:28, 27 July 2022 (UTC)
 * Thanks. I'll take V/v, U/u and N/n out then.  U and N rarely get a hit.  V gets more but I'm less confident about fixing them as most of them require me to guess what article the editor was thinking of. Certes (talk) 20:57, 27 July 2022 (UTC)
 * All working as normal today, and an hour faster than previously. Thanks again for your help. Certes (talk) 10:03, 28 July 2022 (UTC)
 * Yes, finished in 25 minutes. No single one took very long (or much memory!). You are welcome and thanks for reporting it because it uncovered a problem in the program that only became evident at scale. -- Green  C  15:52, 28 July 2022 (UTC)

22930	Jul	27	09:11	0.new 127027	Jul	27	09:11	1.new 16924	Jul	27	09:11	2.new 15575	Jul	27	09:11	3.new 15540	Jul	27	09:11	4.new 14709	Jul	27	09:12	5.new 12741	Jul	27	09:12	6.new 17054	Jul	27	09:12	7.new 15220	Jul	27	09:12	8.new 14745	Jul	27	09:12	9.new 7476	Jul	27	09:13	10.new 6315	Jul	27	09:13	100.new 15741	Jul	27	09:13	A.new 13776	Jul	27	09:13	B.new 16104	Jul	27	09:13	C.new 13410	Jul	27	09:13	D.new 13301	Jul	27	09:14	E.new 12605	Jul	27	09:14	F.new 13550	Jul	27	09:14	G.new 13518	Jul	27	09:14	H.new 14387	Jul	27	09:14	I.new 13005	Jul	27	09:14	J.new 12845	Jul	27	09:14	K.new 14099	Jul	27	09:14	L.new 13174	Jul	27	09:14	M.new 39805	Jul	27	09:18	N.new 13668	Jul	27	09:19	O.new 13088	Jul	27	09:19	P.new 11858	Jul	27	09:19	Q.new 14160	Jul	27	09:19	R.new 14529	Jul	27	09:19	S.new 13146	Jul	27	09:19	T.new 15718	Jul	27	09:21	U.new 96856	Jul	27	09:45	V.new 12403	Jul	27	09:45	W.new 12797	Jul	27	09:45	X.new 13659	Jul	27	09:45	Y.new 13403	Jul	27	09:45	Z.new 15741	Jul	27	09:45	a.new 13776	Jul	27	09:45	b.new 16104	Jul	27	09:45	c.new 13410	Jul	27	09:46	d.new 13301	Jul	27	09:46	e.new 12605	Jul	27	09:46	f.new 13550	Jul	27	09:46	g.new 13518	Jul	27	09:46	h.new 14387	Jul	27	09:46	i.new 13005	Jul	27	09:46	j.new 12845	Jul	27	09:46	k.new 14099	Jul	27	09:46	l.new 13174	Jul	27	09:46	m.new 39805	Jul	27	09:51	n.new 13668	Jul	27	09:51	o.new 13088	Jul	27	09:51	p.new 11858	Jul	27	09:51	q.new 14160	Jul	27	09:51	r.new 14529	Jul	27	09:51	s.new 13146	Jul	27	09:51	t.new 15718	Jul	27	09:53	u.new 96856	Jul	27	10:16	v.new 12403	Jul	27	10:16	w.new 12797	Jul	27	10:16	x.new 13659	Jul	27	10:16	y.new 13403	Jul	27	10:16	z.new 217699	Jul	27	10:17	ABC 5951	Jul	27	10:17	Accolade.new 118095	Jul	27	10:17	Acre.new 89027	Jul	27	10:17	Admiral.new 22088	Jul	27	10:17	Alphabet.new 29758	Jul	27	10:17	Amber.new 4295	Jul	27	10:17	Amen.new 31785	Jul	27	10:17	Aperture.new 2643	Jul	27	10:17	Ash.new 2643	Jul	27	10:17	ash.new 44238	Jul	27	10:17	Atlantic.new 1375	Jul	27	10:17	Back.new 1375	Jul	27	10:17	back.new 36337	Jul	27	10:17	Bay.new 36337	Jul	27	10:17	bay.new 53374	Jul	27	10:17	Bowling.new 53374	Jul	27	10:17	bowling.new 2048	Jul	27	10:17	Cabinet 36569	Jul	27	10:17	Captain.new 36569	Jul	27	10:17	captain.new 12368	Jul	27	10:17	Calvary.new 12368	Jul	27	10:17	calvary.new 26920	Jul	27	10:17	Caterpillar.new 28665	Jul	27	10:17	Chancellor.new 28665	Jul	27	10:17	chancellor.new 31754	Jul	27	10:17	Chestnut.new 31754	Jul	27	10:17	chestnut.new 4924	Jul	27	10:17	Chin.new 725	Jul	27	10:17	Clipboard.new 725	Jul	27	10:17	clipboard.new 44162	Jul	27	10:17	Colony.new 44162	Jul	27	10:18	colony.new 3070	Jul	27	10:18	Colonies.new 3070	Jul	27	10:18	colonies.new 55	Jul	27	10:18	Colors.new 55	Jul	27	10:18	colors.new 565	Jul	27	10:18	Colours.new 565	Jul	27	10:18	colours.new 138372	Jul	27	10:19	Company.new 138372	Jul	27	10:20	company.new 6611	Jul	27	10:20	Companies.new 6611	Jul	27	10:20	companies.new 14699	Jul	27	10:20	Consul.new 14699	Jul	27	10:20	consul.new 76725	Jul	27	10:20	Colorado 3180	Jul	27	10:21	Commonwealth.new 3180	Jul	27	10:21	commonwealth.new 30657	Jul	27	10:21	Conservative.new 1206	Jul	27	10:21	Conservatives.new 113900	Jul	27	10:21	Corvette.new 2005	Jul	27	10:21	Corvettes.new 28639	Jul	27	10:21	Delphi.new 48181	Jul	27	10:21	Family.new 48181	Jul	27	10:21	family.new 2257	Jul	27	10:21	Families.new 2257	Jul	27	10:21	families.new 61603	Jul	27	10:21	Icon.new 61603	Jul	27	10:21	icon.new 6665	Jul	27	10:21	Icons.new 6665	Jul	27	10:21	icons.new 5801	Jul	27	10:21	Interpreter.new 5801	Jul	27	10:21	interpreter.new 70977	Jul	27	10:21	Jupiter.new 12095	Jul	27	10:21	Knot.new 12095	Jul	27	10:21	knot.new 80891	Jul	27	10:21	Krishna.new 121459	Jul	27	10:21	Lead.new 121459	Jul	27	10:21	lead.new 127	Jul	27	10:21	Liberal 180	Jul	27	10:21	Libertarian 183969	Jul	27	10:22	Madonna.new 183969	Jul	27	10:22	madonna.new 65528	Jul	27	10:22	Mass.new 65528	Jul	27	10:22	mass.new 5378	Jul	27	10:22	Meta.new 770	Jul	27	10:22	Ministry 3160	Jul	27	10:22	Model.new 3160	Jul	27	10:22	model.new 176677	Jul	27	10:23	Moon.new 176677	Jul	27	10:23	moon.new 214735	Jul	27	10:23	National 199067	Jul	27	10:23	Oxygen.new 76332	Jul	27	10:23	Primate.new 76332	Jul	27	10:23	primate.new 5462	Jul	27	10:23	Roland.new 346	Jul	27	10:24	Ronaldo.new 68973	Jul	27	10:24	Salt.new 68973	Jul	27	10:24	salt.new 16813	Jul	27	10:24	Season.new 16813	Jul	27	10:24	season.new 44306	Jul	27	10:24	Shiraz.new 44306	Jul	27	10:24	shiraz.new 53287	Jul	27	10:24	Spire.new 53287	Jul	27	10:24	spire.new 153867	Jul	27	10:24	Stream.new 153867	Jul	27	10:24	stream.new 11482	Jul	27	10:24	Telegram.new 3845	Jul	27	10:24	Thermal.new 3845	Jul	27	10:24	thermal.new 88519	Jul	27	10:24	Tree.new 88519	Jul	27	10:24	tree.new 3102	Jul	27	10:24	Trojan 3102	Jul	27	10:24	trojan 167	Jul	27	10:24	U.S. 2334	Jul	27	10:24	Victory.new 26424	Jul	27	10:24	Ardennes.new 19159	Jul	27	10:24	Aspen.new 1884	Jul	27	10:24	Baler.new 105737	Jul	27	10:25	Batman.new 20662	Jul	27	10:25	Battle.new 53364	Jul	27	10:25	Bethlehem.new 439921	Jul	27	10:25	Birmingham.new 11530	Jul	27	10:25	Boulder.new 54094	Jul	27	10:25	Brampton.new 14995	Jul	27	10:25	Calvados.new 208354	Jul	27	10:25	Cambridge.new 71179	Jul	27	10:25	Canterbury.new 15715	Jul	27	10:25	Caracal.new 203571	Jul	27	10:26	Christchurch.new 78460	Jul	27	10:26	Cicero.new 43543	Jul	27	10:26	Durango.new 18943	Jul	27	10:26	East 296629	Jul	27	10:26	Edmonton.new 12304	Jul	27	10:26	Esplanade.new 25247	Jul	27	10:26	Eye.new 32977	Jul	27	10:26	Flint.new 151	Jul	27	10:26	Gladstone.new 81116	Jul	27	10:26	Gloucester.new 56266	Jul	27	10:26	Greenwich.new 780	Jul	27	10:26	Guna.new 21889	Jul	27	10:26	Horsham.new 199436	Jul	27	10:26	Hyderabad.new 89915	Jul	27	10:26	Ipswich.new 15229	Jul	27	10:26	Ithaca.new 132579	Jul	27	10:27	Lagos.new 68478	Jul	27	10:27	La 18993	Jul	27	10:27	Leek.new 439197	Jul	27	10:27	Liverpool.new 26324	Jul	27	10:27	Loire.new 54	Jul	27	10:27	Loni.new 8106	Jul	27	10:27	Malmesbury.new 35538	Jul	27	10:27	Mansfield.new 7545	Jul	27	10:27	March.new 16434	Jul	27	10:27	Mold.new 25849	Jul	27	10:27	Moselle.new 33698	Jul	27	10:27	New 270789	Jul	27	10:27	New 205009	Jul	27	10:28	Norfolk.new 112023	Jul	27	10:28	Norwich.new 28431	Jul	27	10:28	Ore.new 71930	Jul	27	10:28	Pali.new 83138	Jul	27	10:28	Panama 373705	Jul	27	10:28	Perth.new 99124	Jul	27	10:28	Piedmont.new 22133	Jul	27	10:28	Pueblo.new 73659	Jul	27	10:28	Punjab.new 30869	Jul	27	10:28	Reading.new 100419	Jul	27	10:29	Republic 19646	Jul	27	10:29	Rye.new 23084	Jul	27	10:29	Saga.new 6106	Jul	27	10:29	Saint 5866	Jul	27	10:29	St. 11630	Jul	27	10:29	Saint 5336	Jul	27	10:29	St. 97107	Jul	27	10:29	St. 22068	Jul	27	10:29	Stanford.new 255991	Jul	27	10:29	Surrey.new 93952	Jul	27	10:29	Tripoli.new 50366	Jul	27	10:29	Troy.new 38853	Jul	27	10:29	Van.new 18130	Jul	27	10:29	Vosges.new 21909	Jul	27	10:29	Warwick.new 15455	Jul	27	10:29	Angels.new 23662	Jul	27	10:29	Arsenal.new 38084	Jul	27	10:29	Avalanche.new 2391	Jul	27	10:29	Barbarians.new 1558	Jul	27	10:29	Bears.new 5145	Jul	27	10:29	Border 296	Jul	27	10:29	Broncos.new 463	Jul	27	10:29	Buccaneers.new 1063	Jul	27	10:29	Canadiens.new 15399	Jul	27	10:29	Cavaliers.new 751	Jul	27	10:29	Cheetahs.new 367	Jul	27	10:29	Corinthians.new 3529	Jul	27	10:29	Coyotes.new 9722	Jul	27	10:29	Crusaders.new 5268	Jul	27	10:29	Dolphins.new 3090	Jul	27	10:29	Dragons.new 4159	Jul	27	10:29	Ducks.new 160	Jul	27	10:29	Eagles.new 45	Jul	27	10:29	Flames.new 48481	Jul	27	10:29	Force.new 181	Jul	27	10:29	Griquas.new 2627	Jul	27	10:29	Hawks.new 27971	Jul	27	10:29	Heat.new 653	Jul	27	10:29	Hornets.new 5809	Jul	27	10:29	Hurricanes.new 949	Jul	27	10:29	Jaguars.new 223	Jul	27	10:29	Jays.new 1571	Jul	27	10:29	Leopards.new 43470	Jul	27	10:30	Lightning.new 2409	Jul	27	10:30	Lions.new 229	Jul	27	10:30	Ospreys.new 1981	Jul	27	10:30	Pelicans.new 2413	Jul	27	10:30	Penguins.new 9026	Jul	27	10:30	Pirates.new 4012	Jul	27	10:30	Predators.new 2731	Jul	27	10:30	Rockets.new 802	Jul	27	10:30	Rockies.new 7330	Jul	27	10:30	Saints.new 9918	Jul	27	10:30	Saracens.new 3954	Jul	27	10:30	Sharks.new 3306	Jul	27	10:30	Stars.new 6305	Jul	27	10:30	Thunder.new 2129	Jul	27	10:30	Tigers.new 26592	Jul	27	10:30	Titans.new 3808	Jul	27	10:30	Twins.new 98682	Jul	27	10:30	Vikings.new 663	Jul	27	10:30	Warriors.new 3396	Jul	27	10:30	Wasps.new 5597	Jul	27	10:30	Wolves.new 6	Jul	27	10:30	Zunz.new 795	Jul	27	10:30	Orsini.new 226	Jul	27	10:30	Rockefeller.new 32	Jul	27	10:30	Paintal.new 483	Jul	27	10:30	Rothschild.new 8	Jul	27	10:30	Pevsner.new 4861	Jul	27	10:30	O'Reilly.new 62	Jul	27	10:30	Primo 18	Jul	27	10:30	Cimarosa.new 53	Jul	27	10:30	Narasimha 505	Jul	27	10:30	Caracciolo.new 155	Jul	27	10:30	Bakunin.new 665	Jul	27	10:30	Weber.new 26	Jul	27	10:30	Malevich.new 57	Jul	27	10:30	Korotayev.new 18	Jul	27	10:30	Krauser.new 186	Jul	27	10:30	Ghazali.new 266	Jul	27	10:30	Touré.new 190	Jul	27	10:30	Sadat.new 288	Jul	27	10:30	Rajguru.new 289	Jul	27	10:30	Maitland.new 83	Jul	27	10:30	Strozzi.new 90	Jul	27	10:30	Delacroix.new 167	Jul	27	10:30	Reuter.new 185	Jul	27	10:30	Baden 31	Jul	27	10:30	Lessing.new 129	Jul	27	10:30	Boyle.new 96	Jul	27	10:30	Aelian.new 48	Jul	27	10:30	Zichy.new 64	Jul	27	10:30	Nomura.new 204	Jul	27	10:30	Takeda.new 21	Jul	27	10:30	Gilbert 265	Jul	27	10:30	Batista.new 939	Jul	27	10:30	Andrássy.new 544	Jul	27	10:30	Prabhu.new 165	Jul	27	10:30	Tyszkiewicz.new 22	Jul	27	10:30	Mommsen.new 251	Jul	27	10:30	Köppen.new 492	Jul	27	10:30	Della 168	Jul	27	10:30	Bernstein.new 32	Jul	27	10:30	Tippett.new 380	Jul	27	10:30	Sanseverino.new 51	Jul	27	10:30	Pucci.new 377	Jul	27	10:30	Hieronymus 113	Jul	27	10:30	Ghirlandaio.new 65	Jul	27	10:30	Beckett.new 711	Jul	27	10:30	O'Ryan.new 273	Jul	27	10:30	Neumann.new 10	Jul	27	10:30	Matsushita.new 1276	Jul	27	10:30	Ferrero.new 114	Jul	27	10:30	Dietz.new 59	Jul	27	10:30	Amorim.new 29	Jul	27	10:30	Wankel.new 594	Jul	27	10:30	Uexküll.new 20	Jul	27	10:30	Stirner.new 80	Jul	27	10:30	Sridhar.new 234	Jul	27	10:30	Rossetti.new 150	Jul	27	10:30	Nassar.new 115	Jul	27	10:30	Morandi.new 160	Jul	27	10:30	Bulgakov.new 25	Jul	27	10:30	Barks.new 136	Jul	27	10:30	Agnelli.new 350	Jul	27	10:30	Teleki.new 134	Jul	27	10:30	Tarnowski.new 574	Jul	27	10:30	Hamdan.new 93	Jul	27	10:30	Guicciardini.new 589	Jul	27	10:30	Clark.new 97	Jul	27	10:30	Borromeo.new 22	Jul	27	10:30	Bazzi.new 51	Jul	27	10:30	Wolf-Ferrari.new 357	Jul	27	10:30	Sylvester.new 26	Jul	27	10:30	Schichau.new 164	Jul	27	10:30	Scarlatti.new 67	Jul	27	10:30	Noriega.new 24	Jul	27	10:30	Bohlen.new 40	Jul	27	10:30	Boiardo.new 45	Jul	27	10:30	Bosman.new 446	Jul	27	10:30	Braun.new 9	Jul	27	10:30	Gabrielli.new 56	Jul	27	10:30	Haider.new 49	Jul	27	10:30	Jayachandran.new 72	Jul	27	10:30	Jellinek.new 332	Jul	27	10:30	Manning.new 28	Jul	27	10:30	Naryshkin.new 157	Jul	27	10:30	Sachs.new 118	Jul	27	10:30	Sacks.new 101	Jul	27	10:30	Saunders.new 159	Jul	27	10:30	Uccello.new 204	Jul	27	10:30	Velazquez.new 29	Jul	27	10:30	Wills.new 60	Jul	27	10:30	Bergman.new 759	Jul	27	10:30	Haim.new 18588	Jul	27	10:30	Agamemnon.new 3872	Jul	27	10:30	Antigone.new 33458	Jul	27	10:30	Bloomsbury.new 36678	Jul	27	10:30	Cabaret.new 494	Jul	27	10:30	Can-Can.new 23895	Jul	27	10:30	Carousel.new 7172	Jul	27	10:30	Cyrano 47072	Jul	27	10:30	Dune.new 13573	Jul	27	10:30	Euphoria.new 6460	Jul	27	10:30	Falstaff.new 13338	Jul	27	10:30	Faust.new 575	Jul	27	10:30	Fra 1650	Jul	27	10:30	Gidget.new 16873	Jul	27	10:31	Gladiator.new 85498	Jul	27	10:31	Julius 10409	Jul	27	10:31	Medea.new 7415	Jul	27	10:31	Mystic 536	Jul	27	10:31	Peaky 9674	Jul	27	10:31	Peer 16265	Jul	27	10:31	Pericles.new 60538	Jul	27	10:31	Quartz.new 9418	Jul	27	10:31	Salome.new 49778	Jul	27	10:31	St. 84	Jul	27	10:31	The 9885	Jul	27	10:31	Ansible.new 20259	Jul	27	10:31	Arrow.new 57727	Jul	27	10:31	Daily 672758	Jul	27	10:31	The 8853	Jul	27	10:32	Decanter.new 11944	Jul	27	10:32	Dissent.new 13559	Jul	27	10:32	Germania.new 7858	Jul	27	10:32	Guernica.new 29403	Jul	27	10:32	Life.new 6739	Jul	27	10:32	The 809	Jul	27	10:32	The 195831	Jul	27	10:32	The 13864	Jul	27	10:32	Referee.new 2987	Jul	27	10:32	Sunday 24360	Jul	27	10:32	Sunday 154416	Jul	27	10:32	The 5692	Jul	27	10:32	Cage.new 872	Jul	27	10:32	Carpenters.new 2853	Jul	27	10:32	Chrysalis.new 133	Jul	27	10:32	Doors.new 324	Jul	27	10:32	Fernando.new 62059	Jul	27	10:32	Grenade.new 38621	Jul	27	10:32	Guru.new 125	Jul	27	10:32	Happy.new 970	Jul	27	10:32	Hello.new 190	Jul	27	10:32	Jojo.new 13288	Jul	27	10:32	Pink.new 84108	Jul	27	10:33	Sugar.new 16057	Jul	27	10:33	anchorage.new 25	Jul	27	10:33	barks.new 105737	Jul	27	10:33	batman.new 109392	Jul	27	10:33	derby.new 166471	Jul	27	10:33	jersey.new 107237	Jul	27	10:33	limerick.new 121643	Jul	27	10:33	louvre.new 332	Jul	27	10:33	manning.new 7545	Jul	27	10:33	march.new 99124	Jul	27	10:34	piedmont.new 118	Jul	27	10:34	sacks.new 1443	Jul	27	10:34	sandbanks.new 26151	Jul	27	10:34	slough.new 255991	Jul	27	10:34	surrey.new 50366	Jul	27	10:34	troy.new 29	Jul	27	10:34	wills.new 523	Jul	27	10:34	The.new 523	Jul	27	10:34	the.new 48	Jul	27	10:34	Is.new 48	Jul	27	10:34	is.new 337	Jul	27	10:34	were.new 199	Jul	27	10:34	That.new 199	Jul	27	10:34	that.new 370	Jul	27	10:34	said.new 1155	Jul	27	10:34	One.new 1155	Jul	27	10:34	one.new 5430	Jul	27	10:34	goes.new

Bot updating Webarchive template is adding "url" same as existing "url2"
This bot made a group of WaybackMedic 2.5 edits in June where it "rescued" an archive link in the url parameter of Webarchive, replacing it with a this link which was already in the url2 parameter. Two examples of this are and. Can the bot remove the duplicate url2/date2/title2 parameters and renumber any subsequent url3/date3/title3, etc.? I've fixed over 500 of these edits myself, but there are still over 700 remaining to be fixed. Thanks. -- Zyxw (talk) 03:54, 9 August 2022 (UTC)


 * That was part of the deprecation of WebCite which is a dead archive provider. It didn't account for dups. It's complicated here because even though url and url2 are the same, title and title2 are different - which do you choose. I think the best course is the keep url set and remove the url2 set, at least based on two examples. In terms of renumbering that is not required as the webarchive template is designed to allow any numbers up to 10, so long as there is a url .. aka url1 .. is the only requirement. I'll start looking at this today. --  Green  C  15:35, 9 August 2022 (UTC)


 * I agree with keeping the url set and removing the url2 set when there is a duplicate URL and that is what I did for the 500+ already fixed. I also thought Webarchive might automatically handle the missing url2 set and display the url3 set, but as per these tests that is not the case:
 * archive with url/date/title, url2/date2/title2, and url3/date3/title3
 * url2/date2/title2 removed with url3/date3/title3 remaining
 * url2/date2/title2 removed and url3/date3/title3 renumbered
 * -- Zyxw (talk) 16:15, 9 August 2022 (UTC)
 * Reported at Template_talk:Webarchive. I wrote the template originally but Trappist did a major rewrite so I'm not sure if that is my bug or his. I processed the first 500 articles and there are only 3 with a url3 suggesting 40 or 50 at most in the whole bunch. Anyway it won't be difficult to renumber them. -- Green  C  16:26, 9 August 2022 (UTC)
 * Ah miscalculated it's 733 not 7,330 :) It's done see anything more let me know. -- Green  C  17:08, 9 August 2022 (UTC)
 * Fixed the webarchive bug. -- Green  C  18:06, 9 August 2022 (UTC)
 * Reported at Template_talk:Webarchive. I wrote the template originally but Trappist did a major rewrite so I'm not sure if that is my bug or his. I processed the first 500 articles and there are only 3 with a url3 suggesting 40 or 50 at most in the whole bunch. Anyway it won't be difficult to renumber them. -- Green  C  16:26, 9 August 2022 (UTC)
 * Ah miscalculated it's 733 not 7,330 :) It's done see anything more let me know. -- Green  C  17:08, 9 August 2022 (UTC)
 * Fixed the webarchive bug. -- Green  C  18:06, 9 August 2022 (UTC)

Bad webcitation link replacement
So I've just found out that GreenC bot made edits like this, replacing a dead archive link with another dead archive link. Would it be possible to replace that archive link with, say, this one that actually works? Thanks very much! Graham 87 11:48, 26 August 2022 (UTC)


 * Bots are not 100% perfect. It relies on the Wayback API to determine live links and it is not perfect so for those errors it depends on human intervention to correct. The alternative is not to use bots at all, in which case most links never get fixed at all due to the scale, it's back-end boring work people want bots to do, but there is not guarantee bots, or for that matter people, will not make mistakes. The question is the scale of mistakes. --  Green  C  15:08, 26 August 2022 (UTC)
 * Yeah fair enough, soft 404's and all. On re-reading my message I spectacularly failed at phrasing it clearly ... there are nearly a hundred more such links; could you instruct the bot to replace them with a working archive (i.e. the one linked above)? I thought that would be the easiest way to fix this problem. I tried changing the archive link on InternetArchiveBot's side and asking it to fix the affected articles, but that didn't do what I intended. Graham 87 13:34, 27 August 2022 (UTC)
 * OK it's done. Yeah there's no way to automate replace of one archive with another via IABot. That would be a good feature though when finding soft-404s. --  Green  C  16:16, 27 August 2022 (UTC)
 * Opened Phab .. no idea if or when. --  Green  C  16:34, 27 August 2022 (UTC)

Avoid editing inside HTML comments
GreenC bot now edits inside HTML comments eg. Special:Diff/1107954452, but I suggest it not to. Although the edit in this example happened to be harmless (even useful), in general, comments could be used for a wide range of reasons, so there is a higher risk that automatic edits could break their intentions. Wotheina (talk) 03:49, 2 September 2022 (UTC)


 * That's true but there is a positive trade-off so for a couple reasons I am OK fixing certain (not all) link rot in comments, as I have been doing for 7 years. If someone wants to preserve a block of immutable wikitext they should use the talk page, user page or offline - otherwise anyone can edit the comment or delete it entirely. Comments can be strangely formatted, I take measures, auto and manual, to check commented text before posting a live diff. --  Green  C  05:39, 2 September 2022 (UTC)

Stopping backlinks report during wikibreak
Hello, and thanks again for the useful Backlinks reports. I'm currently taking a Wikibreak and have attempted to exclude my list from the bot's tasks but it  today. It's not a problem for me if the reports continue but, if you'd like to save some resources by stopping it properly, please go ahead. Certes (talk) 11:25, 5 September 2022 (UTC)


 * Fixed, it was seeing  in the "#" comment. First time this code has been tested :) Have a good break. --  Green  C  05:14, 6 September 2022 (UTC)

Please Update the monthly list of Top 10000 wikipedia users by Article Count
Please Update the monthly list of Top 10000 wikipedia users by Article Count which changes every 1st and 15th date of a month. Abbasulu (talk) 07:52, 3 October 2022 (UTC)
 * It's still running for some reason very slowly in 3 days it only completed 19%. -- Green  C  12:51, 3 October 2022 (UTC)

Exactly what purpose did this edit serve? Edit summary is misleading at best
https://en.wikipedia.org/w/index.php?title=Rodney_Marks&diff=1095741886&oldid=1091111369 108.246.204.20 (talk) 20:17, 3 October 2022 (UTC)


 * Don't use if the citation has a working archive-url. --  Green  C  20:46, 3 October 2022 (UTC)
 * it doesn't. "this page is not available". 108.246.204.20 (talk) 04:15, 14 October 2022 (UTC)
 * Ah soft-404. Removed. O also updated the IABot databace. -- Green  C  04:24, 14 October 2022 (UTC)

A cookie for you!

 * Thank you. For the Cookie. -- Green  C  14:12, 9 October 2022 (UTC)

RSSSF
Why is this bot changing "website=rsssf.com" to "website=RSSSF", where there is already "publisher=RSSSF" parameter, and then in many pages you get stupid outcome like this with double RSSSF linking? Snowflake91 (talk) 10:27, 7 February 2023 (UTC)


 * Yeah it's not ideal, a work in progress. In any case the problem is there should not be both work and publisher use one or the other not both. And should not use a domain name, use the name of the site, is best practice on Wikipedia. The re are so many RSSSF citations, and so many problems with them, I've done a lot of work to fix them but there are still things that need more work. -- Green  C  15:22, 7 February 2023 (UTC)
 * Prefer website over publisher.  does not include publisher in the citation's metadata.
 * —Trappist the monk (talk) 16:18, 7 February 2023 (UTC)
 * Special:Diff/1038698982/1138241646 -- Green  C  21:44, 8 February 2023 (UTC)

I think all the doubles are cleared, if you see any more or other problems let me know. -- Green  C  21:45, 8 February 2023 (UTC)

WaybackMedic
It seems that WaybackMedic 2.5 is running by GreenC bot 2. However, I can't find its source code of version 2.5 in the Github repo. I need to read the latest code to learn its current behavior. Have you published it yet? -- NmWTfs85lXusaybq (talk) 14:04, 24 March 2023 (UTC)


 * I can send snippets or functions if you want for anything you are interested in. The entire codebase is not currently available for public due to containing some proprietary information. It's written in Nim, and some awk utils. -- Green  C  14:44, 24 March 2023 (UTC)
 * The bot detection of businessweek.com you mentioned in Village_pump_(technical)/Archive_203 may be bypassed by simply assigning an user agent of a web browser in the header of http requests, such as Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36. As far as I know from version 2.1, WaybackMedic may execute external commands (via execCmdEx) to determine page status and the assignment of user-agent should be easily implemented via some available parameters. By the way, as of version 2.1, I can see the validate_robots function is implemented in medicapi.nim. -- NmWTfs85lXusaybq (talk) 16:55, 24 March 2023 (UTC)
 * Thank you for the suggestion to use a browser agent. I tried it, they appear to limit based on query rate, and it's pretty sensitive. I was able to trigger it by manually requesting 8 headers rapidly then it stopped working, sending a header with "HTTP/1.1 307 s2s_high_score" and redirect to a javascript challenge ("press and hold button"). Maybe I could slow the bot down enough between queries, it would be difficult, and extremely slow, perhaps a month or longer for 10k articles, and would need to verify every header is not 307 otherwise abort and manually clear the challenge. Green  C  21:36, 24 March 2023 (UTC)
 * If they limit the query rate based on ip, you can find some web proxies to accelerate this procedure as your bot may behave like a web crawler. After you collect and validate some free proxies, you can just apply them alternately to your bot, although their stability is not guaranteed. -- NmWTfs85lXusaybq (talk) 03:47, 25 March 2023 (UTC)
 * I have access to a web proxy that uses home based IPs and it still didn't work. Maybe the solution is to pull every URL into a file and process them outside the bot with a simple script that waits x seconds between each header query. Then feed the results to the bot which URLs are dead. It can run for however long it wouldn't matter. Trying to do it inside the bot is too error prone too complicated and ties up the bot too long. -- Green  C  04:11, 25 March 2023 (UTC)
 * It's a good idea to run this job outside the bot. However, I'm not sure what you mean by . Have you tried high-anonymity proxies? Did you change proxy IP every time you made a new request? NmWTfs85lXusaybq (talk) 04:45, 25 March 2023 (UTC)
 * The IPs change with every request, and the IPs are sourced to home broadband users globally, so they are not detectable by CIDR block. I don't know how they got blocked, maybe Cloudflare is on this service and recorded all of the IPs. -- Green  C  14:46, 25 March 2023 (UTC)
 * Then I suppose your proxy strategy is OK. Please make sure your web proxy has high anonymity if all of your configuration works fine. -- NmWTfs85lXusaybq (talk) 15:20, 25 March 2023 (UTC)
 * I ran this bot-block avoidance script and it took forever. What I discovered is just about every link should be archived. Either 404, soft-404 or better-off-dead. The later because the links went to content that was behind a paywall or otherwise messed up in some way - so the archived version is better in nearly every case. -- Green  C  14:17, 3 April 2023 (UTC)
 * I see you mentioned some awk scripts as a workaround at Link_rot/URL_change_requests. However, I can't find the meta directory businessweek.00000-10000 you referred to in the Github repo of InternetArchiveBot and WaybackMedic. NmWTfs85lXusaybq (talk) 07:15, 24 April 2023 (UTC)
 * Oh that's a note to myself, if you want the awk script let me know it's nothing more than going through a list of URLs, pausing between each to avoid rate limiting, getting the headers and recording the results and if it's a bot block header notify and abort the script. It also shuffles the agent string. It seemed to learn agent strings and block based on those which could be avoided by retiring an agent and adding a new one. -- Green  C  13:47, 24 April 2023 (UTC)

Backlinks report 2023
User:Certes/Backlinks/Report has stopped updating. The bot is running, as User:GoingBatty/Backlinks/Report still updates. I've not changed the job list in User:Certes/Backlinks since 8 May, nor pressed the stopbutton. Do you know how to restart the report please? Certes (talk) 12:17, 4 June 2023 (UTC)


 * The process from June 2nd crashed for unknown reason and turned into a zombie preventing future runs. I can't kill it so I contacted Toolforge admins for help. -- Green  C  14:17, 4 June 2023 (UTC)
 * Working again now – thanks! Certes (talk) 21:50, 4 June 2023 (UTC)

Archiving chapter urls
This is a bit of an edge case with GreenC bot's archive repair task, so I wanted to get your opinion. In several articles where I'm citing an archived book that has separate PDFs for each chapter, I use the |archive-url= parameter for the chapter url (since that's the most important one) and have a Wayback url for the book url in the |url= field. It's not ideal, but I'm not sure how else to handle it. My brief search also found this thread where you indicated that |archive-url= was okay to use for the chapter url. However, GreenC bot switches the |archive-url= field to be the archive of the |url= field (example here).

Is there a better way to format these citations? I'm not able to find any. Otherwise, is there any way I can mark the citations to be ignored by the bot? This seems like a relatively rare case; I imagine it's not worth modifying the bot to handle. Thanks, Pi.1415926535 (talk) 22:14, 14 August 2023 (UTC)


 * Special:Diff/1170358971/1170410520. Another option:
 * I like this better because it doesn't hack the cite book template arguments. The downside is the display is a little messier. Another way with some duplication:
 * From
 * To keep the bot off the citation add template after the end of the cite book but inside the ref tags. --  Green  C  02:17, 15 August 2023 (UTC)
 * Thanks, much appreciated. Pi.1415926535 (talk) 17:15, 15 August 2023 (UTC)
 * Please take a look at Special:Diff/1171111146, where the bot edited several citations already tagged with cbignore. Thanks, Pi.1415926535 (talk) 06:35, 21 August 2023 (UTC)
 * I found two problems. 1) The should follow directly after the template it targets: Special:Diff/1171510462/1171514730 - I think the cbignore docs has this. 2) My bot has a known limitation. Within any block of text between new lines (ie. a paragraph of text), if there is more than one cbignore, the citations the cbignore follows all need to be unique. In this case the two citation are mirror copies. The bot ignored the cbignore for that reason (it has to do with disambiguate it needs to know which citation to target). So, I modified one of the citations, they are now unique: Special:Diff/1171514730/1171514803 (changed the semi-colon to colon in the publisher field for the first citation) -- a bit quirky but tested and it works now. I do recommend though using the alt suggestions above because while my bot honors cbignore most other bot's do not and eventually in the future it's probable some other tool will try to "fix" what it detects as an error (archive URL in the url field). --  Green  C  15:45, 21 August 2023 (UTC)
 * I found two problems. 1) The should follow directly after the template it targets: Special:Diff/1171510462/1171514730 - I think the cbignore docs has this. 2) My bot has a known limitation. Within any block of text between new lines (ie. a paragraph of text), if there is more than one cbignore, the citations the cbignore follows all need to be unique. In this case the two citation are mirror copies. The bot ignored the cbignore for that reason (it has to do with disambiguate it needs to know which citation to target). So, I modified one of the citations, they are now unique: Special:Diff/1171514730/1171514803 (changed the semi-colon to colon in the publisher field for the first citation) -- a bit quirky but tested and it works now. I do recommend though using the alt suggestions above because while my bot honors cbignore most other bot's do not and eventually in the future it's probable some other tool will try to "fix" what it detects as an error (archive URL in the url field). --  Green  C  15:45, 21 August 2023 (UTC)

Incorrect dead flags and archive.today
Hello ! Your bot recently made this strange edit to Pokémon. In it, the bot changed "archive.is" and "archive.ph" to "archive.today". I'm not sure what purpose this has. The task is not explained on User:GreenC bot.

Furthermore, the bot flagged these three sources as dead:


 * https://www.theguardian.com/technology/gamesblog/2013/oct/11/pokemon-blockbuster-game-technology
 * https://order.mandarake.co.jp/order/detailPage/item?itemCode=1052117728
 * https://www.nytimes.com/1997/12/20/news/big-firms-failure-rattles-japan-asian-tremors-spread.html

But as you can see, the above links are not dead. So something must've gone wrong there. I've remarked these refs as live. Cheers, Manifestation (talk) 11:04, 19 August 2023 (UTC)


 * Archive.today is what the owner of archive.today wants us to use, it's a redirector that sends traffic to other domains as they are available. The reason those three got marked dead is there was an archive URL in the url field and the bot moved it to the archive-url field and the bot assumes if someone put an archive URL in the main url field it was probably a dead URL. -- Green  C  14:47, 19 August 2023 (UTC)
 * Aaah! So that's why. I wrote the text, so I take full responsibility for the  /   mixup. As for archive.today: I looked at our article, and it cites this tweet from 4 January '19 in which the owner states that the .is domain might stop working soon. However, the domain is still active. In fact, the '@' handle used by the account to this day is still "@archiveis". I've used archive.today many times, including this year. It always gave me either a .is or a .ph link. Cheers, Manifestation (talk) 15:07, 19 August 2023 (UTC)
 * Yeah it redirects to one of the 6 domains like .is or .ph .. but if one of those domains gets shut down by the registar, he can switch where it redirects to easily, without having to change every link on Wikipedia. -- Green  C  15:24, 19 August 2023 (UTC)
 * Hmm ok. Well I guess we should honor his/her request then. For the sake of clarity, maybe the description of Job #2 / WaybackMedic 2.5 on User:GreenC bot could be expanded a little to include a mention of archive.today? archive.today is not part of the Internet Archive, so the term "WaybackMedic" is a bit misleading. - Manifestation (talk) 16:03, 19 August 2023 (UTC)
 * Alright I updated fix #21 which also now links to Help:Using_archive.today. It started out as Wayback-specific then expanded to all archive providers but I kept the original name anyway. -- Green  C  16:41, 19 August 2023 (UTC)
 * @GreenC Hi! I know that .today is the domain to be used, but every time i try to open a link with .today it returns me a "This site cannot be reached" type of error, and the same goes with .ph links. The only active links i get are the one with .is Astubudustu (talk) 10:55, 2 April 2024 (UTC)
 * This is because the DNS resolver you are using is hosted on CloudFlare and that won't work (well) with archive.today domains see Archive.today -- Green  C  15:38, 2 April 2024 (UTC)

WaybackMedic 2.5 adding unneceesary URLs
I saw the bot's task run on Guardians of the Galaxy (film) here and it made edits to three references that used Cite Metacritic, Cite Box Office Mojo, and Cite The Numbers, adding in unnecessary URLs and marking the links as dead. The citation templates construct the urls from the given parameters (as most follow a common format on those sites) and were not dead. Didn't know if this was a bot issue, or the templates themselves doing something that is flagging the citations to make the bot adjust them. I can look into the templates to see what the issues may be if that is ultimately the case (and to know what to look for for the error). - Favre1fan93 (talk) 14:16, 24 August 2023 (UTC)


 * That is a bot error. It is in 9 articles. I rolled them back (you got 2). Thanks for the report. -- Green  C  15:00, 24 August 2023 (UTC)
 * No problem, thank you! - Favre1fan93 (talk) 15:26, 24 August 2023 (UTC)

Timestamp mismatch
This bot is changing the archive-url as seen here, but it is not changing the archive-date as required, creating a timestamp mismatch error, as seen here. I just recently emptied this category and now it has over 80 articles (when I wrote this) in it again. Your help would be appreciated. Thanks. Isaidnoway (talk) 05:57, 2 September 2023 (UTC)


 * I am aware, did it in two steps, because of the way this particular job was programmed, it was easier this way. You saw it in that 30-minute gap between runs-- Green  C  16:11, 2 September 2023 (UTC)

My bot can empty that category easily. It was 40,000 a week ago. Got it down to few hundred edge cases, which I assume you fixed manually, thank you. I'd like to fully automate it, but right now it's all integrated into WP:WAYBACKMEDIC which can't be fully automated, so I run it on request. -- Green  C  16:16, 2 September 2023 (UTC)

User:Isaidnoway, I'm running a bot job to convert archive.today URLs from short-form to long-form. Example. It is exposing old problems with date mismatches that are showing up in Category:CS1 errors: archive-url -- after this bot job completes, I'll run another bot to fix the date mismatches, it will clear the tracking cat. No need to do anything manually. -- Green  C  04:57, 8 September 2023 (UTC)


 * Hi GreenC! My bot is following yours today.  There were several instances when your bot reformatted archive URLs like this edit, mine fixed the archive dates like my bot did in the following edit.  My bot is running on Category:CS1 errors: dates, and pulling the archive date from the archive URL.  Any chance your bot could do it all in one edit?  Thanks!  GoingBatty (talk) 18:25, 8 September 2023 (UTC)
 * I used to be able to fix archive.today problems and date mismatches in the same process, but it was semi-automated. Fixing archive.today problems can and should be full-auto, so I separated that out to its own process that uses EventStream to monitor real-time when a new short-form link shows up, log the article name, and once a month or so fix them - all full-auto. Across 100s of wikis. The downside is this program can't fix date mismatch problems. I want to fix date mismatches automatically, and hope to do that eventually with its own process. Once I have that developed I can see about including it in the archive.today program, so it saves the extra edit, when the source of the date mismatch is archive.today short to long conversion.
 * The tracking category will be cleared in the next few hours, it's currently generating diffs. This is a one-off event clearing out the backlog of archive.today problems which exposed a lot of problems. Going forward there will be much smaller numbers. We both currently have bots that can clear that category on request, do you know how to update the docs for the category page? --  Green  C  23:41, 8 September 2023 (UTC)
 * Not sure which category page you're referring to, but most of the text on these category pages comes from Help:CS1 errors, so if you updated the help page, it would also appear on the appropriate category page. GoingBatty (talk) 03:15, 9 September 2023 (UTC)
 * Category:CS1 errors: archive-url. Do you want me to include your bot in the doc as available to clear the cat on-request? I'm going to mention WaybackMedic is available, but only if there are more than 500 entries. -- Green  C  14:25, 9 September 2023 (UTC)
 * I don't have a bot to clear Category:CS1 errors: archive-url. GoingBatty (talk) 18:21, 9 September 2023 (UTC)
 * Oh I see I misinterpreted what you said above I thought it was fixing mismatched dates but it was actually fixing an incomplete date. -- Green  C  19:12, 9 September 2023 (UTC)

Economy of Zimbabwean
I need some help Mindthem (talk) 21:13, 25 September 2023 (UTC)


 * @Mindthem: How would you like the bot to help with the Economy of Zimbabwe article? GoingBatty (talk) 19:20, 29 September 2023 (UTC)

Backlinks
Hi there! I see your bot delivered a new Backlinks report for Certes, but I didn't receive an update today. Could you please give the bot a nudge? Thanks! GoingBatty (talk) 19:21, 29 September 2023 (UTC)


 * I saw some messages this morning Toolforge was down due to NFS, likely your run didn't complete before the outage. I see it aborted around 09:32GMT and Certes finished at 09:28 .. with minutes to spare. I'll run yours again now. -- Green  C  19:37, 29 September 2023 (UTC)
 * Report received - thank you! GoingBatty (talk) 02:49, 30 September 2023 (UTC)

Bot put italics in strange places
I don't know what happened here, but the bot appears to have put italics in place where they didn't belong, and then missed putting them in where they did belong. Given that the bot had to edit three times, I imagine this bot run was stressful for you. If this code is still active, it might need yet another debugging. – Jonesey95 (talk) 18:26, 19 October 2023 (UTC)


 * Yeah this was a pain, every time I thought it was done, some new issue came up. And getting those ticks right, in the right place, after the fact, wasn't easy. Anyway this task is done for me (1,200 articles deletion of ). If you see any problems they need manual adjustment. I don't think the number of problems is very large from spot checking. -- Green  C  18:35, 19 October 2023 (UTC)
 * I think you are correct, based on my perusal of the list of Linter errors. – Jonesey95 (talk) 18:54, 19 October 2023 (UTC)

Flagging non-dead link as dead (2)
Hello. Why did GreenC bot rewrite url-status=live to url-status=dead in Special:Diff/1186567077 for a live URL? The URL is alive, at least from Japan as of 2023-11-24 04:50 UTC (checked with Firefox and Chrome on Windows 10). Wotheina (talk) 05:05, 24 November 2023 (UTC)


 * It's freemimum content. Open an incognito window and see if it gives a different result. I tried to archive premium content pages for NatGeo because they use a freemium wall. View page source and search on "freemiumContentGatingEnabled". -- Green  C  05:42, 24 November 2023 (UTC)
 * I see. I agree on switching from paywalls to archives, but for such unintuitive edits please write the intention somewhere, as in edit summary or embedded comment, or at least in User:GreenC/WaybackMedic 2.5. I think url-access= is the best way, but I guess you are not using that because there is no option "url-access=freemium" yet. Wotheina (talk) 06:46, 24 November 2023 (UTC)
 * freemium is a great idea. Until it appears, I think live is less bad, or for a bonus point live&lt;!--freemium--> which can be converted in bulk later.  I can see the goats too, but I block a lot of third-party scripts which might hide them in standard browsing. Certes (talk) 16:25, 8 December 2023 (UTC)
 * Regarding "live is less bad", did you mean "live is less bad"? Wotheina (talk) 17:24, 8 December 2023 (UTC)
 * Yes, sorry, I was confusing the two parameters. live seems more accurate than dead here.  The least bad value for status might be limited.  I can't find a definition of limited to determine whether freemium falls within its scope. Certes (talk) 18:34, 8 December 2023 (UTC)
 * When I did NatGeo, I didn't have the ability to add archive URLs with live so unfortunately they were all set to dead. I have since added this ability after it was requested at Link_rot/URL_change_requests by User:Alexis Jazz. I'm not sure about going back and resetting from dead to live the NatGeo links that are freemium, that would probably require some special one-off code and a lot of time to recheck all the links. But it's the kind of thing anyone could probably do pretty easily, if you have code to parse and edit CS1 templates. --  Green  C  17:34, 8 December 2023 (UTC)

Backlinks timing
Hi there! I noticed that the Backlinks report hasn't run yet today for or me. Looking at the bot's contributions, I see the report is running later each day this week. Could you please check the bots to see what's going on? Thank you! GoingBatty (talk) 15:22, 8 December 2023 (UTC)


 * I started monitoring Buenos Aires as an experiment, not because its new links are likely to be wrong but because socks of a certain puppetmaster love linking to it. I've just removed it from my list, in case this widely-linked page is causing problems. Certes (talk) 16:18, 8 December 2023 (UTC)
 * They are forks of the same script, they run on different cron jobs and directories, thus not be possible to effect each other. If both are not working I dunno I'll check. -- Green  C  17:40, 8 December 2023 (UTC)

GoingBatty & Certes, I found a bug that only shows up when running from cron. It wasn't apparent when the script was on Toolforge because there you signify the working directory with  with the jsub command which masked the problem. The effect of the bug was to create duplicate entries in the list at /Backlinks which is why it kept taking longer each run. For example GoingBatty had 7 instances of "hamlet" (from the scripts perspective), one for the original and 6 for each day the script ran. So I think the best solution is wipe out the data files again and start over, the data files look kind of weird anyway. The usual, you'll see the message about new entries, then the next one should be good. -- Green  C  18:20, 8 December 2023 (UTC)


 * On December 8, the bot started over and published a report, but didn't publish a report for December 9. Could you please check it again?  Thanks! GoingBatty (talk) 04:34, 10 December 2023 (UTC)

GoingBatty, I don't know what happened. Nevertheless, it is working now. It looks system-level. Cron logs show the process ran, but it didn't. No apparent reason, and I can't replicate. Weird. Let me know if it doesn't run again, I enabled verbose logging. Also during testing I moved the job time to around 5:30 GMT .. or do you want the previous 8:30? Or some other time? -- Green  C  06:01, 10 December 2023 (UTC)


 * Thank you! I'd prefer the previous 8:30, as I'm likely to see the 5:30 job right before I should be going to sleep, and then be tempted to stay up too late to address them immediately.  Thanks! GoingBatty (talk) 07:04, 10 December 2023 (UTC)

User:Certes during testing your most recent report lost some data, seen below. -- Green  C  06:01, 10 December 2023 (UTC)


 * Thanks; I'll take a look at those. I've a slight preference for 0830 over 0530, as I tend to look at the entries about 1000-1200 UTC and the fresher the better. Certes (talk) 16:07, 10 December 2023 (UTC)

It didn't run again. The logging helped. I'm narrowing in on the problem and made some changes. We'll see what happens next run. -- Green  C  21:18, 11 December 2023 (UTC)

At some point when this issue is resolved, are you willing to open Backlinks to other users? For example, see Help desk. Thanks! GoingBatty (talk) 04:18, 12 December 2023 (UTC)

So, it does appear my IP is being rate limited by WMF. I moved all my tools off-site and it's generating a lot of traffic. The solution is to add a retry loop with pauses. Will try that next. -- Green  C  14:42, 12 December 2023 (UTC)


 * Would moving the tools on-site be a solution? I know they just made that a whole lot more difficult by deprecating GridEngine. Certes (talk) 14:49, 12 December 2023 (UTC)
 * That will take time because I think it will require building a custom kerbenos image which is a learning curve. I have a ticket open asking them about this but no reply yet. I should have been using a retry loop anyway so this will help either way, I have a function, but was apparently lazy and didn't call it. -- Green  C  15:26, 12 December 2023 (UTC)
 * A lot of people will be climbing the same learning curve. It would be nice if we had a page for giving each other a leg up.  Sadly (or perhaps gratefully), I've never had to use Kubernetes and so can't be of much assistance. Certes (talk) 16:17, 12 December 2023 (UTC)
 * I hope to learn the system eventually, probably good thing to know. -- Green  C  18:02, 12 December 2023 (UTC)

Ran both manually with the new code. It will keep requesting when it gets a 429 ("Too many requests"). It tries 20 times with a 2 second delay. I have seen it make up to 5 requests, but it will depend on WMF server load. The jobs will run on the regular morning schedule tomorrow. -- Green  C  18:02, 12 December 2023 (UTC)


 * If it's not too much work, escalating the delay might be good for both the program and the server, e.g. if the nth try fails, wait n seconds. (Exponential is recommended but seems extreme.) Certes (talk) 18:15, 12 December 2023 (UTC)
 * There are too many tool making constant requests it almost doesn't matter, they are going to saturate regardless. I'm concerned because if slowed down too much the work never gets done. Will keep on it. It will email if/when it reaches 20. --  Green  C  19:49, 12 December 2023 (UTC)
 * Hmmm. It sounds as if they need a bigger computer.  They can afford it. Certes (talk) 22:33, 12 December 2023 (UTC)

Everything looks good today. Thank you. The only difference from before is that the output now appears alphabetically by target rather than sorted as in the parent page, but that's not a problem. Certes (talk) 10:13, 13 December 2023 (UTC)


 * Because there were duplicates in the parent page I had to unique the list which required a sort. I tried to unique it in a way that doesn't require a sort ie., but for some reason it dropped one of the entries.. I didn't have time to investigate it so went with the tried and true method of  . You can try this yourself with the list of entries and see if the results differ in the number of entries on output compared to input. --  Green  C  15:43, 13 December 2023 (UTC)
 * That sounds very reasonable. (  may work on your system too.) Certes (talk) 16:33, 13 December 2023 (UTC)

Buck Goldstein
Hi there! In this edit, your bot changed an incorrect url parameter, which added the article to Category:CS1 errors: URL‎‎. Should the bot have done something different, or should it ignore the url parameter and only update the archiveurl/archive-url parameter? Thanks! GoingBatty (talk) 06:02, 18 December 2023 (UTC)


 * You mean Special:Diff/1187499427/1190066019. The bot that runs this process is a global bot, it is not programmed to handle templates in different languages, it only operates on the URL itself, not with template knowledge. The bot didn't do anything wrong, that wasn't already there; it's only purpose is to normalize archive.today URLs wherever they happen to be. If that caused the pre-existing error to be exposed in the tracking cat, it's a step forward. --  Green  C  06:32, 18 December 2023 (UTC)

Preserving the correct archived version of archive.today links
In this edit, WaybackMedic 2.5 attempted to reformat a link to archive.today that had multiple different archives, but used the archive of the wrong date. The pre-existing link https://archive.is/2Ljk6 is an archive from 24 November 2023. The link should have been converted to http://archive.today/2023.11.24-014538/https://www.bloomberg.com/press-releases/1999-11-08/pokefans-can-now-eat-their-hearts-out-with-candy-planet-s (the "long link" for the page), but was instead converted to https://archive.today/20231124014538/https://www.bloomberg.com/press-releases/1999-11-08/pokefans-can-now-eat-their-hearts-out-with-candy-planet-s, which corresponds to the 6 December 2023 archive. This resulted in the new archive link leading to an archive of a 404 page instead of the successfully archived page, and the  parameter not matching the timestamp on the page or in the long URL.

Ideally, the bot would notice when the new URL's archive date does not match the old URL's archive date and not make the edit if it cannot resolve this. Also, ideally it would catch when the citation template's  doesn't match the URL's archive date, and either adjust the template's   or display some kind of warning. Snorlax Monster  12:09, 1 January 2024 (UTC)
 * Actually, there also appears to be an issue on archive.today's end. While the page https://archive.md/2Ljk6 does have a share option that says that http://archive.today/2023.11.24-014538/https://www.bloomberg.com/press-releases/1999-11-08/pokefans-can-now-eat-their-hearts-out-with-candy-planet-s is the correct long URL, as it turns out, that long URL redirects to the 404 archive as well. In cases like that, I think WaybackMedic 2.5 should not change the URL to the long version, until archive.today corrects their long URLs for URLs with multiple archives. -- Snorlax Monster  12:12, 1 January 2024 (UTC)


 * That's strange. Looks like a one-off error at archive.today .. never seen it before. I can't verify every new long archive.today is the same, because of the resource load on archive.today servers would double, and the time it would take for the bot to finish. Unless there is evidence of a widespread problem, but in 7 years and over half a million conversions this is the first time it's been reported. All I can do for now is add a static string to the code to skip processing when it sees 2Ljk6. Other tools might try to do the same conversion like IABot or possibly Citation Bot. This is a tricky problem to solve long term. Ideally archive.today would be notified, is the correct solution. --  Green  C  19:23, 1 January 2024 (UTC)
 * I notified archive.today about the specific issue with the long URL via their "report bug or abuse" button, but I have no idea how likely those reports are to get read. I think just manually excluding that specific case is the best option for now.
 * With regards to validating that the target page is the same, I think it should be as simple as checking the timestamp is the same (ignoring that bug I mentioned in my second message, where the long URL can redirect to the wrong version). I assume whatever API you're using to get the long URL from the short URL returns the archive date of the short URL in the request you are already making—the long URL has the archive date in the URL itself, so to me it seems like it should be possible to validate that the archive date hasn't changed by just comparing those two values, without needing any additional API requests to archive.today. But I also don't know what the code your bot uses, so I can't verify my assumptions about how it works. (I tried taking a look at the GitHub page linked on User:GreenC/WaybackMedic 2.5, but it appears that it is for Wayback Medic 2.1 and doesn't include the  function that's included in Wayback Medic 2.5.) -- Snorlax  Monster  13:22, 2 January 2024 (UTC)
 * There is no API for this. You download the HTML of the short URL page, and the long form is there towards the top (view source search on "long link"). The GitHub code is old, but you can see it here at line 173. If the long form URL goes to a different version of the HTML, as in this case, I would need to download both the short and long HTML page, and run a string comparison to see if they are approximately the same HTML. Thus downloading HTML twice. --  Green  C  22:28, 2 January 2024 (UTC)
 * Ah okay, I suspected it could just be plain web scraping. Anyway, what I was trying to suggest was just comparing the date in the URL with the date on the HTML page (so there would be no need to resolve the long link). However, I had missed that the date in the long URL you retrieved was the correct one—the issue was entirely that archive.today redirects it. -- Snorlax Monster  23:34, 2 January 2024 (UTC)

bug report
At this edit, GreenC bot copied a malformed wayback machine url from url into archive-url. It ought not to have done it like that.

The wayback machine url is malformed because its timestamp is not an acceptable length (14 digits preferred, 4 or 6 tolerated). cs1|2 emits an error message for single-digit timestamps and another error message when the values assigned to url and archive-url are the same.

—Trappist the monk (talk) 01:46, 30 January 2024 (UTC)
 * Also, not clear where 2007-06-15 came from.
 * —Trappist the monk (talk) 01:49, 30 January 2024 (UTC)

Bug report: Incorrect archive-date
Hi there! In this edit, the bot added 18990101080101. Is there something you could add to the bot to prevent the addition of incorrect dates such as this? Thanks! GoingBatty (talk) 18:22, 30 January 2024 (UTC)


 * I do have warnings but apparently was lazy and forgot to check the logs. -- Green  C  20:08, 30 January 2024 (UTC)

bug report (2)
recently bloomed. I have just fixed these four articles broken by Wayback Medic 2.5: Every error was a archive-date mismatch with the archive-url timestamp. archive-date was always off by one day; always earlier than the time stamp except for this one from 2024 Noto earthquake.
 * 2023–24 Australian region cyclone season
 * 2024 in science
 * 2024 Noto earthquake
 * 101955 Bennu

—Trappist the monk (talk) 18:57, 1 February 2024 (UTC)
 * And then there is this one that is off by a couple of weeks, this one off by a year. So it looks like what I wrote above may not hold much water...
 * —Trappist the monk (talk) 19:08, 1 February 2024 (UTC) 19:37, 1 February 2024 (UTC)

The date mismatch error preexisted. The bot only made it more obvious, so that CS1|2 error-checking is now able to see it. I would prefer to fix the archive-date at the same time as expanding archive.today URLs from short to long form (per RfC requirement). However this task is universal it operates on many wiki language sites, it does not have knowledge of template names or arguments in other languages. It only expands a URL wherever it may be, it doesn't look at templates. That would require another universal bot I guess, that can operate on CS1|2 templates in multiple languages. If you want to write one, I have the approval to run it. The reason the dates are frequently offset by 1 day, users add an archive.today link they just created, set archive-date to their relative location, but the archive.today uses UTC time, which has already passed into a new day. The ones offset by a week or year are user entry errors. -- Green  C  21:49, 1 February 2024 (UTC)

bug report (3) Bot ignores cbignore
Here [] I noticed that the bot edited an external link with cbignore after it. I compared the links before and after the edit to see why the cbignore template was there. The long and short links are from different dates and display different content. The altered link no longer contained the relivent content. This would not matter if the bot observed the cbignore.--198.111.57.100 (talk) 17:05, 4 June 2024 (UTC)


 * OK this problem is complicated. There are multiple things going on.
 * All short-form archive.today links need to be expanded to long form. This is required as Wikipedia does not allow URL shortening which has security problems.
 * Archive.today has a bug. When saving links from WebCite, it incorrectly gives the long form.
 * Incorrect: http://archive.today/UfV6G --> https://archive.today/20121120012223/http://romeoareateaparty.org/wordpress/2012-candidates-2/races/u-s-senate/
 * Correct: http://archive.today/UfV6G --> https://archive.today/20121120012223/https://www.webcitation.org/6CIutMLaZ?url=http://romeoareateaparty.org/wordpress/2012-candidates-2/races/u-s-senate/
 * Notice the "Correct" version includes the original WebCite URL. The "Incorrect" version excludes the WebCite URL.
 * GreenC bot has a bug in that it can't see cbignore when making these changes.
 * GreenC bot has a bug in so far as it doesn't detect the Archive.today bug
 * So I need to make some adjustments to work around the Archive.today bug. I also need to report the bug to Archive.today though there is no guarantee they will fix it. -- Green  C  17:28, 4 June 2024 (UTC)
 * Update the bug is reported to Archive.today -- Green  C  18:14, 4 June 2024 (UTC)
 * Archive.today fixed it. -- Green  C  21:01, 4 June 2024 (UTC)
 * Thank you!--198.111.57.100 (talk) 16:27, 6 June 2024 (UTC)

Please don't convert old Google patents links to archive.today
This is a very unhelpful change: special:diff/1227937929. The links on the archived page to PDFs and drawings all 404, meaning that the actual content of the patent is not accessible. Nor are any of the other features originally presented by Google patent search. This type of archive page should not ever be used for patents. You should either fix the Google patent URLs, which is fairly trivial (you can see the fix for this page at special:diff/1227941924), or switch to links to the US patent office or similar.

Can you please revert or properly fix all of the similar recent edits you have made across Wikipedia? (Judging from your recent contribution list it seems like there were a lot.) Otherwise you're just creating work for someone else / leaving confused readers. –jacobolus (t) 16:41, 8 June 2024 (UTC)


 * 1. You should post this in the forum linked in the edit summary: WP:URLREQ - that's the community forum for this task that everyone is reading.
 * 2. There is nothing my bot can't do. And there is nothing that is permanent or can't be changed or undone. Do not panic or become upset.
 * 3. Give me details. I will do it. But I need information. You gave a diff saying it's trivial, but how do I obtain https://archive.today/20121211035219/http://www.google.com/patents?id=lvNwAAAAEBAJ is the same as https://patents.google.com/patent/US417831A ? There is a code in the second URL that does not exist in the first URL.
 * Anyway, please follow at URLREQ so others can know what's going on. -- Green  C  16:52, 8 June 2024 (UTC)

Job 18 showing up in WPCleaner
I'm running the WPCleaner and noticed that Error 95 (Editor's signature of link to user space) has flagged the bot, specifically Job 18, on a ton of pages (Arundhathi Subramaniam is one to give an example). It looks like the bot signature is in the "reason" field of the template

{{verify source |date=September 2019 |reason=This ref was deleted Special:Diff/893567847 by a bug in VisualEditor and later restored by a bot from the original cite located at Special:Permalink/893405019 cite #4 - verify the cite is accurate and delete this template. User:GreenC bot/Job 18

I don't have a count of the pages, but it's not an insignificant amount from what I can see. Lindsey40186 (talk) 02:16, 11 June 2024 (UTC)


 * I don't know about WPCleaner, or what the error message means. It was an old bot job, that no longer runs. It was a peculiar and difficult situation. -- Green  C  03:56, 11 June 2024 (UTC)

Typo
After Link_rot/URL_change_requests, the bot is adding links to Deccan Chronical instead of Deccan Chronicle. See and. DareshMohan (talk) 18:59, 14 June 2024 (UTC)


 * Oh sheesh, thanks. Fixed Special:Diff/1228320785/1229089609 in 829 pages . -- Green  C  20:17, 14 June 2024 (UTC)

Thanks
Hey, I just want to say thank you for using the Wayback Machine for MTV News for my citations. Can you do that for Drag-On's album Hell and Back? Ill post the original link. JuanBoss105 (talk) 13:30, 2 July 2024 (UTC)


 * Hey, I found a link to a MTV.com source that can be used for Rocafella. Can you add it using the wayback machine?
 * https://www.mtv.com/news/c1psz3/state-property-members-stress-independence-dont-take-orders&ved=2ahUKEwiS1cGYwIiHAxUdD1kFHf0oCVYQFnoECCIQAQ&usg=AOvVaw1m9yMSZqvcQC7xuV2PKS9D JuanBoss105 (talk) 13:53, 2 July 2024 (UTC)
 * User:JuanBoss105: I found an archive URL with a different source URL: https://web.archive.org/web/20150122173241/http://www.mtv.com/news/1498885/state-property-members-stress-independence-dont-take-orders/
 * I found it using the archive's search feature: Search: "State Property Members Stress Independence".
 * You can find other archive URLs at MTV.com this way.
 * For example in Special:Diff/1231668617/1232196891 you added https://www.mtv.com/news/v0uzg8/norah-jones-tops-a-mil-at-1-kanye-west-settles-for-2 you can find the archive URL by going to this search page: Search: "Norah Jones tops a mil". --  Green  C  16:07, 2 July 2024 (UTC)

Tampabay.com
Stop running this right now on tampabay.com links. Every one I've checked is wrong. It is adding archive links (okay) to currently live articles, and tagging them as dead (wrong). Also is overriding explicit |url-status=dead to |url-live when it encounters redirects to the main page of tampabay.com. Tired of fixing these because GreenC bot is on a roll. ▶ I am Grorp ◀ 00:21, 12 July 2024 (UTC)
 * Clarification: Not every single instance, but too many, for sure.   ▶ I am Grorp ◀  00:31, 12 July 2024 (UTC)

Oh shoot, looks like they used an exotic redirect mechanism, it fooled the bot. I have a way around it, but this is the first I became aware of it. I'll have to reprocess. Anyway, thanks for the info. BTW you should post error reports in the section linked in the edit summary, that is the discussion for this job. -- Green  C  00:38, 12 July 2024 (UTC)
 * That was gibberish to me so I found this talk page. I just now put a link from there to here. You're welcome to copy this over there, and delete this thread, if that makes more sense. I'll watchlist both.   ▶ I am Grorp ◀  00:42, 12 July 2024 (UTC)
 * Not all of the edits were incorrect or needed correcting. If you want a list of which ones I corrected, then they're in my contributions list from 22:10, 11 July 2024 to 00:37, 12 July 2024 (UTC). All but the first of my corrections has "GreenC bot" in the edit summary. (I edit in a topic area that relies heavily on tampabay.com, many of which are on my watchlist.)   ▶ I am Grorp ◀  00:53, 12 July 2024 (UTC)

Grorp,
 * 1) Special:Diff/1233941553/1233989259 - this appears to be a one-off, maybe a network transient. When I run the page again (locally) the problem does not happen. I'd be surprised there are more like this. It can happen but I don't think it's systematic or common. If you see more, let me know.
 * 2) Special:Diff/1233948702/1233989465 - exotic redirect problem noted above
 * 3) Special:Diff/1233957098/1233990527 - ditto
 * 4) Special:Diff/1233959661/1233991011 - archive.today I manually verify beforehand. This one is a manual verification error, which is rare, but not impossible. I can provide a list of the archive.today URLs that were added (193).

I can redress the exotic redirect, which looks to be limited to URLs ending in .ece --  Green  C  01:29, 12 July 2024 (UTC)


 * Update: I found 29 instances of the exotic redirect, among the set of 6,846 pages, or less than 1/2 of one percent. Of the archive.today error, there was one in 193, or about the same 1/2 of one percent. Thanks for the report, find any other problems let me know. -- Green  C  02:42, 12 July 2024 (UTC)


 * Thanks. Will do.   ▶ I am Grorp ◀  05:45, 12 July 2024 (UTC)

I have no idea how to decipher/restore/resurrect these old pqarchiver links (like in your fourth example above). If there's a writeup, or some tips, please point me in the right direction. I do come across these fairly regularly in this topic area I edit; many point to old sptimes.com news articles (St Petersburg Times was bought out by Tampa Bay Times). If there is any way I can resurrect an actual copy of some of these old articles, I'd like to try to fix some of them. ▶ I am Grorp ◀ 05:45, 12 July 2024 (UTC)


 * I found 63 pqarchiver links (out of the 193 archive.today links added) and they all worked, except this one. If it doesn't exist at archive.org or archive.today it's probably gone forever need to find an alternate source probably. -- Green  C  06:09, 12 July 2024 (UTC)