Wikipedia:Citation Archive

Wikipedia, perhaps even Wikimedia, needs a self-sustainable web archiving infrastructure as a long-term solution to link rot.

Wikipedia doesn't do entropy as well as it could. Organisations merge, get renamed and close, their websites tend to reflect that. Much of Wikipedia is referenced to other websites, and over time all of those websites are likely to be deleted, amended or just renamed.

It is technically feasible to have a bot that checked all new cites, and where there is a website to link to, archive a copy of that page. Then have the bot monitor all wikipedia cites on a regular basis and when a link goes red or the content at the other end of the link changes, redirect the citation to link to the citation archive. Bonus points if your citation tools in future let you highlight the actual text, segment of video or segment of audio you are citing and have that reflected in the archive. That would also enable more sophisticated archival that allowed for conditions such as the cited page has been changed, but the cited text is unchanged, or the cited text appears to have moved to a different page within the same website. Or the video has been rerecorded with a different narrator, or the audio has been edited to remove ums, ahs and unnecessary pauses.

I'm not a lawyer, and this would need a good one, and probably a legal system such as the US with its "Fair Use" provision. It might even require us to limit access to the bluelinked part of the archive site to Wikimedians with userrights such as reviewer and admin, just as one reason for only admins to have access to deleted edits is that they contain copyrighted information including copyright violations. But hopefully the redlinked parts of the website could be published under Fair Use as the original website was now gone. You'd also need a facility for organisations to tell us that they had reorganised parts of their website and give us the mapping logic to update our citations with.

I'm inclined to think that one citations archive should work, Wikimedia Commons style, for all versions of Wikipedia and those of our sister projects that cite sources. If there are legal, technical or cultural reasons why we need separate ones for Wiktionary as opposed to Wikipedia, or separate ones for different language versions of Wikipedia then I'd be interested to hear them.