User:DASHBot/Wayback

DASHBot periodically scans Wikipedia's most important articles (Featured articles for example) for dead external links. When external links become dead, they are useless to the reader and detract from the credibility of articles. DASHBot is part of an effort to combat "link rot". The bot finds dead external links, and finds suitable archived copies from various internet archiving services (Such as the Internet Archive).

How does DASHBot work?
DASHBot is coded in python by Tim1357, using the pywikipedia framework.


 * The bot first takes a list of articles and downloads their text. It parses out a set of all external links used in references (between two tags).


 * The bot then tests these links to determine which are dead. Only those that return 404 twice in 5 seconds are considered to be dead.


 * Then, it saves the list of dead links, along with the time.


 * At some time later (usually a week), the bot re-visits the page and re-assesses the links that were determined to be dead. (This is to prevent false positives due to temporary server outages)


 * The bot then looks for some sort of access-date that corresponds with the URL, for example a access-date in a cite web would suffice, or some sting along the lines of "Retrieved on _________" or "Accessed ____________".


 * If none are present in the article, the bot scans the articles history. The first time that the link appears is considered to be the access-date.


 * The bot uses this accessdate to query WebCitation and the Internet Archive. The closest archive (usually within a few weeks) is used.


 * Finally the bot updates all references in the article with this new archive url. If the template has an archive-url parameter (such as Cite web), those parameters will be filled. (If there are already items filling those parameters, the bot skips the reference). Otherwise, the bot appends a Wayback template to the end of the reference.


 * If something does not work in that process, but the bot has verified that the link is dead, it appends Dead link to the reference.

Note: General fixes are applied, where applicable.

How can I keep DASHBot away from an article?
Keeping DASHBot from editing an article is easy. Simply put the following template anywhere on the page:

If there is already a Bots template on the page, you may edit it to display the following:

DASHBot needs to be turned off
To turn off the bot, change "YES" to anything but "YES". FIX DEAD LINKS = YES

AFTER SHUTTING THE BOT OFF: Promptly leave a message on Tim1357's talk page. Thanks