User:GreenC/WaybackMedic 2.5

WaybackMedic by GreenC

Wayback Medic 2.5 is a bot that adds and maintains links from the list of known web archive services in use on the English Wikipedia.

Edits made after 2018-12-04 are by version 2.5

The bot operator is User:GreenC. The bot account is User:GreenC bot. The bot (software) is "WaybackMedic".




 * Technical details:


 * Changes to URLs are checked against the remote site to ensure they are working
 * Real-time link checks, no link database. However, links are checked over a 24 hour period before final upload of diff.
 * Supports many APIs including the Internet Archive, Memento, WebCite and "Timemap" APIs at individual services
 * Multiple HTTP header status code checks at the application (WaybackMedic) layer
 * Additional time-out and retries built-in to the web transfer libraries.
 * Additional operating-procedure level checks against network and other errors – bot is semi-supervised in known trouble areas.
 * Multiple redundant checks of the APIs using multiple dates to ensure a page really is unavailable
 * Accepts API results but then verifies by looking at page headers and/or contents
 * The bot is primarily written in Nim (compiles to C source) with support utilities in Awk. Libraries were custom made including a string primitives library for regex, a wiki template parsing library, OAuth library (in awk), a MediaWiki API interface library, a soft404 detector.
 * Due to the nature of the task, running the bot includes a fair amount of supervisory overhead so it requires operator training, though the steps are documented in the source package.

Running
The bot takes requests at WP:URLREQ on a per-domain basis. You can request a domain name for the bot to process.

General sources

 * GitHub is an old public repo. The most current version is not public. The bot is written in Nim and GNU awk.

Links

 * WaybackMedic 2.1
 * WaybackMedic 2.0
 * WaybackMedic 1.0
 * Bot Approval
 * Trial runs