Wikipedia:Wikipedia Signpost/2010-12-20/Technology report

Old Wikipedia archive uncovered
This is the new WikiPedia! The idea here is to write a complete encyclopedia from scratch, without peer review process, etc. Some people think that this may be a hopeless endeavor, that the result will necessarily suck. We aren't so sure. So, let's get to work!

Tim Starling, a developer and system administrator working for the Wikimedia Foundation, announced this week his discovery of backups of Wikipedia pages from February, March and August 2001, which, he said, were assumed to be permanently lost. Though it was originally thought possible that the later backups might include early revisions of Wikipedias in other major languages such as French and German, it now seems that the ad hoc nature of the backups meant that they only refer to the English-language "WikiPedia".

"I've long been interested in Wikipedia's history, and I've tried in the past to locate such backups", he said. "I asked various people who might have had one. I had given up hope." However, he uncovered two UseModWiki files which contained a record of every change made to Wikipedia from January 15 to August 17, 2001. The files are available to download here; Brian Mingus has created an online index of articles in the encyclopedia at the time (example revisions to the 'boat' article), as has Wikipedia researcher Joseph Reagle, who also compiled a list of the top 20 Wikipedia contributors from these early stages. Given the discovery's timing, weeks before the tenth anniversary of the English-language (and first) Wikipedia, interesting snippets (such as the one at the top of this article) are also being collated on a special "Wikipedia in the Beginning" page.

As for what happens now with the revisions – which are invaluable in terms of correctly attributing contributors to the project in line with its copyleft licence – Tim writes, "I'm developing a script which will import the dump into a modified MediaWiki instance, the idea being that I can then export XML from it [i.e. transform it into a modern style database dump] ... I'm not sure when that will be."

In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
 * The CodeReview software, which eases development of the MediaWiki software, was briefly broken (bug #24270).
 * Following hardware-related disruption to the availability of database dumps, Google has donated some of their storage space to provide a backup facility. Ariel Glenn, writing for the Foundation, writes that "we expect to run copies once every two weeks, keeping the four latest copies as well as one permanent copy at every six month interval" ( wikitech-l mailing list ).
 * With the resolution of bug #26339, users will now be warned if their API result has been truncated.