Wikipedia:Wikipedia Signpost/2005-03-28/Return of stats

Last Monday, Erik Zachte announced the return of his statistical tables for Wikipedia. A new effort to produce regular snapshots of Wikipedia's current status was also started by Alterego.

For several months, Zachte had been unable to provide fresh statistics because of changes to the database. Last December, the developers began the process of reshuffling the already compressed database archives to conserve even more disk space. This prevented the generation of database dumps for a few weeks. As a result, the last update to the statistical tables was dated from 30 December 2004.

Even when the database dumps were available again, Zachte indicated that the scripts used to generate his statistics needed significant revision to deal with the new layout of the archives. Now that he has been able to do this, the updated statistics were generated based on a database dump from 9 March.

A number of people expressed their appreciation for Zachte's work in making the statistics available again. Eloquence commented, "I can't begin to count the number of times I've made use of your stats and pointed other people to them."

New stats features
As part of the latest update, Zachte announced the availability of some new statistics. These include a count of the total number of records (i.e., pages, including redirects) along with the percentage of articles that have been categorised. Note that the latter does not include categories that are inserted via a template.

Also available are hierarchical category trees and an index of timelines used on Wikipedia. The timelines are created with the EasyTimeline feature, which was also developed by Zachte.

In the future, changes to the database design planned for version 1.5 of the MediaWiki software (see related story) will require additional adaptation for the statistics to be collected. Zachte has said to "expect a similar outage" when that time comes. Tentative indications are that the approach will change from relying on database dumps to using direct SQL access to generate statistics incrementally.

WikiPulse, another stats and status source
For those interested in tracking a variety of current information including some statistics, Alterego has created a new page called WikiPulse. Currently the statistics are drawn only from the English Wikipedia, although translations were invited to allow information from other Wikipedias to be added.

WikiPulse also has information on the status of the Wikimedia server network and even some data about IRC and mailing list activity. Zachte jokingly said he thought extra servers might soon be needed if Alterego's snapshot proved to be as popular as he expected (in fact the OpenFacts Wikipedia status page has already experienced difficulty with crashdotting during Wikipedia downtime).