Wikipedia:Statistics server

Statistics | Wikidemia >

There are many reasons to set up a dedicated statistics server, which exists only to pull down and analyse the latest project data dumps.

There may be be enough demand for a variety of stats to have both an internal stats server (which includes non-public data such as raw referrer and other logs) and an external stats server (which sits outside the Wikimedia clusters and includes only public data, but always has the latest updates and stats-scripts.

Most requested

 * Traffic stats : # of independent IPs, users, and sessions per day/week/month/year.
 * Page popularity : # of visits per page per day/week/month/year.
 * Page unpopularity : Orphaned, dead-end, or single-editor pages.

Broken special pages

 * Lonelypages (orphans), Ancientpages (not edited for years), &c -- many stop showing results past the first 1000, and have not evolved with the growth of large wikipedias.
 * Recentchanges -- hard to track back past more than 5000 changes
 * Longpages -- modify this page so that admins can flag special pages as "OK" so that the default view is of long pages that should not be that long.

Popular requests
Current goings-on:
 * Currently active topics : over 10 edits in the last 1|5|24|72 hours
 * Currently active blockers / deleters : over 3|10|30 blocks/deletes in the last 1|5|24|72 hours
 * Active newbies : new accounts with over 10|30|100 edits in the last 1|5|24|72 hours

New requests

 * # of pages on over 100 watchlists
 * # of pages on 0 watchlists [rather than worrying endlessly about 'letting spammers see [such a] list', it should be kept empty!]
 * Pages with 0 categories
 * Pages with over 10 categories
 * Pages that have been "temporarily" protected for over 3 days

External research requests
Many researchers working on WP data outside the projects -- sociologists, economists, computer scientists, authors, &c. -- have statistical needs that are currently hard to meet. Some of their requests include:
 * A way to purchase a hard-drive with a recent full DB dump, to be shipped via snail- or air-mail.
 * A way to submit stats queries that can be queued to run on a central server, either once or regularly over a few weeks/months
 * A way to add questions for WP users to be answered by at least a few dozen participants, and ideally more, targeting a reasonably random cross-section of {the community, the active editors, the anon editors, [other]}.