User:Shadowjams/Stats

A statistical analysis of pages

This is a rudimentary categorization of a statistical sample of wikipedia pages.

I analyzed 121 pages at random from the wikipedia mainspace in May 2010. I used the "random article" api function to do this. I then categorized each page and indicated whether it met certain criteria (below).

These results should be taken as an overview of the encyclopedia. I may have made mistakes in my analysis and the sample size is relatively small. However this may reveal some patterns in the encyclopedia.

The largest category of articles are geographic articles about towns, states, and geographic features. Of those, 71% are stubs. Of the biographies, about 29% are stubs. Sports articles make up about 10% of the sample with half of those covering soccer/football topics. 67% of the sports topics are biographies regardless of the sport.

Similar projects and comparisons
Others have done similar reports at different times. While the methodologies differ, some statistics come out strikingly similar. For example, the 17% geography articles appears to be a consistent finding, as do some other frequencies for categories. Dantheox's assessment that 35% of articles were stubs is also close to the 38% number I found four years later.

Other editor's reports

 * User:Knulclunk/Random (2008)
 * WikiProject_Video_games/Article_statistics
 * WikiProject creation trends (inactive)
 * User:Dragons flight/Log analysis (2007)
 * User:Dantheox/Stub percentages (2006)
 * User:Dr pda/Article referencing statistics

Results
Pages may belong to more than one attribute.

Categories are exclusive, all pages were assigned one category.