Wikipedia:Wikipedia Signpost/2008-08-11/Growth study

According to a new study, Wikipedia has a pattern of growth that may indicate unlimited potential.

In a study published in the August issue of Communications of the ACM entitled "The Collaborative Organization of Knowledge" (abstract; working draft), computer scientists Diomidis Spinellis and Panagiotis Louridas analyze the relationship between references to non-existent articles (redlinks) and the creation of new articles.

The study, based on the February 2006 dump of English Wikipedia, finds that the link rate from complete (i.e., non-stub) articles to incomplete (non-existent or stub) articles remained nearly constant between 2003 and 2006 (about 1.8 incomplete articles linked from every complete article). A long-term trend in either direction, according to the authors, would indicate an unsustainable growth pattern. If the average number of redlinks per article is increasing, it means that Wikipedia is becoming diffuse and will become less useful as more and more of the terms in the average article are not covered. If the average number is decreasing, it suggests that Wikipedia's growth will slow or stop as the number of links to uncreated articles approaches zero. The stable redlink ratio suggests that Wikipedia is a scale-free network, in principle capable of unlimited growth.

The study also notes that most new articles were created within the first month that they were referenced in another article. Furthermore, only 3% of new articles were created by the same user who created the first link to that article (whether as a redlink or a bluelink). This implies that the connection between redlinks and new articles is a collaborative one, and that adding redlinks actually spurs others to create new articles.

The statistics were re-run with a more recent dump (from January 3, 2008), with results that "don't appear to differ from the ones based on the study's 2006 data set", according to Spinellis (User:Diomidis Spinellis). Wikipedia's growth rate peaked in late 2006, and it declined slightly in 2007 and in the first 7 months of 2008. According to the updated statistics, the incomplete:complete ratio has been dropping gradually since early 2006, and was less than 1.4 in January 2008. However, Spinellis argues that "As long as the ratio is above 1.0, growth as we know it should continue."

Earlier studies
A 2006 study, "Preferential attachment in the growth of social networks: The case of Wikipedia", showed that Wikipedia's early growth (through June 2004) demonstrated preferential attachment: highly-linked articles were more likely to be the target of new links. According to the authors, this indicates one of two things: either Wikipedia editors failed to take full advantage of the wiki model to create a more balanced network, or preferential attachment to highly-linked articles results from "the intrinsic organization of the underlying knowledge". The former case would indicate that Wikipedia's structure cannot overcome the "bounded rationality" of its contributors, each of whom may have limited knowledge beyond his/her area of activity.

The new study is consistent with a "bounded rationality" model, since the creation of new articles depends significantly on the topics editors choose to link to from existing articles. However, it also suggests a possible mechanism for achieving more balanced coverage, as less-covered areas will contain more redlinks, leading to more coverage and even more redlinks.

In contrast to many of the academic studies of Wikipedia, long-term observers within the community have tended to analyze Wikipedia's growth trends in terms of changing content conventions and social dynamics. For example, in a series of blog posts from 2007 ("Wikipedia Plateau?", "Unwanted: New articles in Wikipedia", and "Two Million English Wikipedia articles! Celebrate?") Andrew Lih (User:Fuzheado) examined some of the community factors limiting new article creation. An analysis of article creation and deletion logs by User:Dragons flight from late 2007 showed that for every three articles created, one article was deleted.

A more complete picture of how the size and activity level of the English Wikipedia community has evolved in recent months and years should be available once Erik Zachte updates his statistics website with a recent dump. Zachte was recently hired as a Data Analyst by the Wikimedia Foundation.