Wikipedia:Wikipedia Signpost/2009-01-31/Orphans

Almost 30% of Wikipedia articles are "orphans", with few or no incoming links from other articles, according to WikiProject Orphanage. Based on an analysis by JaGa from January 24, 2009, that includes 133,515 articles with zero links from other articles and another 92,031 linked only from lists or chronology pages. A total of 533,411 articles have links from only one or two articles (excluding lists and chronology pages); these are also classified as orphans according to WikiProject Orphanage. Only 42,936 articles have been tagged with the orphan template. By JaGa's count there are 2,575,308 articles when disambiguation pages are excluded (compared to 2,700,000+ counted by Special:Statistics).

The distribution of links per article is a characteristic long tail distribution that approximately demonstrates the Pareto principle: articles with 50 or more links comprise 20% of all articles, but account for 84% of all links. JaGa's list of the top 5000 articles by link count shows that many of the very top articles are ones commonly linked from templates, such as biography, geographic coordinate system, list of sovereign states, and music genre. Major nations are also among the most-linked articles; United States holds the top spot, with 16% of all articles linking to it.

The long tail distribution of links is consistent with a 2008 academic study of the network structure of Wikipedia, which showed that&mdash;like networks of scientific publications&mdash;Wikipedia linkage demonstrates preferential attachment and appears to be a scale-free network (see earlier story). That study focused on red links and the creation of new articles, and followup work showed a troubling trend that may also help explain the large magnitude of the orphan problem revealed by JaGa's data. Computer scientist Diomidis Spinellis showed that while Wikipedia was growing exponentially from 2003 to 2006 there was a stable average rate of 1.8 links to "incomplete" articles (red links and stubs) per non-stub article, but that rate had declined to 1.4 by early 2008. This indicates that linkage patterns became more "top-heavy" and articles were relatively less likely to point to undeveloped articles. Orphaned articles tend to be stubs, and because they have few related articles linking to them, they are likely to remain underdeveloped for longer than well-linked stubs.

Partly to blame may be a pernicious trend noted by User:Raul654, James F. and others: contrary to the red links guideline, red links are frequently being removed for aesthetic reasons. The 2008 linkage study showed that new articles tend to be created soon after the first link pointing to them. Red links thus drive growth and allow new articles to avoid orphan status right from the start.