User:Wikid77/Overlinking

This essay addresses the overlink crisis and problems of overlinking Wikipedia pages with excessive wikilinks, especially in navboxes or infoboxes.

The issue of overlinking in Wikipedia pages (or other hyperlinked text) is the characteristic of having too many internal wikilinks or hyperlinks to external webpages.

Aspects of overlinking
There are some typical cases of overlinking. It is characterized by:


 * a large proportion of the words in each sentence being rendered as links;
 * using links that have little related content, such as linking on specific years like 1995, or unnecessary linking of common words used in the common way, for which the reader can be expected to understand the word's full meaning in context, without any hyperlink help;
 * A link for any single term (other than for date formats) is excessively repeated in the same article. "Excessive" is usually more than one link for the same term in a line or a short paragraph, since in this case one or more duplicate links will almost certainly then appear needlessly on the viewer's screen.

Overlink crisis
During 2007, many thousands of articles were modified to use various navigation-boxes (navboxes) or infobox templates to link to related sets of other articles. However, the usage expanded: For example, if a navbox linked 50 major cities and was used in each of those 50 city-articles, the total wikilinks generated was only 50*50 = 2500 wikilinks, however, when the cities were increased to 200, then the total wikilinks exploded into 200*200= 40,000 total wikilinks. For larger navboxes the problem expands rapidly:
 * the links in navboxes were gradually increased to link 100, 200, 500 (or more) articles, in each of thousands of other articles; and
 * multiple navboxes were placed on almost any article remotely related to the subject.
 * for navboxes containing 500 wikilinks to be used in 600 articles, the total becomes: 500*600= 300,000 total wikilinks.
 * for navboxes containing 650 wikilinks to be used in 2500 articles, the total becomes: 650*2500= 1,625,000 (1.6 million) total wikilinks.

Boxifying articles
Rather than limiting a navbox to the major related topics, some navboxes have become the condensed key contents of an entire article, in a "boxified form" to be appended to another article. Such navboxes are the total opposite of the wikilink concept: details should be kept separate by linking to another article via a single wikilink, rather than repeating portions of that article, again, in the current article. The notion of repeating all major aspects of another article in the boxed form as navbox contents is contrary to the wikilink concept. For example, mentioning that a singer often performed in a famous concert hall requires just one link to that singer's name, not an entire navbox linking that singer's albums, singles, co-singers, songwriters, tours, and TV specials.

Solution: avoid or limit navboxes/infoboxes
The greatest source of overlinking is in large navboxes or infoboxes used in hundreds of thousands of articles. There are several ways to limit the impact of those boxes:
 * If possible avoid using navboxes, completely, in articles that are only remotely related to the topic.
 * Link just a few related pages as see-also links.
 * Use a set of smaller navboxes to cover a topic, and only link to each smaller navbox where directly related, such as cities or counties, but rarely linking both.
 * Emphasize that any major overall navbox should be kept limited in size, to perhaps no more than 200 total wikilinks, recommending smaller navboxes to link specialized sub-topics, not all joined into a single massive navbox.
 * Remove common-word links from navboxes or infoboxes: avoid linking "city" or "county" or "km" or other common words. Readers can type "km" and look it up.

Those are some ways to limit the growing overlink crisis.

Analogy indexing indexes
There are several analogies that help realize how the overlinking has drastically expanded the total wikilinks:


 * Thinking of each navbox as a small index to related subjects (with the total wikilinks as an index of "What links here"), the wikilinks for each navbox become an "index of indexes" because each navbox is a mini-index of the whole. The total wikilinks are so numerous because they are effectively the index of indexes.
 * If each chapter in the Bible ended with a mini-concordance to related chapters (like a navbox), then the cross-referencing of all chapters would generate a "concordance of concordances" as a massive tome.

Similar analogies illustrate the n-squared problem: if Wikipedia only contained the contents of a single Bible, the massive concordance of concordances would be manageable; however, the cross-referencing of hundreds of thousands of pages has generated a Tower of Babel in wikilinks, choking the performance of the Wikipedia servers.

Why a crisis exists
With newer computer disk drives allowing more capacity every year, the numerous extra millions of wikilinks might seem acceptable. However, the situation is a crisis because total wikilinks, formerly at the level of 50 page-links per list, have grown to 200*200= 40,000 crossed page-links, which represents 40,000 / 50 = 800 times more in total page-links being generated than a page that formerly listed 50 wikilinks. The problem is NOT simply twice the number of links, or 10 times the links since last year, but rather the total is effectively 800 times more links. Where, formerly, one disk drive could have held a set amount of page-link data, 800 disk drives will subsequently be needed.

The problem can be seen when a navbox of 250 box-links, used in 2,000 articles, is reduced to just 10 links, and the Wikipedia servers (after a few minutes) will pause as the page-link database(s) are updated to unlink most of the 500,000 total page-links between those 2,001 articles (which had been cross-linked as 250*2000= 500,000 page-links). The situation is a crisis because it is a self-generated resource drain, on a massive scale.