User:Vnurshaba/Information network

An information network is also called a knowledge network. The classic example of an information network is the network of citations between academic papers. Most learned articles cite previous work by others on related topics. These citations form a network in which the vertices are articles and a directededge from article A to article B indicates that A cites B. Citation networks are acyclic because papers can only cite other papers that have already been written, not those that have yet to be written. All edges in the network point backwards in time, making closed loops impossible, or at least extremely rare.

Citation networks have a great advantage in the copious and accurate data available for them. Quantitative study of publication patterns stretches back at least as far as Alfred Lotka’s backgraounding in 1926 discovery of the so-called Law of Scientific Productivity, which states that the distribution of the numbers of papers written by individual scientists follows a power law. That is, the number of scientists who have written k papers falls off as k−α for some constant α. (In fact, this result extends to the arts and humanities as well.) The first serious work on citation patterns was conducted in the 1960s as large citation databases became available through the work of Eugene Garfield and other pioneers in the field of bibliometrics. Many other studies of citation networks have been performed since then, using the ever better resources available in citation databases. Of particular note are the studies by Seglen and Redner.

Another very important example of an information network is the World Wide Web, which is a network of Web pages containing information, linked together by hyperlinks from one page to another. The Web or World Wide Web should not be confused with the Internet, which is a physical network of computers linked together by optical fibre and other data connections. Unlike a citation network, the World Wide Web is cyclic; there is no natural ordering of sites and no constraints that prevent the appearance of closed loops. The World Wide Web is so huge that there are no exact number how many pages are there, for example if you type 'the' in Google it will show 10 billion pages.

Important point to notice about the Web is that our data about it come from crawls of the network, in which Web pages are found by following hyperlinks from other pages. Our picture of the network structure of the World Wide Web is therefore necessarily biased. A page will only be found if another page points to it, and in a crawl that covers only a part of the Web (as all crawls do at present) pages are more likely to be found the more other pages point to them. This suggests for instance that our measurements of the fraction of pages with low in-degree might be an underestimate. This behavior contrasts with that of a citation network.