User:StanfordLinkPredictor

Task
This is a Wikipedia bot which inserts links between Wikipedia pages based on statistical inference on human navigational traces.

The job of this bot is to insert a link between a source and a target page, given the mention in the source page which should link to the target page. The input is in the form of a tab-separated file. To make this bot version-agnostic, it provides a best-effort service when searching for the mention in the source article. If the mention exists then the link is added (at the first mention), otherwise it is not. It does not support specifying a location of the mention (in terms of number of words preceding it) because that location is subject to change due to edits.

The link prediction algorithm was developed in a research project that is part of a collaboration between Stanford University and the Wikimedia Foundation. The project page can be found here. A paper describing the algorithm and results is under submission to the World Wide Web Conference; if you would like a confidential preprint, please get in touch with Bob West.

Link Prediction Method
We propose a novel approach to identifying missing links on Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia’s navigability. We leverage a data set of navigation paths collected through a Wikipedia-based human-computation game called The Wiki Game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates according to various metrics. We further validate our prediction by recruiting human raters from Amazon Mechanical Turk and setting up a human evaluation task that asks them to guess which links should exist in Wikipedia, based on the Linking Guidelines. Our evaluation (see above for how to obtain a preprint of the paper) shows that the links predicted by our method are of higher quality than alternative methods.