Wikipedia:Bots/Requests for approval/StanfordLinkPredictor

StanfordLinkPredictor
Operator:

Time filed: 22:46, Wednesday December 3, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s):  Python

Source code available:  Github The repository is currently empty. Source code will be added after bot approval.

Function overview: Insert links between Wikipedia pages based on statistical inference on human navigational traces.

Links to relevant discussions (where appropriate): Research:Improving_link_coverage is an ongoing research effort and describes the underlying idea which is used to predict links.

Edit period(s):  One time run to test efficacy of a specific method.

Estimated number of pages affected: 600

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No):

Function details:

The job of this bot is to insert a link between a source and a target page, given the mention in the source page which should link to the target page. The input is in the form of a tab-separated file. To make this bot version-agnostic, it provides a best-effort service when searching for the mention in the source article. If the mention exists then the link is added (at the first mention), otherwise it is not. It does not support specifying a location of the mention (in terms of number of words preceding it) because that location is subject to change due to edits.

The link prediction algorithm was developed in a research project that is part of a collaboration between Stanford University and the Wikimedia Foundation. The project page can be found here. A paper describing the algorithm and results is under submission to the World Wide Web Conference; if you would like a confidential preprint, please get in touch with Bob West.

Link Prediction Method
We propose a novel approach to identifying missing links on Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia’s navigability. We leverage a data set of navigation paths collected through a Wikipedia-based human-computation game called The Wiki Game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates according to various metrics. We further validate our prediction by recruiting human raters from Amazon Mechanical Turk and setting up a human evaluation task that asks them to guess which links should exist in Wikipedia, based on the Linking Guidelines. Our evaluation (see above for how to obtain a preprint of the paper) shows that the links predicted by our method are of higher quality than alternative methods.

Discussion

 * The user account this request is for is also listed as the Operator, but the account name does not clearly indicate that the account is a bot and the account has very few edits. Please note that WP:Bot policy states that a bot account's username should make it immediately clear that the account is in fact a bot, which is normally done by having the account name end with the word "Bot". Also note that a bot may not operate itself, so the Operator field should identify the account of the human running the bot. AnomieBOT ⚡  23:01, 3 December 2014 (UTC)


 * Anyone passing by: see Bots/Requests for approval/StanfordLinkBot. This request was never submitted. —  Earwig   talk 03:33, 4 December 2015 (UTC)