User:Tony1/International survey of overlinking/Methodology

How the survey is conducted and the data expressed has not been finalised, and is subject to agreement on the main talk page, where everyone is invited to provide feedback.

Thus far, the plan is that survey editors do two things for their chosen WP edition: (i) to identify any guidelines, rules or even essays on wikilinking in the edition, and if so, to provide a link to them; (ii) to review six "benchmark" articles in the edition, (iii) to provide a summary statement about wikilinking in the edition, and (iv) to provide responses to a set of standardised questions down in the discussion sections.

Survey the benchmark articles
*We invite survey editors to determine for a representative part of each of the six articles the percentage of (i) common "dictionary" items (e.g., in most contexts, "tourism", "United States", "music", "sport"), and (ii) chronological items (e.g., years, months, decades, and centuries that are unnecessarily linked. The six articles are chosen on the basis of wide cross-cultural recognition; substantial article treatment over many WP editions; and dissimilarity of topic (sportsperson, nation, artist, etc).
 * We ask that survey editors analyse (i) the lead of the article, and (ii) one other, substantial and representative section below the lead (i.e., a section that generally appears to be linked about as much as the article as a whole).
 * Since it is essential that a single set of criteria be used throughout the survey (without this, a comparison is invalid), we suggest that the English WP's guidelines be used to identify overlinked items. In no way should this be regarded as a prescription for any edition other than en.WP: naturally, each edition has been and will continue to be free to determine its own guidelines and practices for wikilinking. The aim of the survey is simply to produce good data, comparative and absolute.
 * The survey should include the main text, titles and subtitles, table items [?], and image and figure captions.
 * The survey should exclude direct quotations, links to daughter articles that are set off at the start of sections; text within navbox and infobox; the sections at the bottom, including References and See also (there are separate questions about these aspects in the discussion sections below).
 * Reviews should be based on the article version at 23:59, 30 April 2009 (UTC). and to make brief summary observations about the WP edition in question in the summary section.

Write your summary statement
Insert this statement under the table row allocated for your edition. It should be succinct—if possible < 150 words. More can be said underneath in the discussion section for your edition.

Respond to the standardised questions
We have posed ?five key questions about overlinking in the discussion section for each edition. We ask that survey editors respond to these questions.

Ratings of the overall level of overlinking (OL)
Ratings are on a nine-point scale from 0 (the ultimate goal for all Wikipedias) to 8 (extreme overlinking), applied separately to dates and other words. [These need to be illustrated by examples.]


 * 0: (very low) -very few instances of dates linked in body or reference sections; linking of relevant words on first occurrences only; no linking of commonly used terms
 * 1: -most date links restricted to infoboxes; multiple repeated linking of a relevant term, no linking of commonly used terms
 * 2: low -occasional/selective dates linked in infoboxes and body; multiple repeated linking of fewer than 5 relevant terms, no linking of commonly used terms
 * 3:
 * 4: moderate -few links to mmdd or ddmm dates in body or reference sections, years mostly linked; multiple repeated linking of fewer than 10 relevant terms, some linking of commonly used terms
 * 5:
 * 6: high -many instances of dates linked in infoboxes and body and reference sections; extensive multiple repeated linking of relevant terms, pervasive linking of commonly used terms
 * 7:
 * 8: (extreme) -almost all instances of dates linked in body and reference sections; multiple-linking of terms, linking of commonly used terms which appears to be indiscriminate