User:Ohconfucius/script/Common Terms

Use with AWB
A module adapted for use with AWB is available at User:Ohconfucius/AWB modules/Unlinking.

General principles
Main objectives are as follows:

Whilst it might be said that "blue words are as easy to read as black words", the underlying 'information' a blue word imparts on Wikipedia (cf unlinked black words that have no such connotation) makes blue words more weighty, with the concomitant risk that they will detract or distract if used inappropriately or excessively. As an editor, I see it as an important part of my job to ensure that contents of all articles add value. Overlinking was a serious problem in the past, but has declined somewhat due to the rising consciousness that less is often more.

The script has brought about considerable standardisation as to the words that ought not to be linked in the vast majority of cases. New terms which are added to the repertoire to unlink are nowadays infrequent. Words that I notice have been linked to gloss their meanings within certain contexts are linked to Wiktionary instead; also I aim to focus more on removing contextual chain-linking. I have started on the more blatant examples as follows:
 * 1) A particular bane of my life is chain-linking – a frequent sight. This is detrimental to parsing and to the click-through, as well as to the aesthetics of an article. Links to articles on current-existing countries, where part of a chain of links, are also a frequent sight. The script will remove all instances of country links where they are preceded by another linked term. Specific piped links to US cities, districts of London, Oxford and Cambridge colleges are simplified to remove unnecessary parts of 'chain links' rather than total unlinking. (for example, ' Wichita, Kansas' becomes 'Wichita, Kansas '). Chain links involving certain countries (namely, France, England, Spain, Italy, Germany, Canada, China) and their subdivisions are removed: thus Munich, Bavaria, Germany will be simplified to Munich, Bavaria, Germany.
 * 2) I don't see why place names, where they are merely incidental or where they are well-known, should be linked by default. If text in the article mentions [person] went on performing tour to '[country 1], [country 2], [country 3], [country 4]', or where a corporation has offices in '[country W], [country X], [country Y], [country Z]', I'd say the encyclopaedia is better off without those links, as they bring little of value to the subject of the [person] article. If, however, it's a place that is important to the person, then it may be worth retaining: for example, where someone went on a spiritual pilgrimage to, say, Tibet and the bio mentions some lasting influence on the person's spirituality, it would be important enough to warrant a link.

Actions and test
Use of the Safari browser is highly recommended. I have found it consistently executes much faster in Safari than Firefox, but feel free to give me your feedback on that issue. It runs quickly as it is composed of fairly straightforward regexes.

Once you are in edit mode, there is ONE button from this script in the toolbox in the left margin:
 * 1) Delink COMMON terms

By consensus, articles should be suitably linked. In my experience, a large number of Wikipedia articles link to common terms, even when this does not enhance the readers' understanding of the subject in question. Common words like 'English', 'President of the United States', 'United States dollar', 'singer', 'newspaper', 'sitcom', 'divorce', heart attack' are routinely linked, usually just because they are low-hanging fruit. Although they appear relevant at first glance, it is obvious that the editor meant to impart a definition, and no greater understanding of the subject to the reader in the context of the article.

Users are reminded of the following provisions of Manual of Style/Linking:

Overlinking and underlinking
Provide links that aid navigation and understanding, but avoid adding obvious or redundant links. An article is said to be underlinked if words are not linked that aid understanding of the article. However, overlinking should be avoided, as it can make it more difficult for the reader to identify and follow those links which are likely to be of value.


 * Do not link to a page that redirects back to the page the link is on.
 * Do not be afraid to create links to potential articles that do not yet exist (see Red links below).
 * Think before removing a link—it may be useful to other readers.
 * If you feel that a certain link does not belong in the body of the text, consider moving it to a "See also" section at the bottom of the article. (Remember that links can also be useful when applying the "What links here" feature from the target page.)

Some editors feel that the lead section is a special case. It may be desirable to have a smaller proportion of links in the lead section than in the main text; while some links make it easier to scan a lead by highlighting key terms, too many make it harder. On the other hand, in technical articles that use many uncommon terms in the introduction, a higher-than-usual link density in the lead section may be necessary to facilitate understanding. In such cases, try to provide an informal explanation in the lead, avoiding using too many technical terms until later in the article—see WP:Make technical articles accessible and point 5 of WP:NOT.

Known limitations
The script
 * has been criticised as a rather 'blunt tool' in dealing with overlinking. It is binary in that links conforming to a certain pattern are removed wholescale and without exception. The user may need to reinsert the first occurrence of the link should he judge it necessary. Consider reinstating the infobox links and those in the lead section of some articles where appropriate, adding a colon after the opening bracket (like :[[Hong Kong]]).
 * unlinks some countries (those permanent members of the UN Security Council), cities (major capitals) and states (Florida, California, New York) are deemed by the author to be so well known that they are unlinked)
 * may be non-exhaustive in unlinking piped variants of terms, as some very creative pipings are created. Beware of Easter eggs: there are limitless permutations editors use to pipe links, and not all are removed.

Disclaimer
Users are expected to exercise careful judgement in the context of each article in which they run this script. Use at your own risk and make sure you check the edit changes before you save. It's not my fault if someone misuses this script.

Test page
A test page is available at User:Ohconfucius/Common Terms test page. The list is provided for indicative purposes only, as the list of terms unlinked can be added to, or items can be removed.