User:Amire80/Thoughts about disambiguation

On Fri, Aug 15, 2008 at 6:18 PM, Lars Aronsson  wrote: > Is any similar innovation going on in disambiguation pages and > list articles?

For a few months now i have been thinking about writing an email with my thoughts about disambiguation, so i guess that now is the time.

I've been doing a lot of work improving interwiki links, mostly manually. There's a grave problem with adding interwiki links to disambig pages - very often a word that may be homonym in one language is not a homonym in another or has a completely different set of meanings. Examples of different kinds are abundant:

Simple cases would be:
 * Grossmann vs. Grossman - In English these are separate disambig pages, but in Hebrew they would be one.
 * Kirov - In English this would be the Soviet politician and a bunch of things names after him, but in Hebrew (קירוב) this would also be "approximation".
 * Due to the peculiarities of Hebrew spelling, דאון (pron. "daon") can be interwiki-linked to the various meanings of "down" and "daun" and also to Glider and Flying fish.

Harder examples:
 * In general, any such link between languages which are written in different character sets are wild approximations. One extreme example, which is beyond my comprehension, is the Japanese バーンスタイン and ベルンシュタイン, which seem to be different spellings of "Bernstein", but i can only guess.
 * all the variations of John, Johan, Juan, Ivan etc. - how to interwiki-link them? Different spelling are one problem, and cultural implications make it harder (think about saints, fair tale heroes, Catalan Joan vs. Spanish Juan, etc.)
 * In Russian, Коса (pron. "kosa") is a disambiguation between Queue (hairstyle), Scythe, Spit (landform), Xhosa, Braid theory and a few other things. Should it be linked in any way to the English "kosa" or "cosa"? Probably not, as it would be completely arbitrary. Should it be linked to the Ukrainian Коса? The spelling of Ukrainian is reasonably close to that of Russian and so are its word meanings and disambiguations ... but where does it stop?

The interim solution that was more or less agreed upon in the Hebrew Wikipedia is to mark disambig pages which are too specific to the Hebrew language with an invisible template that would tell the bots not to add interwiki links to it. The technical details of the implementation of this solution are still in flux, but you can see a preliminary list of such pages here: ויקיפדיה:דפים ללא בינוויקי/פירושונים/פירושונים רק בעברית

Personally i would go further. Since most often disambiguation has little encyclopedic meaning and is essentially a feature of each language, i would put all disambig pages into a new separate namespace and prevent the adding of interwiki links to it.

The only disadvantage that i can think of is that there are a lot of links from the article space to disambig pages. This can be solved by making the "Disambiguation:" space the second option for searching; in pseudo-code it would be something like:

if (exists(article) or exists("Disambig:" + article)) { output(blue_link(article)); } else { output(red_link(article)); }

There are several other advantages:


 * It will make the work of the scripts that prepare the lists for WP:DPL much easier. (There are similar projects in several other Wikipedias.)
 * A link to a disambig page can be made in a different color, and thus help the editors to fix it.
 * It will clearly separate between purely technical and homonymic disambiguations and those that have some encyclopedic meaning. The latter can go to the article space. (Cancer is a possible example.)

Any other thoughts are welcome.