Wikipedia:WikiProject Interlanguage Links/Ideas from the Hebrew Wikipedia

This page explains how WikiProject Interlanguage Links works in the Hebrew Wikipedia. Your comments are welcome.

The original project page: he:ויקיפדיה:דפים ללא בינוויקי

Introduction
This project's aim is to add relevant interwiki links to all possible pages and categories in the Hebrew Wikipedia. We became quite experienced in it in the recent months, so we would like to share our experience. Of course, if you did anything similar to this in the Wikipedia in your language, we wish to learn from your experience. We hope that this will improve the cooperation between Wikimedia projects in different languages.

In line with the Freedom spirit of Wikipedia, all related tools are distributed as Free Software.

N.B.: This is not about interwiki bots. This is about a careful manual process, that is aided by a semi-automatic searching tool.

Translation of this page into other languages will be strongly appreciated!

The general principle
As with most other things in Wikipedia - Be bold.

Why is this needed
The special page Special:WithoutInterwiki is very limited: Because of load issues it only shows the first several thousands of pages and it shows them according to the alphabet and not by topics. Besides, the script that searches for pages without interwiki links can also find problems in existing links, such as redundant linking and encoding problems.

The types
The pages without interwiki links are sorted according to their "type". This is similar to categories, but intentionally kept separate from them, because we feel that a manual process makes us treat the pages with more care. The types do not necessarily categorize a page encyclopedically, but add them to a list, browsing which users will be more apt to find a corresponding article in a foreign Wikipedia.

For example, if an interwiki link is missing from an article about a plant, an average user will find it hard to locate such an article in a foreign Wikipedia, because this requires knowledge in Biology; but a user who does know Biology will be able to go over the list and find corresponding articles, and if they don't exist, to write them.

Another nice side effect of those lists is that they automatically create WikiProjects to create articles in foreign Wikipedias about local topics. In the Hebrew Wikipedia these are articles about Israel and the Jewish history and culture, but you can adapt it to your own language and thus improve the coverage of your culture in foreign Wikipedias; such a process is called "countering systemic bias" in the English Wikipedia.

The types are added using a very simple invisible template, which is called here he:תבנית:אין בינוויקי (this means "no interwiki"; the name can be changed very easily.) It accepts two parameters: the date, used for calculating the cooling period (see below) and the names of the types. That's right - unlike the way it is with categories, several type names may be listed in one swoop (another advantage of the type system).

Pages that definitely don't need an interwiki link
It is possible that some pages will never need an interwiki. We intentionally don't say that about pages which appear to have importance only for the local culture, although it is possible that we will define degrees of importance for this.

One area that we identified in which interwiki links are most likely not needed is disambiguation pages for terms that are homographs in our language, and which are irrelevant for other languages. We just mark it as a yet another type. In English it would look like this:

This keeps all those pages in a separate list, which may sometimes be checked.

Suggestions about this are welcome.

The technical process

 * A dump from http://download.wikimedia.org is scanned using a Perl script (it is Free Software and source is available upon request).
 * All pages which don't have interwiki links and don't have the "no interwiki" template are added to the list of type "other".
 * Pages which have the "no interwiki" template are added to lists of their respective types.
 * Editors go over the created lists.
 * If a corresponding page is found in a foreign Wikipedia, interwiki links are added to it, and a link to the Hebrew Wikipedia is added to the foreign page. (Usually adding a link to Hebrew to one foreign Wikipedia is propagated to relevant pages in other Wikipedias by means of bots within days.)
 * If an editor cannot find a corresponding page in any foreign Wikipedia and it doesn't have a "no interwiki" template, then the editor adds the template with relevant types and today's date, like this:

In the example names of the template and the params are translated to English and can be changed. Five tildes are translated to today's date.

"Cooling date"
Since there are quite a lot of pages without interwiki links, a cooling date is added to the page. A page is not listed in the project during the cooling period. Currently it is defined as 120 days. The implementation details of this feature are incomplete and suggestions are welcome.

Interwiki conflicts and problems
The Perl script also finds several types of problems in interwiki links:
 * 1) Multiple links to one foreign article from several local articles. (see an example in eo.wiki: eo:Vikipedio:Interlingvaj ligiloj/Ligiloj de pluraj esperantaj artikoloj al unu alilingva artikolo)
 * 2) Multiple links to several foreign articles from one local article. (see an example in eo.wiki: eo:Vikipedio:Interlingvaj ligiloj/Ligiloj de unu esperanta artikolo al pluraj artikoloj en unu lingvo)

Some other comments

 * As said above, types don't have to be meticulously sorted as categories are. Usually when anyone thinks that it is relevant and useful to add a "no interwiki" type to a page it is added. Categories are visible to users and make encyclopedic statements about the subject. Types are hidden, and they are there to help editors and maintainers find pages without interwiki links.
 * Sometimes it makes sense to add a page specifically to the "other" type. It may happen with esoteric pages that don't fit to any other type.
 * The English Wikipedia, even though it is the oldest and the biggest by article count, is not necessarily the most correct. It happened more than once that as a result of systematic interwiki work we improved pages in the English Wikipedia. Sometimes we even got bold and made fixes in Wikipedias in languages that we hardly know and we heard little complaints, so we suppose that we did improve it. Of course, it also works vice versa - in many cases we improved the Hebrew Wikipedia to match the foreign one when we felt that it was more correct.
 * If a page doesn't have an interwiki link, it may sometimes mean that it shouldn't exist at all, but should be deleted or merged.

Plans for the future
Suggestions are welcome!


 * To add an "importance" param. Currently it is not done, because we haven't figured out the importance criteria yet. In a perfect world all Wikipedias are supposed to have the same pages and notability is not local, but practice shows that listing pages by importance may help editors find pages to translate to other languages.
 * To add statistics (Thanks to LA2 for some of the ideas). At every run of the search tool these statistics will be collected:
 * How many pages are there without interwiki links and without the template. (+ Percentage out of total article count.)
 * How many pages are there without interwiki links and with the template. (+ Percentage out of total article count.)
 * How many pages got interwiki links added to them since the last run.
 * How many new pages without interwiki links were created.
 * (Feel free to add more ideas for statistics!)

The code
The Perl code for the program that searches for pages without interlanguage links is available at SourceForge:


 * http://sourceforge.net/projects/perlwikibot/

It resides in SVN, under trunk/no-interwiki. There's no convenient installation script, makefile or test suite. It is all in the planning stage. If you want to try it out, check out the SVN tree:

And read the perldoc:

Feel free to contact the developer, Amir E. Aharoni, for any assistance.