Eurotra

Eurotra was a machine translation project established and funded by the European Commission from 1978 until 1992.

History
In 1976, the European Commission started using the commercially developed machine translation system SYSTRAN with a plan to make it work for further languages than originally developed for (Russian-English and English-French), which however turned out to be difficult. This and the potential in existing systems within European research center, led to the decision in 1978 to start the project Eurotra, first through a preparatory Eurotra Coordination Group. Four years later, the European Commission and coordination group gained the approval of the European Parliament.

The goal of the project as to create machine translation system for the official languages of the European Community, which at the time were Danish, Dutch, German, English, French, Italian, later including Greek, Spanish and Portuguese.

However, as time passed, expectations became tempered; "Fully Automatic High Quality Translation" was not a reasonably attainable goal. The true character of Eurotra was eventually acknowledged to be in fact pre-competitive research rather than prototype development.

The project was motivated by one of the founding principles of the EU: that all citizens had the right to read any and all proceedings of the Commission in their own language. As more countries joined, this produced a combinatorial explosion in the number of language pairs involved, and the need to translate every paper, speech and even set of meeting minutes produced by the EU into the other eight languages meant that translation rapidly became the overwhelming component in the administrative budget. To solve this problem Eurotra was devised.

The project was unusual in that rather than consisting of a single research team, it had member groups distributed around the member countries, organised along language rather than national lines (for example, groups in Leuven and Utrecht worked closely together), and the secretariat was based at the European Commission in Luxembourg.

The actual design of the project was unusual as MT projects go. Older systems, such as SYSTRAN, were heavily dictionary-based, with minor support for rearranging word order. More recent systems have often worked on a probabilistic approach, based on parallel corpora. Eurotra addressed the constituent structure of the text to be translated, going through first a syntactic parse followed by a second parse to produce a dependency structure followed by a final parse with a third grammar to produce what was referred to internally as Intermediate Representation (IR). Since all three modules were implemented as Prolog programs, it would then in principle be possible to put this structure backwards through the corresponding modules for another language to produce a translated text in any of the other languages. However, in practice this was not in fact how language pairs were implemented.

The first "live" translation occupied a 4Mb Microvax running Ultrix and C-Prolog for a complete weekend some time in early 1987. The sentence, translated from English into Danish, was "Japan makes computers". The main problem faced by the system was the generation of so-called "Parse Forests" - often a large number of different grammar rules could be applied to any particular phrase, producing hundreds, even thousands of (often identical) parse trees. This used up huge quantities of computer store, slowing the whole process down unnecessarily.

While Eurotra never delivered a "working" MT system, the project made a far-reaching long-term impact on the nascent language industries in European member states, in particular among the southern countries of Greece, Italy, Spain, and Portugal. There is at least one commercial MT system (developed by an academic/commercial consortium in Denmark) derived from Eurotra technology.