User:OlenaUa/sandbox

Named entities in Machine Translation
Named Entity (NE) is a problem for state-of-the-art commercial machine translation (MT) systems. It leads to translation failures. Therefore translation of proper names requires different approaches and methods than translation of other types of words. Mistakenly translated NE as a common noun involves extensive post-editing, because often a failure has an effect not only on local context, but also on the global syntactic and lexical structure. On the one hand, developers of commercial MT systems pay insufficient attention to correct automatic identification of certain types of NEs, on the other hand, the problem of correct identification of NE is specially addressed and benchmarked by developers of Information extraction(IE) systems, such as GATE (General Architecture for Text Engineering system). Further, it will be demonstrated how one can resolve this problem on the example of an experiment held by Babych&Hartly.

The idea of the experiment

 * high quality automatic NE recognition, produced by GATE, could be used to create do-not-translate (DNT) lists of organisation names, a specific type of NE which in human translation practice is often left untranslated
 * the baseline translations (produced without NE DNT-processing) should be comared with translations producing DNT lists (created with the GATE-1 NE recognition system)

Problems of NEs for MT
On the example of Russian: Creating DNT lists manually requires much effort from the user of an MT system. However, the high accuracy in NE tagging of current IE systems, including GATE, means that DNT lists for MT can be created automatically. The following results are based entirely on automatically created DNT lists.
 * foreign person names in Russian should be transcribed and written in Cyrillic.
 * names that coincide with common nouns should not be looked up in the general dictionary
 * often the names of organisations are not translated and preserve Roman orthography within Russian Cyrylic text
 * *e.g. Russian BBC news: ‘Nestle’, ‘Burger King’ and othersare neither translated nor transliterated

Description of an expirement
In order to measure the effect of NE recognition on MT quality, they took 30 texts (news articles) from the DARPA(The Defense Advanced Research Projects Agency) MUC-6 evaluation set. These texts were selected because they are relatively rich in NEs, and because clean NE annotation is available for them. They used the following linguistic resources of the Shefﬁeld NLP group: Some statistical parameters of this corpus: NE (organisation names) occur 544 times in manually annotated DARPA “keys” and it is 510 for GATE ‘response’. These figures are very close.
 * DARPA ‘keys’ — texts manually annotated with NEs;
 * GATE ‘responses’ — the output of the automatic NE annotation of the GATE-1 system, which participated in MUC-6.

The average occurrence per document:
 * DARPA “keys” = 18.1
 * for GATE ‘response’ = 17.0

The average occurrence per paragraph: The average occurrence per sentence:
 * DARPA “keys” = 1.9
 * for GATE ‘response’ = 1.8
 * DARPA “keys” = 1.0
 * for GATE ‘response’ = 0.9

Having automatically generated DNT lists of organisation names from GATE ‘response’ annotation, the texts were translated using three commercial MT systems: - English-Russian ‘ProMT 98’ v4.0, released in 1998 (Softissimo) - English-French ‘ProMT’, (Reverso) v5.01, released in 2001 (Softissimo) - English-French ‘Systran Professional Premium’ v3.0b, released in 2000 (Systran).

Two translations were generated by each MT system:
 * 1) a baseline translation without a DNT list
 * 2) a DNT-processed translation with the automatically created DNT list of organisation names

The baseline translations were then compared with DNT-processed translations, with respect to the morphosyntactic well-forrnedness of the context surrounding the NEs. Paragraphs with contextual differences were automatically selected and different strings in these paragraphs were highlighted.
 * 'ORI' indicates the  original  English string in the DARPA corpus;
 * ‘TWS’ ( baseline translation ) indicates a String Translated Without the do-not- translate list;
 * TDS’ ( DNT-processed translation ) indicates a String Translated with Do-not-translate list.

The segmentation was done in two stages.
 * 1) First, tagged NEs from the ‘ 'ORI' ’ paragraph were identified and searched for in the ‘ TDS ’ paragraph. Then they were used as separators for the TDS: parts of the TDS between (untranslated) NEs were identified and searched for in the ‘ TWS ’ paragraph.
 * 2) If any sub-string was not found in TDS, it was printed and also highlighted in bold in TDS . This shows that strings in the context of the NE are different in the DNT-processed translation and in the baseline translation.

This difference was then manually scored:

+1 baseline translation = not well-formed// DNT-processed translation = well-formed

+0.5 baseline translation = not well-formed// DNT-processed translation = not well-formed; some features are more correct

=0 Both = equally (not) well/formed

– 0.5 baseline translation = not well-formed; ; some features are more correct // DNT-processed translation = not well-formed

–1 baseline translation = well-formed// DNT-processed translation = not well-formed

The terms ‘well-formed’ and ‘not well-formed’ refer to the local morphosyntactic or lexical context within a segment where differences occur. It remains possible that well-formed structures require post-editing at a higher level in the translated text. The term ‘features’ refers to morphosyntactic or lexical features of certain words in the context of the NE. By ‘more correct’, we mean that the features considered in the context are correct, but the corresponding features in the compared text are wrong.

For example: the word Labour could be either an organisation name (‘the party’), a part of a larger NE, often of a type other than organisation name(Federal Railway Labour Act), or a common noun (‘work’, as in the phrase: rise in labour costs). Nevertheless, the difference in this experiment is relatively low (less then 10% for the worst case). Given that there are (on average) only about 2 NE occurrences per paragraph in the corpus, over- generation does not greatly affect the evaluation results. Results of an experiment

Texts with NE DNT-processing showed consistent improvement (not lower than 20%) for all systems:

Pro MT 1998 E-R = +29%

Pro MT E-F = +22%

Systran 2000 E-F = +32%

Combining present-day MT systems with specific IE modules has beneficial effect on the overall MT quality. Here is an example of a sentence where improvement has been achieved in the DNT-processed translation for all three MT systems on several levels: morphological, syntactic and lexical.


 *  Original :The agreement was reached by a  coalition of four of Pan Am‘s ﬁve unions .

E-R ProMT


 *  Baseline translation :
 * Soglashenie bylo dostignuto koaliciej chetyreh Kastryuli pyat’ soyuzov Ama.
 * The agreement was reached by a coalition of four of a Saucepan five unions of Am.
 *  DNT-processed translation :
 * Soglashenie bylo dostignuto koaliciej chetyreh iz pyati soyuzov Pan Am.
 * The agreement was reached by a coalition of four out of five unions of Pam Am.

E-F ProMT
 *  Baseline translation :
 * L’accord a été atteint par une coalition de quatre de casserole cinq unions d’Am.
 * The agreement was reached by a coalition of four of saucepan five unions of Am.
 *  DNT-processed translation :
 * L’accord a été atteint par une coalition de quatre de cinq unions de Pan Am.
 * The agreement was reached by coalition of four of five unions of Pan Am.

Conclusions
Resolving speciﬁc linguistic problems, like NE, improves not only the treatment of that phenomenon by MT, but also morphosyntactic and lexical well-formedness more generally in the wider context of the target.The results show that modem MT systems still leave room to achieve a considerable improvement.Further gains in performance may be anticipated by harnessing other focussed technologies, such as Word sense disambiguation, to MT. It was noted also that the scale of the improvement for particular MT systems correlates with the baseline quality of MT:it is more difﬁcult to achieve improvement for a system which produces high-quality well-formed structures Without DNT-processing. Example: The improvement which is possible with NE DNT-processing is lowest for the English-French ProMT (Reverso) system. This system was ranked higher than English-French Systran by human evaluators in an experiment conducted by using data from DARPA’s 1992-1994 series of MT evaluations.