User talk:Narcheung2/sandbox

Brief History of Machine Translation (MT) The field of “machine translation” appeared in Warren Weaver’s Memorandum on Translation (1949). The first researcher in the field, Yehosha Bar-Hillel, began his research at MIT (1951). A Georgetown MT research team followed (1951) with a public demonstration of its system in 1954. MT research programs popped up in Japan and Russia (1955), and the first MT conference was held in London (1956). Researcher continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U.S. (1962) and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee (ALPAC) to study MT (1964). The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971); and Xerox used SYSTRAN to translate technical manuals (1978). Various MT companies were launched, including Trados (1984), which was the first to develop and market translation memory technology (1989). The first commercial MT system for Russian/ English / German-Ukrainian was developed at Kharkov State University (1991). MT on the web started with SYSTRAN Offering free translation of small texts (1996), followed by AltaVisa Babelfish, which racked up 500,000 requests a day (1997). Franz-Josef Och (the future head of Translation Development AT Google) won DARPA’s speed MT competition (2003). More innovations during this time included MOSES, the open-source statistical MT engine (2007), a text/ SMS translation service for mobiles in Japan (2008), and a mobile phone with build-in-speech-to-speech translation functionality for English, Japanese and Chinese (2009). Recently, Google announced that Google Translate translates roughly enough text to fill 1 million books in one day (2012).

Development of Internet and Machine Translation (MT) Internet is an electronic communications network that connects computer networks and organizational computer facilities around the world (Merriam-Webster Online Dictionary, 2011). MT is the application of computers to the translation of texts from one natural language into another (Hutchins & Somers, 1992). MT today has moved into internet era, where translation is used in translating pages in internet. Internet has become the source of information in which this is available in many languages coming from different countries around the world. The urgency of translation today is the fact that internet users around the world is coming from different cultures which have different native languages. Online Translation Tools Technologies available for translation nowadays have been increasing and improving. Some knew machine translators that available online for free are Google Translate (http://translate.google.com), Bing Translator (http://www.microsofttranslator.com/), and SYSTRAN (http://www.SYSTRANsoft.com) etc.

Advantages and Disadvantages of Online Translation Tools

For the advantages, the huge capacity of modern technology allows online tools to combine data of varies professional dictionaries to support its translation. Besides, the high speed of the computer allows us to get the translation results within a few seconds. Moreover, the Internet provides a good platform for the translators that it enables users to establish and share the data base of MT system. Users can also contribute to the optimization of MT output and their choices will help those most suitable translation results to survive; they will offer better suggestions to the translation systems when they are not satisfied with all the provided results. Apart from constant updating of their translation system, service providers are trying to improve their service interface by providing some assisting functions like translation memory, so as to attract more users. However, the high demand of input may lead to a problem of low accuracy in translation. The output quality of online-instant translation is required to be improved manually. The expanding and extensive data sources on internet make it hard to ensure the quality of the corpus which will directly influence the translation.

Challenges of Machine Translation As mentioned above, machine translation currently can be divided into 3 types including traditional rule-based systems, statistical systems and hybrid systems. Achieving high translation quality remains the biggest challenge MT systems face. Madsen (2003) stated that in machine translation it is almost impossible to get accurate translation and machine translators will continue to make mistakes and errors. However, it does not mean that machine translators became useless in human lives. The essential thing to be considered about MT is how they cope with its challenges. These challenges may consist of different structure and grammars in languages, changing languages through times and cultural barrier.

Features of Online Machine Translation Approaches There are many kinds of online machine translation that might adopt different kinds of machine translation approach:

Rule-Based Machine Translation (RBMT) System RBMT is the first approach to be pursued in research of a machine translation technique (Charoenpornsawat, Sornlertlamvanich, & Charoenporn, 2002). Rule-based approach is also known as rationalism approach. RBMT system works with linguistic knowledge; so, obtaining linguistic knowledge has become the primary task. Traditionally, most of the MT systems are in rule-based. They can be further divided into three types: a.) direct system, b.) transfer system and c.) inter-lingua system. a.) Direct approach is to replace source words or sentences with parallel target words and sentences and adjust the word order when necessary. (Hutchins & Somers, 1992) Traditional SYSTRAN is a typical system of direct approach. The simplest direct MT system is the E-dictionary while more complicated ones can be divided to syntactical direct MT system and semantic direct MT system. b.) Transfer approach is to set an intermediate expression between source language (SL) and target language (TL) to show their semantic relationship to a certain extent. (Feng Zhiwei, 1995). c.) Interlingua approach is that all languages share a universal structure. This approach is to parse SL and transfer it into the syntactical-semantic expression that can be applied to all languages. (Feng Zhiwei, 1995) Linguistics knowledge of rule-based MT system is summarized by linguists, which is really a kind of hard work while the effect is always disappointing. Due to the complexity of the natural language, we can almost find exceptions, more rules have to be created which will lead to the expansion of the rule number, thus making the development of MT system even more difficult and the quality more harder to be improved. In addition, with the advance of the society and frequent language contact, language itself is constantly changing which also increases the difficulty of rule summary.

Statistical Machine Translation (SMT) System SMT is a MT approach by using probability. Every sentence in the target language is a translation of the source language. SMT adopts an algorithm of learning translation made by human translation (Lopez, Statistical Machine Translation, 2008). SMT’s hard work lies on the large amount of human translated text to be learned and creating a model of it. The concept of this approach was firstly put forward by Weaver (1955), but was given up as it received a lot of criticism from Chomsky and etc. With the development of the computer’s speed and capacity and successful application in speech recognition together with the lexicography, this approach is now getting popular. Facing the limitation of knowledge acquisition of RBMT, automatic learning techniques have been gradually applied to the acquisition of linguistic knowledge. Statistical MT system is to build an ingenious model by statistically analyzing corpus of two languages. Through this model, machine can learn by itself, thus finally realizing automatic mutual translation between these two languages. According to Och, to ensure the development of a statistical MT system for a new language pair, we must do the following data collecting work: bilingual text corpora (or a parallel corpus set) with millions of words, and monolingual corpora of the two languages, each of which has more than a billion of words; then, the MT results of the language pair will be generated by the statistical model obtained from the data.

Hybrid Machine Translation (HMT) System HMT combines the core of RBMT System and SMT System. The result of translation obtained with bringing out the literal meaning into the statistical output (Boretz, 2009). But in many other hybrids the result from rule-based translation can be used and adjusted with the statistics.

Many online translator technologies use different translation approaches. Google Translate which is a free online machine translator using SMT. This translator is one of the most popular and keeps increasing the language option and expanding its usability. In the beginning, Google Translate was using a RBMT from SYSTRAN. SYSTRAN is also one of the well-known machine translators adopting HMT.

References (1) Anja S. The (Im) Possibilities of Machine Translation, Peter L. 2001. European University Studies. (2) Charoenpornsawat P. Sornlertlamvanich V. & Charoenporn T. (2002). Improving Translation Quality of Rule-Based Machine Translation. COLING-02: Machine Translation in Asia. (3) Eisele A. ,Federmann C., Saint-Amand, H. Jellinghaus, M., Hermann, T. & Chen, Y. (2008) Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System. Proceedings of the Third Workshop on Statistical Machine (99. 179-182). Colombus, Ohio: ACC. (4)Hutchins, W.J. Machine Translation: past, present, future, Ellis Horwood Limited England, 1986. (5)Hutchins, W.J. & Somers, H.L. An Introduction to Machine Translation, London Academic Press, 1992. (6)Koehn P. (2009). 20 Years of Statistical Machine Translation, University of Edinburgh. (7)Lopez. A. (2008). Statistical Machine Translation. ACM Computing Surveys, Vol, 40, No.3. (8)Madsen, M.W. (2009, December 23) The Limits of Machine Translation. Department of Scandinavian Studies and Linguistics, Faculty of Humanities, University of Copenhagen. (9)Peters S. (2001, November) SYSTRAN – Past and Present: A Brief History of SYSTRAN Translation Software. (http://pages.unibas.ch/LIlab/staff/tenhacken/Applied-CL/3_Systran/3_Systran.html#history) (10)冯志伟.计算语言学导论.北京: 商务印书馆.1995