User:FrancisTyers/Proposal

Inducing grammatical transfer rules and bilingual lexicon from comparable corpora.

The proposal is to research the generalised algorithms to induce bilingual lexicons and subsequently grammatical transfer rules for three closely related under-resourced languages from comparable corpora. The language pairs in question are; Farsi->Tajik, Bulgarian->Macedonian and Turkish->Tatar.

The research will build upon work done in comparable corpora, text alignment, example based machine translation, rule generation from aligned corpora and

How will evaluation fit in ?

Tajik

 * Valentina S. Rastorgueva, A Short Sketch of Tajik Grammar [translated and edited by Herbert H. Paper], Bloomington: Indiana University, 1963, 110 p.
 * Azim Baizoyev, John Hayward, The Official Beginners' Guide to Tajiki, Dushanbe: Star Publications, 2001, 448 p.
 * John Perry (2005) A Tajik Persian Reference Grammar
 * Lutz Rzehak, Vom Persischen zum Tadschikischen: sprachliches Handeln und Sprachplanung in Transoaxanien zwischen Tradition, Moderne und Sowjetmacht (1900-1956), Wiesbaden: Reichert, 2001, 456 S. (in German).

Macedonian

 * Victor Friedman "Macedonian Grammar"
 * Lunt, H. (1952) Grammar of the Macedonian Literary Language (Skopje)

Related projects

 * METIS project
 * ASSIST project
 * CRATER project

Online resources

 * The Jon Safari site - Persian Link Grammar syntax parser, Stemmer and basic morphological parser for Persian,
 * Jon Dehdari & Deryle Lonsdale (2005) A link grammar parser for Persian