User:Alvations/multilingual and crosslingual WSD

Multilingual and Crosslingual Word Sense Disambiguation (WSD) evaluation tasks focused on WSD across 2 or more languages simultaneously. While the Multlingual WSD evaluation task uses a fixed sense inventory (i.e. BabelNet), the sense inventory for the Crosslingual WSD evaluation task is built up on the basis of parallel corpora, e.g. the | Europarl corpus.

Multilingual WSD
The Multilingual WSD task was introduced for the upcoming SemEval-2013 workshop. The task is aimed at evaluating Word Sense Disambiguation systems in a multilingual scenario using BabelNet as its sense inventory. Unlike similar task like crosslingual WSD or the  multlingual lexical substitution task, where no fixed sense inventory is specified, Multilingual WSD uses the BabelNet as its sense inventory. Prior to the development of BabelNet, a bilingual lexical sample WSD evaluation task was carried out in SemEval-2007 on Chinese-English bitexts.

The multlingual WSD task follows the all-word version of classic WSD, where participating systems will be expected to link all occurrences of noun phrases within arbitrary texts in different languages to their corresponding Babel synsets.

The evaluation criterion for the multlingual WSD task follows the standard precision, recall and  F1 measures similar to the  evaluation for classic WSD.

BabelNet
BabelNet is a very large multilingual semantic network with millions of concepts obtained from:
 * an integration of WordNet and Wikipedia based on an automatic mapping algorithm and
 * translations of the concepts (i.e. English Wikipedia pages and WordNet synsets) based on Wikipedia cross-language links and the output of a machine translation system

An example of a sense label in BabelNet is as followed:

Target polysemous English word: bank Occurs in the phrase/sentence: "the bank of Scotland" Princeton WordNet(3.0) synset (not necessarily used in the task): {08420278-n} | depository financial institution BabelNet(1.0) synset: {bn:00008364n} depository_financial_institution ES:banco, CA:banc, IT:banca, DE:bank, FR:banque

Crosslingual WSD
The Crosslingual WSD task was introduced in the | SemEval-2007 evaluation workshop and re-proposed in the upcoming | SemEval-2013 workshop. To facilitate the ease of integrating WSD systems into other Natural Language Processing (NLP) applications, such as Machine Translation and multilingual Information Retrieval, the crosslingual WSD evaluation task was introduced a language-indepedent and knowledge-lean approach to WSD.

The task is an unsupervised Word Sense Disambiguation task for English nouns by means of parallel corpora. It follows the lexical-sample variant of the Classic WSD task, restricted to only 20 polysemous nouns.

The evaluation criterion uses a weighted version of the precision and recall metric inspired by the English lexical subsitution task in SemEval-2010.

Europarl Sense Inventory
Participating systems in this evaluation task will use the | Europarl corpus for building up the sense inventory. Then systems will perform WSD on polysemous English words based on that sense inventory. For evaluation, a sense inventory for all target nouns was manually built up on the basis of all retrieved translations from the Europarl corpus. All translations of a polysemous English word are grouped into clusters of that given word.

An example of a sense label in the Europarl sense inventory is as followed:

Target polysemous English word: bank Occurs in the phrase/sentence: "the bank of Scotland" Princeton WordNet(3.0) synset (not necessarily used in the task): {08420278-n} | depository financial institution Europarl sense invntory synset {Dutch, French, German, Italian, Spanish}: {bank/kredietinstelling, banque/établissement de crédit, Bank/Kreditinstitut, banca, banco}