User:Alvations/Semeval-unwikified

SemEval (originally Senseval) is a series of workshops conducted to evaluate semantic analysis systems. Traditionally, computational semantic analysis focused on Word Sense Disambiguation (WSD) tasks. WSD is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). ACL-SIGLEX (Special Interest Group on the LEXicon of the Association for Computational Linguistics)is the umbrella organization for SemEval semantic evaluations and the SENSEVAL word-sense evaluation exercises. The first three evaluation workshops, Senseval-1, Senseval-2 and Senseval-3, were focused on Word Sense Disambiguation Systems (WSD). More recently, Senseval had become SemEval, a series of evaluation exercises for semantic annotation involving a much larger and more diverse set of tasks. Beginning with the 4th workshop, SemEval-1, the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. The framework of the SemEval/Senseval evaluation workshops emulates Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)). Stages of SemEval/Senseval evaluation workshops
 * 1) Firstly, all likely participants were invited to express their interest and participate in the exercise design.
 * 2) A timetable towards a final workshop was worked out.
 * 3) A plan for selecting evaluation materials was agreed.
 * 4) 'Gold standards' for the individual tasks were acquired, often human annotators were considered as a gold standard to measure precision and recall scores of computer systems. These 'gold standards' are what the computational systems strive towards. (In WSD tasks, human annotators were set on the task of generating a set of correct WSD answers(i.e. the correct sense for a given word in a given context)
 * 5) The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers.
 * 6) The organizers then scored the answers and the scores were announced and discussed at a workshop

"-Eval" Etymology
"-Eval" is a fairly recent morpheme for conferences, workshops and algorithms related to computational evaluations. The "-Eval" innovation originate from the evaluation metric for computational grammar systems. Grammar Evaluation Interest Group (GEIG) evaluation metric, also termed as the Parseval metric, , a blend of grammatical "pars"ing and system "eval"uation. Progessively, a series of well intended puns motivates the popular use of the "-eval" morpheme:


 * Parseval's (commonly spelled as Percival), one of King Arthur's legendary Knights of the Round Table, involvement in the quest for the holy grail symbolizes computational linguists' ultimate quest for computer to understand natural language.


 * Parseval coincides with the Parseval theorem (a fourier series related theorem that most computer scientists are familiar with).

Pre-WSD evaluations
From the earliest days, assessing the quality of WSD algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”. Only very recently have extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications. Until 1990 or so, dissions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words.

Senseval to Semeval
In April 1997, a workshop entitled Tagging with Lexical Semantics: Why, What, and How? was held in conjunction with the Conference on Applied Natural Language Processing. At the time, there was a clear recognition that manually annotated corpora had revolutionized other areas of NLP, such as part-of-speech tagging and parsing, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well. Kilgarriff recalls that there was “a high degree of consensus that the ﬁeld needed evaluation,” and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises. Senseval-1 took place in the summer of 1998 for English, French, and Italian, culminating in a workshop held at Herstmonceux Castle, Sussex, England on September 2–4.

Senseval-2 took place in the summer of 2001, and was followed by a workshop held in July 2001 in Toulouse, in conjunction with ACL 2001. Senseval-2 included tasks for Basque, Chinese, Czech, Danish, Dutch, English, Estonian, Italian, Japanese, Korean, Spanish, Swedish.

Senseval-3 took place in March–April 2004, followed by a workshop held in July 2004 in Barcelona, in conjunction with ACL 2004. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition.

Semeval-1/Senseval-4 took place in 2007, followed by a workshop held in conjunction with ACL in Prague. Semeval-1 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text. Semeval-2 took place in 2010, followed by a workshop held in conjunction with ACL in Uppsala. Semeval-2 included 18 different tasks targeting the evaluation of semantic analysis systems.

Senseval & Semeval Tasks
Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that look into wider areas of semantics, viz. Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences are represented in first-order logic forms) and Senseval-3 explores performances of semantics analysis on Machine Translations.

As the types of different computational semantic systems grows beyond the coverage of WSD, Senseval evolves into Semeval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to Semeval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/Semeval workshops.

Senseval-1
The Senseval-1 evaluation exercise was attempting for the first time to run an ARPA-like competition between WSD systems, under the auspices of ACL-SIGLEX and EURALEX (European Association for lexicography), ELSNET  and ECRAN (Extraction of Content Research At Near market) and SPARKLE (Shallow Parsing and Knowledge extraction for Language Engineering). There were two variants of computational WSD tasks, viz. "all-words" and "lexical-sample". In all words, participating systems have to disambiguate all words (or all open-class words) in a set of texts. In lexical-sample, first, a sample of words were selected. Then for each sample word, a number of corpus instances were selected. Participating systems then have to disambiguate just the sample-word instances. For Senseval-1, the lexical-sample variant was chosen due to Senseval-1 Tasks
 * 1) Cost-effectiveness of "gold-standards" (human annotation of sense tags)
 * 2) Unavailability of a full dictionary for low or no cost
 * 3) Many systems interested in participating were not ready for all-word task.
 * 4) The lexical sample task would be more informative about the strength and failings of WSD research at that point of time. (The all-words task would provide too little data about problems presented by any particular word)

Senseval-2
Senseval-2 evaluated WSD systems on three types of task over 12 languages. In the "all-words" task, the evaluation was on almost all of the content words in a sample of texts. In the "lexical sample" task, first sample the lexicon was selected, then corpus instances of the sample words were selected and WSD systems competed to disambiguated the sense in these instances. In the "translation task" (Japanese only), senses corresponded to distinct translations of a word into another language. Senseval-2 Tasks

Senseval-3
Senseval-3 was a follow-up to Senseval-1 and Senseval-2. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition. Senseval-3 Tasks

SemEval-1
Beginning with the 4th workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. Semeval-1 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text. The tasks were elaborated than Senseval as it crosses the different areas of studies in NLP SemEval-1 Tasks

SemEval-2
SemEval-2010 (SemEval-2) was the 5th workshop on semantic evaluation. SemEval-2 added tasks that were from new areas of studies in computational semantics, viz., Coreference, Elipsis, Keyphrase Extraction, Noun Compounds and Textual Entailment. The first three workshops, Senseval-1 through Senseval-3, were focused on word sense disambiguation, each time growing in the number of languages offered in the tasks and in the number of participating teams. In the 4th workshop, SemEval-2007, the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation.

SemEval-2 Tasks