User:Vít Baisa/sandbox

= Tasks =


 * EU parallel data (Europarl, EUR-Lex, TMX, ...)
 * Machine translation
 * http://www.mendelssohninscotland.com/why-scotland Check the article and add missing info

= List of languages by processing properties and tools =


 * CAP: language distinguishes lower and upper-case
 * SEG: segmentation tool available
 * LEM: lemmatization tool available
 * MOR: morphology analysis tool
 * SYL: syllabification tool
 * POS: Part-of-Speech tagger available
 * UD: Universal POS dependency
 * ACC: language has accents
 * NACC: no accents form
 * Script: language uses script
 * DIR: left-to-right or right-to-left writing system
 * Country: where language is used as an official / unofficial language

=Natural language processing Centre=

NLP Centre is a research centre at Masaryk university in Brno, Czech Republic. It is focused on several topics in natural language processing, computational linguistics and corpus linguistics, namely The director of the Centre is Karel Pala.
 * computational lexicography: development of lexicographical software, buildling lexicons,
 * corpus linguistics: developing corpus manager Sketch Engine, building very large web corpora in many languages,
 * syntactic analysis: developing various parsers for Czech language,
 * semantic and logic analysis: TIL, WordNets.

Tools and resources developed in the centre

 * Sketch Engine - corpus manager tool
 * VerbaLex - valency lexicon of Czech verbs
 * Czech WordNet - semantic lexicon of Czech
 * SET - parser of Czech
 * Synt - chart parser of Czech
 * DEB - dictionary editing and browsing platform
 * Corpus Pattern Analysis - a corpus-based method for discovering senses of English verbs
 * Jazyková příručka

Collaboration
The centre participates in various international projects together with other research institutions and departmens, mainly with
 * Wolverhampton university (Patrick Hanks),
 * Faculty of Arts, Masaryk University (James Thomas, Dana Hlaváčková),
 * Khokhlova St. Petersburg

Projects

 * BalkaNet, VerbaLex

= Comparison of corpus managers =

This is a comparison of corpus managers processing large corpora. The comparison was worked on the British National Corpus (a 100-million-word text corpus of written and spoken English).

Corpus managers

 * BNCweb – a web-based interface to the British National Corpus
 * BYU-BNC – a website allows search the British National Corpora and others created at Brigham Young University
 * Sketch Engine – a text corpus management and analysis software

Speed of corpus managers
The three architectures are equally as fast for single word forms, lemmas and collocates. It is a second or less for most queries of this type. The differences appear in search of strings of words.