User:Gorka lozano/Spell-checking of other languages

In English, except from some jargon and modified words, we can find a single spelling which can be checked in a typical dictionary. In many languages, however, it is typical to combine words in different ways. In German, for instance, compound nouns are frequently used to create new words. Some scripts do not clearly separate one word from another which requires word-splitting algorithms. Consequently, each of these words presents a challenge to usual spell-checkers.

A separate program or word processing function tests for correctly spelled words. It can test either the spelling of a marked block, or every single word in a document, as well as a group of documents. Advanced systems check for spelling as the user types and can correct common typos and misspellings. Commonly, most of the automatic spell checkers check the words isolated. This involves that it does not take the context into account, which is also very relevant. Nevertheless, as each language presents its own difficulties, it is not the same to correct the spelling in English or in Spanish. Among other things, the genders of the words, the order and whether they are polysemic or not represent a difficulty for the spell checker.

In this article some language spell checkers are tested and their difficulties defined.

English
English is a language spoken by more than 1,000 million people all around the world and, in contrast with many other languages, there is no language-academy to define the officially accepted words and spelling. This fact affects the task of spell-checkers. Nevertheless, the existence of well-known dictionaries such as "The Oxford English Dictionary" let us know that there are more than 600,000 words in this language.

With regards to English grammar, inflection is minimal. There is no grammatical gender]] and adjectival agreement. Besides, case marking has disappeared except for some pronouns which still take it. However, there are some strong verbs, or irregular verbs, which remain of the Old German (they have their own form which stands different from the root). Nevertheless, the English language has developed some aspects that makes easy to know if we are talking about a subject, an object or a verb, since it has a very strict structure of subject + verb + complements. It should also be mentioned that the strict word order changes when the auxiliary verbs come in questions, passive and progressive aspect as well as the negative that always comes before the verb or changes the auxiliary in the previous mentioned aspects.

Summing up, English is an easy language to check, despite the unexistence of an institution to regulate the language as on the one hand, there are no genders, and on the other, there are no cases so the word remains without alterations which can make difficult their checking.

=Middle difficulty languages for spell-checkers=

Italian
The Italian language can be considered as a middle difficulty language in the modern Europe. Modern Italian developed in the 13th and 14th century out of Latin and numerous local dialects. In that time Dante, Boccaccio and Petrarch revolutionized literature. They all lived in Florence or Tuscany and turned the Florentine dialect into the standard variety. Nowadays, many terms from art history and music theory are Italian. In fact, it is the perfect language to discover European history. Here we have some characteristics of the Italian language, which make the difficulty of its correction for the spell checker, middle:
 * It is pronounced phonetically.
 * Every letter corresponds to a distinguishable sound. What is more, there are hardly any differences between pronunciation and spelling. However, pronunciation can vary considerably from one region to another.
 * Deference and politeness is expressed by the switch between formal 'lei' and informal 'tu'.
 * Italian shares have an approximate 85% lexical similarity to Spanish and French, which are middle difficult as well.
 * Italian has a highly musical quality as almost all words end in a vowel.
 * Besides, inflection, declination and grammatical gender are also relevant features of Italian grammar.
 * Vowels can feature diacritic marks: à, è, ì, ò, and ù.

Finally, I'd like to point out the different dialects which are spoken in Italy. Some of its inhabitants speak the pure language and others mix the real Italian with some other dialects:
 * 44% of them speak only Italian.
 * 51% of them mix the Italian language with some dialects.
 * 5% of them speak only dialectic languages.

French
French is considered one of the most important spoken languages all over the world since it is spoken by almost 500 million inhabitants in the world. In fact, more or less 90 million people speak it as their first language; around 190 million as their second language and last but not least, around 200 million people speak it as an acquired tongue. The aforementioned language is also spoken in 54 countries around the world. In addition, there’s also a community named La Francophonie, which is formed by French speaking countries and a huge amount of islands all over the Pacific Ocean.

Owing to the Latin origin, the French language is pretty similar to some other languages in Europe, which is exemplified by some of its characteristics:
 * Gender: nouns are classified as masculine or feminine. Generally, adjectives used to describe feminine words end with e.
 * The article: le (masculine form of the English article the) is used with masculine words. However, la (feminine form of the) is used with feminine words. But, there is also another kind of article: l’. It's used when a word begins with a vowel. Take for instance, the word enfant, which means child, either masculine or feminine, would be written l'enfant.
 * Accents: in French there are five kinds of accents. These are placed over vowels or under the letter c to indicate a change in pronunciation. These are the diacriticals we are talking about: à, â, é, è, ê, ë, î, ï, ô, ö, û, ù and ç. The ç is pronounced as an s. Such is its importance that just one accent can change the whole meaning of the sentence.
 * Plural forms of French words are usually created by adding s or x to the singular word. This is exemplified by the word frère, which becomes frères, and the word beau, which becomes beaux. So, the plural of beau-frère (brother-in-law) would be beaux-frères (brothers-in-law).

Moreover, as in English, the forms of some words vary according to how they are used in a sentence. As you read French records, you will need to be aware that some words vary with usage.

In spite of that, we can probably assure that one of the most important difficulties of the French language is that words do not sound in the same way they are written. This fact makes the language more complicated to be learnt, written and spoken.

Spanish
General characteristics


 * Spanish is written using the Latin alphabet, with the addition of the character ñ (eñe, representing the phoneme /ɲ/, a letter distinct from n, although typographically composed of an n with a tilde) and the digraphs ch (che, representing the phoneme /tʃ/) and ll (elle, representing the phoneme /ʎ/). However, the digraph rr (erre fuerte, "strong r", erre doble, "double r", or simply erre), which also represents a distinct phoneme /r/, is not similarly regarded as a single letter. Since 1994, the digraphs ch and ll are to be treated as letter pairs for collation purposes, though they remain a part of the alphabet.
 * It is right-branching, uses prepositions, and usually, though not always, places adjectives after nouns - as most other Romance languages. Its syntax is generally Subject Verb Object, though variations are common.
 * It is a pro-drop language (allows the deletion of pronouns when pragmatically unnecessary) and verb-framed.
 * Spanish is a syllable-timed language, so each syllable has the same duration regardless of stress. Stress most often occurs on any of the last three syllables of a word, with some rare exceptions at the fourth last. The tendencies of stress assignment are as follows.
 * Spanish is a relatively inflected language, with a two-gender system and about fifty conjugated forms per verb, but limited inflection of nouns, adjectives, and determiners.

Written Accents
 * Pronunciation can be entirely determined from spelling. A typical Spanish word is stressed on the syllable before the last if it ends with a vowel (not including y) or with a vowel followed by n or s; it is stressed on the last syllable otherwise. Exceptions to this rule are indicated by placing an acute accent on the stressed vowel.The acute accent is used, in addition, to distinguish between certain homophones, especially when one of them is a stressed word and the other one is a clitic
 * The interrogative pronouns also receive accents in direct or indirect questions, and some demonstratives can be accented when used as pronouns. The conjunction "o" is written with an accent between numerals so as not to be confused with a zero.
 * When u is written between g and a front vowel (e or i), if it should be pronounced, it is written with Ä|a diaeresis (ü) to indicate that it is not silent as it normally would be
 * In words ending in vowels and /n/ or /s/, stress most often falls on the penultimate syllable.In words ending in all other consonants, the stress more often falls on the last syllable.Preantepenultimate stress occurs rarely where a clitic follows certain verbal forms.

=High difficulty languages for spell-checkers=

German
The German language could be considered as a pain in the neck for spell checkers, due to its difficulty. Despite being spoken by more that 100 million people in the world, and being commonly used in business, since its country is one of the most industrialized in the world, it is in popularity far from other languages such as French, Spanish or Italian.

Such complexity derives from the existence of multiple noun constructions, the order of the words in the sentences and the usage of three genders; among other things. Taking into account that natives hold with the idea of its difficulty, let alone students…

Further information of the aforementioned obstacles put in the way of the student who tries to learn the language is now provided:
 * Multiple noun constructions: unlike other languages such as English, nouns in German may vary according to its gender. Owing to this, whereas “house” in English means “house”, it may have different interpretations in German depending on the gender used.
 * Word order: in contrast with the simple rules followed by other languages (subject-verb), the German language presents the order of the words depending on the tense of the sentence. For this reason, complex rules have to be learnt, for the purpose of avoiding sounding ridiculous in front of a native speaker.
 * Genders: are to a certain extent, the basis of the language. Despite being just 3 (der, die, das) (masculine, feminine, neutral), shouldn’t you learn each word with its gender, the language may become a headache for you.

Basque
The Basque language is spoken by about 660,000 people at the western end of the Pyrenees, in the Bay of Biscay. The Franco-Spanish frontier runs through the middle of the country, leaving 80,000 speakers on the French side and half million on the Spanish side.

Studies have tried to link Basque to other languages, but it is considered as an 'isolated language', since it has no demonstrable relationship with other languages. However, Basque has words from other languages as Latin, Spanish, French, Celtic or Arabic. Furthermore, it is written with the Latin alphabet but the letters 'c q v w y' are not considered part of the alphabet.

If someone tried to learn Basque as a second language, he would be familiar with some of the more obvious difficulties that has a language different from the English language.
 * First and foremost, the word order is not similar from English. For instance subjects of transitive verbs carry an ergative marker in order to separate them from subjects of intransitive verbs.
 * Secondly, the verb contains complexities of meaning that English speakers need verbs, prepositions, subject pronouns and direct and indirect object pronouns to express.
 * Moreover, the structural difficulties of the language are more complicated by due to the wide variety of dialects existent.
 * Furthermore, Basque hasn't got grammatical gender and noun classes. What is more, morphological sex-marking is almost absent.
 * Besides, only noun phrases are inflected. Noun phrases contain determiners that can be definite and indefinite. There are four definite determiners: three are demonstratives and the definite article (a suffix). These four distinguish number (singular and plural). The other determiners are indefinite and cannot distinguish number.
 * Finally, with the regard to the word order, it is Subject-Object-Verb but it is not rigid. The major phrases, including the verb, can be permuted with freedom and this variation is used for thematic purposes. But the order of elements within major phrases is strict. Basque is head-final. All modifiers precede their heads, except lexical adjectives. This has syntactic complexity modifiers, like relative clauses. Basque is exclusively postpositional.

To be brief, Basque is a highly difficult language since there are plenty of changes and differences grammatically in contrast with other languages in the world.

=Notes=

=See also=
 * Spell checker
 * English language
 * Basque language
 * Italian language
 * French language
 * Spanish language

=External links=
 * What is a spell-checker?. In Answers.com. Retrieved 13:51, May 1, 2009, from http://www.answers.com/What%20is%20a%20spell-checker%3F
 * The Oxford English Dictionary. 2nd ed. 1989. OED Online. Oxford University Press.
 * The triumph of English (Dec 20th 2001) From Economist.com. Retrieved 3rd May 2009, 20:25. In http://www.economist.com/world/europe/displayStory.cfm?Story_ID=883997
 * The Cambridge Grammar of the English Language. From Cambridge University Press. 3rd May 2009. 20:40 In http://catdir.loc.gov/catdir/samples/cam033/2001025630.pdf
 * A Simple Spanish Part of Speech Tagger for Detection and Correction of Accentuation Error Springer Berlin / Heidelberg https://commerce.metapress.com/content/g8rcrulhycffdagm/resource-secured/?target=fulltext.pdf&sid=4ftola455d24e155uyerzfun&sh=www.springerlink.com
 * Buber's Basque Page: Basque. From Buber by Larry Trask. Retrieved 20:10, May 3, 2009, from http://www.buber.net/Basque/Euskara/lang.lt.html
 * Basque Language. From English Pen. Retrieved 20:20, May 3, 2009, from http://www.englishpen.org/writersintranslation/magazineofliteratureintranslat/basquecountry/basquelanguage/
 * White, L. & McClanahan, T. Amerikanuan! Basques in the High Desert.Translating the culture. from http://www.sde.idaho.gov/InternationalEducation/docs/Basque/TranslatingtheBasqueCulture.pdf
 * The Italian language. In Language Capitals (LC). Retrieved 12.04, April 4, 2009 from:http://www.language-capitals.com/italian_facts.php
 * Language characteristics. In Research Guidance: French – Genealogical Word List. Retrieved 11:41, May 6, 2009, from http://www.familysearch.org/Eng/search/RG/guide/WLFrench.asp

=Further reading=
 * Oliver Farrar Emerson: The History of the English Language, Adamant Media Corporation, 2005.
 * Albert Croll Baugh: A history of the English language 2ndEdition, D. Appleton-Century Company,2007
 * The Cambridge Grammar of the English Language. From Cambridge University Press. 3rd May 2009. 20:40 In http://catdir.loc.gov/catdir/samples/cam033/2001025630.pdf
 * White, L. & McClanahan, T. Amerikanuan! Basques in the High Desert.Translating the culture. from http://www.sde.idaho.gov/InternationalEducation/docs/Basque/TranslatingtheBasqueCulture.pdf