Estonian orthography

Estonian orthography is the system used for writing the Estonian language and is based on the Latin alphabet. The Estonian orthography is generally guided by phonemic principles, with each grapheme corresponding to one phoneme.

Alphabet
Due to German and Swedish influence, the Estonian alphabet (eesti tähestik) has the letters Ä, Ö, and Ü (A, O, and U with diaeresis), which represent the vowel sounds, and , respectively. Unlike German umlauts, they are considered, and alphabetised as, separate letters. The most distinctive letter in the Estonian alphabet, however, is the Õ (O with tilde), which was added to the alphabet in the 19th century by Otto Wilhelm Masing and stands for the vowel. In addition, the alphabet also differs from the Latin alphabet by the addition of the letters Š and Ž (S and Z with caron/háček), and by the position of Z in the alphabet: it has been moved from the end to between S and T (or Š and Ž).

The official Estonian alphabet has 27 letters: A, B, D, E, F, G, H, I, J, K, L, M, N, O, P, R, S, Š, Z, Ž, T, U, V, Õ, Ä, Ö, Ü. The letters F, Š, Z, Ž are so-called "foreign letters" (võõrtähed), and occur only in loanwords and foreign proper names. Occasionally, the alphabet is recited without them, and thus has only 23 letters: A, B, D, E, G, H, I, J, K, L, M, N, O, P, R, S, T, U, V, Õ, Ä, Ö, Ü.

Additionally C, Q, W, X, and Y are used in writing foreign proper names. They do not occur in Estonian words, and are not officially part of the alphabet. Including all the foreign letters, the alphabet consists of the following 32 letters:

In Blackletter script W was used instead of V. In some reference works (e. g. Estonian Soviet Encyclopedia), V and W were sorted as if they were one and the same letter.

Johannes Aavik suggested that the letter Ü be replaced by Y, as it has been in the Finnish alphabet.

Double letters are used to write half-long and overlong vowels and consonants, e. g. aa or, nn  or , kk. For more information, see below.

As the distinction between voiced and voiceless plosives is not native to Estonian, the names of the letters 'b', 'd', 'g' may be pronounced, , , so the letters 'b' and 'd' are also named nõrk B (weak B) and nõrk D (weak D) to distinguish them from tugev P (strong P) and tugev T (strong T). About usage of these letters, see below.

Orthographic principles
Although the Estonian orthography is generally guided by phonemic principles, with each grapheme corresponding to one phoneme, there are some historical and morphological deviations from this: for example the initial letter 'h' in words, preservation of the morpheme in declension of the word (writing b, g, d in places where p, k, t is pronounced) and in the use of 'i' and 'j'. Where it is impractical or impossible to type š and ž, they are substituted with sh and zh in some written texts, although this is considered incorrect. Otherwise, the h in sh represents a voiceless glottal fricative, as in Pasha (pas-ha); this also applies to some foreign names.

Some features of the modern Estonian orthography are:
 * Word-initial b, d, g occur only in loanwords and are normally pronounced as, , . Some old loanwords are spelled with p, t, k instead of etymological b, d, g: pank 'bank'. Word-medially and word-finally, b, d, g represent short plosives (may be pronounced as partially voiced consonants), p, t, k represent half-long plosives , and pp, tt, kk represent overlong plosives ; for example: kabi  'hoof' — kapi  'wardrobe' [gen sg] — kappi /kɑpːːi/ 'wardrobe [ine sg]'.
 * Before and after b, p, d, t, g, k, s, h, f, š, z, ž, the sounds, , are written as p, t, k, with some exceptions due to morphology or etymology. For example, the suffixed particle -gi 'too, also' may become -ki, but does not alter the spelling of the stem, so kõrb 'desert' + -gi becomes kõrbki.
 * Word-initial is usually dropped in spontaneous speech, but should be represented in writing.
 * The letter j is used at the beginning of syllables, but i is used at the end of diphthongs. Double j is used only in some illative case forms. The spelling üü before vowels corresponds to the pronunciation : müüa 'he sells' (from müüma 'to sell'). The spelling üi is used only in the loanwords rüiu, rüiuvaip, süit. Between i and vowels, the epenthetic sound is pronounced but not written. It is, however, written in the suffix -ja.
 * Vowels and the consonants h, j, l, m, n, r, s, v are written single when they are short, double when they are half-long or overlong: vere /vere/ 'blood [gen sg]' — veere /veːre/ 'edge [gen sg]' — veere /veːːre/ 'roll [imp 2nd sg]', lina 'sheet' — linna  'town [gen sg]' — linna  'town [ine sg]'.
 * Diphthongs and consonant combinations are written as combinations of single letters, regardless of whether they are pronounced short or long. Only s after l, m, n, r may be doubled if not followed by another consonant (valss "waltz"), otherwise combinations "consonant+double consonant" and "double consonant+consonant" occur only in morpheme boundaries, e. g. modernne 'modern' (-ne is a suffix), pappkarp 'cardboard box' (from papp 'cardboard' and karp 'box'). However, a double consonant at the end of a root is simplified before a suffix beginning with a consonant (except -gi/-ki): linlane 'townsman' (from linn 'town').
 * The single word-medial or word-final letters f and š represent half-long consonants, the double letters ff and šš represent overlong consonants . After consonants, f and š are always written single, regardless of whether they are pronounced half-long or overlong.
 * Palatalization is not indicated in writing, e. g. kann /ˈkɑnː/ 'jug' — kann /ˈkɑnʲː/ 'toy'. It occurs in words that have i in declension: kanni 'toy [gen sg and part sg]'.
 * Stress is not indicated in writing. Usually it falls on the first syllable, but there are a few exceptions with the stress on the second syllable: aitäh 'thanks', sõbranna 'female friend'. Often the original stress is preserved in loanwords, such as ideaal 'ideal', professor 'professor'; presence of long vowels (as in ideaal) also shows stress.

Syllabification
One consonant between two vowels belongs to the following syllable: kala 'fish' is syllabified ka-la. Consonant combinations are syllabified before the last consonant: linna 'town [gen sg]' is syllabified lin-na, tutvus 'acquaintance' is syllabified tut-vus. Consonant digraphs and trigraphs in foreign names are regarded as single consonants: Manchester is syllabified Man-ches-ter. Two vowels usually form a long vowel or a diphthong, e. g. laulu 'song is syllabified lau-lu. However, a hiatus is formed in morpheme bounds, e. g. avaus 'opening' is syllabified a-va-us as the word is composed from the root ava- and the suffix -us. Combinations of three vowel letters represent a hiatus of a long vowel or a diphthong and another vowel, e. g. põuane 'dry, droughty, arid (lacking rain)' is syllabified põu-a-ne; but some loanwords have a hiatus of a short vowel followed by a long vowel: oaas 'oasis' is syllabified o-aas. Compound words are syllabified as combinations of their parts: vanaema 'grandmother' is syllabified as va-na-e-ma as the word is composed from vana 'old' and ema 'mother'. Etymologically compound loanwords and foreign names may be syllabified as compound or simple words: fotograaf 'photographer' is syllabified fo-to-graaf or fo-tog-raaf, Petrograd is syllabified Pet-ro-grad or Pet-rog-rad.

These syllabification rules are used for hyphenating words at the end of line, with the additional rule that a single letter is not left on a line.

Foreign words
Loanwords are normally adapted to Estonian spelling: veeb 'web', džäss 'jazz'. However, foreign words and phrases sometimes may be used in the original spelling, such as Latin phrases, Italian musical terms, exotic words. Such citations are typographically emphasized using italics and declined using apostrophe: croissant ' id 'croissants'.

Foreign proper names from Latin-script languages are written in their original spelling: Margaret Thatcher, Bordeaux. Names from non-Latin-script languages are written using either Estonian orthographic transcription or established romanization systems. Some geographical names (and some names of historical personalities, such as monarchs) have traditional Estonian forms (including some adapted spellings such as Viin for German Wien 'Vienna').

Derivations from foreign proper names with the suffixes -lik, -lane, -lus, -ism, -ist usually conserve the spelling of names (e. g. thatcherism, bordeaux'lane), but a few are adapted by established tradition: marksism, darvinism, luterlus. Derivations without suffixes or with other suffixes are adapted to Estonian spelling: njuuton 'newton' (physical unit), haimoriit 'maxillary sinusitis' (inflammation of antrum of Highmore), üterbium 'ytterbium', šeikspiroloog 'Shakespearologist', etc.

Expressions such as Celsiuse kraad 'degree Celsius', Cheddari juust 'Cheddar cheese' conserve the spelling of proper names (adding case endings). However, names of plants and animals are usually written in adapted forms, e.g. koloraado mardikas 'Colorado beetle'.

Apostrophe is used when adding case endings to proper names with unusual grapheme-to-phoneme correspondences (such as ending on a consonant orthographically but on a vowel phonetically or vice versa), e.g. Provence'i (genitive of Provence).

Capitalization
Capital letters are written at the beginning of the first word in a sentence, proper names, and official names functioned as proper names. May be used in the pronouns Sina 'you (singular)' and Teie 'you (plural, also used as formal singular)' to show respect.

Names of months, days of the week, holidays, Chinese zodiac years, and titles of people such as professor are not capitalized.

Titles of books, films, etc. are written in quotation marks with only the first word and proper names capitalized.

Compound words
Compound words are written as one word, but they are often composed of genitive+nominative and hard to distinguish from simple word combinations. A compound word is considered a single word and written together when: 1)it has a separate meaning, e. g. peatükk 'chapter' but pea tükk 'part of a head'; 2)it is different from the genitive+nominative combination, e. g. vesiveski (nominative+nominative) 'watermill'; 3)some combinations may be together or separately, but writing them together is preferred in more complex word phrases: erakonna liige 'member of a party' — iga erakonnaliige 'every member of the party'. Rare and long word combinations are typically written separately.

The hyphen is used: 1)in compounds where one of the parts is a letter (C-vitamiin 'vitamin C'), an initialism (teksti-TV 'text TV'), a foreign citation (nalja-show 'joke show') or a word part (kuni-sõna 'word containing kuni '); 2)in compound adjectives where the first part as a proper name; 3)in compound geographical names such as Lõuna-Eesti 'South Estonia'; 4)as a suspended hyphen, e. g. kuld- ja hõbeesemed 'gold and silver things' (also in compound words such as ekspordi-impordipank 'export-import bank'); 5)in "nominative+ablative" adverbs, e. g. päev-päevalt 'day after day'; 6)in dvandva compounds, e. g. isa-ema 'father and mother'; 7)in compound adjectives from word phrases, e. g. katselis-foneetiline 'related to tentative phonetics'; 8)in compound adjectives with coordinating meaning, e. g. eesti-inglise sõnaraamat 'Estonian-English dictionary'; 9)in double names such as Ulla-Liisa. It can be optionally used in unusual compounds such as karusmarja-jahukaste 'gooseberry disease'; in compounds with three or four identical letters in a row (e. g. iga-aastane 'yearly', luu-uure 'bone groove'); in compounds with numbers (see below) or with signs (e.g. +-märk '+ sign'); in the construction 'genitive of a proper name + nominative' after another genitive (e. g. Venemaa Euroopa-osa 'European part of Russia'); in the colloquial construction 'genitive of a proper name + noun' instead of 'noun + proper name', e. g. Kuuse-onu instead of onu Kuusk 'Uncle Kuusk'; in ad hoc compounds such as aega-küll-meeleolu; in words from two-or-more-component proper names, e. g. françois-villon'lik, buenos-aireslane.

Abbreviations
The abbreviation period (full stop) may be used, but it is not mandatory. Commonly used abbreviations are usually written without the abbreviation period: t, tn, or tän for tänav 'street'; vt for vaata 'see'; jpt for ja paljud teised 'and many others'. Using the abbreviation period is recommended when an abbreviation may be misread as another word: joon. for joonis 'figure, draft' but joon 'line'. If an abbreviation of a word phrase may be mistaken for a word or for another abbreviation, periods are used after every letter but the last one, and spaces are not used: e.m.a for enne meie ajaarvamist but ema 'mother', m.a.j for meie ajaarvamise järgi but maj for majandus 'economy'.

The hyphen is used in some abbreviations of compound words, e. g. ped-dr for pedagoogikadoktor 'doctor of pedagogy', kpt-ltn for kaptenleitnant 'capitan lieutenant', especially in the construction abbreviation + complete word, such as rb-paneelid for raudbetoonpaneelid 'reinforced concrete panels'.

Numerals
Numerals may be written in words (üks 'one', kaks 'two', kolm 'three'...) or in figures (1, 2, 3, ...). In Estonian texts, the comma is used as the decimal separator, and the space is used as thousands separator (in financial documents, the point can be used as thousands separator to avoid inserting an extra digit). The point as a separator is used for dates, daytime, prices, and sports results in meters and centimeters. For prices in euros and cents, writing € 84.95 as well as 84,95 € is accepted. Daytime in hours and minutes (24-hour format) may be written using the point or the colon (without spaces): 16.15 or 16:15; but seconds are separated by the point: 16:15.25. The colon with spaces is used for ratios. e. g. 2 : 3.

When written in words, numerals with -teist or -teistkümment (11 to 19), -kümmend (tens) and -sada (hundreds) are written together, e. g. viisteist(kümment) 'fifteen', viiskümmend 'fifty', viissada 'five hundred'. Other compound numerals are written separately: kakskümmend viis 'twenty-five'.

For writing ordinal numbers in figures, the ordinal dot is used: 16. for kuueteistkümnes 'the sixteenth'. In some cases, ordinals are written as Roman numerals (without the ordinal dot). Roman numerals followed by a dot may be used in numbered lists.

Case forms of cardinal and ordinal numerals may be written in the form "figures+case ending" with or without a hyphen: 16s or 16-s for kuueteistkümnes 'sixteen [inessive]', 16ndas or 16-ndas for kuueteistkümnendas 'the sixteenth [inessive]'. For case endings beginning with the letter l, the hyphen is mandatory to avoid confusion with the digit 1: 16-le for kuueteistkümnele 'sixteen [allative]'. Case endings after figures are not used when a cardinal or ordinal numeral is in a case concordance with a following noun. Likewise, compound words with numbers written in figures may be written with or without the hyphen: 60vatine lamp or 60-vatine lamp for kuuekümnevatine lamp '60-watt light'.

Punctuation
The period (full stop) is used at the end of sentences, as an ordinal mark and sometimes as an abbreviation mark and as a number separator (see above).

The comma is used for appositions (but appositions in genitive require the comma only before them), for more than one attribute after a determined word, for enumerations (but the serial comma is not used), between coordinated or subordinated clauses, between direct speech and author's words, before and after parenthetic or vocative phrases, and before and after some other constructions. It is also used between placenames and dates in the nominative case (but not in locative cases); between a surname and a given name, if they are written in this order; before parts of and address; and as a decimal mark.

The colon is used before lists, before direct speech, before explanations, and also in writing daytime and ratios (see above).

The semicolon is used between weakly related parts of sentences, especially containing commas.

The hyphen is used for writing compound words (see above). It is also used for hyphenating words at the end of line, for declining letters and abbreviations, and optionally for declining acronyms/initialisms, numbers, and symbols.

The dash is used when there appears a generalizing word after an enumeration; instead of the comma for accenting clauses and appositions or for relatively long parenthetical constructions; before words indicating surprise; for slight pauses (interchangeably with the ellipsis); in the meaning "from...to" (instead of the word kuni); for indicating lines or routes (when in attributive function, the hyphen is also accepted); between coordinated attributes if at least one attribute has a hyphen or a space; between remarks of a dialogue written as one line without author's words; as a marker before enumeration items. The dash is not used to indicate omission of a word that would be repeated.

The exclamation and question marks are used at the end of exclamative and interrogative sentences. Occasionally, they may be parenthesized and written after words within sentences to show doubt or surprise. The exclamation mark is also used for addressing people in letters, e. g. Austatud professor Pirk!. Using the comma or the colon in this case is considered inappropriate.

The quotation marks, written as „ ”, are used for direct speech, citations, scare quotes, and names of books, documents, episodes, enterprises, etc. Names of plant sorts may be written in double or in single quotation marks (looking like apostrophes: ’ ’) and are normally italicized. For cited words and phrases, including words in a linguistic context, quotation marks or italics may be used. Quotation marks are not used in the names of institutions, periodicals, awards, wares, and vehicles.

The apostrophe is used for adding case endings and suffixes to foreign names with unusual grapheme-to-phoneme correspondences and to foreign citations in the original spelling (see above). Sometimes the apostrophe is used for adding case endings and suffixes to Estonian names, to make the original form clear: Metsa’le (allative of the surname Metsa), mutt’lik (the apostrophe is used to conserve the spelling of the surname Mutt, otherwise the double consonant would become a single consonant). Also, the apostrophe is sometimes used in poetry to indicate omission of a sound: õitsel', mull', sull'  instead of õitsele, mulle, sulle are found in Lydia Koidula's poems. Single quotation marks (’ ’) are used for word meanings in a linguistic context.

The parentheses are used for parenthetical words or sentences, and also for optional parts of words in a linguistic context.

The square brackets are used for citer's notes to citations and for showing pronunciation in linguistic and reference works.

The slash is used for division in fractions and unit symbols, for connecting alternatives, to show line breaks when citing poetry in the single-line format, and for non-calendar years. In practice, it occasionally appears in abbreviations made of more than one word (e. g. õ/a for õppiaasta 'school year'), but this usage is considered nonstandard (correct abbreviation: õa). Spaces are used before and after the slash only if it separates text fragments of more than one word.

The ellipsis is used for slight pauses and for unfinished thoughts. It is surrounded by spaces. Also, the ellipsis is used for bowdlerizing obscene words.

History
Modern Estonian orthography is based on the Newer Orthography created by Eduard Ahrens in the second half of the 19th century based on Finnish orthography. The Older Orthography it replaced was created in the 17th century by Bengt Gottfried Forselius and Johann Hornung based on standard German orthography. In the old orthography, single consonants following short vowels were written double even if they are short (kala 'fish' was written as kalla) and long vowels in an open syllable were written single (looma 'to create' was written as loma). Before Otto Wilhelm Masing introduced the letter õ in the early 19th century, its sound had not been distinguished in writing from ö. Earlier writing in Estonian had by and large used an ad hoc orthography based on Latin and Middle Low German orthography. Some influences of the standard German orthography — for example, writing 'W'/'w' instead of 'V'/'v' persisted well into the 1930s.

In Fraktur typesetting (which was common in Estonian publications before the first half of the 20th century), two kinds of the small letter s were distinguished: the short s and the long ſ. The long ſ was used at the beginning and in the middle of syllables, and the short s was used at the end of syllables. For example: kaſs 'cat' — kasſi 'cat [gen. sg., part. sg.]'.

Estonian words and names quoted in international publications from Soviet sources were often back-transliterations from the Russian transliteration. Examples are the use of я ("ya") for ä (e.g. Pyarnu (Пярну) for Pärnu), ы ("y") for õ (e.g., Pylva (Пылва) for Põlva) and ю ("yu") for ü (e.g., Pyussi (Пюсси) for Püssi). Even in the Encyclopædia Britannica one can find "ostrov Khiuma", where "ostrov" means "island" in Russian and "Khiuma" is back-transliteration from Russian instead of "Hiiumaa" (Hiiumaa > Хийума(а) > Khiuma).