Orthography

An orthography is a set of conventions for writing a language, including norms of spelling, hyphenation, capitalization, word boundaries, emphasis, and punctuation.

Most national and international languages have an established writing system that has undergone substantial standardization, thus exhibiting less dialect variation than the spoken language. These processes can fossilize pronunciation patterns that are no longer routinely observed in speech (e.g., "would" and "should"); they can also reflect deliberate efforts to introduce variability for the sake of national identity, as seen in Noah Webster's efforts to introduce easily noticeable differences between American and British spelling (e.g., "honor" and "honour").

Orthographic norms develop through social and political influence at various levels, such as encounters with print in education, the workplace, and the state. Some nations have established language academies in an attempt to regulate aspects of the national language, including its orthography—such as the Académie Française in France and the Royal Spanish Academy in Spain. No such authority exists for most languages, including English. Some non-state organizations, such as newspapers of record and academic journals, choose greater orthographic homogeneity by enforcing a particular style guide or spelling standard such as Oxford spelling.

Etymology and meaning
The English word orthography dates from the 15th century. It comes from the orthographie, from orthographia, which derives from ὀρθός (orthós, 'correct') and γράφειν (gráphein, 'to write').

Orthography is largely concerned with matters of spelling, and in particular the relationship between phonemes and graphemes in a language. Other elements that may be considered part of orthography include hyphenation, capitalization, word breaks/boundaries, emphasis, and punctuation. Orthography thus describes or defines the set of symbols used in writing a language and the conventions that broadly regulate their use.

Most natural languages developed as oral languages and writing systems have usually been crafted or adapted as ways of representing the spoken language. The rules for doing this tend to become standardized for a given language, leading to the development of an orthography that is generally considered "correct". In linguistics, the term orthography is often used to refer to any method of writing a language, without judgment as to right and wrong, with a scientific understanding that orthographic standardization exists on a spectrum of strength of convention. The original sense of the word, though, implies a dichotomy of correct and incorrect, and the word is still most often used to refer specifically to a thoroughly standardized, prescriptively correct, way of writing a language. A distinction may be made here between etic and emic viewpoints: the purely descriptive (etic) approach, which simply considers any system that is actually used—and the emic view, which takes account of language users' perceptions of correctness.

Units and notation
Orthographic units, such as letters of an alphabet, are technically called graphemes. These are a type of abstraction, analogous to the phonemes of spoken languages; different physical forms of written symbols are considered to represent the same grapheme if the differences between them are not significant for meaning. Thus, a grapheme can be regarded as an abstraction of a collection of glyphs that are all functionally equivalent. For example, in written English (or other languages using the Latin alphabet), there are two different physical representations (glyphs) of the lowercase Latin A|letter 'a': a and ɑ. Since, however, the substitution of either of them for the other cannot change the meaning of a word, they are considered to be allographs of the same grapheme, which can be written $⟨a⟩$. The italic and boldface forms are also allographic. Graphemes or sequences of them are sometimes placed between angle brackets, as in $⟨b⟩$ or $⟨back⟩$. This distinguishes them from phonemic transcription, which is placed between slashes, and from phonetic transcription, which is placed between square brackets.

Types
The writing systems on which orthographies are based can be divided into a number of types, depending on what type of unit each symbol serves to represent. The principal types are logographic (with symbols representing words or morphemes), syllabic (with symbols representing syllables), and alphabetic (with symbols roughly representing phonemes). Many writing systems combine features of more than one of these types, and a number of detailed classifications have been proposed. Japanese is an example of a writing system that can be written using a combination of logographic kanji characters and syllabic hiragana and katakana characters; as with many non-alphabetic languages, alphabetic romaji characters may also be used as needed.

Correspondence with pronunciation
Orthographies that use alphabets and syllabaries are based on the principle that the written symbols (graphemes) correspond to units of sound of the spoken language: phonemes in the former case, and syllables in the latter. However, in virtually all cases, this correspondence is not exact. Different languages' orthographies offer different degrees of correspondence between spelling and pronunciation. English, French, Danish, and Thai orthographies, for example, are highly irregular, whereas the orthographies of languages such as Russian, German, and Spanish represent pronunciation much more faithfully, although the correspondence between letters and phonemes is still not exact. Finnish, Turkish, and Serbo-Croatian orthographies more consistently approximate the principle of "one letter per sound."

An orthography in which the correspondences between spelling and pronunciation are highly complex or inconsistent is called a deep orthography (or less formally, the language is said to have irregular spelling). An orthography with relatively simple and consistent correspondences is called shallow (and the language has regular spelling).

One of the main reasons why spelling and pronunciation diverge is that sound changes taking place in the spoken language are not always reflected in the orthography, and hence spellings correspond to historical rather than present-day pronunciation. One consequence of this is that many spellings come to reflect a word's morphophonemic structure rather than its purely phonemic structure (for example, the English regular past tense morpheme is consistently spelled -ed in spite of its different pronunciations in various words). This is discussed further at.

The syllabary systems of Japanese (hiragana and katakana) are examples of almost perfectly shallow orthographies—the kana correspond with almost perfect consistency to the spoken syllables, although with a few exceptions where symbols reflect historical or morphophonemic features: notably the use of ぢ ji and づ zu (rather than じ ji and ず zu, their pronunciation in standard Tokyo dialect) when the character is a voicing of an underlying ち or つ (see rendaku), and the use of は, を, and へ to represent the sounds わ, お, and え, as relics of historical kana usage.

Korean hangul and Tibetan scripts were also originally extremely shallow orthographies, but as a representation of the modern language those frequently also reflect morphophonemic features.

For full discussion of degrees of correspondence between spelling and pronunciation in alphabetic orthographies, including reasons why such correspondence may break down, see Phonemic orthography.

Defective orthographies
An orthography based on a correspondence to phonemes may sometimes lack characters to represent all the phonemic distinctions in the language. This is called a defective orthography. An example in English is the lack of any indication of stress. Another is the digraph, which represents two different phonemes (as in then and thin) and replaced the old letters and. A more systematic example is that of abjads like the Arabic and Hebrew alphabets, in which the short vowels are normally left unwritten and must be inferred by the reader.

When an alphabet is borrowed from its original language for use with a new language—as has been done with the Latin alphabet for many languages, or Japanese katakana for non-Japanese words—it often proves defective in representing the new language's phonemes. Sometimes this problem is addressed by the use of such devices as digraphs (such as and  in English, where pairs of letters represent single sounds), diacritics (like the caron on the letters  and, which represent those same sounds in Czech), or the addition of completely new symbols (as some languages have introduced the letter  to the Latin alphabet) or of symbols from another alphabet, such as the rune  in Icelandic.

After the classical period, Greek developed a lowercase letter system with diacritics to enable foreigners to learn pronunciation and grammatical features. However, as pronunciation of letters changed over time, the diacritics were reduced to representing the stressed syllable. In Modern Greek typesetting, this system has been simplified to only have a single accent to indicate which syllable is stressed.