Linguistic typology

Linguistic typology (or language typology) is a field of linguistics that studies and classifies languages according to their structural features to allow their comparison. Its aim is to describe and explain the structural diversity and the common properties of the world's languages. Its subdisciplines include, but are not limited to phonological typology, which deals with sound features; syntactic typology, which deals with word order and form; lexical typology, which deals with language vocabulary; and theoretical typology, which aims to explain the universal tendencies.

Linguistic typology is contrasted with genealogical linguistics on the grounds that typology groups languages or their grammatical features based on formal similarities rather than historic descendence. The issue of genealogical relation is however relevant to typology because modern data sets aim to be representative and unbiased. Samples are collected evenly from different language families, emphasizing the importance of lesser-known languages in gaining insight into human language.

History
Speculations of the existence of a (logical) general or universal grammar underlying all languages were published in the Middle Ages, especially by the Modistae school. At the time, Latin was the model language of linguistics, although transcribing Irish and Icelandic into the Latin alphabet was found problematic. The cross-linguistic dimension of linguistics was established in the Renaissance period. For example, Grammaticae quadrilinguis partitiones (1544) by Johannes Drosaeus compared French and the three ‘holy languages’, Hebrew, Greek, and Latin. The approach was expanded by the Port-Royal Grammar (1660) of Antoine Arnauld and Claude Lancelot, who added Spanish, Italian, German and Arabic. Nicolas Beauzée's 1767 book includes examples of English, Swedish, Lappish, Irish, Welsh, Basque, Quechua, and Chinese.

The conquest and conversion of the world by Europeans gave rise to 'missionary linguistics' producing first-hand word lists and grammatical descriptions of exotic languages. Such work is accounted for in the ‘Catalogue of the Languages of the Populations We Know’, 1800, by the Spanish Jesuit Lorenzo Hervás. Johann Christoph Adelung collected the first large language sample with the Lord's prayer in almost five hundred languages (posthumous 1817).

More developed nineteenth-century comparative works include Franz Bopp's 'Conjugation System' (1816) and Wilhelm von Humboldt's ‘On the Difference in Human Linguistic Structure and Its Influence on the Intellectual Development of Mankind’ (posthumous 1836). In 1818, August Wilhelm Schlegel made a classification of the world's languages into three types: (i) languages lacking grammatical structure, e.g. Chinese; (ii) agglutinative languages, e.g. Turkish; and (iii) inflectional languages, which can be synthetic like Latin and Ancient Greek, or analytic like French. This idea was later developed by others including August Schleicher, Heymann Steinthal, Franz Misteli, Franz Nicolaus Finck, and Max Müller.

The word 'typology' was proposed by Georg von der Gabelentz in his Sprachwissenschaft (1891). Louis Hjelmslev proposed typology as a large-scale empirical-analytical endeavour of comparing grammatical features to uncover the essence of language. Such a project begins from the 1961 conference on language universals at Dobbs Ferry. Speakers included Roman Jakobson, Charles F. Hockett, and Joseph Greenberg who proposed forty-five different types of linguistic universals based on his data sets from thirty languages. Greenberg's findings were mostly known from the nineteenth-century grammarians, but his systematic presentation of them would serve as a model for modern typology. Winfred P. Lehmann introduced Greenbergian typological theory to Indo-European studies in the 1970s.

During the twentieth century, typology based on missionary linguistics became centered around SIL International, which today hosts its catalogue of living languages, Ethnologue, as an online database. The Greenbergian or universalist approach is accounted for by the World Atlas of Language Structures, among others. Typology is also done within the frameworks of functional grammar including Functional Discourse Grammar, Role and Reference Grammar, and Systemic Functional Linguistics. During the early years of the twenty-first century, however, the existence of linguistic universals became questioned by linguists proposing evolutionary typology.

Method
Quantitative typology deals with the distribution and co-occurrence of structural patterns in the languages of the world. Major types of non-chance distribution include:


 * preferences (for instance, absolute and implicational universals, semantic maps, and hierarchies)
 * correlations (for instance, areal patterns, such as with a Sprachbund)

Linguistic universals are patterns that can be seen cross-linguistically. Universals can either be absolute, meaning that every documented language exhibits this characteristic, or statistical, meaning that this characteristic is seen in most languages or is probable in most languages. Universals, both absolute and statistical can be unrestricted, meaning that they apply to most or all languages without any additional conditions. Conversely, both absolute and statistical universals can be restricted or implicational, meaning that a characteristic will be true on the condition of something else (if Y characteristic is true, then X characteristic is true). An example of an implicational hierarchy is that dual pronouns are only found in languages with plural pronouns while singular pronouns (or unspecified in terms of number) are found in all languages. The implicational hierarchy is thus singular < plural < dual (etc.).

Qualitative typology develops cross-linguistically viable notions or types that provide a framework for the description and comparison of languages.

Subfields
The main subfields of linguistic typology include the empirical fields of syntactic, phonological and lexical typology. Additionally, theoretical typology aims to explain the empirical findings, especially statistical tendencies or implicational hierarchies.

Syntactic typology
Syntactic typology studies a vast array of grammatical phenomena from the languages of the world. Two well-known issues include dominant order and left-right symmetry.

Dominant order
One set of types reflects the basic order of subject, verb, and direct object in sentences:


 * Object–subject–verb (OSV)
 * Object–verb–subject (OVS)
 * Subject–verb–object (SVO)
 * Subject–object–verb (SOV)
 * Verb–subject–object (VSO)
 * Verb–object–subject (VOS)

These labels usually appear abbreviated as "SVO" and so forth, and may be called "typologies" of the languages to which they apply. The most commonly attested word orders are SOV and SVO while the least common orders are those that are object initial with OVS being the least common with only four attested instances.

In the 1980s, linguists began to question the relevance of geographical distribution of different values for various features of linguistic structure. They may have wanted to discover whether a particular grammatical structure found in one language is likewise found in another language in the same geographic location. Some languages split verbs into an auxiliary and an infinitive or participle and put the subject and/or object between them. For instance, German (Ich habe einen Fuchs im Wald gesehen - *"I have a fox in-the woods seen"), Dutch (Hans vermoedde dat Jan Marie zag leren zwemmen - *"Hans suspected that Jan Marie saw to learn to swim") and Welsh (Mae ' r gwirio sillafu wedi'i gwblhau - *"Is the checking spelling after its to complete"). In this case, linguists base the typology on the non-analytic tenses (i.e. those sentences in which the verb is not split) or on the position of the auxiliary. German is thus SVO in main clauses and Welsh is VSO (and preposition phrases would go after the infinitive).

Many typologists classify both German and Dutch as V2 languages, as the verb invariantly occurs as the second element of a full clause.

Some languages allow varying degrees of freedom in their constituent order, posing a problem for their classification within the subject–verb–object schema. Languages with bound case markings for nouns, for example, tend to have more flexible word orders than languages where case is defined by position within a sentence or presence of a preposition. For example, in some languages with bound case markings for nouns, such as Language X, varying degrees of freedom in constituent order are observed. These languages exhibit more flexible word orders, allowing for variations like Subject-Verb-Object (SVO) structure, as in 'The cat ate the mouse,' and Object-Subject-Verb (OSV) structure, as in 'The mouse the cat ate.' To define a basic constituent order type in this case, one generally looks at frequency of different types in declarative affirmative main clauses in pragmatically neutral contexts, preferably with only old referents. Thus, for instance, Russian is widely considered an SVO language, as this is the most frequent constituent order under such conditions—all sorts of variations are possible, though, and occur in texts. In many inflected languages, such as Russian, Latin, and Greek, departures from the default word-orders are permissible but usually imply a shift in focus, an emphasis on the final element, or some special context. In the poetry of these languages, the word order may also shift freely to meet metrical demands. Additionally, freedom of word order may vary within the same language—for example, formal, literary, or archaizing varieties may have different, stricter, or more lenient constituent-order structures than an informal spoken variety of the same language.

On the other hand, when there is no clear preference under the described conditions, the language is considered to have "flexible constituent order" (a type unto itself).

An additional problem is that in languages without living speech communities, such as Latin, Ancient Greek, and Old Church Slavonic, linguists have only written evidence, perhaps written in a poetic, formalizing, or archaic style that mischaracterizes the actual daily use of the language. The daily spoken language of Sophocles or Cicero might have exhibited a different or much more regular syntax than their written legacy indicates.

Theoretical issues
The below table indicates the distribution of the dominant word order pattern of over 5,000 individual languages and 366 language families. SOV is the most common type in both although much more clearly in the data of language families including isolates. 'NODOM' represents languages without a single dominant order.

Though the reason of dominance is sometimes considered an unsolved or unsolvable typological problem, several explanations for the distribution pattern have been proposed. Evolutionary explanations include those by Thomas Givon (1979), who suggests that all languages stem from an SOV language but are evolving into different kinds; and by Derek Bickerton (1981), who argues that the original language was SVO, which supports simpler grammar employing word order instead of case markers to differentiate between clausal roles.

Universalist explanations include a model by Russell Tomlin (1986) based on three functional principles: (i) animate before inanimate; (ii) theme before comment; and (iii) verb-object bonding. The three-way model roughly predicts the real hierarchy (see table above) assuming no statistical difference between SOV and SVO, and, also, no statistical difference between VOS and OVS. By contrast, the processing efficiency theory of John A. Hawkins (1994) suggests that constituents are ordered from shortest to longest in VO languages, and from longest to shortest in OV languages, giving rise to the attested distribution. This approach relies on the notion that OV languages have heavy subjects, and VO languages have heavy objects, which is disputed.

Left-right symmetry
A second major way of syntactic categorization is by excluding the subject from consideration. It is a well-documented typological feature that languages with a dominant OV order (object before verb), Japanese for example, tend to have postpositions. In contrast, VO languages (verb before object) like English tend to have prepositions as their main adpositional type. Several OV/VO correlations have been uncovered.

Theoretical issues
Several processing explanations were proposed in the 1980s and 1990s for the above correlations. They suggest that the brain finds it easier to parse syntactic patterns that are either right or left branching, but not mixed. The most widely held such explanation is John A. Hawkins' parsing efficiency theory, which argues that language is a non-innate adaptation to innate cognitive mechanisms. Typological tendencies are considered as being based on language users' preference for grammars that are organized efficiently, and on their avoidance of word orderings that cause processing difficulty. Hawkins's processing theory predicts the above table but also makes predictions for non-correlation pairs including the order of adjective, demonstrative and numeral in respect with the noun. This theory was based on corpus research and lacks support in psycholinguistic studies.

Some languages exhibit regular "inefficient" patterning. These include the VO languages Chinese, with the adpositional phrase before the verb, and Finnish, which has postpositions. But there are few other profoundly exceptional languages. It is suggested more recently that the left-right orientation is limited to role-marking connectives (adpositions and subordinators), stemming directly from the semantic mapping of the sentence. Since the true correlation pairs in the above table either involve such a connective or, arguably, follow from the canonical order, orientation predicts them without making problematic claims.

Morphosyntactic alignment
Another common classification distinguishes nominative–accusative alignment patterns and ergative–absolutive ones. In a language with cases, the classification depends on whether the subject (S) of an intransitive verb has the same case as the agent (A) or the patient (P) of a transitive verb. If a language has no cases, but the word order is AVP or PVA, then a classification may reflect whether the subject of an intransitive verb appears on the same side as the agent or the patient of the transitive verb. Bickel (2011) has argued that alignment should be seen as a construction-specific property rather than a language-specific property.

Many languages show mixed accusative and ergative behaviour (for example: ergative morphology marking the verb arguments, on top of an accusative syntax). Other languages (called "active languages") have two types of intransitive verbs—some of them ("active verbs") join the subject in the same case as the agent of a transitive verb, and the rest ("stative verbs") join the subject in the same case as the patient. Yet other languages behave ergatively only in some contexts (this "split ergativity" is often based on the grammatical person of the arguments or on the tense/aspect of the verb). For example, only some verbs in Georgian behave this way, and, as a rule, only while using the perfective (aorist).

Phonological typology
Linguistic typology also seeks to identify patterns in the structure and distribution of sound systems among the world's languages. This is accomplished by surveying and analyzing the relative frequencies of different phonological properties. Exemplary relative frequencies are given below for certain speech sounds formed by obstructing airflow (obstruents). These relative frequencies show that contrastive voicing commonly occurs with plosives, as in English neat and need, but occurs much more rarely among fricatives, such as the English niece and knees. According to a worldwide sample of 637 languages, 62% have the voicing contrast in stops but only 35% have this in fricatives. In the vast majority of those cases, the absence of voicing contrast occurs because there is a lack of voiced fricatives and because all languages have some form of plosive (occlusive), but there are languages with no fricatives. Below is a chart showing the breakdown of voicing properties among languages in the aforementioned sample.

Languages worldwide also vary in the number of sounds they use. These languages can go from very small phonemic inventories (Rotokas with six consonants and five vowels) to very large inventories (!Xóõ with 128 consonants and 28 vowels). An interesting phonological observation found with this data is that the larger a consonant inventory a language has, the more likely it is to contain a sound from a defined set of complex consonants (clicks, glottalized consonants, doubly articulated labial-velar stops, lateral fricatives and affricates, uvular and pharyngeal consonants, and dental or alveolar non-sibilant fricatives). Of this list, only about 26% of languages in a survey of over 600 with small inventories (less than 19 consonants) contain a member of this set, while 51% of average languages (19-25) contain at least one member and 69% of large consonant inventories (greater than 25 consonants) contain a member of this set. It is then seen that complex consonants are in proportion to the size of the inventory.

Vowels contain a more modest number of phonemes, with the average being 5–6, which 51% of the languages in the survey have. About a third of the languages have larger than average vowel inventories. Most interesting though is the lack of relationship between consonant inventory size and vowel inventory size. Below is a chart showing this lack of predictability between consonant and vowel inventory sizes in relation to each other.