Centum and satem languages



Languages of the Indo-European family are classified as either centum languages or satem languages according to how the dorsal consonants (sounds of "K", "G" and "Y" type) of the reconstructed Proto-Indo-European language (PIE) developed. An example of the different developments is provided by the words for "hundred" found in the early attested Indo-European languages (which is where the two branches get their names). In centum languages, they typically began with a sound (Latin centum was pronounced with initial /k/), but in satem languages, they often began with  (the example satem comes from the Avestan language of Zoroastrian scripture).

The table below shows the traditional reconstruction of the PIE dorsal consonants, with three series, but according to some more recent theories there may actually have been only two series or three series with different pronunciations from those traditionally ascribed. In centum languages, the palatovelars, which included the initial consonant of the "hundred" root, merged with the plain velars. In satem languages, they remained distinct, and the labiovelars merged with the plain velars.

The centum–satem division forms an isogloss in synchronic descriptions of Indo-European languages. It is no longer thought that the Proto-Indo-European language split first into centum and satem branches from which all the centum and all the satem languages, respectively, would have derived. Such a division is made particularly unlikely by the discovery that while the satem group lies generally to the east and the centum group to the west, the most eastward of the known IE language branches, Tocharian, is centum.

Centum languages
The centum languages of the Indo-European family are the "western" branches: Hellenic, Celtic, Italic and Germanic. They merged Proto-Indo-European palatovelars and plain velars, yielding plain velars (k, g, gh) only ("centumisation"), but retained the labiovelars as a distinct set.

The Anatolian branch probably falls outside the centum–satem division; for instance, the Luwian language indicates that all three dorsal consonant rows survived separately in Proto-Anatolian. The centumisation observed in Hittite is therefore assumed to have occurred only after the breakup of Proto-Anatolian into separate languages. However, Craig Melchert proposes that proto-Anatolian is indeed a centum language.

While Tocharian is generally regarded as a centum language, it is a special case, as it has merged all three of the PIE dorsal series (originally nine separate consonants) into a single phoneme, *k. According to some scholars, that complicates the classification of Tocharian within the centum–satem model. However, as Tocharian has replaced some Proto-Indo-European labiovelars with the labiovelar-like, non-original sequence *ku, it has been proposed that labiovelars remained distinct in Proto-Tocharian, which places Tocharian in the centum group (assuming that Proto-Tocharian lost palatovelars while labiovelars were still phonemically distinct).

In the centum languages, PIE roots reconstructed with palatovelars developed into forms with plain velars. For example, in the PIE numeral * 'hundred', the initial palatovelar * became a plain velar /k/, as in Latin centum (originally pronounced with /k/, although most modern descendants of Latin have a sibilant there), Greek (he)katon, Welsh cant, Tocharian B kante. In the Germanic languages, the /k/ developed regularly by Grimm's law to become /h/, as in Old English hund(red).

Centum languages also retained the distinction between the PIE labiovelar row (*, *, *) and the plain velars. Historically, it was unclear whether the labiovelar row represented an innovation by a process of labialisation, or whether it was inherited from the parent language (but lost in the satem branches); current mainstream opinion favours the latter possibility. Labiovelars as single phonemes (for example, ) as opposed to biphonemes (for example, ) are attested in Greek (the Linear B q- series), Italic (Latin $⟨qu⟩$), Germanic (Gothic hwair $⟨ƕ⟩$ and qairþra $⟨q⟩$) and Celtic (Ogham ceirt $⟨Q⟩$) (in the so-called P-Celtic languages developed into /p/; a similar development took place in the Osco-Umbrian branch of Italic and sometimes in Greek and Germanic). The boukólos rule, however, states that a labiovelar reduces to a plain velar when it occurs next to or.

The centum–satem division refers to the development of the dorsal series of sounds only at the time of the earliest separation of Proto-Indo-European into the proto-languages of its individual daughter branches; it does not apply to any later analogous developments within any branch. For example, the palatalization of Latin to  or  (often later ) in some Romance languages (which means that modern French and Spanish cent and cien are pronounced with initial /s/ and /θ/ respectively) is satem-like, as is the merger of  with  in the Gaelic languages; such later changes do not affect the classification of the languages as centum.

Linguist Wolfgang P. Schmid argued that some proto-languages like Proto-Baltic were initially centum, but gradually became satem due to their exposure to the latter.

Satem languages
The satem languages belong to the Eastern sub-families, especially Indo-Iranian and Balto-Slavic (but not Tocharian), with Indo-Iranian being the major Asian branch and Balto-Slavic the major Eurasian branch of the satem group. It lost the labial element of Proto-Indo-European labiovelars and merged them with plain velars, but the palatovelars remained distinct and typically came to be realised as sibilants. That set of developments, particularly the assibilation of palatovelars, is referred to as satemisation.

In the satem languages, the reflexes of the presumed PIE palatovelars are typically fricative or affricate consonants, articulated further forward in the mouth. For example, the PIE root *, "hundred", the initial palatovelar normally became a sibilant [s] or [ʃ], as in Avestan satem, Persian sad, Sanskrit śatam, sto in all modern Slavic languages, Old Church Slavonic sъto, Latvian simts, Lithuanian šimtas (Lithuanian is between Centum and Satem languages). Another example is the Slavic prefix sъ(n)- ("with"), which appears in Latin, a centum language, as co(n)-; conjoin is cognate with Russian soyuz ("union"). An [s] is found for PIE in such languages as Latvian, Avestan, Russian and Armenian, but Lithuanian and Sanskrit have  (š in Lithuanian, ś in Sanskrit transcriptions). For more reflexes, see the phonetic correspondences section below; note also the effect of the ruki sound law.

"Incomplete satemisation" may also be evidenced by remnants of labial elements from labiovelars in Balto-Slavic, including Lithuanian ungurys "eel" < * and dygus "pointy" < *. A few examples are also claimed in Indo-Iranian, such as Sanskrit guru "heavy" < *, kulam "herd" < *, but they may instead be secondary developments, as in the case of kuru "make" < * in which it is clear that the ku- group arose in post-Rigvedic language. It is also asserted that in Sanskrit and Balto-Slavic, in some environments, resonant consonants (denoted by /R/) become /iR/ after plain velars but /uR/ after labiovelars.

Some linguists argue that the Albanian and Armenian branches are also to be classified as satem, whereas other linguists argue that they show evidence of separate treatment of all three dorsal consonant rows and so may not have merged the labiovelars with the plain velars, unlike the canonical satem branches.

Assibilation of velars in certain phonetic environments is a common phenomenon in language development. Consequently, it is sometimes hard to establish firmly the languages that were part of the original satem diffusion and the ones affected by secondary assibilation later. While extensive documentation of Latin and Old Swedish, for example, shows that the assibilation found in French and Swedish were later developments, there are not enough records of the extinct Dacian and Thracian languages to settle conclusively when their satem-like features originated.

In Armenian, some assert that /kʷ/ is distinguishable from /k/ before front vowels. Martin Macak (2018) asserts that the merger of *kʷ and *k occurred "within the history of Proto-Armenian itself".

In Albanian, the three original dorsal rows have remained distinguishable when before historic front vowels. Labiovelars are for the most part differentiated from all other Indo-European velar series before front vowels (where they developed into s and z ultimately), but they merge with the "pure" (back) velars elsewhere. The palatal velar series, consisting of Proto-Indo-European *ḱ and the merged *ģ and ģʰ, usually developed into th and dh, but were depalatalized to merge with the back velars when in contact with sonorants. Because the original Proto-Indo-European tripartite distinction between dorsals is preserved in such reflexes, Demiraj argues Albanian is therefore to be considered, like Luwian, neither centum nor satem but at the same time it has a "satem-like" realization of the palatal dorsals in most cases. Thus PIE *ḱ, *kʷ and *k become th (Alb. thom "I say" < PIE *ḱeHsmi), s (Alb. si "how" < PIE. kʷih1, cf. Latin quī), and q (/c/: pleq "elderly" < *plak-i < PIE *plh2-ko-), respectively.

Schleicher's single guttural row
August Schleicher, an early Indo-Europeanist, in Part I, "Phonology", of his major work, the 1871 Compendium of Comparative Grammar of the Indogermanic Language, published a table of original momentane Laute, or "stops", which has only a single velar row, *k, *g, *gʰ, under the name of Gutturalen. He identifies four palatals (*ḱ, *ǵ, *ḱʰ, *ǵʰ) but hypothesises that they came from the gutturals along with the nasal *ń and the spirant *ç.

Brugmann's labialized and unlabialized language groups
Karl Brugmann, in his 1886 work Outline of Comparative Grammar of the Indogermanic Language (Grundriss...), promotes the palatals to the original language, recognising two rows of Explosivae, or "stops", the palatal (*ḱ, *ǵ, *ḱʰ, *ǵʰ) and the velar (*k, *g, *kʰ, *gʰ), each of which was simplified to three articulations even in the same work. In the same work, Brugmann notices among die velaren Verschlusslaute, "the velar stops", a major contrast between reflexes of the same words in different daughter languages. In some, the velar is marked with a u-Sprache, "u-articulation", which he terms a Labialisierung, "labialization", in accordance with the prevailing theory that the labiovelars were velars labialised by combination with a u at some later time and were not among the original consonants. He thus divides languages into die Sprachgruppe mit Labialisierung and die Sprachgruppe ohne Labialisierung, "the language group with (or without) labialization", which basically correspond to what would later be termed the centum and satem groups:

"For words and groups of words, which do not appear in any language with labialized velar-sound [the 'pure velars'], it must for the present be left undecided whether they ever had the u-afterclap."

The doubt introduced in that passage suggests he already suspected the "afterclap" u was not that but was part of an original sound.

Von Bradke's centum and satem groups
In 1890, Peter von Bradke published Concerning Method and Conclusions of Aryan (Indogermanic) Studies, in which he identified the same division (Trennung) as did Brugmann, but he defined it in a different way. He said that the original Indo-Europeans had two kinds of gutturaler Laute, "guttural sounds" the gutturale oder velare, und die palatale Reihe, "guttural or velar, and palatal rows", each of which were aspirated and unaspirated. The velars were to be viewed as gutturals in an engerer Sinn, "narrow sense". They were a reiner K-Laut, "pure K-sound". Palatals were häufig mit nachfolgender Labialisierung, "frequently with subsequent labialization". The latter distinction led him to divide the palatale Reihe into a Gruppe als Spirant and a reiner K-Laut, typified by the words satem and centum respectively. Later in the book he speaks of an original centum-Gruppe, from which on the north of the Black and Caspian Seas the satem-Stämme, "satem tribes", dissimilated among the Nomadenvölker or Steppenvölker, distinguished by further palatalization of the palatal gutturals.

Brugmann's identification of labialized and centum
By the 1897 edition of Grundriss, Brugmann (and Delbrück) had adopted Von Bradke's view: "The Proto-Indo-European palatals... appear in Greek, Italic, Celtic and Germanic as a rule as K-sounds, as opposed to in Aryan, Armenian, Albanian, Balto-Slavic, Phrygian and Thracian... for the most part sibilants."

There was no more mention of labialized and non-labialized language groups after Brugmann changed his mind regarding the labialized velars. The labio-velars now appeared under that name as one of the five rows of Verschlusslaute (Explosivae) (plosives/stops), comprising die labialen V., die dentalen V., die palatalen V., die reinvelaren V. and die labiovelaren V. It was Brugmann who pointed out that labiovelars had merged into the velars in the satem group, accounting for the coincidence of the discarded non-labialized group with the satem group.

Discovery of Anatolian and Tocharian
When von Bradke first published his definition of the centum and satem sound changes, he viewed his classification as "the oldest perceivable division" in Indo-European, which he elucidated as "a division between eastern and western cultural provinces (Kulturkreise)". The proposed split was undermined by the decipherment of Hittite and Tocharian in the early 20th century. Both languages show no satem-like assibilation in spite of being located in the satem area.

The proposed phylogenetic division of Indo-European into satem and centum "sub-families" was further weakened by the identification of other Indo-European isoglosses running across the centum–satem boundary, some of which seemed of equal or greater importance in the development of daughter languages. Consequently, since the early 20th century at least, the centum–satem isogloss has been considered an early areal phenomenon rather than a true phylogenetic division of daughter languages.

Different realizations
The actual pronunciation of the velar series in PIE is not certain. One current idea is that the "palatovelars" were in fact simple velars, and the "plain velars" were pronounced farther back, perhaps as uvular consonants:. If labiovelars were just labialized forms of the "plain velars", they would have been pronounced but the pronunciation of the labiovelars as  would still be possible in uvular theory, if the satem languages first shifted the "palatovelars" then later merged the "plain velars" and "labiovelars". The uvular theory is supported by the following evidence.


 * The "palatovelar" series was the most common, and the "plain velar" was by far the least common and never occurred in any affixes. In known languages with multiple velar series, the normal velar series is usually the most common, which would imply that what have been interpreted as "palatovelars" were more probably simply velars but the labiovelars were most likely still just due to them being the second most common.
 * There is no evidence of any palatalisation in the early history of the velars in the centum branches, but see above for the case of Anatolian. If the "palatovelars" were in fact palatalised in PIE, there would have had to be a single, very early, uniform depalatalisation in all (and only) the centum branches. Depalatalisation is cross-linguistically far less common than is palatalisation and so is unlikely to have occurred separately in each centum branch. In any case it would almost certainly have left evidence of prior palatalization in some of the branches. (As noted above, it is not thought that the centum branches had a separate common ancestor in which the depalatalization could have occurred just once and then have been inherited.)
 * Most instances of the rare to non-existent /a/ phoneme without the /h₂/ laryngeal appear before or after *k, which could be the result of that phoneme being a-coloring, particularly likely if it was uvular /q/, similar to the /h₂/ laryngeal which may have been uvular /χ/. Uvulars coloring and lowering vowels is common cross-linguistically as in languages such as Quechuan or Greenlandic where /i/ and /u/ lower to [e] and [o] when next to uvulars meaning the lowering of /e/ and /o/ to [a] or [ɑ] would be possible, and also occurs in Arabic.

On the above interpretation, the split between the centum and satem groups would not have been a straightforward loss of an articulatory feature (palatalization or labialization). Instead, the uvulars (the "plain velars" of the traditional reconstruction) would have been fronted to velars across all branches. In the satem languages, it caused a chain shift, and the existing velars (traditionally "palatovelars") were shifted further forward to avoid a merger, becoming palatal: > ;  >. In the centum languages, no chain shift occurred, and the uvulars merged into the velars. The delabialisation in the satem languages would have occurred later, in a separate stage.

Related to the uvular theory is the glottalic theory. Both these theories have some support if Proto-Indo-European was spoken near the Caucasus, where both uvular and glottal consonants are common and many languages have a paucity of distinctive vowels.

Only two velar series
The presence of three dorsal rows in the proto-language has been the mainstream hypothesis since at least the mid-20th century. There remain, however, several alternative proposals with just two rows in the parent language, which describe either "satemisation" or "centumisation", as the emergence of a new phonematic category rather than the disappearance of an inherited one.

Antoine Meillet (1937) proposed that the original rows were the labiovelars and palatovelars, with the plain velars being allophones of the palatovelars in some cases, such as depalatalisation before a resonant. The etymologies establishing the presence of velars in the parent language are explained as artefacts of either borrowing between daughter languages or of false etymologies. Having only labiovelars and palatovelars would also parallel languages such as Russian or Irish, where consonants can be either broad and unpalatalized, or slender and palatalized, and is also seen in some Northwest Caucasian languages.

Other scholars who assume two dorsal rows in Proto-Indo-European include Kuryłowicz (1935) and Lehmann (1952), as well as Frederik Kortlandt and others. The argument is that PIE had only two series, a simple velar and a labiovelar. The satem languages palatalized the plain velar series in most positions, but the plain velars remained in some environments: typically reconstructed as before or after /u/, after /s/, and before /r/ or /a/ and also before /m/ and /n/ in some Baltic dialects. The original allophonic distinction was disturbed when the labiovelars were merged with the plain velars. That produced a new phonemic distinction between palatal and plain velars, with an unpredictable alternation between palatal and plain in related forms of some roots (those from original plain velars) but not others (those from original labiovelars). Subsequent analogical processes generalised either the plain or palatal consonant in all forms of a particular root. The roots in which the plain consonant was generalized are those traditionally reconstructed as having "plain velars" in the parent language in contrast to "palatovelars".

Oswald Szemerényi (1990) considers the palatovelars as an innovation, proposing that the "preconsonantal palatals probably owe their origin, at least in part, to a lost palatal vowel" and a velar was palatalised by a following vowel subsequently lost. The palatal row would therefore postdate the original velar and labiovelar rows, but Szemerényi is not clear whether that would have happened before or after the breakup of the parent-language (in a table showing the system of stops "shortly before the break-up", he includes palatovelars with a question mark after them).

Woodhouse (1998; 2005) introduced a "bitectal" notation, labelling the two rows of dorsals as k1, g1, g1h and k2, g2, g2h. The first row represents "prevelars", which developed into either palatovelars or plain velars in the satem group but just into plain velars into the centum group; the second row represents "backvelars", which developed into either labiovelars or plain velars in the centum group but just plain velars in the satem group.

The following are arguments that have been listed in support of a two-series hypothesis:
 * The plain velar series is statistically rarer than the other two, is almost entirely absent from affixes and appears most often in certain phonological environments (described in the next point).
 * The reconstructed velars and palatovelars occur mostly in complementary distribution (velars before *a, *r and after *s, *u; palatovelars before *e, *i, *j, liquid/nasal/*w+*e/*i and before o in o-grade forms by generalization from e-grade).
 * It is unusual in general for palatovelars to move backwards rather than the reverse (but that problem might simply be addressed by assuming three series with different realizations from the traditional ones, as described above).
 * In most languages in which the "palatovelars" produced fricatives, other palatalisation also occurred, implying that it was part of a general trend;
 * The centum languages are not contiguous, and there is no evidence of differences between dialects in the implementation of centumization (but there are differences in the process of satemisation: there can be pairs of satemized and non-satemized velars within the same language, there is evidence of a former labiovelar series in some satem languages and different branches have different numbers and timings of satemization stages). This makes a "centumisation" process less likely, implying that the position found in the centum languages was the original one.
 * Alternations between plain velars and palatals are common in a number of roots across different satem languages, but the same root appears with a palatal in some languages but a plain velar in others (most commonly Baltic or Slavic, occasionally Armenian but rarely or never the Indo-Iranian languages). That is consistent with the analogical generalisation of one or another consonant in an originally-alternating paradigm but difficult to explain otherwise.
 * The claim that in late PIE times, the satem languages (unlike the centum languages) were in close contact with each other is confirmed by independent evidence: the geographical closeness of current satem languages and certain other shared innovations (the ruki sound law and early palatalization of velars before front vowels).

Arguments in support of three series:
 * Many instances of plain velars occur in roots that have no evidence of any of the putative environments that trigger plain velars and no obvious mechanism for the plain velar to have come in contact with any such environment; as a result, the comparative method requires three series to be reconstructed.
 * Albanian and Armenian are said to show evidence of different reflexes for the three different series. Evidence from the Anatolian language Luwian attests a three-way velar distinction > z (probably );  > k;  > ku (probably ).  There is no evidence of any connection between Luwian and any satem language (labiovelars are still preserved, the ruki sound law is absent) and the Anatolian branch split off very early from PIE. The three-way distinction must be reconstructed for the parent language. (That is a strong argument in favor of the traditional three-way system; in response, proponents of the two-way system have attacked the underlying evidence by claiming that it "hinges upon especially difficult or vague or otherwise dubious etymologies" (such as Sihler 1995).) Melchert originally claimed that the change  > z was unconditional and subsequently revised the assertion to a conditional change occurring only before front vowels, /j/, or /w/; however, that does not fundamentally alter the situation, as plain-velar  apparently remains as such in the same context.  Melchert also asserts, contrary to Sihler, the etymological distinction between  and  in the relevant positions is well-established.
 * According to Ringe (2006), there are root constraints that prevent the occurrence of a "palatovelar" and labiovelar or two "plain velars", in the same root, but they do not apply to roots containing, for example, a palatovelar and a plain velar.
 * The centum change could have occurred independently in multiple centum subgroups (at the very least, Tocharian, Anatolian and Western IE), as it was a phonologically natural change, given the possible interpretation of the "palatovelar" series as plain-velar and the "plain velar" series as back-velar or uvular (see above). Given the minimal functional load of the plain-velar/palatovelar distinction, if there was never any palatalisation in the IE dialects leading to the centum languages, there is no reason to expect any palatal residues. Furthermore, it is phonologically entirely natural for a former plain-velar vs. back-velar/uvular distinction to have left no distinctive residues on adjacent segments.

Phonetic correspondences in daughter languages
The following table summarizes the outcomes of the reconstructed PIE palatals and labiovelars in the various daughter branches, both centum and satem. (The outcomes of the "plain velars" can be assumed to be the same as those of the palatals in the centum branches and those of the labiovelars in the satem branches.)

