Proto-Austroasiatic language

Proto-Austroasiatic is the reconstructed ancestor of the Austroasiatic languages. Proto-Mon–Khmer (i.e., all Austroasiatic branches except for Munda) has been reconstructed in Harry L. Shorto's Mon–Khmer Comparative Dictionary, while a new Proto-Austroasiatic reconstruction is currently being undertaken by Paul Sidwell.

Scholars generally date the ancestral language to c. 3000 BCE with a homeland in southern China or the Mekong River valley. Sidwell (2022) proposes that the locus of Proto-Austroasiatic was in the Red River Delta area around c. 2500 BCE.

500 Proto-Austroasiatic etyma were published by Paul Sidwell in 2024.

Shorto (2006)
The Proto-Mon–Khmer language is the reconstructed ancestor of the Mon–Khmer languages, a purported primary branch of the Austroasiatic language family. However, Mon–Khmer as a taxon has been abandoned in recent classifications, making Proto-Mon–Khmer synonymous with Proto-Austroasiatic; the Munda languages, which are not well documented, and have been restructured through external language contact, have not been included in the reconstructions.

Proto-Mon–Khmer as reconstructed by Harry L. Shorto (2006) has a total of 21 consonants, 7 distinct vowels, which can be lengthened and glottalized, and 3 diphthongs.

Proto-Mon–Khmer is rich in vowels. The vowels are:
 * *a, *aa
 * *e, *ee
 * *ə, *əə
 * *i, *-iʔ, *ii, *-iiʔ
 * *o, *oo
 * *ɔ, *ɔɔ
 * *u, *uu, *-uuʔ

The diphthongs are:
 * *iə, *uə, *ai

Sidwell & Rau (2015)
Paul Sidwell and Felix Rau (2015) propose the following syllable structure for Proto-Austroasiatic.
 * *Ci(Cm)VCf

Also possible are more complex forms with prefixes and infixes, as well as presyllable "coda-copying" from main syllables.
 * *(Cp(n/r/l))CiVCf

The Proto-Austroasiatic word template was later revised as follows by Sidwell (2023).
 * *(Cv(C)).ˈCVC(ˀ)

Sidwell & Rau (2015) reconstruct 21-22 Proto-Austroasiatic consonants (the reconstruction of *ʄ is uncertain). Sidwell (2024) adds *ɕ.

All of the Proto-Austroasiatic consonants except for implosives and voiced stops can occur as syllable finals (Cf).

All of the Proto-Austroasiatic unvoiced stops and voiced stops, as well as *m-, *N-, *r-, *l-, and *s-, can occur as presyllables or sesquisyllables (Cp).

Medial consonants (Cm) are *-w -, *-r -, *-l -, *-j -, and *-h-.

Sidwell & Rau (2015) reconstructs 8 Proto-Austroasiatic vowels, for which there is vowel length contrast. A long vowel will be appended with triangular colon (ː) instead of doubling.

Proto-Austroasiatic diphthongs are *iə and *uə, and possibly *ie and *uo.

Sidwell (2023) proposes a nine-vowel system for Proto-Austroasiatic, with short and long vowels as follows.

Word structure
Common word structures in Proto-Austroasiatic include *CV(C) and *CCV(C) roots. *CVC roots can also be affixed either via prefixes or infixes, as in *C-CVC or *C⟨C⟩VC (Shorto 2006). Sidwell (2008) gives the following phonological shapes for two types of stems.


 * Monosyllabic: C(R)V(V)C (Note:  is one of the optional medial consonants /r, l, j, w, h/.)
 * Sesquisyllabic: CCV(V)C

Hiroz (2024) proposes disyllabic forms for some Proto-Austroasiatic etyma. A few tentative reconstructions are:


 * #suláː 'leaf'
 * #kuláːʔ 'tiger'
 * #kujáːl 'wind, air'
 * #kil- 'bird, flying animal' (nominal class for birds and flying animals; cf. Old Mon , )
 * #kil-ʔáːk 'crow'
 * #kil-láːŋ 'bird of prey'

Morphology
Sidwell (2008) considers the two most morphologically conservative Mon–Khmer branches to be Khmuic and Aslian. On the other hand, Vietnamese morphology is far more similar to that of Chinese and the Tai languages and has lost many morphological features found in Proto-Mon–Khmer.

The following Proto-Mon–Khmer affixes, which are still tentative, have been reconstructed by Paul Sidwell (Sidwell 2008:257-263).


 * Nominalizing *-n- (instrumental in Kammu, resultative in Khmu)
 * Nominalizing agentive *-m-
 * Nominalizing iterative (expressive of repetitiveness/numerousness) *-l-/*-r-
 * Nominalizing instrumental *-p-
 * Causative *p(V)- (allomorphs: p-, pn-, -m-)
 * Reciprocal *tr-/*t(N)-
 * Stative *h-/*hN- (?)

Roger Blench (2012) notes that Austroasiatic and Sino-Tibetan share many similarities regarding word structure, particularly nominal affixes (otherwise known as sesquisyllables or minor syllable prefixes). Blench (2012) does not make any definitive conclusions about how these similarities could have arisen, but suggests that this typological diffusion might have come about as a result of intensive contact in an area between northern Vietnam, Laos, and northeast Myanmar.

Syntax
Like the Tai languages, Proto-Mon–Khmer has an SVO, or verb-medial, order. Proto-Mon–Khmer also makes use of noun classifiers and serial verb constructions (Shorto 2006).

However, Paul Sidwell (2018) suggests that Proto-Austroasiatic may have in fact been verb-initial, with SVO order occurring in Indochina due to convergence in the Mainland Southeast Asia linguistic area. Various modern-day Austroasiatic languages display verb-initial word order, including Pnar and Wa (Jenny 2015). Nicobarese also displays verb-initial word order.

Lexicon
Below are some Proto-Austroasiatic words relating to animals, plants, agriculture, and material culture from Sidwell (2024).

• *ciʔ 'head louse'
 * Invertebrates

• *ʔmrəɲˀ 'body louse'

• *tŋke(ː)ʔ 'tick'

• *ktaːm 'crab'

• *klɔʔ 'shellfish, snail'

• *kʔiːpˀ 'centipede'

• *suːcˀ 'ant'

• *ksuːˀ 'red (ant)'

• *ʔŋruːɲ 'termite'

• *suːtˀ 'bee'

• *ʔɔːŋ 'wasp, hornet'

• *roaj 'fly (n.)'

• *mɔːs; *(s/c)macˀ 'mosquito'

• *Criːtˀ 'cricket'

• *plɨːm 'leech (land type)'

• *tɟuːˀ, *tɟoːˀ 'worm'

• *kaʔ 'fish (n.)'
 * Vertebrates

• *ʔnduŋ 'eel'

• *bsaɲ 'snake'

• *tlan 'python'

• *trkɔːtˀ 'monitor lizard'

• *Ckuəj 'Calotes lizard'

• *kapˀ 'turtle'

• *ʔrɔkˀ 'frog, toad'

• *ciːm 'bird'

• *ʔiər 'chicken'

• *klaːŋ 'eagle, hawk'

• *kʔaːkˀ 'crow'

• *racˀ, *rəcˀ 'sparrow, swift'

• *mraːkˀ 'peacock'

• *kla(ː)ʔ 'tiger'

• *rwaːjˀ 'tiger spirit'

• *cɔʔ 'dog'

• *kneːˀ 'rat, mouse'

• *tkan 'bamboo rat'

• *prɔːkˀ 'squirrel'

• *kiəɕ 'serow, mountain goat'

• *bɕeʔ 'otter'

• *poːɕ, *puəɕ 'barking deer'

• *ɟkəːɕ, *ɟkɨːɕ 'porcupine'

• *car; *cɔŋ, *cəŋ 'civet'

• *ɟɕɨːˀ, *ɟɕeːˀ 'tree, wood'
 * Plants

• *Clɔːŋ 'tree (trunk)'

• *tɓaŋ 'bamboo shoots'

• *(k)ɗiŋ 'bamboo tube/joint'

• *(kn)ɓatˀ 'grass'

• *plaŋ 'thatching grass'

• *spuːˀ, *spɔːˀ 'thatch grass'

• *lwa(ː)ʔ 'fig tree'

• *ɟriːˀ 'banyan, Ficus'

• *kiəl 'cucumber'

• *ptiːɕ 'mushroom'

• *ʔrmiːtˀ 'turmeric'

• *ʦuːɲ 'fern'

• *ɟmɨːˀ 'creeper, vine'

• *srɔ(ː)ʔ 'paddy rice'
 * Agriculture

• *ɓa(ː)ʔ 'paddy rice'

• *rŋkoːˀ 'husked rice'

• *skaːmˀ 'bran, husk'

• *skɔːj 'millet'

• *sroʔ 'taro'

• *ɓaːj 'bean'

• *ɟeːˀ 'pestle'

• *rŋa(ː)ʔ 'coals'

• *kʦaɕ, *kcah 'charcoal'

• *tɨːl 'plant, sow (v.)'

• *ɟɔːlˀ 'plant seed (v.)'

• *raːm, *rɨːm 'clearing (for swidden) (n., v.)'

• *kacˀ 'pluck, harvest (v.)'

• *kɗoːŋ 'winnowing basket'

• *guːmˀ; *ʔuːm 'winnow (v.)'

• *ɟiər 'blow, winnow (v.)'

• *pɨkˀ 'winnow, fan (v.)'


 * Material culture
 * *ɲaːˀ 'house'
 * *ɗuːŋ 'home, clan territory'
 * *ɟraŋ, *ɟrɔŋ 'post, pillar'
 * *puːŋ, *poaŋ 'window'
 * *Cdaŋ 'walling/fencing material'
 * *taːɲ 'weave (v.)'
 * *ksɛːʔ 'string, cord'
 * *kam 'arrow'
 * *ʔaːkˀ 'bow (n.)'

Numerals are as follows:
 * 1. *muəjˀ, *moːjˀ
 * 2. *ɓaːr
 * 3. *peːˀ
 * 4. *puənˀ
 * 6. *truʔ, *pruʔ
 * 7. *pɔh, *pəɕ
 * 8. *tNɕaːm

Function words
Proto-Austroasiatic personal pronouns determiners, and particles are as follows, with reconstructions from Sidwell & Rau (2015) and Shorto (2006).

Sidwell (2024) revises the personal pronouns as follows.

Branch reconstructions
Austroasiatic branch-level reconstructions include:
 * Proto-Munda: Sidwell & Rau (2015) (list)
 * Proto-Khasic: Sidwell (2012) (list)
 * Proto-Palaungic: Sidwell (2010, 2015) (list 1, list 2)
 * Proto-Khmuic: Sidwell (2013) (list)
 * Proto-Pakanic: Hsiu (2016) (list)
 * Proto-Vietic: Ferlus (2007)
 * Proto-Katuic: Sidwell (2005) (list)
 * Proto-Bahnaric: Sidwell (2011) (list)
 * Proto-Khmeric: Sidwell & Rau (2015), based on Ferlus (1992) (list)
 * Proto-Pearic: Sidwell & Rau (2015); Headley (1985) (list)
 * Proto-Monic: Diffloth (1984) (list)
 * Proto-Aslian: Phillips (2012) (list)
 * Proto-Nicobarese: Sidwell (2018) (list)

Origin and dispersal
Paul Sidwell (2009) suggested that the likely homeland of Austroasiatic is in the Mekong River region, and that the family is not as old as frequently assumed, dating to perhaps 2,000 BCE.

However, Ilia Peiros (2011) criticized Sidwell's 2009 riverine dispersal hypothesis heavily and claimed many contradictions. He showed with his analysis that the homeland of Austroasiatic is somewhere near the Yangtze. He suggests the Sichuan Basin as likely homeland of proto-Austroasiatic before they migrated to other parts of central and southern China and then into Southeast Asia. He further suggests that the family must be as old as proto-Austronesian and proto-Sino-Tibetan or even older.

George van Driem (2011) proposed that the homeland of Austroasiatic is somewhere in southern China. He suggested that the region around the Pearl River (China) is the likely homeland of the Austroasiatic languages and people. He further suggested, based on genetic studies, that the migration of Kra–Dai people from Taiwan replaced the original Austroasiatic language but the effect on the people was only minor. Local Austroasiatic speakers adopted Kra-Dai languages and partially their culture.

Laurent Sagart (2011) and Peter Bellwood (2013) supported the theory of an origin of Austroasiatic along the Yangtze river in southern China.

Genetic and linguistic research in 2015 about ancient people in East Asia suggest an origin and homeland of Austroasiatic in today southern China or even further north.

Integrating computational phylogenetic linguistics with recent archaeological findings, Paul Sidwell (2015) further expanded his Mekong riverine hypothesis by proposing that Austroasiatic had ultimately expanded into Indochina from the Lingnan area of southern China, with the subsequent Mekong riverine dispersal taking place after the initial arrival of Neolithic farmers from southern China. He tentatively suggests that Austroasiatic may have begun to split up 5,000 years B.P. during the Neolithic transition era of mainland Southeast Asia, with all the major branches of Austroasiatic formed by 4,000 B.P. Austroasiatic would have had two possible dispersal routes from the western periphery of the Pearl River watershed of Lingnan, which would have been either a coastal route down the coast of Vietnam, or downstream through the Mekong River via Yunnan. Both the reconstructed lexicon of Proto-Austroasiatic and the archaeological record clearly show that early Austroasiatic speakers around 4,000 B.P. cultivated rice and millet, kept livestock such dogs, pigs, and chickens, and thrived mostly in estuarine rather than coastal environments. At 4,500 B.P., this "Neolithic package" suddenly arrived in Indochina from the Lingnan area without cereal grains and displaced the earlier pre-Neolithic hunter-gatherer cultures, with grain husks found in northern Indochina by 4,100 B.P. and in southern Indochina by 3,800 B.P. However, Sidwell found that iron is not reconstructable in Proto-Austroasiatic, since each Austroasiatic branch has different terms for iron that had been borrowed relatively lately from Tai, Chinese, Tibetan, Malay, and other languages. During the Iron Age about 2,500 B.P., relatively young Austroasiatic branches in Indochina such as Vietic, Katuic, Pearic, and Khmer were formed, while the more internally diverse Bahnaric branch (dating to about 3,000 B.P.) underwent more extensive internal diversification. By the Iron Age, all of the Austroasiatic branches were more or less in their present-day locations, with most of the diversification within Austroasiatic taking place during the Iron Age.

Paul Sidwell (2018) considers the Austroasiatic language family to have rapidly diversified around 4,000 years B.P. during the arrival of rice agriculture in Indochina, but notes that the origin of Proto-Austroasiatic itself is older than that date. The lexicon of Proto-Austroasiatic can be divided into an early and late stratum. The early stratum consists of basic lexicon including body parts, animal names, natural features, and pronouns, while the names of cultural items (agriculture terms and words for cultural artifacts, which are reconstructable in Proto-Austroasiatic) form part of the later stratum.

Roger Blench (2018) suggests that vocabulary related to aquatic subsistence strategies (such as boats, waterways, river fauna, and fish capture techniques) can be reconstructed for Proto-Austroasiatic. Blench (2018) finds widespread Austroasiatic roots for 'river, valley', 'boat', 'fish', 'catfish sp.', 'eel', 'prawn', 'shrimp' (Central Austroasiatic), 'crab', 'tortoise', 'turtle', 'otter', 'crocodile', 'heron, fishing bird', and 'fish trap'. Archaeological evidence for the presence of agriculture in northern Indochina (northern Vietnam, Laos, and other nearby areas) dates back to only about 4,000 years B.P. (2,000 B.C.), with agriculture ultimately being introduced from further up to the north in the Yangtze valley where it has been dated to 6,000 B.P. Hence, this points to a relatively late riverine dispersal of Austroasiatic as compared to Sino-Tibetan, whose speakers had a distinct non-riverine culture. In addition to living an aquatic-based lifestyle, early Austroasiatic speakers would have also had access to livestock, crops, and newer types of watercraft. As early Austroasiatic speakers dispersed rapidly via waterways, they would have encountered speakers of older language families who were already settled in the area, such as Sino-Tibetan.

Sidwell (2021) proposes that the locus of Proto-Austroasiatic was in the Red River Delta area about 4,000-4,500 years before present. Austroasiatic dispersed coastal maritime routes and also upstream through river valleys. Khmuic, Palaungic, and Khasic resulted from a westward dispersal that ultimately came from the Red River valley. Based on their current distributions, about half of all Austroasiatic branches (including Nicobaric and Munda) can be traced to coastal maritime dispersals.