Phonological history of Hindustani

The inherited, native lexicon of the Hindustani language exhibits a large number of extensive sound changes from its Middle Indo-Aryan and Old Indo-Aryan. Many sound changes are shared in common with other Indo-Aryan languages such as Marathi, Punjabi, and Bengali.

Indo-Aryan etymologizing
The history of Hindustani language is marked by a large number of borrowings at all stages. Native grammarians have devised a set of etymological classes for modern Indo-Aryan vocabulary:


 * Tadbhava (तद्भव, "arising from that") refers to terms that are inherited from vernacular Apabhraṃśa (अपभ्रंश, "corrupted"), from the dramatic Prakrits, and further from Sanskrit. An example is Hindustani jībh "tongue", inherited through Prakrit jibbhā, from Sanskrit jihvā. Such words are the focus of this article.
 * Tatsama (तत्सम, "same as that") refers to words that are borrowed into Hindi or Old Hindi directly from Sanskrit with minor phonological modification (e.g. lack of pronunciation of the final schwa). The Hindi register of Hindustani is associated with a large number of tatsama words through Sanskritisation. An example is Hindustani jihvāmūlīy "guttural", directly from Sanskrit jihvāmūlīya.
 * Ardhatatsama (अर्धतत्सम, "half-same as that") refers to words that are semi-learned borrowings from Sanskrit. That is, words that underwent some tadbhava sound changes, but were adapted on the basis of a Sanskrit word. An example is Hindustani sūraj "sun", which is from Prakrit sujja, from Sanskrit sūrya. We would expect Hindustani *sūj from Prakrit, but the -r- was added later on after the Sanskrit word. Such adaptation to Sanskrit occurred continuously and as early as the Middle Indo-Aryan stage. Adapted words were crucial to determining the date and chronology of sound changes.
 * Deśaj (देशज, "indigenous") refers to words that may or may not be derived from Prakrit, but cannot be shown to have a clear Sanskrit etymon. This is sometimes complicated by Sanskrit re-borrowing of Prakrit words. Such words sometimes derive from Non-Indo-Aryan languages—primarily Austroasiatic (Munda) languages, as well as Dravidian and Tibeto-Burman languages. An example is Hindustani ōṛhnā "to cover up, veil", from Prakrit ǒḍḍhaṇa "covering, cloak", from Dravidian, whence Tamil உடு (uṭu, "to wear").

In the context of Hindustani, other etymological classes of relevance are:


 * Perso-Arabic loanwords, which came to Old Hindi from Classical Persian. The pronunciation is closer to Classical Persian, rather than modern Iranian Persian. The Urdu register of Hindustani is associated with a large number of Perso-Arabic loanwords. An example is Hindustani zubān "tongue, language", from Classical Persian zubān (whence Persian zobân).
 * Borrowings from Northwestern Indo-Aryan. Modern Hindustani, while based primarily on the language of the Khariboli region, comes from a dialectal mixture. Many of the Western Hindi dialects are transitional to Punjabi and the Northwestern Indo-Aryan languages, and have donated words to Hindustani that underwent Northwestern sound changes. We often encounter doublets like Hindustani makkhan "butter", borrowed from Northwestern dialects (compare Punjabi makkhaṇ), and Hindustani mākhan, the native tadbhava term which is now archaic/obsolete outside of fossilized phrases.

Like many other languages, many phenomena in the historical evolution of Hindustani are better explained by the wave model than by the tree model. In particular, the oldest changes like the retroflexion of dental stops and loss of ṛ have been subject to a great deal of dialectal variance and borrowing. In the face of doublets like Hindustani baṛhnā "to increase" and badhnā "to increase" where one has undergone retroflexion and the other has not, it is difficult to know exactly under what conditions the sound change operated. One often encounters sound changes described as "spontaneous" or "sporadic" in the literature (such as "spontaneous nasalization"). This means that the sound change's context and/or isogloss (i.e. dialects in which the sound change operated) have been sufficiently obscured by inter-dialect borrowing, semi-learned adaptations to Classical Sanskrit or Prakrits, or analogical leveling.

From Vedic Sanskrit (ca. 600 BCE)
The sound changes are sometimes shared with the attested Pali language and Ashokan Prakrit inscriptions, referred to here as "Early Prakrit". The order of the changes below is not a hard rule.

Conservative features lost in Vedic Sanskrit
Pali, Prakrit, Hindustani, and many other Indo-Aryan languages preserve some conservative features lost in Vedic Sanskrit. For example, the Sanskrit kṣ cluster can arise from a large number of Proto-Indo-European (PIE) sequences (such as *ks, *tḱ, or *ǵʰs), which have merged in Vedic Sanskrit, but in Middle Indo-Aryan we find that such sequences are partially distinguished (kh- vs ch- vs jh-). PIE /r/ and /l/ are merged to /r/ in Vedic Sanskrit, but the distinction appears to have been somewhat preserved in parallel dialects.

Orthographic changes

 * Before a consonant, ṅ, ñ, ṇ, n, m, and the anusvara ṃ are in complementary distribution. In Sanskrit, each different nasal consonant is typically written out. In later languages, all pre-consonant nasals are written as the anusvara ṃ.
 * The Sanskrit long vowels ē and ō are sometimes romanized as e and o (without the macron) since Sanskrit didn't have short versions of these vowels so there is no ambiguity. Prakrit does have short versions of these vowels (at least allophonically) so this article uses the convention of ē, ō for long vowels and ĕ, ŏ for short vowels.

Early retroflexion
A dental occasionally cerebralizes to a retroflex stop in the environment of a rhotic. This rule origined in the east, and later to the north and northwest; it was less common in the west). Some scholars like Wackernagel argue that the original cases (or borrowings from eastern dialects) with a retroflex stop in the environment of a rhotic, like prati- > paḍi- and mēḍra (already retroflex in Proto-Indo-Aryan *Hmáyẓḍʰram) influence later analogical formation.

It is difficult to pinpoint exactly the conditions in which this change occurs due to a high degree of dialectal borrowing and semi-learned adaptation to Sanskrit. Compare Hindi ādhā "half" (from Sanskrit ardha) and sāṛhe "and a half" (from Sanskrit sārdha) or baḍhānā and badhānā (both meaning "to increase", from Sanskrit vardhāpaya-). Many cerebralized words were old enough to be borrowed back into Sanskrit, like paṭh- "to read" (from older pṛth- "to spread") with a specialized meaning.

Loss of ṛ
Initially, the syllabic liquid -ṛ- becomes ri-. Elsewhere, it becomes -i- (most frequently), but also can become -a- or -u- (especially around labials). In cases where ṛ > a, u, this change is usually seen in most or all modern Indian languages and may reflect some dialectal leveling. The outcome is subject to analogical leveling and umlaut.

Merging of sibilants
s, ṣ, ś > s


 * Sanskrit dēśa > Pali and Prakrit dēsa > Hindustani des "country"

Monophthongizations
aya > ē and ava > ō


 * Sanskrit avaśyā > Pali and Prakrit ŏssā > Hindustani os "dew"
 * Sanskrit avara "lower" > Pali and Prakrit ōra, ōraṃ "to this side" > Hindustani or "side"

Middle Indo-Aryan assimilations
Several changes below will yield a very distinct phonotactic structure in MIA that almost resembles that of Dravidian languages. With the relevant sound change laws in parentheses, the situation in MIA is:
 * (Initial cluster simplification) Word-initial consonant clusters cannot occur. Only a single consonant may occur.
 * (Medial cluster total assimilation) Word-medially, between two vowels we can only find:
 * a single consonant
 * a geminate unaspirated stop
 * an unaspirated stop + the corresponding aspirated stop
 * a nasal + a homorganic stop or non-stop consonant
 * (Loss of word-final consonants) No word-final consonants are tolerated
 * (Two-mora rule) Syllable codas can be at-most two morae long

Regarding the assimilations of Old Indo-Aryan consonant conjuncts, the Jayadhavalā (ca. ninth century AD) writes"dīsaṁti doṇṇi vaṇṇā saṁjuttā aha va tiṇṇi cattāri tāṇaṁ duvvala-lōvaṁ kāūṇa kamō pajuttavvō 'When two, or three or four, consonants appear in combination, elide the weakest one, and continue the process'"Here, "weakest" refers to sounds of higher sonority, and "elide" refers to either true elision/loss or total assimilation of the weaker sound to the stronger sound (the relevant sound changes being alluded to are discussed below). Specifically, the sonority scale of Prakrit is (weakest) h < y < r < v < l < sibilants < nasals < stops (strongest). It will be helpful to keep this notion of "stronger" and "weaker" sounds in mind through the following sound changes.

Palatalization from y and s
t, th, d, dh > c, ch, j, jh / _y and also t, p > c / _s. The weaker y and s sounds are totally assimilated to the stronger stops or lost in Prakrit by later sound changes.


 * Sanskrit sandhyā > Prakrit saṃjhā > Hindustani sā̃jh "evening"
 * Sanskrit tsaru > Prakrit charu "sword handle"
 * Sanskrit apsarā > Prakrit accharā "Apsara"

Optional assimilation to v or m
Occasionally, t, th, d, dh > p, ph, b, bh / _v or _m
 * Sanskrit dvādaśa > Early Prakrit bādasa (> Hindustani bārah "twelve")
 * Sanskrit ātmā > Early Prakrit appā (> Hindustani āp "you")
 * But Sanskrit itvara "strut" > Prakrit ittara (> Hindustani itrānā "to strut")

Excrescent -b-
mr, ml > ṃbr, ṃbl (which becomes Prakrit ṃb through later sound changes)


 * Sanskrit āmra > Prakrit aṃba > Hindustani ām "mango"

Initial cluster simplification
Only the strongest sound in a word-initial cluster is retained. When s or h are in a cluster with a stronger sound, they are lost but cause the stronger sound to become aspirated, if it is not already. In the case of unaspirated stops becoming aspirated, the result is highly regular.
 * Sanskrit grāma > Pali and Prakrit gāma (> Hindustani gā̃v "village")


 * Sanskrit stara > Pali and Prakrit thara > Dialectal Hindustani thar "layer"

When s is in cluster with a nasal, the result is an aspirated nasal. When h is in cluster with nasals, liquids, or glides, the result is aspirated nasals, liquids, and glides. Unlike aspirated stops, aspirated sonorants are not generally considered phonemes in Middle Indo-Aryan. A number of solutions occurred to repair this situation:


 * Early anaptyxis between s and the sonorant (sm- > sum-/sam- and sn- > sin-/san-)
 * Toleration of the aspirated sonorant (at least in earlier Prakrit writing)
 * De-aspiration
 * Late anaptyxis between the sonorant and /ɦ/ (regular in Pali)

At least in Hindustani, aspirated sonorants cannot occur initially, so if aspiration is tolerated in Prakrit then one of the latter two solutions must have occurred before the development of Old Hindi. The exact sound changes that occur are random up to analogy, re-borrowing, and dialectal influence: Anaptyxis is also seen in other cases, most often between a consonant and /l/.


 * Sanskrit klēśa > Prakrit kilēsa (> Hindustani kiles, kales "grief")

Loss of word-final consonants
Final nasals n and m become the anusvara ṃ. The final sequence -aḥ, which already has the variant -ō in sandhi, becomes -ō. Elsewhere, the final consonant is lost without a trace.

Two-mora rule
All over-long (>2 morae) syllables are simplified to 2 morae syllables. For purposes of syllabification, syllable onsets can be only one consonant (and not an aspirated nasal, which always counts as two separate consonants word-medially). In syllable codas consisting of more than one consonant, the weakest consonants are lost:
 * Sanskrit astra > *asta > Prakrit attha "weapon"

A long vowel (2 mora) cannot occur with a consonant coda, so it is always shortened. In the case of long ē and ō, the result is short ĕ and ŏ. Since these short sounds cannot be represented in Middle Indo-Aryan orthographies, as Masica writes, "MIA writers' discomfort with this situation is often indicated by their using sometimes E, sometimes I, and sometimes 0, sometimes U."

The Sanskrit vowels ai and au are always trimoraic/over-long (phonetically, they are better represented as āi and āu), so they are monophthongized to ē̆ and ō̆, respectively.
 * Sanskrit vyāghra > * vāghra > *vaghra > Pali and Prakrit vaggha > Hindustani bāgh "tiger"
 * Sanskrit nētra > *nĕtra > Pali and Prakrit nĕtta
 * Sanskrit lōptra > Pali and Prakrit lŏtta, lutta

In the rarer Sanskrit clusters -CsN-, either the consonant C or the nasal N are deleted, rather than the sibilant.


 * Sanskrit tīkṣṇa > *tiksa or *tisṇa > Prakrit tikkha, tiṇha (> Hindustani tīkhā "sharp")
 * Sanskrit jyotsnā > Prakrit jŏṇhā "moon"
 * Sanskrit pārṣṇi > Prakrit paṇhi "heel"

Medial cluster total assimilation
In medial clusters, the weaker sound assimilates to the stronger sound. To avoid -ChCh- clusters, the first consonant in the cluster is always de-aspirated.
 * Sanskrit dugdha > Prakrit duddha (> Hindustani dūdh "milk")
 * Sanskrit candra > Prakrit caṃda (> Hindustani cā̃d "moon")
 * Sanskrit kartavya > Prakrit kattavva
 * Sanskrit pārśva "the side, to the side of" (> *pārsva > *pasva) > Prakrit passa (> Hindustani pās "near")

Again, s and h lead are assimilated next to stronger sounds, but leave their trace by triggering aspiration of the resulting geminate:


 * Sanskrit hasta > Prakrit hattha (> Hindustani hāth "hand")
 * Sanskrit rāṣṭra > *rasṭa > Prakrit raṭṭha "land, country" (compare Hindustani marāṭhā "Maratha (caste)")

In the case of nasals, -mh- and -ṇh-/-nh- are seen (rather than *-mmh-, etc.).


 * Sanskrit grīṣma > Prakrit gimha

Occasionally, -sm- and -sn- can assimilate to -ss- — Sanskrit vismara- > Prakrit vissara- (> Hindustani bisarnā "to forget"), but also note the Prakrit variants vimhara-, visumara-, etc.

The sequence -sr- can sometimes yield -ṃs-, rather than -ss- — Sanskrit aśru > Prakrit assu, aṃsu (> Hindustani ā̃sū "tear").

In the case of glides, the resulting aspirated geminate sequence was fortified so -hy- > -jjh- and -hv- > -bbh-.

In rare cases, anaptyxis is applied to break the heterogeneous cluster — Sanskrit ratna > Pali ratana
 * Sanskrit jihvā > Prakrit jibbhā (> Hindustani jībh "tongue")

Up to Ashokan Prakrit (ca. 230 BCE)
The following sound changes characterize Early/Ashokan Prakrit, but not Pali:

Fortition of /j/
/y/ > /dʒ/ initially and /jː/ > /dːʒ/ everywhere
 * Sanskrit yaḥ, yo "what" > Pali yo, Prakrit jo (> Hindustani jo "that, what")

Cases of -ēy- and -ī̆y- are sometimes re-analyzed as having a geminate glide and undergo this rule as well.


 * Sanskrit kālēya > Prakrit kālēya, kālijja, *kālĕjja (> Hindustani kalejā "liver")

Other differences

 * In Pali, Sanskrit -jñ- becomes -ññ- (geminate of the palatal nasal). In Prakrit, the result is usually -jj-, but is sometimes -ṇṇ-/-nn- (probably semi-learned)
 * In Pali, geminate -vv- > -bb-, but this never occurred in Prakrit

Up to Dramatic Prakrits (ca. 200 AD)
These changes occur after Pali and Early Prakrit, but before the development of the dramatic regional Prakrits like Maharashtri Prakrit and Shauraseni Prakrit:

Merging of nasals ṇ, n > ṇ
Whether the actual place of articulation of this sound was truly retroflex or was dental (and just orthographically represented as a retroflex nasal) is debated. Regardless, this sound regularly becomes Hindustani dental n later on (but intervocalically, the sound becomes ṇ in other languages like Marathi, Gujarati, and Punjabi).

Lenition of intervocalic consonants
First, single intervocalic unvoiced stops become voiced (k, kh, c, ṭ, ṭh, t, d, p, ph > g, gh, j, ḍ, ḍh, d, dh, b, bh / V_V) Then, for single intervocalic voiced stops:
 * Sanskrit śōka > sōga "sorrow"
 * Sanskrit kapha > kabha "plegm"
 * If non-retroflex, they spirantize. Say g, gh, j, d, dh, b, bh > [ɣ], [ɣʱ], [ʑ], [ð], [ðʱ], [β], [βʱ] / V_V, but it is unknown what the value of these spirants were.
 * Retroflex stops are allowed intervocalically orthographically, but it is likely that they now represent flaps in this environment.

According to Chatterji, this stage is represented by vacillation between writing a voiced stop, semivowel, or nothing.

Then, the aspirated spirants are debuccalized ([ɣʱ], [ðʱ], [βʱ] > h). Remaining [ɣ], [ʑ], [ð] > y and [β] > v. Finally, intervocalic -y- is lost, effectively meaning that /j/ is not a phoneme in Prakrit. This produces a large number of vowels in hiatus (mainly from the previous stages of this lenition process). The following table lists some terms in Sanskrit, Pali, Dramatic Prakrit, and the Hindustani inherited reflex if it exists: Between two ā̆ vowels, hiatus is usually resolved by what Hemachandra, in his grammar of Prakrit, calls a “lightly pronounced y-sound” (laghuprayatnatarayakāraśrutiḥ). As far as orthography/romanization is concerned, this results in the optional inclusion of epethentic -y- or less likely -v- between the ā̆ vowels. For the word derived from Sanskrit nayanam "eye", some Prakrit writers would use ṇayaṇaṃ and others ṇaaṇaṃ. This light epenthetic sound should not be confused with the older, genuine /j/ phoneme.
 * Sanskrit tapaka [t̪ɐ.pɐ.kɐ] > [t̪ɐ.bɐ.gɐ] > [t̪ɐ.βɐ.ɣɐ] > Prakrit tavaya

In all other cases, hiatus is always allowed. Notably, a + i and a + u are romanized as aï and aü to avoid confusion with the Sanskrit overlong vowels, as in Sanskrit pratijña > Prakrit païjja (> Hindustani paij "vow"). Occasionally, the sequences aï and aü can contract in Prakrit to ē̆ and ō̆. This a separate, much earlier change than the later coalescence of vowels in hiatus.

Intervocalic -v- is generally retained, but is sometimes also lost in a similar way.
 * Sanskrit sthavira > Earlier Prakrit ṭhavira > *ṭhaïra > Later Prakrit ṭhēra "old"

Pleonastic Suffixes
Another change worth noting here that will become more prevalent by late MIA and early NIA is the extension of Old Indo-Aryan nominals and roots with pleonastic suffixes. The consensus, implied by the name, is that these innovative suffixes have no semantic purpose and mainly serve to distinguish homophones (created by the sweeping sound changes of Early Prakrit). They are applied after nominal and verb stems, before inflecting suffixes. Some are recognizable as the reflexes of Old Indo-Aryan diminutive suffixes. The most common suffixes are:


 * Feminine -i(y)ā (< earlier -iga, -ikā < Sanskrit -ikā) and masculine -(y)a (< earlier -ga, -ka < Sanskrit -ka). The equivalent Sanskrit endings were already common in Old Indo-Aryan as diminutives, but become more general and common at this stage. These become the "marked" declension of nouns in Hindustani and other Indo-Aryan languages.
 * (Sanskrit karpaṭa >) Prakrit kappaḍa + -a > kappaḍa(y)a > Hindustani kapṛā "clothing"
 * (Sanskrit kaṭa "twist of straw" >) Prakrit kaḍa + -iā > kaḍi(y)ā > Hindustani kaṛī "chain link"
 * Many Sanskrit words were already extended with this suffix. For example, Sanskrit prahēlikā > Prakrit pahēli(y)ā > Hindustani pahelī "riddle, puzzle"
 * -kka
 * Prakrit jhala- "flash" + -kka > Late Prakrit jhalakka- "to burn" > Hindi jhalaknā "to sparkle"
 * (Sanskrit ḍhōla >) Prakrit *ḍhōla + -kka- > *ḍholakka > Hindustani ḍholak "dholak"
 * -ḍa
 * (Sanskrit drava- >) Prakrit dava- + -ḍa- > *davaḍa- > Hindustani dauṛnā "to run"
 * -illa, -la, -lla, or -ulla (in other Indo-Aryan languages, like Marathi and Bengali, these ultimately become tied to the past tense), for which compare the Sanskrit -ila and -ula diminutives.
 * (Sanskrit maṣa- >) Prakrit masa- + -lla > *masalla- > Hindustani masalnā "to crush"
 * -ra, for which compare the Sanskrit -ira diminutives.
 * (Sanskrit pada >) Prakrit paya + -ra > *payara ~ *paara > Hindustani pair "foot"
 * -āve- (< rare Sanskrit -āpaya-) becomes a productive Prakrit causative suffix. The equivalent -āpē- suffix is already common in Pali.
 * (Sanskrit ucca >) Prakrit ucca "high" + -āve- > uccāve- "to raise" > Hindustani ucānā "to raise"

These suffixes are very often combined with each other:


 * (Sanskrit stoka "a drop" >) Prakrit thova ~ thoa ~ thoga + -ḍa + -a > Hindustani thoṛā "a little"
 * (Sanskrit yata "restrained" >) Prakrit jaa + -kka + -ḍa > Hindustani jakaṛnā "to tighten"
 * (Sanskrit matsya >) Prakrit maccha + -l(l)a + -iā > Hindustani machlī "fish"

Up to Late Prakrit (Apabhraṃśa) (ca. 900 AD)
These changes occur after the dramatic regional Prakrits, and characterize the Apabhraṃśa stage. Some of these changes start to differentiate Hindi dialects from other Indo-Aryan languages.

Intervocalic -m- > -w̃-
Intervocalic -m- > -w̃- (a nasalized glide), where the unstable nasal is typically transferred to the preceding vowel.
 * Sanskrit grāma > Pali/Prakrit gāma > Apabhraṃśa gā̃wa > Hindustani gā̃v "village"

Final long vowels are shortened
ā ē ō > a i u
 * Sanskrit sandhyā > Prakrit saṃjhā > Apabhraṃśa saṃjha > Hindustani sā̃jh "evening".

Long ū is shortened to u before another vowel
Later, long ī is sometimes also shortened in this environment
 * Sanskrit bhūta > Prakrit bhūa, bhūaa > Apabhraṃśa huā > Hindi huā "became"
 * Sanskrit dīpaka > Prakrit dīvaa > Hindustani dī̆yā "lamp"

Development of a Latin-like stress system
Abandonment of Vedic lexical stress in favor of a Latin-like positional stress system. Stress falls on the penultimate syllable if it is heavy, failing which it falls on the antepenultimate syllable if it is heavy, failing which it falls on the fourth syllable from the end.

This system retroactively came to characterize Classical Sanskrit, but it can be considered a MIA development that was only fully completed around the Apabhraṃśa stage. It is not seen in Pali, and happened late enough that some modern languages like Marathi, which have vestiges/reflexes of Vedic stress, do not appear to be included in this development.

Prakrit suffixes which are heavy as per the new system were weakened by irregular changes or altogether analogically replaced in order to force stress back onto the root. The changes relevant to Hindustani are:


 * De-gemination of the singular genitive ending from Prakrit -assa to Ap. -asa.
 * Loss of the distinction between Prakrit -ai verbs (< Sanskrit -ati) and -ēi verbs (< Sanskrit -ayati) in favor of the former.

Suffix-weakening
In a few common suffixes and words, s, v, ṇ > h / V_V#. This is different from the broader s > h rules of northwest region, and is part of the Apabhraṃśa tendency to force positional stress onto the root.
 * Sanskrit genitive -asya > Prakrit -assa > -asa (loss of gemination to put stress on the root) > -aha > Old Hindi -aha, -a (irregular haplology) > Hindustani -ø (genitive oblique)
 * Replacement of the plural genitive ending from Prakrit -āṇaṃ to Ap. -ahuṃ , probably from blending from the pronominal locative ending -amhi. This becomes Hindustani oblique -õ ending.
 * Replacement of the first-person ending from Prakrit -āmi and -āmo to Ap. -a(h)uṃ, and of the third-person plural ending from Prakrit -aṃti to a(h)iṃ. The exact development of these endings are uncertain, but they are continued by the Hindi verbal suffixes -ū̃ and -ẽ.
 * Replacement of the neuter plural ending from Prakrit -āṇi to Ap. -aiṃ.
 * Sanskrit caturdaśa > Prakrit caüddasa > Ap. caüddaha > Hindustani caudah "14"
 * Sanskrit eṣo, eṣā "this" > Prakrit eso, esā > Ap. ehu, ehā > Hindustani yah "this"
 * By late Central Apabhraṃśa, the neuter and masculine genders had merged (but not in the west, as modern Marathi and Gujarati retain 3 genders). There was a tendency to apply the old neuter plural suffix -aiṃ to the feminine declension—the Hindustani feminine plural -ẽ thus probably descends from this neuter plural suffix.

Up to Old Hindi (ca. 1300 AD)
Old Hindi marks the start of the New Indo-Aryan (NIA) era from the MIA period. Many of these changes start to distinguish Hindi from nearby languages like Marathi, Gujarati, and Punjabi.

Before, it was convenient to use the nominal/verbal stem as lemma in describing sound changes (e.g. ending in -a for the nominative masculine a-stem). In Hindustani, the nominal's nominative case (from Apabhraṃśa -u, Prakrit -ō) and the verb's infinitive in -nā are the lemma forms. Hence, from now those forms will be used unless otherwise specified.

Diphthongs
The sequences aü and aï become diphthongs au and ai.


 * MIA païjja > Old Hindi paija "vow"
 * MIA caükka > Old Hindi cauka "plaza"

Also to be included are final suffix ahu(ṃ), ahi(ṃ) > au(ṃ), ai(ṃ) – Sanskrit hastānām "of the hands" > Prakrit hatthāṇaṃ > Ap. hatthahuṃ > Old Hindi hātha͠u > Hindustani hāthõ "hands (oblique)".

The glides i and u in hiatus after ā give rise to new overlong diphthongs. The sequences āya and āva also weakened to these overlong diphthongs, and both are written as āya and āva.
 * (Sanskrit rājakula >) Prakrit rāaula, rāula, rāōla > Old Hindi rāvala "prince, royal palace"

Vowel coalescence with glides
This process was already underway in Late Apabhraṃśa. Glides in hiatus of like quality coalesce:
 * MIA d uu ṇaü > Pre-Hindi, Old Braj dūnau > Old Hindi d ū nā "twice"

In the case of unlike vowels in succession:
 * If the first is unstressed i or u and the second vowel is stressed, the vowel becomes a new glide.
 * MIA pi(v)āsa > Old Hindi pyāsa "thirst"
 * If the first is ī̆, ū̆, e, or o and the second vowel is short, unstressed consonant, the unstressed vowel is lost and the stressed vowel becomes long if it was short.
 * MIA thōa + -ḍa + -u > Old Hindi thōṛā "a little"
 * MIA sīalu > Old Hindi sīla "cold"
 * MIA pāṇia > Old Hindi pānī "water"
 * Elsewhere, the short vowel glide is lost.

Vowel coalescence with a
Generally, a + ā, ā + a, and ā + ā (where final short a is not part of a diphthong) all coalesce into ā.
 * Prakrit cittaāra > Old Hindi citāra "painter"
 * Prakrit khāaṇaü > Old Hindi khānā "food, to eat"

Ahead of some suffixes like -ra and -la with short vowels, there is more pressure to separate the suffix from the root, and so the -y- appears to intervene.


 * Prakrit kāyara > Old Hindi kāyara "coward"
 * Prakrit sā(y)ara > Old Hindi sāyara "sea"

Vowel coalescence of a(y)a
The sequence a(y)a (where final short a is not part of a diphthong) generally becomes ai first, and can also contract even further to ē.
 * MIA m a(y)a ṇaü > Pre-Hindi, Old Braj mainau > Old Hindi mainā, mēnā "myna"
 * MIA ka(y)alaü > *kailau, -ā > Old Hindi kēlā "banana"

Similarly, ava (where final short a is not part of a diphthong) contracted to either au or further to o. Other sequences of vowels in hiatus require medial -y-.
 * MIA khavaṇaü > *khaunau, -ā > Old Hindi khonā "to lose"
 * MIA avara > Old Hindi aura "and"
 * MIA ṇava "9" > Old Hindi nau "9"


 * MIA ga(y)aü > Pre-Hindi, Old Braj/Awadhi gayau > Old Hindi gayā. a + au cannot contract so intervening -y- appears.

Turner explains the occasional further contraction of ai > e and au > o (at least for Gujarati) in terms of inherited words versus semi-learned words: in the former the process has had time to go further. A similar explanation of occasions where -y- possessed more reality could be drawn up to word frequency, dialectal borrowing, and semi-learned borrowings.

Weakening and strengthening of v
Intervocalic -v- is lost around -ī̆-. This explains why we have Hindustani tavā "tawa" (< Prakrit tavaa) but taī "griddle" (< Prakrit taviā), both from the same root. Compare Marathi, Punjabi, and Gujarati tavī "griddle". In some cases, like Hindustani dī̆yā < Prakrit dīvau, the variant in -v- (dīvā) is found in Modern Hindustani as a regional variant.

In Hindustani, this process went much further than in other regions, and analogical leveling sometimes caused the -v- to be lost altogether.

Contrary to this weakening, initial v- and medial geminate -vv- strengthens to /b/ — Prakrit vāla > Old Hindi bāla "hair", but Gujarati vāḷ.
 * Prakrit ṇavaa, ṇaviā > Old Hindi navā ~ naī or nayā ~ naī > Hindustani nayā ~ naī "new" (with dialectal/archaic navā). For this, we have Marathi, Gujarati, and Punjabi navā ~ navī.
 * For the causative from Prakrit -āvaṇaa, Old Hindi *-āvanā is expected but instead we find -ānā, leveled from the forms where -v- is lost. Comapre Marathi -āvṇe, Nepali -āunu, and Punjabi -āuṇā. The -v- resurfaces in the causative-of-causative -vānā suffix.

Compensatory lengthening rules
This is one of the most core sound changes of the NIA period, and is almost pan-Indo-Aryan. The change, VCː > VːC, states that MIA geminates are de-geminated, and the preceding short vowel undergoes compensatory lengthening.

A similar process occurs for clusters with a homorganic nasal which results in long nasalized vowels: VṃC > ṼːC.
 * (Sanskrit saṃjha >) Prakrit saṃjha > Old Hindi sā̃jha "evening"

Compensatory lengthening from older geminates was sometimes accompanied by spontaneous (and regionally random) nasalization of the vowel. In some cases this goes back to Prakrit or is otherwise reflected in many NIA languages.


 * (Sanskrit akṣi >) Prakrit akkhi > Old Hindi ā̃kha "eye"
 * (Sanskrit mudga >) Prakrit mugga > Old Hindi mū̃ga "mung bean"

Counter-examples and early cluster simplification
In some counter-examples, gemination appears to have been lost even earlier and thus evaded compensatory lengthening. These are confined to:


 * The MIA participle suffix -aṃta(u), which loses nasalization and becomes Old Hindi -atā̆
 * Geminate consonants in suffixes (e.g. the pleonastic suffixes -akka-, -illa-, and -ulla- and the derivational suffix -appaṇa) all appear to have de-geminated without compensatory lengthening.
 * Geminates after prefixes (e.g. from Sanskrit ud-, nis-, and vi-), unless the prefix syllable carried the positional stress of the word.

This probably represents the last stage of the Apabhraṃśa trend to weaken heavy syllables in order to force/regularize stress onto the preceding "root" syllable; the distinction at this stage being that the result is an intervocalic consonant.

Pre-tonic vowel shortening
A very important set of Central Indo-Aryan shifts that are not seen in languages like Marathi and Bengali. In a pre-tonic position, heavy/long vowels are shortened. In many cases, this change is fed by the vowel lengthening rules above. It results in Hindustani's distinctive ablauting system, since adding heavy/stressed suffixes to a root with a long vowel forces the root's vowel to shorten.


 * ā > a — Old Hindi mī́ṭhā "sweet" but miṭhā́ī (not *mīṭhā́ī)
 * au, ō, ū > u —Old Hindi chōṛanā "to leave" but chuṛānā "to cause to leave, release" (not *chōṛānā, compare -ō- in Old Marathi sōḍavaṇē̃)
 * ai, ē, ī > i — Old Hindi khḗlanā "to play" but khilā́nā "to cause to play" (not *khēlānā)

Pre-tonic nasalized vowels typically become short nasal vowels, though they can also lose nasalization.


 * Old Hindi sā̃pa "snake" + -érā > sãpérā, sapérā "snake-charmer"
 * MIA paṃcāsa > Old Hindi pacāsa "fifty"

Word rhythm shortening
The second part of this change occurs where the long vowels ā, ī, and ū are shortened (even if nasalized) before a sequence of a consonant, short vowel, consonant, and a heavy vowel (i.e. long vowel or diphthong). This explains several alterations present in modern Hindustani:


 * Old Hindi nīcā̆ "low" but nicalā "lower" (not *nīcalā)
 * Old Hindi pūta "son" but putalā "mannequin" (not *pūtalā)
 * Old Hindi mācha "fish" with machalī "fish (dimin.)" (not *māchalī)
 * Old Hindi pãkhaṛī "petal" from Prakrit *paṃkha-ḍ-iā̆ (not *pā̃khaṛī)

In the case of verbs with a long vowel in the root, there is competition throughout the paradigm. Based on the participle in -atā and infinitive in -anā, the root's vowel should be shortened; elsewhere, it should stay lengthened. The result of this is usually a short vowel which has been analogically leveled throughout the paradigm (but see the section below on counter-examples).

A note on counter-examples
The above two rules and their caveats still do not sufficiently explain all cases of vowel length and gemination encountered in Hindustani, but it is closest to the ordering of the rules that Turner proposes in his analyses of Gujarati, Marathi, and Hindi. More complex phenomena must be employed to explain the counter-examples.

The first set of examples are from semi-learned adaptation to Sanskrit. For instance, from Prakrit aṃdhaa we predict Hindustani *ā̃dhā but find andhā "blind", under influence of the Sanskrit etymon andha. From Prakrit suddhi we predict Old Hindi *sūdha (> Hindustani *sūdh) but find sudha "memory, sense" (> Hindustani sudh), under influence of the Sanskrit etymon śuddhi.

The second set of examples are from analogy and morphological processes. In verbs, there is a tendency to associated short root vowels with intransitive verbs and long vowels with transitive verbs, which is inherited from the Sanskrit tendency (compare Sanskrit tapyatē "is heated" and tāpayati "causes to heat up"). Hence, based on Prakrit tappaï "is heated", we find both Hindustani tapnā "is heated" and tāpnā "heats (sthg.) up", where the long-vowel form has been analogically created. Other verbs with a long vowel in the root have either been re-lengthened or evaded rhythmic shortening based analogically on the de-verbal nominal form. For instance, we have nācanā "to dance" (with nāca "dancing") and bā̃dhanā "to bind" (with bā̃dha "bond").

The third set of examples are borrowings from the northwest (whence Punjabi and Sindhi). The vowel lengthening rules did not take place in the northwestern region (words with this sound change in Punjabi and Sindhi are themselves borrowings from other Indo-Aryan languages, like Hindustani). These borrowings, likely from a Western Hindi dialect transitional to Punjabi, result in a large number of doublets in Hindustani, where in many cases the native word has been or is being eclipsed by the borrowed word: The final set of examples occurs in unstressed small words (e.g. postpositions) that were reduced without lengthening. This is probably due to rhythmic vowel shortening across a larger phrase. Compare reductions of English the, a, etc. in unstressed environments. Such words include Old Hindi saba "all" (< Prakrit savva), tujha "you (oblique)" (< Prakrit tujjha), and is "this (oblique)" (< Ap. ĕssa < Prakrit ēassa).

Final nominative -au > -ā
This forms the Hindustani marked direct case in -ā, which ultimately goes back to Prakrit -a(k)ō, from Sanskrit -akaḥ. -au is retained in the second-person plural suffix (from where it later becomes Hindustani -o).

Attenuation of post-tonic and final short vowels to /ǝ/.
A number of words are saved from this lenition by semi-learned lengthening of the final vowel. For instance, from Sanskrit guru > Prakrit guru > Old Hindi gura, but also the semi-learned variant gurū "teacher, guide".

Derivations from MIA
The New Indo-Aryan sound changes were sweeping and fed each other in complex ways. It is helpful to see some derivations:

Suffix weaking
During the Old Hindi stage, final unstressed -ai and -au monophthongized to -e and -o, respectively. Hence, the general third-person singular ending underwent Sanskrit -ati > Prakrit -adi > Apabhraṃśa -aï > Old Hindi -ai > Hindustani -e, but when it was stressed in the monosyllabic Old Hindi hai, it remains unsimplified in Hindustani hai "is".

Schwa deletion
At some point in the Old Hindi stage, unstressed a was reduced to the schwa /ǝ/ and then ə → ∅ / VC_CV. Schwa is also lost at most ends of words, producing word-final consonants, aspirated consonants, and many word-internal clusters. This change is not indicated in the Devanagari script for Hindustani.
 * Old Hindi rāta "night" > Hindustani rāt "night"

Unstressed (short) vowels are also lost in other positions, particularly initial vowels in words of 3 or more syllables or intertonic short vowels.
 * Old Hindi aḍhā́ī > Hindustani ḍhāī "two and a half"
 * Old Hindi sámujhā > Hindustani samjhā "understood"
 * Old Hindi gadahā > Hindustani gadhā "donkey"

This is the source of schwa ablaut in Hindi. For example, the infinitive utarnā "to descend" has the past participle utrā "descended", where the intertonic vowel in Old Hindi utarā has been lost.

A schwa is regularly lost between a consonant and a verbal suffix like -nā.

Lenition of Ṽbh > Vmh and Ṽb > Vm
This change was a dialectal feature, and in regional Hindi variants the archaic form persists.
 * Old Hindi tā̃ba > Hindustani tām "copper (in compounds)", with regional variant tā̃b
 * Old Hindi kũbhāra > Hindustani kumhār "potter", with regional variant kũbhār
 * Old Hindi ā̃ba > Hindustani ām "mango", with regional variant ā̃b (compare Marathi āmbā, where this sound change never occurred)

In some cases, the regional variant which did not undergo this change ended up supplanting the main-dialect form, at least in writing.


 * Extended Old Hindi tā̃bā > Hindustani tā̃bā, with the pronunciation-spelling variant tāmā.
 * Old Hindi sãbhālanā > Hindustani sãbhālnā, with the pronunciation-spelling variant samhālnā

The common root samajh- "to understand" from Prakrit saṃbujjh- should be treated as an irregular case because the ṃbh > mh > m shift and shifting of stress to the first syllable (hence confusion of post-tonic u > a) occurred in Proto-NIA, hence it is present in Old Hindi and languages like Marathi which usually don't have this lenition rule.

Loss of nasal aspiration if not pre-vowel
This rule is fed by schwa-deletion and lenitions of Ṽb(h). It explains why Hindustani has mh in tumhārā "your" but no h in tum "you" (< *tumh < older tumha). There are no cases with nh > n, since these were resolved by suffixation (Old Hindi kānha, -ā > Hindustani kānhā "Krishna" and Old Hindi jonha, junhāī > Hindustani junhāī "moonlight").

Sounds from loanwords
The sounds /f, z, ʒ, q, x, ɣ/ are loaned into Hindi-Urdu from Persian, English, and Portuguese. Sanskrit ṛ is borrowed into Hindustani as /rɪ/, but is pronounced more like /ru/ in languages like Marathi.
 * In Hindi, /f/ and /z/ are most well-established, but can be /pʰ/ or /bʰ/ in rustic speech. /q, x, ɣ/ are variably (by dialect) assimilated into /k, kʰ, g/, respectively, and /ʒ/ is almost never pronounced and substituted by /ʃ/ or /dʒʰ/.
 * /pʰ/ is starting to merge into /f/ in a number of Hindustani dialects.

Monophthongization
Monophthongization of ai to /ɛː ~ æː/ and au to /ɔː/ in many non-Eastern dialects. A separate /æː/ arguably exists in Hindustani by English loanwords.

Shifts before /ɦ/
Before h + a short vowel or deleted schwa, the pronunciation of short a shifts allophonically to short [ɛ] or [ɔ] (only if the short vowel is u). This change is part of the prestige dialect of Delhi, but may not occur to the full degree for every speaker. Often, this step is taken further by assimilation of short vowel after /ɦ/ to [ɛ] or [ɔ], and then by loss of /ɦ/ and coalescence/lengthening of vowels into long /ɛː/ and /ɔː/.

In some cases, different inflections of the same word have differing outcomes:
 * Hindustani bahut /bǝ.ɦʊt̪/ > [bɔ.ɦʊt̪] > [bɔ.ɦɔt̪] > [bɔːt̪] "a lot, many"
 * Hindustani pahlā /pǝɦ.läː/ > [pɛɦ.läː] > [pɛː.läː] "first"
 * Hindustani bahan /bǝ.ɦǝn/ > [bɛ.ɦǝn] > [bɛ.ɦɛn] > [bɛːn] "sister"
 * Hindustani kahnā /kǝɦ.näː/ > [kɛɦ.näː] > [kɛː.näː] "to say", but kahegā "he will say" is still pronounced [kǝ.ɦeː.gäː]

Examples of sound changes
The following table shows a possible sequence of changes for some basic vocabulary items, leading from Sanskrit to Modern Hindustani. All entries are romanized. An empty cell means no change at the given stage for the given item. Only sound changes that had an effect on one or more of the vocabulary items are shown. Words may not be attested at each stage.