Hindustani etymology

Hindustani, also known as Hindi-Urdu, is the vernacular form of two standardized registers used as official languages in India and Pakistan, namely Hindi and Urdu. It comprises several closely related dialects in the northern, central and northwestern parts of the Indian subcontinent but is mainly based on Khariboli of the Delhi region. As an Indo-Aryan language, Hindustani has a core base that traces back to Sanskrit but as a widely-spoken lingua franca, it has a large lexicon of loanwords, acquired through centuries of  foreign rule and ethnic diversity.

Standard Hindi derives much of its formal and technical vocabulary from Sanskrit while standard Urdu derives much of its formal and technical vocabulary from Persian and Arabic. Standard Hindi and Urdu are used primarily in public addresses and radio or TV news, while the everyday spoken language is one of the several varieties of Hindustani, whose vocabulary contains words drawn from Persian, Arabic, and Sanskrit. In addition, spoken Hindustani includes words from English and the Dravidian languages, as well as several others.

Hindustani developed over several centuries throughout much of the northern subcontinent including the areas that comprise modern-day India, Pakistan, and Nepal. In the same way that the core vocabulary of English evolved from Old English (Anglo-Saxon) but assimilated many words borrowed from French and other languages (whose pronunciations often changed naturally so as to become easier for speakers of English to pronounce), what may be called Hindustani can be said to have evolved from Sanskrit while borrowing many Persian and Arabic words over the years, and changing the pronunciations (and often even the meanings) of these words to make them easier for Hindustani speakers to pronounce. Many Persian words entered the Hindustani lexicon due to the influence of the Mughal rulers of north India, who followed a very Persianised culture and also spoke Persian. Many Arabic words entered Hindustani via Persian, which had previously been assimilated into the Persian language due to the influence of Arabs in the area. The dialect of Persian spoken by the Mughal ruling elite was known as 'Dari', which is the dialect of Persian spoken in modern-day Afghanistan. Therefore, Hindustani is the naturally developed common language of north India. This article will deal with the separate categories of Hindustani words and some of the common words found in the Hindustani language.

Categorization
Hindustani words, apart from loans, basically derive from two linguistic categories:
 * Indo-Aryan (words classified by grammarians as tadbhava, or "inherited"): Sauraseni Prakrit and its apabhraṃśa, or "corrupted", vernaculars
 * Non-Indo-Aryan (words classified by grammarians as deśaja, or "indigenous"): Austroasiatic (Munda) languages, as well as Dravidian and Tibeto-Burman languages

According to the traditional categorization in Hindi (also found in other Indo-Aryan languages except Urdu) the loanwords are classed as tatsam (तत्सम "as it is, same as therein") for Sanskrit loans and vides͟hī (विदेशी "foreign, non-native") for non-Sanskrit loans, such as those from Persian or English, respectively contrasting with tadbhava and deśaja words.

The most common words in Hindustani are tadbhavas.

Second person pronouns
In Hindustani, the pronoun āp (आप ) "[one]self", originally used as a third person honorific plural, denotes respect or formality (politeness) and originates from Prakrit 𑀅𑀧𑁆𑀧𑀸 appā, which derived from Sanskrit ātman, which refers to the higher self or level of consciousness.

The pronoun tū (तू ) and its grammatically plural form tum (तुम ) (also the second person honorific plural) denote informality, familiarity or intimacy and originate respectively from tuhuṃ and tumma from Prakrit 𑀢𑀼𑀁 tuṃ and its variant 𑀢𑀼𑀫𑀁 tumaṃ, which derived from Sanskrit tvam, nominative singular of युष्मद् yuṣmad (the base of the second person plural pronoun). In modern usage, tū is widely used to display a range of attitudes depending on the context, from extreme informality (impoliteness) to extreme intimacy to outright disrespect and even extreme reverence. Usage of tū in most contexts is considered highly offensive in the formal register except when addressing God as a display of spiritual intimacy. This is very similar to the usage of "thou" in archaic English and many other Indo-European languages showing T–V distinction.

Present "be" verb
One of the most common words in Hindustani, the copula hai (है ) and its plural form haiṉ (हैं ) − present forms of honā (होना, meaning "to be" and originating from Prakrit 𑀪𑁄𑀤𑀺 bhodi derived from Sanskrit bhavati "to happen") − rather originate from the following developments:


 * Sanskrit asti ("to be"; root as) evolves into Prakrit 𑀅𑀢𑁆𑀣𑀺 atthi, which further develops into ahi
 * Ahi evolves into Old Hindi ahai (अहै ; pronounced /əɦəɪ/, not /əɦɛː/ as in Hindustani)

Shortening of ahai occurred in Hindustani resulting in hai probably to fulfill the symmetry of the other grammatical forms of honā. Ahai can be found in some older works of Hindustani literature and its evidence can also be seen in other closely related Indo-Aryan languages such as Marathi (आहे āhe) or Sindhi (آهي āhe).

Perfective "go" verb
The verb jānā (जाना, "to go"), which originates from Prakrit 𑀚𑀸𑀤𑀺 jādi derived from Sanskrit yāti ("to move"; root yā), however has its perfective form originating from another Prakrit word 𑀕𑀬 gaya derived from Sanskrit gata, past participle of gacchati ("to go"; root gam or gacch), for example, in gayā (गया , "went, gone").

Some other words
The word ājā (आजा ) has also been used in Northern India and Pakistan for "grandfather". It is indeed derived from arya meaning "sir" in this case. Jain nuns are addressed either as Aryika or Ajji.

The word dādā (दादा ) also has a similar meaning which varies by region. It is used in some regions for "father", in other regions for "older brother", or even for "grandfather" in other regions. This word is an amalgam of two sources:


 * Sanskrit tāta used to address intimate persons which means either  "sir" or "dear".
 * Tau meaning "father's older brother" is also derived from tāta.

The word baṛā (बड़ा "older, bigger, greater") is derived from the Sanskrit vridhha through Prakrit vaḍḍha.

Desi words
The term Desi words is used to describe the component of the lexicon in Indo-Aryan languages which is non-Indo-Aryan in origin, but native to other language families of the Indian subcontinent. Examples of Desi words in Hindustani include: loṭā (लोटा ) "lota (water vessel)", kapās (कपास ) "cotton", kauṛī (कौड़ी ) "cowrie (shell money)", ṭhes (ठेस ) "wound, injury", jhaṉḍā (झंडा ) "flag", mukkā (मुक्का ) "fist, punch", lakṛī (लकड़ी ) "wood", ṭharrā (ठर्रा ) "tharra (liquor)", čūhā (चूहा ) "mouse, rat", čūlhā (चूल्हा ) "stove, oven", pagṛī (पगड़ी ) "turban", luṉgī (लुंगी ) "lungi (sarong)", ghoṭālā (घोटाला ) "scam", dāṉḍī (दांडी ) "salt", jholā (झोला ) "bag, satchel", ṭakkar (टक्कर ) "crash, collision, confrontation", kākā (काका ) "paternal uncle", uṭpaṭāṉg/ūṭpaṭāṉg (उटपटांग/ऊट-पटांग ) "ludicrous", ḍabbā/ḍibbā (डब्बा/डिब्बा ) "box, container" and jhuggī (झुग्गी ) "hut"

Onomatopoeic words
Nouns: gaṛbaṛ (गड़बड़ ) "disorder, disturbance", dhaṛām (धड़ाम ) "thud", bakbak (बक-बक ) "chatter/chitter-chatter", khusur pusar (खुसुर-पुसर ) "whisper", jhilmil (झिलमिल ) "shimmer", ṭhakṭhak (ठक-ठक ) "knock knock", khaṭpaṭ (खटपट ) "quarrel, disagreement"

Verbs: khaṭkhaṭānā (खटखटाना ) "to knock", gaḍgaḍānā (गडगडाना ) "to rumble, to fuss", jagmagānā (जगमगाना ) "to shine/glitter", hinhinānā (हिनहिनाना ) "to neigh", phusphusānā (फुसफुसाना ) "to whisper"

Adjectives and Adverbs: čaṭpaṭ (चट-पट ) "in a jiffy", tharthar (थर-थर ) "with jerky motion (characteristic of shaking or trembling)", čaṭpaṭā (चटपटा ) "dextrous, spicy", čipčipā (चिपचिपा ) "sticky, slimy", čiṛčiṛā (चिड़चिड़ा ) "irritable", gaṛbaṛiyā (गड़बड़िया ) "chaotic, messy"

Loanwords
Due to the language's status as a lingua franca, Hindustani's vocabulary has a large inventory of loanwords, the largest number of which are adopted from Punjabi. Punjabi borrowings often bear sound changes from the parent Prakrit and Sanskrit vocabulary which did not occur in Hindustani, particularly the preservation of short vowels in initial syllables and the gemination of the following consonant. A certain amount of vocabulary from other South Asian languages, Persian, Arabic, and English has been loaned indirectly into Hindustani through Punjabi. Other Indic languages which exist in a state of diglossia with Hindustani and are prone to mutual borrowing include Rajasthani, the Western Pahari languages, Haryanvi, Bhojpuri, Marathi, Nepali, and Gujarati. Besides these, common sources of loan words include those manually adopted from Classical Sanskrit, Classical Persian, Arabic, Chagatai Turkic, Portuguese and English, as well as Mandarin Chinese and French to a lesser extent.

Phonetic alterations
Many Classical Sanskrit words which were not learned borrowings underwent phonetic alterations. In the vernacular form, these include the merger of Sanskrit श (śa) and ष (ṣa), ण (ṇa) and न (na) as well as ऋ (r̥) and रि (ri). Other common alterations were s͟h [/ʃ/] (श ) becoming s [/s/] (स ), v/w [/ʋ/, /w/] (व ) becoming b [/b/] (ब ) and y [/j/] (य ) becoming j [/dʒ/] (ज ). Short vowels were also sometimes introduced to break up consonant clusters. Such words in Hindi (and other Indo-Aryan languages except Urdu) are called ardhatatsam (अर्धतत्सम "semi-tatsam").

Classical Persian
Persian words which were not later artificially added were loaned from Classical Persian, the historical variety of the tenth, eleventh and twelfth centuries, which continued to be used as literary language and lingua franca under the Persianate dynasties of the Late Middle Ages and Early Modern Era and is not the same as Modern Persian (though the Dari Persian of Afghanistan is a direct descendant).

Borrowings
Persian loanwords in Hindustani are mainly borrowed nouns and adjectives as well as adverbs and conjunctions and some other parts of speech.

From stems:

Present:

Past:

From participles:

Present:

Past:

By adding noun suffix ـِش (-iš):

By forming composite words with Arabic:

Loaned Verbs
A substantial number of Hindustani verbs have been loaned from Punjabi, however, verb stems originating in less closely related languages are relatively rare. There are a few common verbs formed directly out of Persian stems (or nouns in some cases) listed below.

Arabic
Some of the most commonly used words from Arabic, all entering the language through Persian, include vaqt (वक़्त ) "time", qalam (क़लम ) "pen", kitāb (किताब ) "book", qarīb (क़रीब ) "near", sahīh/sahī (सहीह/सही ) "correct, right", g͟harīb (ग़रीब ) "poor", amīr (अमीर ) "rich", duniyā (दुनिया ) "world", hisāb (हिसाब ) "calculation", qudrat (क़ुदरत ) "nature", nasīb (नसीब ) "fate, luck, fortune", ajīb (अजीब ) "strange, unusual", qānūn (क़ानून ) "law", filhāl (फ़िलहाल ) "currently", sirf (सिर्फ़ ) "only, mere", taqrīban (तक़रीबन ) "close to, about", k͟habar (ख़बर ) "news", ak͟hbār (अख़बार ) "newspaper", qilā (क़िला ) "fort", kursī (कुर्सी ) "chair, seat", s͟harbat (शर्बत ) "drink, beverage", muāf/māf (मुआफ़/माफ़ ) "forgiven, pardoned", zarūrī (ज़रूरी ) "necessary", etc.

Chagatai Turkic
There are a very small number of Turkic words in Hindustani, numbering as little as 24 according to some sources, all entering the language through Persian. Other words attributed to Turkish, the most widely spoken Turkic language, are actually words which are common to Hindustani and Turkish but are of non-Turkic origins, mostly Perso-Arabic. Both languages also share mutual loans from English. Most notably, some honorifics and surnames common in Hindustani are Turkic due to the influence of the ethnically Turkic Mughals - these include k͟hānam (ख़ानम ), bājī (बाजी ) "sister", and begam (बेगम ). Common surnames include k͟hān (ख़ान ), čug͟htāʾī (चुग़ताई ), pās͟hā (पाशा ), and arsalān (अर्सलान ). Common Turkic words used in everyday Hindustani are qaiṉčī/qainčī (क़ैंची ) "scissors", annā (अन्ना ) "governess", tamg͟hā (तमग़ा ) "stamp, medal", and čaqmaq (चक़मक़ ) "flint".

Mandarin Chinese
There are not many Chinese words that were loaned into Hindustani in spite of geographical proximity.

Portuguese
A small number of Hindustani words were derived from Portuguese due to interaction with colonists and missionaries. These include the following:

French
A few French loans exist in Hindustani resulting from French colonial settlements in India. Other French words such as s͟hemīz (शेमीज़ ) "chemise" and kūpan (कूपन ) "coupon" have entered the language through English.

English
Loanwords from English were borrowed through interaction with the British East India Company and later British rule. English-language education for the native administrative and richer classes during British rule accelerated the adoption of English vocabulary in Hindustani. Many technical and modern terms were and still are borrowed from English, such as ḍākṭar/ḍôkṭar (डाक्टर/डॉक्टर ) "doctor", ṭaiksī (टैक्सी ) "taxi", and kilomīṭar (किलोमीटर ) "kilometer".

Photo-semantic matching
Some loanwords from English undergo a significant phonetic transformation. This can either be done intentionally, in order to nativize words or to make them sound more or less "English-sounding", or happen naturally. Words often undergo a phonetic change in order to make them easier for native speakers to pronounce while others change due to a lack of English education or incomplete knowledge of English phonetics, where an alternate pronunciation becomes an accepted norm and overtakes the original as the most used pronunciation.