Tamil phonology

Tamil phonology is characterised by the presence of "true-subapical" retroflex consonants and multiple rhotic consonants. Its script does not distinguish between voiced and unvoiced consonants; phonetically, voice is assigned depending on a consonant's position in a word, voiced intervocalically and after nasals except when geminated. Tamil phonology permits few consonant clusters, which can never be word initial.

Vowels
The vowels are called உயிரெழுத்து  ('life letter'). The vowels are classified into short and long (five of each type) and two diphthongs.

The long (netil) vowels are about twice as long as the short (kuṟil) vowels. The diphthongs are usually pronounced about 1.5 times as long as the short vowels, though most grammatical texts place them with the long vowels.

Tamil has two diphthongs: ஐ and  ஔ, the latter of which is restricted to a few lexical items. Some like Krishnamurti consider the diphthongs as clusters of /a/ + /j, ʋ/ as they pattern with other VC. The way some words are written also varies e.g. avvai as அவ்வை (avvai), ஔவை (auvai) or அவ்வய் (avvay) (first one most common). Word final is pronounced as, it is called a குற்றியலுகரம் (kuṟṟiyalukaram) "short u" (as it has only half a sound unit, compared to 1, 1.5 or 2 of other vowels) in tolkāppiyam and it is unrounded even in literary Tamil; in spoken Tamil it can occur medially as well in some words after the first syllable. Word final [u] occurs in some names, chiefly male nicknames like rājēndraṉ as rāju.

Colloquially, an initial or  may have a  onglide; likewise, an initial  or  may have a  onglide, e.g.  and. This does not occur in Sri Lankan dialects.

Indian Colloquial Tamil also has nasalized vowels formed from word final vowel + nasal cluster (except for /Vɳ/ where an epenthetic u is added after it). Long vowel + nasal just nasalizes the vowel, short vowel + nasal may also change the quality, for example, /an/ gets fronted to [ɛ̃] அவன் becomes [aʋɛ̃] ([aʋæ̃] for some speakers), /am/ gets rounded to [õ] மரம்  becomes, நீங்களும்  becomes , வந்தான்  becomes , the remaining vowels only get nasalized.

In spoken Tamil sometimes an epenthetic vowel u is added to words ending in consonants, e.g. nil > nillu, āḷ > āḷu, nāḷ > nāḷu (nā in some dialects), vayal > vayalu etc. If another word is joined at the end, it is deleted.

Colloquially, the high short vowels, are lowered to  and  when next to a short consonant and. For example, இடம் becomes ; and உடம்பு  becomes. This is an instance of raising umlaut. It doesn't happen in pronouns and some other words e.g. இவன் ivaṉ and எவன் evaṉ are different words. also monophthongises to an but it causes the lowering of  before it, e.g. ilai > ele. Additionally, the front long vowels and  are subject to retraction when present in the first syllable of a bisyllabic word and followed by a retroflex consonant. As such, "house" becomes, but its inflected form  remains. Likewise, "search!" becomes, but "(he) searched" remains. The presence and degree of retraction for each vowel may be different; it varies between dialects and even individual speakers. Almost all words end with vowels in spoken Tamil.

For some speakers in spoken Tamil the front vowels /i, e/ get rounded to their corresponding rounded back vowels when they are after a labial consonant /m, p, ʋ/ and before a retroflex consonant, some words with it are quite acceptable like பெண் /peɳ/ > பொண்/பொண்ணு [poɳ~poɳ:ɯ] but others like வீடு /ʋi:ʈu/ > வூடு [ʋu:ɖɯ] are less accepted and may even be considered vulgar.

Another change in spoken Tamil is vowel harmony, where vowels change their height to be more similar to nearby vowels: e.g. literary Tamil /koʈu/ > spoken Tamil [kuɖɯ].

Consonants
The consonants are known as மெய்யெழுத்து  ('body letters'). The consonants are classified into three categories with six in each category: valliṉam ('hard'), melliṉam ('soft' or nasal), and iṭayiṉam ('medium'). Tamil has very restricted consonant clusters (for example, there are no word-initial clusters). There are well defined rules for voicing stops in the written form of Tamil, Centamiḻ (the period of Tamil history before Sanskrit words were borrowed). Stops are voiceless when at the start of a word, in a consonant cluster with another stop and when geminated. They are voiced otherwise.

Tamil is characterized by its use of more than one type of coronal consonants: like many of the other languages of India, it contains a series of retroflex consonants. Notably, the Tamil retroflex series includes the retroflex approximant (ழ) (example Tamiḻ; often transcribed 'zh'). Among the other Dravidian languages, the retroflex approximant also occurs in Malayalam (for example in കോഴിക്കോട് 'Kozhikode' /koːɻikkoːɖɨ̆/) and Badaga (for instance, in ಕ್ೞೇಗಿನೆ /kɻeːgɪne/ ). In Telugu, the consonant can be found in inscriptions dated up to 1200 AD and has been substituted with /ɭ/, /ɖ/ or /ɾ/ ever since (although the character is still written and exists in Telugu Unicode block as U+0C34: ఴ, as in నోఴంబ 'Nozhamba' /noːɻɐmbɐ/ ) and disappeared from spoken Kannada around 1000 AD (although the character is still written, and exists in Unicode, ೞ as in ಕೊೞೆ 'Kozhe' /koɻe/). In most dialects of colloquial Tamil, this consonant is seen as shifting to the retroflex lateral approximant in the south and palatal approximant /j/ in the north.

The proto-Dravidian alveolar stop *ṯ developed into an alveolar trill /r/ in the Southern and South Central Dravidian languages while *ṯṯ and *ṉṯ remained (modern ṯṟ, ṉṟ).

[n] and [n̪] are in complementary distribution and are predictable, i.e. they are allophonic. Namely, [n̪] occurs word initially and before /t̪/, while [n] occurs everywhere else.

/ɲ/ is extremely rare word initially and is mostly only found before /t͡ɕ/ word medially; it occurs in geminated form rarely as in aññāṉam or maññai, in singular form in one rare word pūñai. Only around 5 words have doubled intervocalic [ŋ], all are different forms of the word aṅṅaṉam "that manner", apart from that [ŋ] only occurs before /k/.

A chart of the Tamil consonant phonemes in the International Phonetic Alphabet follows:


 * ,, and  are only found in loanwords and may be considered marginal phonemes, though they are traditionally not seen as fully phonemic.
 * 1) Intervocalic  is pronounced as [ɣ~h] by Indian Tamils and [x] in Sri Lanka.
 * 2) For most speakers in spoken Tamil the distinction between the tap and trill is lost except in the southern Kanyakumari dialect.
 * 3) In most spoken dialects, /rr/ [tːr] merges with /t̪t̪/ while some others keep them as [tːr] . /nr/ merges with /ɳɳ/ if the preceding vowel is short or /ɳ/ if it is long like in literary Tamil. In speech, /parri, onru, muːnru/ are realized as [pat̪t̪i, ʷoɳɳɯ, muːɳɯ].
 * 4) /t͡ʃ/ in spoken Tamil varies significantly. Some speakers pronounce it as [s] intervocalically and as an affricate initially, while others have [s] both initially and intervocalically. A final group of speakers has [t͡ʃ] before certain vowels and [s] before others, e.g. சின்ன [t͡ʃin:a] "small" but சாவி [sa:ʋi] "key". However, there are words where the pronunciation is fixed—for example, ceṉṉai and sēlam cannot be pronounced as [senːaɪ̯] and [t͡ʃeːlam]. /t͡ʃː ɲt͡ʃ/ are always [tːʃ n̠ʲd̠ʒ].
 * 5) In spoken Tamil /j/ might cause palatalization to the adjacent consonants and then get assimilated or deleted, e.g. literary Tamil aintu, spoken Tamil añju.
 * 6) In spoken Tamil intervocalic /k, ʋ/ may be deleted sometimes as in /poːkiraːj/ as [poːrɛ] and sometime /ɻ/ with compensatory lengthening of the vowel as in /poɻut̪u/ as [poːd̪ɯ]. Word finally glides, mainly /j/ are generally deleted unless if the word is monosyllabic where its doubled e.g. cey > seyyi, rūpāy > rūbā. Word final l and ḷ in polysyllabic words are deleted especially in pronouns but reappear when a suffix is added e.g. nīṅkaḷ > nīṅga, similar to French liaison.
 * 7) l and ḷ gets assimilated to ṟ, ṭ before plosives and ṉ, ṇ before nasals in literary Tamil, e.g. vil, kēḷ, nal, veḷ > viṟka, kēṭka, naṉmai, veṇmai; in spoken Tamil it is deleted and the next consonant is doubled. Before coronal stops, the stops assimilate to the lateral's POA and the lateral is deleted like in the past tense forms of verbs ending with l and ḷ, e.g. kol-ntu, koḷ-ntu, vil-ttu > koṉṟu, koṇṭu, viṟṟu.
 * 8) As in Proto Dravidian, literary Tamil words can't begin with an alveolar or retroflex consonant but in spoken Tamil some words begin with r and l because of deletion of the initial vowel, e.g. literary Tamil iraṇṭu, spoken Tamil raṇḍu~reṇḍu. In loanwords, a short i, u or a is added before them, e.g. Skt. loka, Tamil ulakam.
 * 9) In loanwords, the voiced and aspirated plosives are all loaned as the plain plosive. Of the fricatives, h is loaned as k, the sibilants as c, ts as cc and kṣ as ṭc; sometimes s and ṣ are loaned as t and ṭ as in mātam, varuṭam, ilaṭcam (Skt. māsa, varṣa, lakṣa) and kṣ as k(k) or c(c) as in kēttiram~cēttiram, piccai from Sanskrit kṣétra, bhikṣā.
 * 10) Words can only end with /m, n, ɳ, l, ɭ, ɾ, ɻ, ʋ, j/ in literary Tamil.
 * 11) Natively the only allowed clusters are C:, PP, NP, RP, RP:, RNP (where P = plosive, N = nasal, R = liquid, : = gemination, C = any consonant) most common are C: and NP; any heterorganic cluster indicates a morpheme boundary. Others clusters occurring in loanwords are split by vowels or simplified, e.g. varṣa > varuṭam.
 * 12) In Sri Lankan Tamil, word initial voiceless plosives may be aspirated, intervocalic /k/ may be [x~ɣ~h]; word final nasals are always preserved;  tends to be an affricate or /ʃ/ or /s/ initially depending on the dialect and speaker; <ṟṟ, nṟ> may be [tt~t̪t̪, nd~ɳɖ].
 * 13) Kongu Tamil has word final /ŋ/ as word final /nkV/ becomes /ŋ/, e.g. literary Tamil vāṅka, Kongu Tamil vāṅ.

The voiceless consonants are voiced in different positions.

In modern Tamil, however, voiced plosives occur initially in loanwords. Geminate stops get simplified to singleton unvoiced stops after long vowels, suggesting the primary cue is now voicing (cf. kūṭṭam-kūṭam becoming kūṭam-kūḍam in modern speakers). Altogether, we see a shift in progress towards phonemic voicing, more advanced in some dialects than others.

Historically [j] was a possible allophone of medial -c- now the terms with [j] have solidified, compare Kannada which only had [s] as the medial allophone, Tamil ñāyiṟu, Kannada nēsaru. In some cases both remained as in ucir, uyir. There are also cases where the opposite happened due to hypercorrection, eg. Tamil kayiṟu, Madurai Tamil kacaru, kacuru, kaciru even though the word didnt originally have a -c-. There are also cases where it became t mutalai/mutaḷai/mucali, Kannada mosaḷe and disappeared after lengthening the previous vowel nilā, Kodava nelaci.

Āytam
Old Tamil had a phoneme called the āytam, which was written as ‘ஃ'. Tamil grammarians of the time classified it as a dependent phoneme (or restricted phoneme ) (). The rules of pronunciation given in the Tolkāppiyam, a text on the grammar of old Tamil, says that the āytam in old Tamil patterned with semivowels and it occurred after a short vowel and before a stop; it either lengthened the previous vowel, geminated the stop or was lost if the following segment is phonetically voiced in the environment. It is said to be the descendant of Proto Dravidian laryngeal *H. The āytam in modern Tamil is used to transcribe foreign phones like ஃப் (ஃp) for [f], ஃஜ (ஃj) for [z], ஃஸ (ஃs) for [z, ʒ] and ஃக (ஃk) for [x], similar to a nuqta.

Overview
Unlike most Indic scripts, Tamil does not have distinct letters for aspirated consonants and they are found as allophones of the normal stops. The Tamil script also lacks distinct letters for voiced and unvoiced stops as their pronunciations depend on their location in a word. For example, the voiceless stop occurs at the beginning of words while the voiced stop  cannot. In the middle of words, voiceless stops commonly occur as a geminated pair like -pp-, while voiced stops do not. Only voiced stops can appear medially and after a corresponding nasal. Thus both the voiced and voiceless stops can be represented by the same script in Tamil without ambiguity, the script denoting only the place and broad manner of articulation (stop, nasal, etc.). The Tolkāppiyam cites detailed rules as to when a letter is to be pronounced with voice and when it is to be pronounced unvoiced. The only exceptions to these rules are the letters ச and ற as they are pronounced medially as and  respectively.

Some loan words are pronounced in Tamil as they were in the source language, even if this means that consonants which should be unvoiced according to the Tolkāppiyam are voiced.

Elision
Elision is the reduction in the duration of sound of a phoneme when preceded by or followed by certain other sounds. There are well-defined rules for elision in Tamil. They are categorised into different classes based on the phoneme which undergoes elision.

1. Kuṟṟiyal ukaram refers to the vowel turning into the close back unrounded vowel  at the end of words (e.g.: ‘ஆறு’ (meaning ‘six’) will be pronounced ).

2. Kuṟṟiyal ikaram refers to the shortening of the vowel before the consonant.

Consonants

 * 1) Deletion of y initially e.g. PD. *yĀṯu, Ta. āṟu "river", it was preserved in a few words in old Tamil, there are even less in Modern Tamil.
 * 2) Deletion of c initially through c > s > h > ∅ e.g. PD. *cōṭam > Ta. ōṭam "boat", loaned into Sanskrit as hoḍa. It is an ongoing process in some Gondi dialects.
 * 3) Palatalization of k to c before non back vowels and if the following consonant isn't retroflex e.g. PD. *kewi, Ta. cevi "ear" but Ta. kēḷ "listen". Sometimes with alveolar consonants too PD. kil-, kīṯ-, kila, Ta. kilukku, kīṟu, cila. Exception: Tamil ceṭi, Toda kïḍf, Kannada giḍa, giḍu.
 * 4) Loss of the laryngeal H e.g. PD. ∗puH- Ta. pū "flower", it survived into old Tamil in a few words as a restricted phoneme called Āytam. According to Tolkāppiyam in old Tamil it patterned with semivowels and it occurred after a short vowel and before a stop; it either lengthened the previous vowel, geminated the stop or was lost if the following segment is phonetically voiced in the environment.
 * 5) The singular ṯ became a trill ṟ, doubled ṯṯ became ṟṟ [t:r] and ṉṯ became ṉṟ [n(d)r], e.g. PD. *cāṯu, Ta. āṟu "six". However in Sri Lankan Tamil dialects ṯṯ and ṉṯ are preserved in their original forms. For example, SLTa. paṟṟi "about" is pronounced paṯṯi and SLTa. eṉṟa "mine" is pronounced eṉḏa (postnasal voicing of ṯ in ṉṯ).
 * 6) Many of the ñ- became n- e.g. PD. *ñaṇṭ- Ta. naṇṭu "crab".
 * 7) m and v alternates in some words e.g. māṉam ~ vāṉam. The word "dravidian" from Sanskrit drāviḍa is said to be from the word tamiḻ with the m becoming v.
 * 8) Some k alternates with v e.g. makaṉ ~ mavaṉ.

Vowels

 * 1) Neutralization of ā̆ and ē̆ after y, it also happens to a lesser extent with ñ- and c-.
 * 2) In proto South-Dravidian, short i, u were lowered to e, o when it was followed by a short consonant and a short a, e.g. PD. *iṯaycci PSD. *eṯaycci; this change was reverted in Proto-Tamil where e, o were raised to i, u, e.g. PSD. *eṯaycci, Ta. iṟaicci; in colloquial Tamil this was again reverted where i, u gets lowered, LT. iṟaicci, ST. erecci (< eraicci).
 * 3) Some V₁wV₂, V₁kV₂ and V₁yV₂ alternate with V̄₁ e.g. *tokal > tukal ~ tо̄l, *mical/*miyal > mēl, *peyar > peyar ~ pēr.

Sample text
The following text is Article 1 of the Universal Declaration of Human Rights.

English
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Tamil
மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.

Romanisation (ISO 15919)
maṉitap piṟiviyiṉar cakalarum cutantiramākavē piṟakkiṉṟaṉar; avarkaḷ matippilum, urimaikaḷilum camamāṉavarkaḷ, avarkaḷ niyāyattaiyum maṉaccāṭciyaiyum iyaṟpaṇpākap peṟṟavarkaḷ. Avarkaḷ oruvaruṭaṉoruvar cakōtara uṇarvup pāṅkil naṭantukoḷḷal vēṇṭum.

IPA
/manit̪ap‿piriʋijinaɾ sagalaɾum sut̪ant̪iɾamaːkaʋeː pirakkinranaɾ ǀ aʋaɾkaɭ mat̪ippilum uɾimai̯kaɭilum samamaːnaʋaɾkaɭ aʋaɾkaɭ nijaːjat̪t̪ai̯jum manat͡ʃt͡ʃaːʈt͡ʃijum ijarpaɳpaːkap‿petːraʋaɾkaɭ ǁ aʋaɾkaɭ oɾuʋaɾuʈanoɾuʋaɾ sakoːt̪aɾa uɳaɾʋup‿paːnkil naʈant̪ukoɭɭal ʋeːɳʈum/