User:Pelagic/sandbox/Extended Arabic characters

Here I'm trying to make sense of the various derived Arabic characters. If there is a lean towards Indo-Arayan languages here, it's because they have a lot of extra consonants that aren't present in Semitic or English.

Information is mostly taken from the relevant Wikipedia articles
 * Arabic script, Arabic alphabet
 * Persian alphabet
 * Urdu alphabet
 * Sindhi language
 * Pashto

Horizontal list format
Arabic (28 letters plus hamza): ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ة و ي

Persian adds 4 (32 total): پ چ ژ گ (pe, che, zhe, gaf). But 10 of the original 28 are only used in Arabic loanwords: ث ح ذ ص ض ط ظ ع غ ق (se, he-ye jimi, zaal, saad, zaad, taa, zaa are homophonous with sin, he-ye do-cheshm, ze, sin, ze, te, ze; the sounds for eyn, gheyn, qaaf are not common).

Urdu and Shahmukhi add 3 retroflexes, one letter for aspiration (treating ھ do chasmi he separate from ہ bari he), and a long y for additional vowel sounds at the end of words (bari ye ے as opposed to ی choti ye): ٹ ڈ ڑ ھ ے. This brings the total repertoire to 37 – if hamza (ء) and/or nuun ghunna are counted as separate letters, then the total is 38 or 39.

Saraiki adds 4 implosives ٻ ڄ ݙ ڳ and a retroflex nasal ݨ. Sindhi uses the same two-dotted forms for implosive b, j, g, but the implosive d is not retroflex and is written with three dots above: ڏ. The retroflex n is undotted, ڻ ; and there are two additional nasals ڃ ڱ.

Where Urdu and Shahmukhi have ten digraphs for aspirates, Sindhi has three digraphs and eight unique forms. There are 4 letters with four dots ( ڀ ٿ ڇ ڦ for bʰ, tʰ, cʰ, pʰ), 3 with two ( ٺ ڌ ڍ for ʈʰ, dʰ, ɖʱ), a special form of k ( ڪ for k and ک for kʰ), and three digraphs ( جھ ڙھ گھ for ɟʱ, ɽʱ, ɡʱ). Sindhi is also more variable in the forms of its retroflex consonants. Where Urdu consistently uses a small to'e ( ٹ ڈ ڑ for ʈ, ɖ, ɽ), Sindhi has differing numbers of dots ( ٽ ڊ ڙ for ʈ, ɖ, ɽ). Retroflex s ( ष ʂ) does not appear to be represented in the Arabic Sindhi script?

Pashto has 8 extra consonants and 4 vowels compared to Persian, for a total of 44 letters. Like Saraiki it has 4 retroflexes (t d r n), but uses a small attached ring below ټ ړ ډ ڼ rather than the small to'e of Saraiki, Urdu, and Shahmukhi ٹ ڈ ڑ. Persian's p, ch, zh have the same form in Pashto, but g is formed with a ring instead of a line ګ. There are two additional affricates څ ځ (/t͡ʃ/ /d͡z/) and two extra fricatives ږ ښ. The added vowels are variants of ي ye: ې ی ۍ ئ.

Horizontal table format
Here I'm grouping them by type, rather than having a row for each base letter.