Talk:ISO 11940

Transliteration
Good thing there is now some info on this standard. Since ISO charges high amounts for their standards, I never had the opportunity of seeing it. It seems to be a transliteration in the narrow sense, from which the original spelling can be reconstructed. Apart from that it seems much stricter than, but quite compatible with RTGS. Perhaps it would be useful to add some comments on the comparison.

I have never seen this standard used in practice anywhere. Do you know of any instances?

Small note: perhaps you missed a diacritic on one of the "y"s.

Do you have any information on the vowels? &minus;Woodstone 14:41, 8 April 2007 (UTC)


 * I have no idea what I am doing here... As you say, the standard is not in wide use, and I had a hard time googling it. All that turns up are sites trying to sell you the standard for 56 dollars or so. I ask you! My only source for this is the xml file at unicode. It contains more information, and it conscientiously tells us when it is deviating from ISO. Is it is, this is all unverified. From what I found elsewhere on the net, yes, this is definitely intended as a standard allowing the reconstruction of the original spelling from the romanization (unlike RTGS). dab (𒁳) 23:29, 9 April 2007 (UTC)

Thanks for finding this file. Very informative. I see you started on the vowel section. I guess that the vowels are written in the same sequence as in Thai? Relative to the consonants vowels are not spoken in the sequence as written in Thai script and many vowel sounds are written with several symbols, located before, after, below and above the consonant (cluster) that precedes them in pronunciation. So deciphering any text from this transliteration will be a challenge. Finding the syllable boundaries will be a nightmare. &minus;Woodstone 09:56, 10 April 2007 (UTC)
 * no, I am sure the vowels in transliteration will be in phonological order. The xml file has some special rules for this afaics. Anything else would be insane, and without precedent in ISO 15919. It's p̣hās̄ʹāthịy, not p̣hās̄ʹāịthy. dab (𒁳) 13:29, 10 April 2007 (UTC)

I hope you guys find my additions helpful. The 'xml file' (ICU or CLDR to me) has several divergences from the standard - that's what impelled me to buy a copy of the standard. (I'm now working on a bug report against it.) It's based on someone's documentation of the system, not on the standard itself. I added the example of Chiangmai, because that looks bizarre whichever option you choose for vowel reordering.

I'd've liked to have compared the transliterations of ชุมนุม spelt with nikkhahit, but Windows XP doesn't render the sequence . The two implementations give ch̊un̊u and chůnů respectively!

You may wonder why I mentioned dandas. The reason is that the Unicode Technical Committee, or at least, senior members, have agreed that using dandas in Romanised text is perfectly reasonable. The dandas to use are the ones named as Devanagari - the Unicode Character Database categorises them as 'common', i.e. used in multiple scripts.

The table for 'other diacritic marks' looks a mess. Perhaps we need a bearer - น/n probably has fewest typographical problems.RichardW57 20:41, 16 May 2007 (UTC)


 * Good work. I've always wondered why ISO tries to keep their standards secret by charging high amounts for them. As I guessed above, the vowels are in spelling order (not phonetic), making the transliterations quite difficult to read. In the vowel section the อ is missing: does it re-use the symbol x given for the silent consonant? The other weird choice is v for ฤ, mentioned both as consonant and as vowel. Quaint is also keeping the h as class indicator. The only deviation from RTGS in consonants, apart from the diacritics, seems to be c for จ (RTGS has ch). I need more time to let it sink in. As bearer for some vowels and diacritics, we have used "&amp;ndash;"(–) in other Thai spelling articles. &minus;Woodstone 22:54, 16 May 2007 (UTC)

I didn't follow the standard's partitioning of consonants and vowels. It classifies them into the exclusive categories of the 46 consonants (i.e including ฤ and ฦ), the 9 free-standing vowels (including sara am and lakkhangyao), 3 'vowels' below the line (sara u, sara uu and phinthu), 7 vowels above the line (including maitaikhu and nikkhahit, but not yamakkan), the 5 marks that can appear above others (tone marks and thanthakhat/karan), the digits, and the 'special markers'. The latter are all puncutation except yamakkan, which I think should have been included in the vowels above the line. I thought it more useful to put ฤ and ฦ in both the consonant and the vowel lists, and less misleading to only show lakkhangyao in the combinations ฟๅ and ฦๅ. However, it wouldn't surprise me if some minority language had an orthography in which lakkhangyao is a proper vowel. There's one language which uses karan as a sort of tone mark!

Mechanically identifying the function of อ as consonant, vowel or class indicator is not easy. Although it is a vowel in the word ผงอบ /phàʔ ŋɔ̀ːp/ 'extremely weak', it's not obvious why one can't translate 'baking powder' as ผงอบ /phǒŋ ʔòp/. I've always wondered about the old spelling 'Yuthia' of 'Ayutthaya'. Was อ being misinterpreted as a class indicator? The consonant combination อย used to be much commoner, and represented a pre-glottalised palatal glide (source: Fang-Kuei Li's 'A Handbook of Comparative Tai'). Similarly, there is no reason to disbelieve that the class indicator ห was not once a full consonant, as in Old English hnutu 'nut' or hlud 'loud'. Again, the difference cannot always be detected - there are two words แหน, pronounced /nɛ̌ː/ and /hɛ̌ːn/ respectively!

(preceding unsigned edit is marked 2007-05-18T00:07:54 RichardW57)

I suspect the choice of v for ฤ is inspired by the Thai cursive (Roman) 'r'. The latter can look rather like 'seagull' (e.g. U+033C), possibly following the example of cursive kho khuat, which can look like a cross between small gamma and ram's horns (ɤ). I don't know how to verify this idea. RichardW57 18:39, 18 May 2007 (UTC)

For the transliterations of ศ and ษ, see extract for ว ศ and ษ.

How do you feeling about displaying the consonants with the extended vargas lined up? I'm tempted to copy the sibilants to their rightful places, thus:

Having one varga (วรรค) per line would make the table too tall. RichardW57 09:35, 20 May 2007 (UTC)

Tabular form
I have found the following layout of Thai consonants useful to see patterns. I think it would show the principles of the ISO system quite well, but I have no time now to work them in. They could replace (or be added to) the RTGS columns.

&minus;Woodstone 13:28, 20 May 2007 (UTC)

I think you're looking for something like this: Including a glottal order might actually be unhelpful. Note that this table doesn't include ฤ and ฦ. RichardW57 23:37, 20 May 2007 (UTC)

horn?
Hi. I'm rather confused by the horn diacritic used for ฅ ↔ k̛h, ฒ ↔ t̛h, and ษ ↔ s̛̄. This is rather strange as to my knowledge the horn U+031B is used in Vietnamese (ơ and ư, and accented), in ALA-LC romanization of Lao (ư ư̄) and Thai (ư ư̄). I would expect the letter prime ʹ U+02B9 as in the Unicode CLDR XML file or perhaps the letter apostrophe ʼ U+02BC (that's what it looks like in the UNGEGN PDF file and book – the HTML page uses the apostrophe ’ U+2019, the punctuation "equivalent" of letter apostrophe ʼ U+02BC). Here's what they look like : --Moyogo/ (talk) 13:46, 19 December 2011 (UTC)
 * horn : ฅ ↔ k̛h, ฒ ↔ t̛h, and ษ ↔ s̛̄
 * letter prime : ฅ ↔ kʹh, ฒ ↔ tʹh, and ษ ↔ s̄ʹ
 * letter apostrophe : ฅ ↔ kʼh, ฒ ↔ tʼh, and ษ ↔ s̄ʼ

U+031B is indeed what the standard lays down - see the extract at ว ศ and ษ. I don't doubt that the occasional use of ơ and ư for Thai led to the use of horn as second preference diacritic.

The ICU transliteration (available at http://demo.icu-project.org/icu-bin/translit) indeed transliterates เชียงใหม่ to cheīyngh̄ım̀ and not to cheīyngh̄m̀ı. Clusters in Thai cannot be identified with 100% reliability. RichardW57 (talk) 23:18, 29 March 2012 (UTC)
 * The sequence s&#x0304;&#x031B; (U+0073 U+0304 U+031B) from  is rather odd. Its normalized form is s̛̄ (U+0073 U+031B U+0304). Which makes me wonder why would a non normalized character sequence be in a standard. --Moyogo/ (talk) 07:12, 31 March 2012 (UTC)


 * The real answer is probably that the authors didn't fully understand normalisation. Also, the standard seems to have sat around for 5 years before being approved in 2003, implying that there were severe doubts as to its usefulness.  Besides, Clause 5.2 requires that the macron be 'typed' before the dot below or horn, and ISO 10646 does not require that equivalent sequences be treated identically.  Finally, phinthu is transliterated to U+0325 COMBINING RING BELOW, so ษฺ is transliterated and normalised to s̛̥̄ U+0073 U+031B U+0325 U+0304 - normalisation makes automated back-transliteration tricky! − RichardW57 (talk) 12:14, 1 April 2012 (UTC)

It's clearly a combining horn - which bit of character code 031B don't you understand? And if it were a raised comma, surely it would be U+0315 COMBINING COMMA ABOVE RIGHT, not U+02B9 PRIME, which is what you've recorrupted the table to contain. RichardW57 (talk) 20:58, 24 March 2021 (UTC)

And if you read an early version of the standard, at https://unstats.un.org/unsd/geoinfo/UNGEGN/docs/6th-uncsgn-docs/e_conf_85_L80.pdf, you'll see that the authors think that the diacritic they are using is a combining horn - see in particular paragraph A7 on p14. RichardW57 (talk) 21:45, 24 March 2021 (UTC)

And there is a typographical tradition of approximating Vietnamese combining horns by apostrophes. RichardW57 (talk) 21:47, 24 March 2021 (UTC)


 * In my PDF copy of the standard part 2, it does not look at all like the horns you show, but as a clearly separated upper comma.&minus;Woodstone (talk) 08:35, 26 March 2021 (UTC)


 * But Part 1 says that the character code 031B is involved in the transliteration! I'd like to use an axiom of competence for interpreting the standard, but I fear that may be inconsistent with the contents.  It seems I'm going to have to fork out for Part 2.  It may be that we will have to add a section on Part 2's misinterpretation of Part 1!  'TiT', I suppose. ISO 11940:1998 remains current (I think as of 2019), so I believe I don't need a new copy of it. RichardW57m (talk) 13:58, 26 March 2021 (UTC)