Wikipedia talk:Pronunciation (simple guide to markup, American)

Please leave all commentary and suggestions in the space below the introduction, in Further discussion.

From VfD
See Votes for deletion/Pronunciation (simple guide to markup, American).

Note: The pages were moved via a copy and paste move. I've merged, however to see the old talk page see Wikipedia talk:Pronunciation (simple guide to markup, American)/oldtalk. - Ta bu shi da yu 15:11, 6 Nov 2004 (UTC)

Nohat's take
I think there are two questions here:
 * 1) Should Wikipedia adopt an official or even semi-official system for marking the pronunciation of English words as a supplement or alternative to IPA?
 * 2) If so, should that system be the one proposed here?

I think the answer to both questions is "no".

It is true that IPA is a somewhat abstruse system and in its goal to be suitable for the phonetic representation of any language, it has understandably become nonideal for the phonetic representation of any one language. Nevertheless, it is an international standard method for representing phonetics in a way that is not dependent upon any knowledge of any language described. As Graham has astutely noted, it is defined purely in terms of vocal tract configurations. This means the IPA can be used to describe the phonetics of any known (spoken) language. Not only that, it has a long and colorful history, dating back to the 19th century, and is used around the world by linguists as their primary tool for describing the phonetics of languages. IPA is the clear choice for representing pronunciations in any scholarly work, which is something Wikipedia strives to be.

However, there is a gap between the number of people who know about pronouncing English and the number of people who know how to use the IPA. Ideally, we would like people who know about pronouncing English to be able to contribute to even if they don't already know IPA. So we do two things: encourage those people to learn enough IPA to be able to contribute using IPA, and allow them to contribute their knowledge using whatever other method they like.

The primary benefit of the wiki system is that anyone can edit. Editors shouldn't feel like they can't contribute if they don't know IPA. They can (and do) just enter pronunciation information however they want. Then, someone who knows IPA comes along and converts it to IPA. It's a great system and it works. For example, people add entries to pages containing pronunciation, either describing the pronunciation in prose or using the system in their dictionary. Sometimes they'll leave a note on a talk page that they don't know IPA and hope someone will fix it. When this happens, I or someone else will come along and change the pronunciation to IPA, sometimes adding SAMPA for those people whose browsers don't correctly display the Unicode IPA. Voilà! Someone who doesn't know IPA contributes to Wikipedia, nobody is forced to learn anything they don't want to, and we didn't have to come up with our own system for marking pronunciation. They might even learn a little IPA in the process.

If we devise a system that is as easy-to-use as possible, we might surmise that we'd have solved this problem. The system will be so easy to use that anyone can use it. The problem is that as you make a system easier to use, it becomes less useful on two fronts. First, as Graham has pointed out, a simple, intuitive system is only intuitive for some subset of users, and as you make it simpler and more intuitive, the number of people for whom it is simple and intuitive shrinks. For the rest, it's unintuitive and has to be learned. Second, as you eliminate the transcription system's ability to encode subtleties of the pronunciation, the system becomes useful in fewer and fewer instances, and some other system that can encode those subtleties has to be used.

If you compound this with the fact that any formalized system will have to learned&mdash;it may be easier than IPA, but it still requires learning&mdash;you can come to no other conclusion that such a system is of diminishing utility. Not only that, you make Wikipedia less accessible because readers will encounter two separate systems for marking pronunciations, one of which is entirely unique to Wikipedia.

So we should stick with IPA because we avoid all these problems. Further, there are already thousands of people around the world who know IPA and can to contribute using IPA without learning anything new other than perhaps the method for entering IPA. When the English Wikipedia converts to UTF-8, we won't even have to enter IPA using numeric entities anymore&mdash;we can use the Unicode values which can be entered using a tool for writing IPA, some of which are as simple as point and click. On the other hand, if we adopt some new system, nobody will already know it. Everyone will have to learn it if they want to contribute using that system. Why use a system that nobody knows when we could use a system that lots of people already know?

As for the particular system suggested here, I think it suffers from a few problems besides those already mentioned:
 * 1) Every vowel sound is written using two letters. This makes transcriptions in this system unnecessarily long.
 * 2) Vowels with R are not as simple as they seem. Simply using an ordinary vowel symbol and adding R is often unintuitive because the presence of R changes dramatically the pronunciation of the preceding vowel. # "uh" for schwa is confusing. So is conflating schwa and the vowel sound in the word cut, as they are two distinct sounds. (Merriam-Webster does this and I think the clarity of their pronunciations suffer because of it).
 * 3) I think we need a separate symbol for the vowel sound of "her" which bears no phonetic relationship to either schwa or the vowel sound of cut. The sound of "fur" is not the sound of "fuh" plus R.
 * 4) t~h for eth seems extraordinary arbitrary and there doesn't seem to be any motivation for it at all by the other symbols.
 * 5) There is no syllable separator symbol. Long strings of unstressed syllables are hard to visually parse, particularly when di- and trigraphs are used to represent single sounds.
 * 6) There is no a way to mark secondary stress. Secondary stress is critical to the correct and unambiguous pronunciation of English words.

Overall I think this system suffers from not being informed by in-depth knowledge about how English phonology works. I could suggest particular ways to improve these flaws but I won't because I don't think we need a system like this and I don't think any number of improvements (save making it congruent with IPA) would make it satisfactory, and if it were congruent with IPA we might as well as use IPA.

I think we should continue to use IPA and let people who don't know IPA either learn it, or cope with not knowing it by relying on those of us who do. The general approach on Wikipedia is to solve problems with a human solution first and create a technical solution only if that fails. I and others who know IPA will continue to be happy to help people put their pronunciations in IPA. Let's focus our efforts on making IPA as accessible as possible both to readers and to editors rather than divide our efforts into two competing systems.

Note about when to use pronunciations on Wikipedia: In general I have added information about pronunciation to two classes of articles. The first should be uncontroversial: articles that are about pronunciation in some importation way, like list of words of disputed pronunciation and list of English homographs. The second is to articles where I thought the pronunciation was non-obvious or otherwise interesting, like San Jose, Illinois and clitoris. I have been pretty tolerant of ad-hoc pronunciations on e.g. clitoris, and no one has complained about including IPA. In fact, I have written a patch for MediaWiki which can render IPA into multiple pronunciation schemes, including X-SAMPA, Kirshenbaum, and a modified version of the system used by Charles Harrington Elster in his Big Book of Beastly Mispronunciations. (Note while I disagree with Elster on many of his conclusions&mdash;FLAK-sid for flaccid, indeed!&mdash;his system is one of the better ad-hoc systems I have encountered.) This system would require an editor to enter only IPA or X-SAMPA and the pronunciations would be rendered on the page using multiple pronunciation schemes that can be selected in user preferences. It will probably be a while before the patch gets into the code running on the Wikipedia proper.

I don't think every article needs a pronunciation&mdash;that's what Wiktionary is for. However, I do think that there will continue to be a need for perhaps large quantities of pronunciations on articles in the first class, and I think those pronunciations should match the already existing articles and use IPA. Nohat 10:32, 4 Nov 2004 (UTC)

Pnot's take
My criticisms:


 * Coverage: you seem to acknowledge that your scheme is incomplete, and that this is acceptable because you estimate that "999 words out of 1000 ordinary English words" or "99.99% of "standard" American English words" can be represented. I don't know how you arrived at these (somewhat different!) estimates, but even if one of them is correct:
 * The words most in need of pronunciation detail are those which are not standard or ordinary!
 * If we use your scheme just for "ordinary American" words, then we still need to use another scheme for ones that can't be represented. So users will have to learn a different encoding anyway. Presumably your initiation of an encoding for British English is supposed to address this... but then you'll end up with conflicting schemes for all kinds of English dialects, and that's before we start on foreign words and names! Even if any one of these schemes counts as "simple", they will form a huge and unwieldy aggregate. You won't be able to trust a pronunciation until you've looked up which "simple" encoding it uses.
 * Wikipedia is an encyclopaedia, not a dictionary. It contains many foreign words and names, which would not be representable in your system. Once again, we'll end up having to use the IPA for many things anyway, forcing people to learn multiple schemes.
 * IPA solves this neatly: one sound, one symbol, one system, no matter what the language.


 * Learning and intuitiveness:
 * At present, nobody but you knows your encoding. I estimate that tens of millions of people know the IPA, or at least those parts of it relevant to English. In my view that's a very strong argument against forcing them all to learn a non-standard scheme in addition.
 * You claim that it's intuitive, but I believe this is wishful thinking: every pronunciation encoding strives to be as intuitive as possible, but everyone's intuition differs. Without some kind of vast empirical study, we can't know what most people find intuitive, so it would be foolish to base any arguments on our subjective opinions.
 * Your encoding is defined in terms of words which the user must already know, so is of little use to non-native English speakers who don't know the defining words. There's no way for me to find out the pronunciation of "router" if I don't know the pronunciation of "flout", "noun" or "sound". In contrast, there are already IPA tables for virtually every language, allowing non-native speakers to look up a sound by comparison with their native tongue.


 * Ease of input:
 * True, keyboards do not contain many of the IPA symbols. But most computers are able to display them and input them it will soon be possible to input them using some kind of a character selector. This is slower than using a keyboard, but people aren't going to be typing whole articles in a phonetic alphabet: usually it will be used for one or two words in an article. (edit: as Nohat said, English Wikipedia isn't in UTF-8 yet so the numeric codes are required for the time being.)
 * For cases where the IPA really is impossible (an author using an ancient computer, perhaps) I would prefer the use of X-SAMPA, for the following reasons:
 * Like your scheme, X-SAMPA can be written using any keyboard and displayed on any computer.
 * X-SAMPA is designed to be as close as possible to IPA, making it easier for IPA users to learn. And there are a lot of IPA users.
 * There is a perfect mapping between X-SAMPA and IPA. So if someone writes a pronunciation in X-SAMPA, it can be easily and unambiguously turned into IPA by another editor, or by a program. (I would be happy to contribute my programming expertise to Wikipedia for the implementation of such a system -- should be fairly simple.) edit: looks like Nohat's way ahead of me here!


 * Stability:
 * Your scheme appears to be under development (the pre-1.0 version numbering and the content of the original article's talk page). Reasonable enough, since it's a new scheme and doubtless people will find ways to make improvements. But if we start using it before it's stable, any changes will have to be propagated through all the articles using it. But there's no way to test it without using it. We'd need some kind of sandbox and volunteer testing corps, I think.
 * IPA and X-SAMPA are already stable.


 * Current problems with IPA / X-SAMPA:
 * Currently, the IPA and X-SAMPA pages are large and unwieldy by comparison with yours. We need a trimmed-down reference specifically for English-speakers writing or reading English pronunciations.
 * It would also be useful to have readily accessible links to IPA / X-SAMPA pages on other-language Wikipediae, to make it easy for non-native English speakers to look up English pronunciations.
 * Again, I would be happy to help with such efforts.

Pnot 20:38, 4 Nov 2004 (UTC)
 * Apologies:
 * I apologise if this seems rather harsh. Please be assured that I appreciate the effort you have put into your system, but I genuinely believe it to be a bad idea for the reasons outlined above.
 * I also apologise for posting this screed in one chunk: I wrote it last night and didn't have time to post it before going home. Overnight, the page has grown greatly, but I'm afraid I don't have the time to try to integrate this with the rest of the discussion. Some of my points appear to have been made already by Graham and Nohat, but I believe there might still be some value left in my ravings.

Jallan's take
I mostly agree with both Nohat and Pnot. The system is actually quite a normal one, with most of the forms being ones often used to indicate ad hoc pronunciations. I've seen similar systems used to indicate pronunciations of Biblical names on the web and for simple pronunciation indications in non-scholarly dialect phrase books. The only real oddities, to my eyes, is aa for [æ] and t~h for [ð]. Why? There's no reason any more to limit oneself to pure ASCII for computer communications. Even if there were, why not ae and dh, the latter often used as a rendering for [ð], for example by J.R.R. Tolkien in names like Caradhras and Caras Galadhon, in some systems of transliterations from Semitic languages, and by W. H. Auden and some others in translations of eddic poems, where Odhin appears instead of the more common Odin as an English rendering of Norse Óðinn and so forth. I get the impression you are not very linguistically aware or you would have used the normal dh 26-letter Latin alphabet kludge.

I suspect you are unaware of how many such systems are out there like this, ad hoc systems, often used only in one book each, all slightly different from one another, but all touting their supposed simplicity. But they are all somewhat different and all very limited. That's fine, if all you are doing is providing the standard English pronunciations of Biblical names.

But such systems look like the ASCII kludges that they are. Supposed intuitiveness is overshadowed by their ugliness and lack of scope. IT IS LIKE LIMITING ONESELF TO UPPERCASE ONLY. One is quickly fatigued by an overabundance of h. I find 'IPA far more readable. Of course, I know IPA. But then IPA is a system that should be learned. And there would be nothing wrong with an IPA guide page giving the most common IPA symbols covering English phonemes used in normal varieties of English with examples. IPA is harder to learn to your system, but not very much harder, and far more worth the learning. People really use it in the real world.

IPA mixed with simple ad hoc respellings does the job for simple indications of pronunciations. IPA is not overkill when used intelligently without making distinctions not relevant to one's purpose. The heteronym chart does not require or benefit from full indications of pronunciations. The differences should appear something like:

Other methods like this are often used, for example ab-strAct or ab-STRACT or ab-stráct or ab-'stract. A note explaining the method used in particular case may be useful, but such methods are intuitive enough as to really need no explanation. I know IPA quite well and often use it. In this chart the aggregate samples might reasonably be rendered as aggre-[&#609;e&#618;t] and aggre-[&#609;&#601;t]. And there would be nothing wrong with using an IPA stress mark with the other forms. But the rest of the IPA usage and American markup usage only obscures. One does not understand heteronyms any better for seeing the supposed exact or approximate pronunciation for the whole word in another spelling, especially when in many cases that spelling must contain irrelevant information about a particular accent. You seem about to duplicate the information in another table for a a British accent. One could do the same for an Oxford English accent, or in a Texas accent, or in a Yorkshire accent, or in an Australian accent, or in a Scottish accent. That would provide no new information on heteronyms. But an ad hoc system using commonly understood methods, perhaps using some IPA and perhaps not, can cover all the information for all pronunciations of English in a single chart by focusing only on the differences between the two forms and nothing else. It is as though one insisted that the difference between resume and resumé requred an entire phonetic writing of both words. No. Just marking the difference is enough.

Linguists use ad hoc methods constantly to focus on particular points of interest, as well as using full IPA, or a specially tailored IPA, or only a few IPA special letters, often for different purposes in different parts of the same book. They also often use minimized (broad) phonemic representations of English and other languages. They sometimes use a single symbol for more than one phoneme when it fits the requirements of their discussion. IPA is a scalable system and a customizable system.

American English markup is cumbersome because it is not scalable or flexible. It is also not markup. I don't see why that word is being used. Also Wikipedia policies do not permit publication or original research or advocacy of one's own original research within Wikipedia. If one cannot get people outside of Wikipedia to adopt a particular system of indicating pronunciation, then why should it be used within Wikipedia in preference to systems that have been widely adopted or sensible ad hoc explanations of pronunciation.

One should be able to point to at least one dictionary that uses the system, or one pronunciation phrase book of American English, or something of that kind. One should be able to find some indication that there are a large number of people who think this system is better than similar idiosynratic systems in use in current American dictionaries or phrase books. Otherwise, Wikipedia is better using systems that have already established their usefulness, in part just because they have established it already, and in part because Wikipedia would die quickly if it encouraged all the thousands of spelling reformers, creators of new systems of writing, or phonetic transcriptions, of pictorial writing systems and so forth to use Wikipedia as a springboard for innumerable competing creations of that kind.

Jallan 04:13, 6 Nov 2004 (UTC)

jguk's take
If this is rolled out generally it will only lead to more edit wars. Are we meant to include all American, British, Irish, South African, Canadian, Indian, Pakistani, Australian, New Zealand, etc.(and bearing in mind there are different accents within all the forms of English mentioned there) pronunciations, or show a bias in favour of one form?

Is there a need? Generally no. Sometimes words have unusual pronunciations, but the way to explain these in non-IPA terms that are understandable by a majority of readers will change from article to article.

I also ask myself whether I, as a speaker of British English, think I can understand the mark-up you propose. I'm sorry, I cannot.

I appreciate that you have put a lot of time and thought into this proposal, but I'm going to have to oppose it. jguk 23:01, 6 Nov 2004 (UTC)

In conclusion
I think the discussion on this proposal seems to have come to an end. But as a footnote:


 * there has been a bit of an effort recently to put IPA pronunciations into articles, replacing both non-standard schemes and SAMPA. I think we can (almost) now say that IPA is the de facto standard (as well as the receommended standard) for pronunciation guides within Wikipedia


 * several contributions above have mentioned the lack of a short and simple key to IPA for English spellings. This now exists at IPA chart for English. rossb 09:58, 22 Apr 2005 (UTC)