Wikipedia talk:Pronunciation (simple guide to markup, American)/Further discussion

Discussion from original page
I have taken the liberty of pasting in this discussion from the talk page of the original article within Wikipedia proper, which now seems destined for deletion. Most of it seems to have been incorporated into the FAQ above, but I think Nohat's list of words might make a useful ad-hoc touchstone. Pnot 03:57, 3 Nov 2004 (UTC)


 * Looks good to me, Pnot. Thanks. --NathanHawking 04:56, 2004 Nov 3 (UTC)

How do you transcribe the following words using this scheme?


 * her
 * err
 * air
 * marry
 * merry
 * Mary
 * butter
 * button
 * prism
 * sink
 * single
 * finger
 * forest
 * roses
 * Rosa's
 * for
 * poor
 * entrepreneur
 * lure

Also, how do you mark secondary stress? Nohat 09:35, 1 Nov 2004 (UTC)


 * Any SIMPLE phonetic markup system will be unable to map all the nuances of possible pronunciations of all words. Trade-offs, remember? That's mostly irrelevant, though, since pronunciations vary widely anyway, and even a dictionary with an extensive phonetic table can only approximate a selected "standard" pronunciation. Look up words on Merriam-Webster--their recorded pronunciation often fails to match their hypothetical written one.


 * A good example is their entry for record. Their phonetic indication is ri-'kord, the equivalent of Pronunciation (simple guide to markup, American)'s rihKOHRD. But play the recorded version of M-W's record pronunciation and it's much closer to reeKOHRD, which happens to be the way I pronounce the word.


 * I've made no provision for secondary stress, to keep things simple. A word like entrepreneur, however, could be adequately rendered AHNtruhpruhNUHR, close enough to M-W's "änn-tr&-p(r)&-'n&r. Once again, simplicity and ease of use are more important than scholarly accuracy. Once again, the word entrepreneur is pronounced in a dozen different ways across America. Close enough, for a List of heteronyms and the like, is good enough.


 * As for your list, I leave most of them as an exercise for you, but will do:
 * her HUHR or HR (M-W: 'h&r)
 * err EHR (M-W: 'er)
 * air EHR (M-W: 'er)
 * marry MAARee (M-W: 'mar-E)
 * merry MEHRee (M-W: 'mer-E)
 * As you know, most of these have two pronunciations, but I only selected one to illustrate. I estimate that 99.99% of "standard" American English words can be closely approximated using this table. As an exercise, I also tried it out tonight on a southern U.S. dialect and it worked well there too.--NathanHawking 10:52, 2004 Nov 1 (UTC)

Another proposal
The author of English_phonetic_spelling offered, awhile back, a proposal which is not as simple as this one, but the symbol-use selected is worthy of study. --NathanHawking 02:54, 2004 Nov 3 (UTC)


 * After a little study of English_phonetic_spelling, I notice what may illustrate one of the problems one encounters when developing a system like this, ambiguities arising from the juxtaposition of symbols. In that proposal, for example:
 * a is the sound in cot, i the sound in fit, yet ai is the sound of size. To pronounce size, his/her proposal would use /saiz/. But is /saiz/:
 * s+ai+z with the ai symbol and a long i? Or
 * s+a+i+z with individual a and i symbols (sounding the cot and fit vowels mashed together, closer to the way a North Carolinian would say size)?
 * Such problems are impossible to avoid completely; at best, one can hope to minimize them. My own proposal could benefit from similar scrutiny, to see if that minimization of ambiguity is optimal.--NathanHawking 03:20, 2004 Nov 3 (UTC)

Disagreement
For what it's worth, I strongly disagree with this approach to pronunciation. I know this form of ad-hoc pronunciation is popular in American Encyclopedias, etc, but I believe this approach is misguided. There is an internationally agreed phonetic alphabet that works in all languages (within reason) and is value-free. This system is the opposite. I truly hope this will not cause a proliferation of these "pronunciations" throughout wikipedia - especially as I have removed them whenever I've discovered them as being essentially worthless. A British version is even more error-prone and loaded than an American one, as British accents and pronunciations vary so widely. You may think the same could be said of IPA, but IPA permits an absolute pronunciation to be given to a word in ANY ACCENT. This system could conceivably do that, but only if a "standard accent" is agreed upon in the first place. There is no such thing, and so this ad-hoc system is self-referential and ultimately conveys almost no useful information at all, if you don't already know how to pronounce English (and if you do, it's redundant). I think there needs to be considerably more discussion before anything like this is deployed - I for one will be seriously turned off WP if it gains currency. Graham 05:14, 3 Nov 2004 (UTC)


 * Thanks, Graham, for your opinion. Let's not confuse two issues. Yes, there is a standard. But it is very cumbersome, absurdly so for many applications.


 * IPA ALLOWS for pronunciation of any accent, but are you seriously suggesting that a reference work list all possible pronunciations for every entry? The serious flaw in your reasoning is that given your premises and implied conclusion, no dictionary would ever list a "standard" or suggested pronunciation. Clearly that would be an absurd outcome.


 * It would be no less absurd on Wikipedia. In compiling a List of heteronyms, for example, words which are spelled the same and pronounced differently, it would be absurd merely to SAY they're different but not to suggest HOW they're pronounced. If a pronunciation must be given to avoid such absurdity, which one would be selected?
 * All pronunciations in all dialects?
 * None?


 * Those are equally absurd.


 * Consider the simple word "past". Merriam-Webster suggests a single pronunciation, 'past, where the a is pronounced like the one in ash. There are numerous ways Americans say the word, however, and they include, using my suggested markup:
 * PAAST (The equivalent of M-W's version)
 * PAHST (New England)
 * PAHyuhst (Southern U.S.)


 * Dictionaries have to make choices, and I believe that Wikipedia must as well, for some articles, or be crippled as a serious reference. ... --NathanHawking 08:54, 2004 Nov 3 (UTC)


 * Actually I'm encouraged that there is already this discussion and a structure for debate in place. I wasn't aware of it, hence my adding the comment in the other place, which I just happened to notice by accident. I appreciate you moving my comment and response here where it can serve more usefully, I hope. I agree with your points - IPA is cumbersome, and almost certainly less intuitive for the average reader, though obviously with familiarity it's just like reading anything. Now I've had a chance to think about it a bit more, I'm going to change tack a little, or maybe clarify my position anyway. I don't think the argument is about IPA versus ad-hoc; rather, it's about any scheme versus none at all. Is there actually any strong evidence that this is something that users feel is lacking? My view is that the actual words are their own pronunciation - you just need to learn them.


 * Whenever I have encountered an ad-hoc pronunciation inserted into an article, a couple of things strike me:
 * 1. it has the tone of talking down to the reader, as if it is assumed they are too stupid to know how to pronounce it. For whatever reason I don't get the same feeling when I read an IPA pronunciation, but maybe that's just me. So that's a lowest common denominator sort of argument I guess.
 * 2. - very often, the ad-hoc pronunciations lead one to an American accent pronunciation.
 * 3. Unless you already are pretty familar with the English language, the ad-hoc pronunciations are very often simply meaningless, though of course you would have to wonder why such a person would be reading the English WP ;-) (conversely, when encountering an ad-hoc pronunciation for a foreign word, the American/English "translation" is often embarrassingly wide of the true pronunciation, a drawback that IPA doesn't have).
 * 4. I agree with your point about how to deal with a variety of accents and versions of a pronunciation - in fact I was not suggesting that. Even with IPA one would have to establish a "standard" accent as a base. But I wonder how you're going to address this with this scheme also. In fact the problem is magnified by the fact that the English WP is also the American, Australian, and default WP - you're surely not suggesting encumbering articles with a huge list of alternatives in every opening paragraph? I'm also wondering how pervasive you intend this to be. In some US encyclopedias I've seen, they put a pronunciation on every article entry, even when the pronunciation is obvious. I wouldn't like to see WP "dumb down" to that degree, but where would you draw the line?
 * 5.Perhaps technology can come to the rescue here - if such pronunciations are necessary at all (I don't think they are, but that's only my opinion - your option of "none" sounds OK to me!) perhaps they could be hidden for those that don't want them - maybe a user preference? Alternatively some kind of additional "helper" page or pop-up - I'm just throwing a few ideas into the ring here. In fact that approach could be quite useful - it would allow multiple pronunciations to be listed, including a mix of accents, IPA too, without cluttering an article. For those who don't need them at all, they aren't even there. Is this even technically feasible? A good idea, bad idea? Sorry for the lengthy comment, but I do feel quite strongly about this! Graham 10:19, 3 Nov 2004 (UTC)


 * I took the liberty of placing some paragraph breaks in your post for clarity and adding to your numbers. Responding to:


 * Preface: As for need, I've offered my own experience and the examples. "...the actual words are their own pronunciation - you just need to learn them" is fine for ordinary language in everyday use. But there are times when we wish to indicate the predominant pronunciation--I gave several examples.
 * 1: I don't feel 'talked down to' when Merriam-Webster tells me the dominant pronunciation of a word. I feel informed. Sometimes I even alter the way I say a word.
 * 2: Wikipedia is pluralistic about spelling, and to a lesser degree, punctuation. There's no reason it can't be pluralistic about pronunciation as well. If the author of an article has a need to indicate pronunciation, he or she can use a British pronunciation as easily as the spelling humour or realise. Even there, which British or U.K. pronunciation will be chosen? Complaining about an "American accent" is a slippery slope, because there are many American accents, and if "equal time" is demanded, cannot disparate British or U.K. dialects voice the same complaint? Slippery slope. Common sense dictates author choice, pluralism and predominant dialects.
 * 3: I don't think pronunciations in dictionaries are "meaningless", nor do I think they are without value in some Wikipedia contexts.
 * 4: I'm advocating a Simple Guide to indicating pronunciation. I have no present position on how extensively it might be used, only that sometimes it should be used. It's a tool; people will sort out how and where it's used in an organic way.
 * 5: I'd rather not try to anticipate every way such a tool can be overused or misused. Any tool can be abused, but we still need tools. This one is no exception, in my view. Thanks. --NathanHawking 11:07, 2004 Nov 3 (UTC)


 * I also strongly disagree with usage of this. IPA is a standard, it's more flexible than this, and it doesn't assume knowledge of english like this does. Again, I appreciate that you're attempting to contribute, but feel that the encyclopedia would be better off without this, and don't expect to see it used anywhere. There are other sites where people can download soundbites of any word they want in the english language. Let people use that if they can't understand IPA. We shouldn't need to learn a different IPA-replacement for every encyclopedia in the world. --Improv 15:54, 3 Nov 2004 (UTC)


 * I find the "there are other sites where people can get ___________" line of thinking problematic. Exactly the same logic could have been used when Wikipedians wanted more tools for expressing mathematical formulae--let them go to the Wolfran site, or whatever. Faced with a use-IPA-or-nothing policy, some article writers will simply opt for nothing. That seems disabling, not enabling, disempowering, not empowering. The Simple Guide would be a tool, filling a gap, with a very short learning curve. --NathanHawking 20:49, 2004 Nov 3 (UTC)

Hiding pronunciations
Hi Nathan, thanks for reformatting my text - I was under pressure from the wife to finish it and do something "useful"! :) A couple of points - I am not saying pronunciations in dictionaries are "meaningless" at all - most dictionaries that are serious reference works use IPA, which is most definitely not meaningless. I haven't spent much time over at Wiktionary, but I think they use IPA there too.

By meaningless I was referring to ad-hoc pronunciations, since they are self-referential in my view. By this I mean that you cannot explain the way an ad-hoc pronunciation is itself pronounced without using the self-same pronunciation! You have to resort to IPA to get an "absolute value" for it. OK, you can show how certain syllables are pronounced by example, but is the example absolute? Very doubtful - you'd have to show the pronunciation of the example, and that would degenerate to another ad-hoc pronunciation and so on ad infinitum. The only way it can work is to tie it to agreed IPA sound values, (which are based solely on vocal tract configurations), so there is an absolute IPA "anchor" to the system as a whole. But if you need to do that, then why not just use IPA in the first place? Most serious dictionaries do not use these ad-hoc schemes, because they are insufficiently rigorous. I'm sorry if this casts Merriam-Webster in a poor light, but I prefer the OED any day (are you actually saying M-W uses ad-hoc, or are you using its examples as a basis? - I don't have M-W so I'm unclear about that). Perhaps some other dictionaries aimed at children or new readers might also use ad-hoc schemes - I don't think I've seen this in the UK though, it could be more of a US thing. I would tend to view the introduction of an ad-hoc scheme throughout WP as an "americanisation" of it (rightly or wrongly), and we really need more of that, right? :) However, WP is not a dictionary, as we all know, so maybe we're not comparing apples with apples.

I fully agree with your "slippery slope" argument - by the way, I realise the accent problem is just as much a problem with the US version as it is with a British one and others. However, even if you can settle on a small set of "standard" base accents for your pronunciation guides, there will always be those who will argue that they are wrong - in fact the opportunity for petty squabbles is enormous, lord knows it's bad enough! The same would apply to IPA too, which is why overall I'm in favour of not doing this at all. I would be much more accepting of a "hidden pronunciation" approach though (since I could personally ignore it!), if there is any technical basis for doing it. As mentioned, this could be in a variety of forms - it could even just be a separate link to an associated wiki page reserved solely for pronunciation, just as talk pages attach to each article ( i.e. a Pronunciation: for each page). I imagine that would be very feasible, though would require some (maybe significant) back-end changes that would have to mean a lot more people would need to get involved in this discussion. As another possibility, I saw a very neat javascript the other day that allowed you to create arbitrary pop-up text boxes for any link with further explanatory text of your choice in it - very standard nice code, supported by all the current browsers - something like that would be neat, but again there would probably be other issues that would come into it. I guess the main problem with any of these approaches is that it would need changes to the wiki engine. Making a scheme that fits the current wiki and pleases everyone will be a tough job. I would like to know your views on this. Graham 23:15, 3 Nov 2004 (UTC)


 * Please see my discussion above about why "knowledge of English is required" is only a pseudo-problem. Of course dictionaries use English words as examples of how to pronounce the sounds of unknown words. Despite this 'self-referentiality', millions of people look up words and learn how to pronounce them every day&mdash;all in terms of words they already know. Everything we learn is described in terms of what we already know.


 * Similarly, people program in high-level Basic or Pascal, etc., without knowing low-level Assembler. Insisting that what people really need from dictionaries is description of how to hold their mouth, lips, tongue, etc., is like insisting everyone know CPU opcodes or Assembler.


 * Hiding pronunciations, if they were truly to become as onerous for the Wikipedia community as you fear, should be no more technically burdensome than the Hide Table of Contents feature. But I see no need to solve a problem which doesn't exist yet, when adopting a Simple Guide to indicating pronunciation is a solution to a problem which presently exists.


 * Insisting that all conceivable problems be addressed before anything changes is a sure route to paralysis. --NathanHawking 23:46, 2004 Nov 3 (UTC)


 * I'm not insisting that people need to learn how to hold their tongue, etc - in reality most dictionaries explain IPA in terms of everyday simple words that most of us can agree on, just as you advocate for ad-hoc. The difference is that the ad-hoc system "floats around" in its own little world that cannot be traced back to absolute phonemic values. IPA is traceable.


 * If a technical solution is ultimately the way to go, it should be addressed sooner rather than later, since the work of removing all the ad-hoc pronunciations from pages to whatever the technical solution is could end up being a lot of work. It would be better to have a structure in place first, both technical and linguistic.


 * I don't think it implies paralysis, though personally I would actually prefer that as an outcome rather than see a half-baked scheme proliferate throughout WP.Graham 23:59, 3 Nov 2004 (UTC)


 * I think we have a fundamental misuderstanding. You seem to be referring to my proposal as ad hoc, when it is in fact quite the opposite. Absense of a Simple Guide encourages people to indicate phonetics in off-the-cuff ways--that's ad hoc. Constructing a phonetic markup guide based upon the preponderant pronunciations of example words as listed in dictionaries is no more ad hoc than the dictionaries themselves.


 * As to the technical issue, your concerns might have more substance if carefully spelled out. I suggest:
 * Very carefully define what you think the potential problem is.
 * Present a BRIEF summary under its own heading.
 * Include some brief and specific proposals as to solutions.
 * We'll go from there. --NathanHawking 03:06, 2004 Nov 4 (UTC)


 * When I use the term ad-hoc, I'm referring to all schemes of this type. I believe these are referred to as ad-hoc by linguists regardless of how carefully they have been formulated, because they are all unofficial schemes that suffer from the deficiencies mentioned. So I'm not using it any pejorative sense - I'm simply using the usual accepted term. However, if I've got that wrong, I'm sure some of the linguists here will put me straight.


 * With regards to the other points, I don't think I can add much more. There are others here far more qualified to discuss specific difficulties with the proposals - I'm sorry if this sounds like a cop-out, but as I said my interest in linguistics is purely casual - I don't feel qualified to push my argument further. If I'm on the right track then actual linguists will probably take up these points. If I'm wrong, then I'm wrong. My specific proposal as to a solution is, briefly, to drop it altogether. As others have said, this should not be taken as a lack of appreciation for your aims and effort - clearly we all have the general improvement of WP as a motivation, and that is very much appreciated. However, my view is that this particular proposal is misguided, even while I applaud the sincerity and motivation behind it.Graham 22:44, 4 Nov 2004 (UTC)

Assuming knowledge of English
It has been noted that a Simple Guide would "assume a knowledge of English."

True, but is that really a problem? The English-language Wikipedia is in English. A Simple Guide to indicating pronunciation would be based upon this premise as well&mdash;one must have a working knowledge of English to use the tool to acquire still more knowledge, in this case preponderant pronounciation. --NathanHawking 20:49, 2004 Nov 3 (UTC).

Do we need it?
I feel pronunciation guides better fit the wiktionary project. It's more of a dictionary thing and therefore doesn't really fit an encyclopedia. Besides, there's numbers of different American pronunciations of one word. Are you planning to transcribe every dialect? [[User:MacGyverMagic|Mgm|(talk)]] 22:13, Nov 3, 2004 (UTC)


 * Yes, we need it. Clearly there are times when Wikipedia needs to describe pronunciation. See the examples above. To relegate this solely to Wiktionary is to leave a conceptual hole in Wikipedia.


 * The "many dialects" argument has also been addressed several times above. How many reference works "transcribe every dialect"? Few in the extreme. Why would we expect or demand this from Wikipedia? Is "all or none" really sensible? --NathanHawking 23:14, 2004 Nov 3 (UTC)

An example of why it doesn't work
I hope, Nathan, you'll forgive this, but I actually only just got around to reading the rest of this page :$ - shoot first and ask questions later, mea culpa. You have already obviously anticipated a number of arguments for and against. However, I stick to my guns. I'm a Brit. My own accent is said by most to be fairly neutral, even though I'm a geordie by birth, an accent which is both extremely broad, and extremely fashionable - to my annoyance I no longer speak with this accent! Anyway, I digress. You have a table of examples above, I'll repaste it here. I'll try and show you how your examples are pronounced by me.

In general, I see AA as "ar", as in the word 'are'. So AABstraakt "sounds like" ARBstrarkt. BAYS - the final 's' sounds like a 'z' to me, so this "sounds like" bays or baize, not base. BAAS - barse, rhymes with parse BOH - bor... with a very short 'o' and silent aspirant on the end kuhnFLIHKT - coonflict KAHNflihkt - carnflict KAHNskrihpt - carnscript

I realise these are partially because of UK vs. American accent differences, but some I feel are due to flaws in the system itself. But in any case, they mislead as much if not more than they enlighten. That's my point, in a nutshell.Graham 23:51, 3 Nov 2004 (UTC)


 * "Doesn't work" is greatly overstating the matter, I think, but I'm glad to hear your perception of the symbol selection. I'm open to change and refinement. A few notes and questions, in response to each comment:
 * "BAYS - the final 's' sounds like a 'z' to me, so this "sounds like" bays or baize, not base."
 * Any phonetic system can mislead on occasion, if the user doesn't become familiar with it. Reading BAYS as bays or base is not unlike the possibility of reading Merriam-Webster's 'bAs as either base or bass (the fish). With a linkified pronunciation which takes one to the table, people soon get onto the idea that the symbols may well be different than a literal reading, and words predisposed to literal misreadings would not be all that common anyway.
 * "In general, I see AA as 'ar', as in the word 'are'. So AABstraakt "sounds like" ARBstrarkt. ... BAAS - barse, rhymes with parse."
 * aa is a relatively rare combination in English. What common words containing aa would predispose that reading for your dialect?
 * What symbol, if not aa, could be used for the vowel in the American pronunciation of Spam and preclude the reading with the r?
 * "BOH - bor... with a very short 'o' and silent aspirant on the end."
 * What common words containing oh would predispose the reading with the r for your dialect?
 * What symbol, if not oh, could be used for the vowel in the American pronunciation of float and preclude that reading?
 * "KAHNflihkt - carnflict. KAHNskrihpt - carnscript."
 * (You do love those Rs, don't you?) What common words containing ah would predispose that reading for your dialect?
 * What symbol could be used for the first vowel in the American father which would NOT create the reading with the r?
 * "kuhnFLIHKT - coonflict."
 * (What? No R? Heh.) What common words would predispose that reading of uh for your dialect?
 * What symbol, if not uh, could be used for the vowel in the fun and preclude misreading?
 * I'll be most interested in your answers. At best, I might get some clues about how one might internationalize a Simple Guide. At worst, it might become clearer than regionalization in necessary. --NathanHawking 02:44, 2004 Nov 4 (UTC)


 * Oh yes, we quite partial to our r's (though many claim we don't know them from our elbows ;). By the way, you're obviously much better at some of the finer formatting thingys than I am, so feel free to reformat this. I may not be the best person to take up these points, since my interest in linguistcs is only casual, and I'm obviously biased against a scheme of this type. But I'll have a go - maybe someone else can pick up here too.
 *  aa is a relatively rare combination in English. What common words containing aa would predispose that reading for your dialect?
 * It's uncommon in English, but very common in Dutch, one of its sister languages. Dutch words are familar enough that the Dutch 'aa' will tend to be read whenever this combination is seen, e.g.. Aardvark, Transvaal.
 * What common words containing oh would predispose the reading with the r for your dialect?
 * Actually this one was tough one to put across. The 'r' is hardly there. I read BOH more as in the British accented version of "boss", but without the s. A British reading of 'float' would produce something like FLAOWT - there's a definite dipthong in there. But I'm having difficulty here - the only way I am able to define precisely what pronunciation I'm talking about is to use IPA - /fl&#601;&#650;t/
 *  What symbol could be used for the first vowel in the American father which would NOT create the reading with the r?
 * The only SYMBOL I am able to come up with is the IPA /a&#720;/ or possibly /&#593;&#720;/, depending on your accent. I cannot think of an ad-hoc sequence that would work unambiguously - in fact I'd go as far as to say such a symbol doesn't exist. (Which is why IPA was invented).
 * "kuhnFLIHKT - coonflict."
 * (What? No R? Heh.) What common words would predispose that reading of uh for your dialect?
 * This could be another bit of interference, this time from German. "Kuhn" would be pronounced "coon" in German. Again, there are sufficient close ties between English and German and familiarity that "kuhn" would be seen this way by many (though definitely not all) British English speakers. The only unambiguous way to represent this short o (as in cot, shot, hot) is to use the IPA /&#594;/ : /k&#594;t, &#643;&#594;t, h&#594;t/. The American accent would be something like /k&#593;&#720;t, &#593;&#720;t, h&#593;&#720;t/.
 * I think I've made my point now - I'd like to see others taking part in this, on both sides.Graham 04:29, 4 Nov 2004 (UTC)
 * For whatever it's worth, my intuitions coincide largely with Graham's (I'm a UKian too, though my intuitions are also influenced by other languages I've learned.) I find it very hard to see AA as anything but "aardvark", but I've often seen it used for the sound in "paw" too (just do a google for "aaland"). I think this kind of anecdotal information is pretty useless, though: everyone's intuitions differ, any system needs to be learned, and a huge empirical study would be required to determine the most common intuitions. And it would still only be intuitive for a minority. Pnot 21:19, 4 Nov 2004 (UTC)