Wikipedia talk:Usage of diacritics/Archive 1

Is this a change or a clarification?
If this is a change to existing guidance, we should say so; if not, say so. I see people arguing both above. Please make up your collective minds. Septentrionalis PMAnderson 20:48, 19 June 2008 (UTC)
 * I think it's a change to guidance which actually moves towards what is currently practised.--Kotniski (talk) 07:09, 20 June 2008 (UTC)

Tokyo
Need a clarification. Does the proposal suggest "Tokyo" to be spelled "Tōkyō"? -- Taku (talk) 23:17, 19 June 2008 (UTC)
 * It would appear to, since ō falls in the table of diacritics in Latin letters; but I await correction. WP:UE would prefer Tokyo, of course, as common usage. Septentrionalis PMAnderson
 * WP:MOS-JA (see point 9) specifically states that it should be Tokyo as it has been established as an English word that way. ··· 日本穣 ? · Talk to Nihonjoe 02:13, 20 June 2008 (UTC)

Which is the whole point, it should be Tokyo in line with normal English. The proposal goes "When person or place has a name in the Latin alphabet ..." Sure it's a transliteration of "東京" but you can argue that it does have the name. This is a point that would have to be fixed. We surely would want to restrict the guideline to people and places the names of which are natively written in the Latin alphabet. J IM ptalk·cont 02:56, 20 June 2008 (UTC)
 * Aye, indeed. — Nightstallion 11:01, 20 June 2008 (UTC)

Handel
When this says it does not support Handel, would it move to George Frideric Händel? Why? Nobody uses, or would use, that hybrid form; Grove's uses George Frideric Handel, as he did; that's why we do. Septentrionalis PMAnderson 02:17, 20 June 2008 (UTC)
 * Pick your choice:
 * Georg Friedrich Händel (native name)
 * George Frederick Haendel (more modern anglicisation from German. ref: e.g., for instance also used by publishers of scores - note: "Frederick" not "Frideric" when combined with "Haendel")
 * George Frideric Handel (the name he apparently used after moving to London)
 * I'd avoid hybrid combinations like old English middle name combined with German version of the last name,... --Francis Schonken (talk) 11:33, 20 June 2008 (UTC)

Zuerich and Goering
I have edited this proposal so that it fits in with WP:UE, if there is a common English usage in reliable sources, for a place or a person we should use it. We do not want a proposal like this being used to suggest that Zuerich is correct, but equally we do not want a proposal that say we must not have Goering. Common English usage takes care of this. --Philip Baird Shearer (talk) 13:33, 19 June 2008 (UTC)


 * Your edit would change the meaning of the proposal quite radically. Nothing wrong with putting up an alternative proposal underneath or somewhere else, but so that we know what we're discussing, the main proposal ought to be basically stable (like if someone writes that they support the proposal, we don't want confusion as to what version they're looking at when they support it).--Kotniski (talk) 16:11, 19 June 2008 (UTC)


 * And I don't see how the present proposal would lead to Zuerich. To Zürich, possibly, which is what we have now anyway last time I looked.--Kotniski (talk) 16:13, 19 June 2008 (UTC)
 * That article begins, surely correctly, by saying: "Zürich (, Zürich German: Züri , Zurich , Zurigo ; in English generally Zurich ) is ...." - so we should just title it Zurich. Johnbod (talk) 11:59, 20 June 2008 (UTC)

I am confused are you saying that we should not use reliable English language sources to decide content of our articles? --Philip Baird Shearer (talk) 18:30, 19 June 2008 (UTC)


 * I'm not saying that, but I don't believe "using" them has to mean unthinking imitation of them. Reliable English sources collectively show us that certain styles of writing (treatment of diacritics in this case) are acceptable in English. Out of those styles, we should prefer to use those which are more suitable for an encyclopedia (which people come to in order to be informed). And we should also prefer to be consistent, particularly where failure to do so is likely to mislead. So we may (and indeed already do) adopt naming conventions and stylistic rules which, while consistent with good English usage, do not necessarily entail reflection of the majority of reliable sources in every single case. This is entirely distinct from questions of fact - when a source uses a particular spelling, it is not stating as a fact that "this is the only correct spelling in English"; it merely implies that "this is one acceptable way of writing it in English, the one most in accordance with our [i.e. their] adopted style and technical limitations". Our (i.e. WP's) style and limitations may well differ. --Kotniski (talk) 07:24, 20 June 2008 (UTC)


 * I would direct you to the introduction to WP:V "The threshold for inclusion in Wikipedia is verifiability, not truth—that is, whether readers are able to check that material added to Wikipedia has already been published by a reliable source, not whether we think it is true." This suggested guideline should start with a definitive statement that make it clear that WP:V, WP:NC and WP:UE are being adhered to and that this is only for exceptions. --Philip Baird Shearer (talk) 07:47, 20 June 2008 (UTC)

Reliable sources indicate that diacritics are common in some words, uncommon in others, and unheard of (except as conscious adoptions of foreign spelling) in yet others. We should do the same: use Besançon (and probably Björn Borg because reliable sources do, use facade and Handel because reliable sources reliable sources generally do, use Stanislaw Ulam, like Rome, because reliable sources almost always do. Inflicting the cedilla of Besançon on facade is reinventing English spelling to provide an non-existent consistency; Wikipedia is not an institute for spelling reform.

Roma, like fontana, can be found in English prose; but both are conscious Italianisms. Except to assert a linguistic fact, they are generally held to be bad writing; Mark Twain kidded the bejesus out of that in Roughing It, and it should be deprecated severely. The same applies to all these proposals to use diacritics where our sources do not. Septentrionalis PMAnderson 15:31, 20 June 2008 (UTC)

Roger Joseph Boscovich
Almost invariably called Boscovich (or Boscovitch), but always a citixen of Ragusa. Septentrionalis PMAnderson 15:34, 20 June 2008 (UTC)
 * That's what I mean by English version of a name. When English orthography is applied as opposed to simply dropping the diacritics, as has been done with many Serbian and Croatian scientists. Boscovich or Boscovitch is where UE is actually applicable.  Balkan Fever  22:38, 20 June 2008 (UTC)
 * Aye, in that case noone here's calling for the original name. — Nightstallion 10:06, 21 June 2008 (UTC)
 * And a good thing too; but this proposal would, unless it gets another bell and whistle to exclude him; perhaps this ad hoc notion that ch can be English orthography, and c can't. Septentrionalis PMAnderson 17:41, 21 June 2008 (UTC)
 * My understanding is that the current proposal mandates Boscovich per point #2 ("Where a name which is clearly the best-established in English differs in spelling, other than merely in terms of diacritics or ligatures, from the native name, then the English name is used."). Established anglicized version, more or less the same as Rome vs Roma - no dispute here I suppose. GregorB (talk) 19:04, 21 June 2008 (UTC)

Ulam
Note that Stanislaw Ulam differs from the Polish form only in the l in Stanislaw. I don't see how this proposal supports it; but this is the usage of his autobiography and his coworkers. Septentrionalis PMAnderson 14:59, 20 June 2008 (UTC)
 * The part about naturalization is intended to cover cases like this.--Kotniski (talk) 07:26, 21 June 2008 (UTC)
 * I would support removing the roman text from Where a name which is clearly the best-established in English differs in spelling, other than merely in terms of diacritics or ligatures, from the native name, then the English name is used. (But this leaves us where we started.) What reason is there to distinguish between the two cases? Septentrionalis PMAnderson 15:01, 20 June 2008 (UTC)
 * Because when you see diacritics and ligatures, you know what the spelling is without them. See Zürich and you recognise Zurich. But see Göring and you don't necessarily recognise Goering. Maybe it isn't phrased very well (and nor is this explanation), but I think you get the point.--Kotniski (talk) 07:26, 21 June 2008 (UTC)
 * Not if you don't know the foreign language in question. Is ö to be represented by o or oe? It varies. But let me rephrase the question: why distinguish between diacritics and ligatures on one side, and everything else on the other? This proposal sensibly would leave Castile at Castile, not Castilla, but would move Aragon to Aragón. This makes no sense, and invites the mannered and illiterate phrase "Aragón and Castile", which nobody ever uses. (One time in ten, pedanticism will use "Aragón and Castilla", or sometimes "Aragon and Castilla".) Septentrionalis PMAnderson 17:38, 21 June 2008 (UTC)
 * On the first point, we minimize confusion overall by being consistent in our treatment, which is the aim of the proposal. The second point is a good one though; we probably need another exception for historically well-established spellings (that might permit umlautless Zurich as well, which would be fine by me).--Kotniski (talk) 20:48, 21 June 2008 (UTC)

Revision
I've revised the wording of the proposal (hopefully without changing its original intention) to deal with some of the cases arising above (Tokyo and Handel for example).--Kotniski (talk) 07:54, 20 June 2008 (UTC)
 * Please make your proposed guidance consistent. --Francis Schonken (talk) 20:54, 20 June 2008 (UTC)
 * What do you see as inconsistent at the moment?--Kotniski (talk) 07:22, 21 June 2008 (UTC)
 * Well, for starters, it doesn't explain what happens when rule #1 and rule #5 lead to a different result (e.g. Händel/Handel, Schönberg/Schoenberg, Jogaila/Jagiełło,...) --Francis Schonken (talk) 15:22, 21 June 2008 (UTC)
 * OK, I see your point. Rule 5 is intended as an exception to Rule 1; I'll try to rephrase it to make that clear.--Kotniski (talk) 15:54, 21 June 2008 (UTC)
 * Rule #5 is stated as if that is what we generally do, currently we don't always:
 * Diacritics-related:
 * We do Jogaila, not Jagiełło
 * Non-diacritics related:
 * We do Maria von Trapp, not Maria Trapp
 * We do Cat Stevens, not Yusuf Islam
 * Rule #5 also doesn't explain what happens if it leads to an ambiguous result,
 * e.g. a king of Polish(-Lithuanian) descent becomes ruler of Hungary and Bohemia: what should we do: use the Czech version of the name? The Hungarian? The Polish? Or?
 * --Francis Schonken (talk) 16:15, 21 June 2008 (UTC)
 * OK, these are questions I'm sure the proposal was never intended to cover. It's basically only addressing questions of "name with diacritics" vs. "same name without diacritics". I guess it needs slight rewording to make that clear. --Kotniski (talk) 20:14, 21 June 2008 (UTC)
 * I think what you need is called scope definition. Think e.g. Naming conventions (standard letters with diacritics) – not that that one saved that proposal. Probably also you could do with a simpler one. The field is too delicate though not to have a very sturdy one. --Francis Schonken (talk) 20:45, 21 June 2008 (UTC)

Most of these problems go away if the guidance in WP:NC and WP:UE are followed and we use verifiable reliable English language sources to decide these naming issues. --Philip Baird Shearer (talk) 01:22, 22 June 2008 (UTC)

There are no "national conventions"
There are no "national conventions" for transliteration! the transliteration is used only when the usage of diacritics is disabled!

And there are no neither unique tranliterations. "Đ" is not always transliterated as "DJ" -so that argument can not be accepted

My proposal is "use personal names as it is official-if there is no English eqivalent". Official names for persons are only ones that they use themselves. Even some Croats have not Croatian names- but we do use their own versions -not Croatized So it is Ksaver Šandor Gjalski -Gjalski - not Đalski --Anto (talk) 19:58, 19 June 2008 (UTC)


 * Just to say the same: There are no "national conventions" in relation to transcription "Đ" with "Dj". If only ASCII is available, in Serbia it will be usually transcribed with "Dj", but in Croatia it will be transcribed with "D" these days. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)
 * However, there is no conventions, no standards and using "dj" as a supplement for "đ" (when the later is available) is considered as a non-orthographic form. The only reason why "dj" was used is not because of any kind of transcription or transliteration rules, but, simply, because of lack of the character at old typewriters. Books about orthography don't mention usage of "dj" as an option (actually, some of them, like Mitar Pešikan's (the main author of the Orthography of Serbian language) Our alphabet and its norms suggests usage of "dy" if it is not possible to use "đ" at a typewriter or a computer. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)
 * Also, according to the English language writing tradition, it is usually to use the full set of the basic Latin characters with diacritics; and letter "đ" is a basic Latin character with a diacritic. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)
 * So, please, remove this from the rules. --millosh (talk (meta:)) 18:01, 24 June 2008 (UTC)

Leave it to regional projects
I think this is something best left to individual regional WikiProjects and language manuals of style as they will have the best idea of how the words affected by this should be used. ··· 日本穣 ? · Talk to Nihonjoe 04:44, 21 June 2008 (UTC)


 * The problem is when members of Wikiprojects like the Tennis one omit all diacritics.  Balkan Fever  04:57, 21 June 2008 (UTC)


 * That's a patently false assumption. Whether a particular biography of a tennis player will include or not include diacritics will depend on what reliable English-language sources are doing for that player.  How many times and ways do we have to say this before it is clear?  Tennis expert (talk) 06:14, 21 June 2008 (UTC)


 * You can't use a source that doesn't use any diacritics to support your argument. How many times does that have to be repeated?  Balkan Fever  06:29, 21 June 2008 (UTC)


 * You still aren't listening. A number of older players with diacritics in their names do not have a biography on tennis websites.  If the name of the English-language Wikipedia article for an older player uses diacritics, then the article name will not be touched unless it can be demonstrated that reliable English-language sources (such as books and newspapers) do not use diacritics for that name.  Do you now understand the concept?  And by the way, no one has explained why a website that never uses diacritics is ipso facto unreliable.  Instead, people like you just keep saying it is, which makes me think that it's unreliable only because it never serves the agenda of the always-use-diacritics crowd.  Tennis expert (talk) 07:05, 21 June 2008 (UTC)


 * That's rich, coming from Mr. diacritics-are-scum. If a website does not use diacritics, it is probably due to technical restrictions or a stylistic issue. Unless you can prove it isn't, i.e. show that sometimes they use diacritics, then you cannot use it as a source for omitting the diacritics. Get it? And please tell me, what agenda do I have? I really would like to know.  Balkan Fever  07:31, 21 June 2008 (UTC)


 * (1) I've never said "diacritics-are-scum" or anything close to that. NEVER.  Got it?  In case you don't, let me rephrase it.  NEVER EVER EVER in this or any other discussion, on or off Wikipedia, have I ever said that "diacritics-are-scum" or anything remotely similar to that.  Got it now?  (2) If an English-language website does not use diacritics for WHATEVER reason, what makes the website unreliable as evidence of English-language usage concerning a particular name?  I see no logic whatsoever in your arguments.  You and others simply provide the bottom line of unreliability without saying WHY.  That's unacceptable.  (3) Your agenda appears to be "use diacritics always because it's the correct thing to do even if reliable English-language sources don't use them" and to discredit your opponents by putting words in their mouths and deliberately misrepresenting their actions when any reasonable person would know the truth of their words and actions by simply listening to and watching them.  Tennis expert (talk) 13:32, 21 June 2008 (UTC)


 * To the "expert": Yes, I get it. You resort to ranting when you don't get your way.  Balkan Fever  02:07, 22 June 2008 (UTC)


 * Don't get my way? Once again, I have no idea what you're talking about.  Civility doesn't appear to be your strong suit.  Maybe you should read or re-read WP:CIVIL before your next attempt at fictional writing concerning yours truly.  Tennis expert (talk) 07:40, 22 June 2008 (UTC)

(reply to TE, edit conflict; and try to remain civil, chaps) I think we get the concept by now, but your proposed way of proceeding is fraught with problems which would make the encyclopedia less good. They have been set out at length in previous discussions; I would summarise them as inconsistency, instability, potential misinformation and constant argument.--Kotniski (talk) 07:34, 21 June 2008 (UTC)


 * Less good because Wikipedia would then reflect standard English-language usage in the tennis world and would reject individual Wikipedia editors' conception of what's right, wrong, or respectful to nationalistic interests? What would make our procedures "instable"?  The instability comes from editors like you who make losing proposal after losing proposal to change a perfectly fine existing policy.  The inconsistency and constant argument come from people who for the most part refuse to abide by Wikipedia policy and instead come up with their own rules, unilaterally implement them, and then fight every effort to enforce that policy through edit warring, interminable and repetitive discussions, canvassing, and, worst of all, the recruitment of administrators to threaten nonadministrators that they will be reported, blocked, or banned if they persist in trying to enforce that policy.  13:32, 21 June 2008 (UTC)Tennis expert (talk)


 * You have argued on a number of occasions that nationalism, original research etc are the real opponents here and the correct names should not be used, ignoring completely the fact that it is their name, they were born with it, they have not legally changed their name, their name is written and pronounced correctly only in its original form, and they are known by a great many English speakers by their correct name. (A comparison is Russian, where their name has been reproduced for English speakers using a transliteration, so Ку́рникова has been rendered "Kournikova", with the unusual character substituted as "ou", rather than "u".) It does not take a nationalist to highlight that correctness should be our end goal, rather than compliance with some other website's style guide that we had no hand in developing - and when the sources are there to indicate what a name should be, there's no problem with using them. If the person has specifically themselves indicated a preference for a different name in the English language - which isn't terribly common, but is not rare either (Martina Navratilova is a good example) - then we go with that and source it. Easy. Orderinchaos 14:37, 21 June 2008 (UTC)


 * I've never said that "correct names should not be used." NEVER.  Thanks for being about the 10th person to misrepresent my position.  I challenge all of you to cite reliable English-language sources that use diacritics for the article names I've listed on the tennis moves project page.  Place those citations in the appropriate place, which is the discussion page of each tennis player article.  It's that simple.  Tennis expert (talk) 14:56, 21 June 2008 (UTC)


 * I'm sure we could find some if we tried, but just suppose we found such citations for some names only. You would allow the diacritics to be kept on those we happened to find, and not on the others? I think it's obvious how this inconsistency, which reflects no truth of any importance in the real world (we know well that either form is always acceptable in good English without actually having to see it), is going to annoy and mislead readers. And what I mean by instability is that if someone comes up with a few new sources, or some existing source changes its style, we'll keep getting proposals to change the name back and forth - again to no encyclopedic purpose. --Kotniski (talk) 15:51, 21 June 2008 (UTC)


 * Change is what Wikipedia is all about. Otherwise, failed proposal after failed proposal about diacritics wouldn't keep appearing.  And if you were truly concerned about annoying readers, you would cease with these repetitive proposals.  And, no, there wouldn't be any inconsistency caused by Wikipedia.  Wikipedia would simply be reflecting the inconsistencies caused by the real world, which is what a good encyclopedia does anyway, not try to fix them by encouraging editors to engage in original research or, more accurately, original opinionating.  Tennis expert (talk) 16:32, 21 June 2008 (UTC)
 * That is just your POV. Original name is not any kind " original research " ! Claiming that is total nonsence. What you want to say. that you know somebody's name better than he himself knows. --Anto (talk) 18:53, 21 June 2008 (UTC)
 * Whatever also you may think of the proposal, you can hardly claim it encourages original research or opinionating, when it's far more objective than your proposed way of doing things (and I wonder why your multiple tennis-player nominations don't count among these "repetitive proposals"?)--Kotniski (talk) 20:36, 21 June 2008 (UTC)
 * How a name is represented in written English is a fact, so policies about facts should apply. Not sure why that's so controversial, but it is.  Somedumbyankee (talk) 20:43, 21 June 2008 (UTC)
 * The original name is also a fact. With diacritics we are in the happy position of being able to give people both facts in one, since their general knowledge of English will tell them that the diacritics are often omitted. --Kotniski (talk) 20:58, 21 June 2008 (UTC)
 * I guess the problem is that WP:NC and WP:UE recommend "most common name" which in this case I'm reading as "most common representation of that name". Making Wikipedia have "more truthiness than the average reference" is a dangerous step.  Better written, sure.  More accessible, definitely.  Free of obvious errors, we'd like to think so.  Any implication that we are "more right" sets off massive alarm bells for me.  I don't really care about the diacritics, frankly, but I am an evil government peon and an enemy of WP:TRUTH in all forms.  Somedumbyankee (talk) 21:11, 21 June 2008 (UTC)


 * Let me restate, with emphasis: I think this is something best left to individual regional WikiProjects and language manuals of style as they will have the best idea of how the words affected by this should be used. Please note that my statement does not apply to the Tennis WIkiProject. ··· 日本穣 ? · Talk to Nihonjoe 22:07, 21 June 2008 (UTC)


 * I would say it's only a runner if there are very, very precise and exhaustive rules for clashes where names come under the auspices of multiple regional and language WikiProjects. Adam Mickiewicz is almost certainly at the correct title, but when you're dealing with a Polish (diacriticless name) - Lithuanian (diacritised name) born on what is now Belarusian soil (Cyrillic alphabet, five transliteration systems documented here and counting), it's fortunate that the name most used in English is crystal clear.  Borderline cases will always be messy, but we don't want to create extra problems in picking between different wikiprojects systems, especially when the remit of a given WikiProject is not really well-defined beyond one editor's drive-by templating - see Talk:Karlovy_Vary... Knepflerle (talk) 18:29, 23 June 2008 (UTC)


 * Mickiewicz and Mickevičius are two distinct names; the latter is not a diacritised version of the former. Therefore, UE is applied. If, however, it was simply between Mickevičius and Mickevicius, the point of this proposal is to use the former, not the latter.  Balkan Fever  03:32, 25 June 2008 (UTC)


 * In the case of most East Asian languages, there is only one country per language (Chinese being the big exception). The WP:MOS-JA is very clear on how to romanize Japanese words, though. ··· 日本穣 ? · Talk to Nihonjoe 03:34, 25 June 2008 (UTC)

Ngo Dinh Diem
I'm sorry GregorB does not see that "use what most other English speakers do" is a working guideline; but it is, and it is the only guidance either necessary or compatible with our fundamental principle for naming articles. This proposal, in fact, encourages Original Research; who calls Diem Ngô Đình Diệm? What source do you have that this mass of squiggles even consists of the right squiggles? Septentrionalis PMAnderson 20:11, 19 June 2008 (UTC)
 * The Britannica does not use or mention them; nor does Columbia; nor does Encarta. The Library of Congress does not either. Septentrionalis PMAnderson 20:19, 19 June 2008 (UTC)


 * Nor does National Geographic, for that matter. They are also quite blunt about it: "Although Vietnamese is written in the Latin alphabet, the number of accent marks can be distracting and may therefore be omitted." Personally, I don't mind the diacritics in Vietnamese, but I could concede NG has something of a point. That's why I said that I support the proposal in principle, not necessarily in implementation details. We might still do what NG is doing for Vietnamese; in this particular case I think English sources would agree fully. GregorB (talk) 20:31, 19 June 2008 (UTC)
 * Also I'd like to point out that "use what most other English speakers do" is not the absolute principle for naming articles, otherwise we wouldn't have titles such as Elizabeth II of the United Kingdom or the like. Newspapers don't call her that. GregorB (talk) 20:47, 19 June 2008 (UTC)
 * And that exception, which was resolved on long ago because most monarchs don't have unambiguous most common names (the common name is Henry IV, but which of the dozen claimants gets the article?), is more disputed than diacritics themselves; it only remains unaltered because there is no consensus what to change to. WP:NCNT does require that the monarch's name and the name of his country be common English usage. Septentrionalis PMAnderson 20:56, 19 June 2008 (UTC)


 * @PMAnderson :Ah, yes. Neither Playboy or Cosmopolitan do not use diacritics! You are definitely right!   --Anto (talk) 15:25, 25 June 2008 (UTC)

A joke :)
"Application of this rule would probably result in Meissen, but Göttingen; Tudjman and (?)Goering but Dvořák; Lech Wałęsa but Stanislaw Ulam and George Frideric Handel; Munich and Tokyo but Zürich."

This is a joke :) --millosh (talk (meta:)) 18:04, 24 June 2008 (UTC)

To be more precise: This is a very nice example of prescriptive madness and I'll use it in my linguistic works :) --millosh (talk (meta:)) 18:12, 24 June 2008 (UTC)


 * Erm, this is confusing English names with de-diacritised native names, e.g. "Munich" is the English name for München, not a simply a de-diacritised version. - Francis Tyers · 18:20, 24 June 2008 (UTC)
 * It would perhaps be a good idea to make a side-by-side table: original spelling / current spelling in the Wikipedia article title / WP:UE spelling / proposed new guideline spelling, with a couple of examples such as these. Rationales for individual cases could also be added. Might make everything a bit clearer (if not easier...). GregorB (talk) 19:04, 24 June 2008 (UTC)

Anything that is not a simple case of diacritics vs. diacritic dropping is covered by WP:UE, not by this proposal. The point is that UE is not about omitting or keeping diacritics. As I have said before, and Francis Tyers reaffirmed, a name with diacritics omitted and an English name are two different things.  Balkan Fever  01:24, 25 June 2008 (UTC)
 * Yes, it is; please read WP:UE; more to the point, no sufficient reason has been given why it should treat diacritics any differently than any other difference between English and a foreign languiage. The English for Meißen is Meissen, according to descriptive linguistics; that's what English-speakers call the city.   Septentrionalis PMAnderson 02:36, 25 June 2008 (UTC)
 * Actually, substitution ß->ss is usual in modern German, too. --millosh (talk (meta:)) 16:10, 25 June 2008 (UTC)
 * Only in Switzerland -- everywhere else it's wrong, wrong, wrong. So wrong, in fact, that only recently the Unicode Consortium was convinced to add a capital ß to Unicode. — Nightstallion 18:40, 25 June 2008 (UTC)

Technological solution
There is a technological solution that if implemented could please all sides on the issue. Unicode normalization form NFD could be used to to decompose characters with diacritics. Then in conjunction with the UCD, diacritics (class Mn etc.) could be stripped. Enabling or disabling this setting could be added to the user preferences. Such a solution would allow those who prefer diacritics to get them, while those who dislike them can opt out at any time. Just a note: If implemented, I would suggest that it be disabled for edit screens. 124.102.8.155 (talk) 12:02, 23 June 2008 (UTC)
 * This has three major problems.
 * It won't work for article names, at least not for all purposes, including linking.
 * We do want some diacritics, at least in stating: "the Fooian form of the name is..." or "the Barland alphabet has thirty-five letters, including the variant forms..."
 * It has the potential for unintended side effects, like the long-established but still opposed date auto-formatting convention. Septentrionalis PMAnderson 13:28, 23 June 2008 (UTC)
 * In principle, I think this solution has a lot of merits, but the technical side may indeed have some problems; I don't know whether points 1 and 3 are really valid, but as far as point 2 is concerned, we could easily implement some kind of environment (like or something like that) within which all diacritics would be shown regardless of user preferences. — Nightstallion 13:27, 24 June 2008 (UTC)
 * I think the above analyses are correct. Comparison with date formatting is valid: diacritics are perhaps more of a presentation than a content issue. However, there isn't an easy solution that would work right, and the one that would work right would necessarily involve some kind of additional tagging. Let's say that  (nota bene, this template already exists!) would render the name verbatim (as it does now), but Ngô Đình Diệm by itself would automatically be displayed as Ngo Dinh Diem. Not too alluring, perhaps, but definitely possible. GregorB (talk) 14:35, 26 June 2008 (UTC)

Further action?
There is a real question here, but this entire page has become a ridiculous assortment of WP:TRUTH and WP:IDHT. "I don't agree with that source, therefore it must be unreliable" seems to be a lot of the discussion I've been having. Neither side is going to cave, and the only time that there will be an agreement is if one side decides it's just not worth it and gives up (clearly a false consensus). I'm tired of arguing with a brick wall, and I think at this point this proposal needs professional help. A request for comment is usually the first step before seeking outside assistance, so let's try that. Somedumbyankee (talk) 21:16, 26 June 2008 (UTC)


 * Support. kwami (talk) 21:28, 26 June 2008 (UTC)


 * Is this RFCpolicy or RFCstyle? Somedumbyankee (talk) 21:56, 26 June 2008 (UTC)
 * This would revise the placement of a good many articles if enacted; that would seem to be policy. Septentrionalis PMAnderson 01:26, 27 June 2008 (UTC)

A call for federalist tolerance
Some parts of this debate seem, to me, to be open to consensus. Others do not. Areas where I think consensus is possible include: These are ways of applying the parts of WP:UE that are generally accepted.
 * 1) The use of diacritics and extensions when they are commonly used by publications that have neither a blanket "no diacritics or extensions" policy nor a blanket "always diacritics and extensions" policy.  This would recommend "Piña colada" rather than "Pina colada".
 * 2) The non-use of diacritics and extensions when they are not commonly used by publications that have neither a blanket "no diacritics or extensions" policy nor a blanket "always diacritics and extensions" policy.  This would recommend "George Frideric Handel" rather than "George Frideric Händel".
 * 3) Sources that always use diacritics and extensions or never use diacritics and extensions are not helpful--this tells us about the source's convention rather than common educated usage.  Thus a source dealing with German subjects that left out umlauts entirely, replaced them wholesale with + e, or always used the native spelling, when other sources write "Hermann Goering", "Gerhard Schröder", and "Rudolf Hess", would carry no weight as to what usage WP should adopt.
 * 4) Articles that do not use diacritics or extensions should indicate at the top of the article the native spelling, and articles that use the native spelling with characters that are unlikely to be understood, such as ß or Ə, should similarly provide an English-characters-only alternative at the top.

Problems that prevent complete consensus are:
 * 1) Sometimes, common usage, educated or not, does not exist, because the word is not commonly used in English.  This seems to be one of the two principal sticking points.  Very few English speakers have heard of the Polish town of Borek Strzeliński (conveniently, there's a German name for it, too: Großburg), so WP:UE is silent on it.
 * 2) Divided usage is the other biggest sticking point.  Z(u/ü)rich, for example.

The debate on these two items--no common usage and divided usage--has raged on and off for half of Wikipedia's lifetime now, and no consensus has emerged. People are getting angry and defensive and dismissive and sarcastic, and have spent wasted hundreds of person-hours on the matter. The debate has sort of become a prisoner's dilemma: if your opponent disengages from the debate, you can win by pursuing it; but if your opponent pursues the debate, you must also pursue it to avoid losing, and so the most advantageous move for you is always to pursue it. Of course, everyone is worse off when all parties pursue it rather than disengage. So let us disengage, or at least compartmentalize the debate:

If people active in articles on, say, Switzerland, or German-speaking places in general, agree to use umlauts and the eszett when there is no common usage as well as when there is divided usage, then they are to be left alone, and their convention shall hold, but only in their bailiwick. The Swiss or perhaps German-language consensus will carry no weight in arguments relating to, say, İlham Əliyev versus Ilham Aliyev. I believe consensus in a particular subrealm of Wikipedia is more likely to emerge if the debate is circumscribed, and not of global consequence. Also, fewer editors will be involved in each subdebate: Three to one is a consensus, but twelve to four is divided usage. People are less likely to be drawn into a debate if it only looks at, say, Azeri words, or Croatian words, and doesn't affect their preferred convention for Meissen versus Meißen. So there will be less debating, and more consensuses are likely to emerge. It's certainly no worse than the present situation.

Just as the Peace of Augsburg sacrificed uniformity for peace, so does this proposal. Cuius projectum, eius conventio.

This is perhaps a pessimistic view that uniformity will not in the near future be agreed upon, but it is a view that has been borne out by the evidence. As long as the first four points at the top of this post are generally agreed to, which by and large they seem to be, readers will not be inconvenienced or miseducated. Perhaps we can all move on and spend our time on more worthwhile and enjoyable pursuits. --Atemperman (talk) 09:07, 28 June 2008 (UTC)


 * (Phillip, I've moved your responses out from inside my proposal and placed them here, along with, in italics, text that they respond to. I hope you don't mind--It just seemed a little confusing when I came back and looked at the page.--Atemperman (talk) 18:34, 28 June 2008 (UTC))


 * Sometimes, common usage, educated or not, does not exist, because the word is not commonly used in English. This seems to be one of the two principal sticking points.  Very few English speakers have heard of the Polish town of Borek Strzeliński (conveniently, there's a German name for it, too: Großburg), so WP:UE is silent on it.


 * Not so WP:UE is clear see WP:UE --Philip Baird Shearer (talk) 16:18, 28 June 2008 (UTC)


 * Divided usage is the other biggest sticking point. Z(u/ü)rich, for example.


 * Zurich may be split, but it is not an even split. Google Books gives "3250 on Zürich -Zurich" and 11000 on "Zurich -Zürich" which a ratio of three to one in favour of Zurich, so this is not a good example as it clearly should be Zurich under common usage (WP:NC). --Philip Baird Shearer (talk) 16:18, 28 June 2008 (UTC)


 * It does not matter what the policy on diacritics of a journal book or newspaper is, what matter is if they are reliable source on the subject. At a practical level how does one ascertain the policy on diacritics in third party publications if they do not publish them?


 * WP:UE already incorporates your suggestion see the section Divided usage "When there is evenly divided usage and other guidelines do not apply, leave the article name at the latest stable version. If it is unclear whether an article's name has been stable, defer to the name used by the first major contributor after the article ceased to be a stub." --Philip Baird Shearer (talk) 16:18, 28 June 2008 (UTC)


 * In response to Philip. There's a subtle distinction between determining common English usage with regard to particular words and common English usage with regard to diacritics and extensions (D&E) generally.  Just as some editors want diacritics and extensions all the time, some want them never, and some want them some of the time depending on the individual word, some publications avoid them entirely or almost entirely, others make a point of using them whenever possible, and others have a more case-by-case approach.  The proportion of publications, weighted by their importance or salience or authority, that fall into these categories can tell us where English is as a language on how to deal with diacritics and extensions, but only the ones that make choices on a case-by-case basis can tell us about individual words.  The policy each publication has is easy enough to ascertain simply by seeing whether D&E appear all the time, not at all, or somewhere in between.


 * If we spend a lot of time (which clearly I don't think we should do) to survey publications' use or non-use of D&E, however, we are likely to find 1) inconsistency within a publication, 2) the problem of whether publications that use them exceedingly sparingly are never-users or sometimes-users, and 3) the problem of whether a source, regardless of its authority as regards content, is an authority on foreign orthography. Some editors want to distinguish between the two, others don't.  What do we do with pre-Unicode sources on the internet?  You may have a clear idea of what you think we should do in all of these cases, but given the debate we've had already, it's unlikely others will agree with you, or with each other.


 * Fair enough on the no-established-usage case. If it's in WP:UE that uncommonly used words are to have their native orthography, I guess that's simple enough.  It seems, though, that there is disagreement over whether native conventions that respell these words when the special characters are not available should be applied to WP.  Since these conventions say they only apply when the characters are not available, which is not the case in WP, it seems natural to use the preferred native orthography.  This will probably happen anyway, as people who write on Rudolf Höß are likely to have some knowledge of the German language, while people who write on Rudolf Hess may very well not.


 * It's hard to draw the line on how evenly divided the usage has to be for contemporary English usage not to deliver a verdict. Is 3:1, in the example of Z(u/ü)rich, strong enough?  I don't think it's possible to arrive at a consensus ratio, even if we all agreed on which sources count and how we go about determining the number of independent uses of one version versus another version.  From the way editors have been arguing over this, it seems people can agree only that we should use Goering since it's spelled that way in an overwhelming proportion of English publications, and that truly 50:50 splits are divided and thus we fall back into the "no-common-usage" territory.  And of course, how uncommon does a name have to be for it to be considered to have no common English usage?  I don't think it's possible to draw bright lines that can be mechanically implemented on these questions, which is why I think we're better off simply not having this debate.  --Atemperman (talk) 18:34, 28 June 2008 (UTC)
 * To answer the question of "how common does a name have to be for it to be considered to have no common English usage" I bellyfeel that this means "if no reliable English sources talk about it, then it has no common English usage." The mayor of Katowice and the mayor of Nice are notable and the English wikipedia has an articles about them, but reliable English publications probably don't talk about them all that much and I would definitely apply "no common English usage" there.  (Actually, the Mayor of Nice may have English usage because of his dashing former life in motorcycle racing, but whatever).  Somedumbyankee (talk) 06:21, 29 June 2008 (UTC)


 * I agree with Atemperman that this debate is going nowhere. I've said what I have to say about it, and I've gotten to the point where I'm going to put down the stick and walk away from the horse carcass.  Pretty much every argument listed on WP:TRUTH and How to win an argument has been used on this page already, and it's just kind of silly.  I'm just waiting for the appeal to Jimbo to drop, and then I'm going to go home.  Somedumbyankee (talk) 17:14, 28 June 2008 (UTC)


 * Atemperman's language may indeed be the consensus of Wikipedia as a whole, although I believe some editors on this discussion have disagreed with each one of them.
 * Similarly, this proposal, while it should not itself be guidance, contains useful suggestions which may form a rule of thumb for those who are uncertain on how to spell an article. I see no reason they cannot be discussed on the relevant talk pages. Septentrionalis PMAnderson 18:57, 28 June 2008 (UTC)


 * If it is to be used as a rule of thumb then it ought to be rewritten to comply with policy and WP:UE such as this attempt that I made shortly after the proposal was suggested and which was promptly reverted. --Philip Baird Shearer (talk) 07:51, 29 June 2008 (UTC)


 * I think a lot of editors will be up in arms over blanket proposals to banish ß or þ from WP. Or at least, that's how they'd see it.  Why not let the editors of Germanophone- and Icelandophone-related articles to decide for themselves?  That's what might happen anyway, even if you do get a momentary consensus on your proposal--there'll be some overzealous implementer of it trying to turn every eszett into a double ess, the changes will be reverted by longtime editors of the Germanophone-related pages, the anti-extensionist will cite this consensus, the reverting editors will rebel against the consensus, etc..  Or maybe they won't, but I don't think another attempt to achieve WP-wide consensus on this will end the acrimony.  I'd much prefer Pudeo's solution below. --Atemperman (talk) 15:23, 29 June 2008 (UTC)
 * No, a handful of German, or Icelandic and Scandinavian, nationalists will. But again, WP:UE does not require that þ be banished, merely that it be used only where English has failed to adopt th in its stead. We should use Althing and we do; we should use Thingvellir and do not; our articles on obscure Icelandic politicians should probably be mixed. Septentrionalis PMAnderson 20:19, 30 June 2008 (UTC)
 * Truly non-English characters fail to meet the "does not impair comprehension" argument for retaining diacritics. I'm familiar with β from exposure to German, but I doubt a substantial number of English speakers would realize that it's not pronounced anything like B.  þ means about as much to me as Ж or ₪ or Њ or ℳ.  None of these should be retained unless there is no plausible alternative.  That is the ά and ω of making this intelligible as an English document.  Demanding that people learn a new alphabet to read about topics in Finland is akin to demanding that the article about Arabic be written from right to left.  Somedumbyankee (talk) 21:23, 30 June 2008 (UTC)


 * I don't think there's much support for retaining clearly non-English characters in article titles in any but the most obvious cases, such as El Niño. Interestingly, NASA's "for kids" website strips that quasi-diacritic as well, though the header image and the rest of the site use it.  Somedumbyankee (talk) 19:06, 29 June 2008 (UTC)

Perhaps the best solution is what is between British/American spellings. If the article is about a German city, the only one regulating its name is the city itself. Then most likely a German/someone who has knowledge about German has created the article and is most active in editing it. Let's see what criteria from WP:ENGVAR it would fullfil: a) Retaining the existing variety b) Strong national ties to a topic c) Consistency within articles (current situation: diacritics used everywhere). Listen to those who have spent great amounts of time for their national topics in several WikiProjects. See what titles they have used. --Pudeo⺮ 10:55, 29 June 2008 (UTC)


 * Value judgments aside, this seems like what actually happens with most of these articles. It raises interesting questions about WP:OWN, but they seem to be tolerable for the "national varieties of English" process.  It obviously causes problems for names that have changed hands multiple times during the history of the English language (q.v. Gdánzkig) when that ownership is contested.  Since we cannot find a consensus here, the default is to retain the status quo, and I think recognizing an "International English" for the purpose of presence or absence of diacritics as a separate "variety" is appropriate.  If the page was created at Zürich, it stays there unless there's a clear consensus for Zurich and vice versa.


 * In short, the policy could read "do not propose moving a page solely over the presence or absence of diacritics unless there is an overwhelming consensus to do so." I don't like it, but we're getting nowhere here, this appears to be what actually happens, and it really doesn't matter that much since 99% English speakers will just ignore them either way.  Somedumbyankee (talk) 19:06, 29 June 2008 (UTC)


 * I agree with Atemperman’s synthesis and especially about the proposition about a renouncement of an absolute rule, applicable to all D&E from all languages. Leaving a space for a lack of standardization may hurt the systematic part of our minds that we all expressed in hoping to find a global solution. I feel however that the consensus was closer than many thought.
 * I find Somedumbyankee’s policy proposal a good compromise.
 * I’d have a few (partly) technical comments to add, the first of which should be kept in mind for the application of the soft consensus and for interoperability purposes:
 * The google statistical evaluation of names with diacritics is flawed when performed on google.com only. google’s national homepages lead to localized indexes taking into account the different use of letters. See also contribution of Pudeo in the section ‘Diacritics infact necessary different letters in some languages’ above. So if one performs searches on a term containing a letter with diacritic that is considered an individual letter in a specific language, the results will be different between google.com and google.[a country code where that letter with diacritic is considered as an individual character]. At the time of my research on this issue, 2 years ago, I used ‘Łodz’ vs. ‘Lodz’ on .com and .pl and the differences were then statistically significantly different – today, they are not any more, probably because of the higher number of hits. A test with a more rarely used name would be interesting. In addition, the example of ‘Ł’ is simple because it is unique to Polish. Other letters considered ‘with diacritic’ in English are more complicated because their status may differ from language to language. For instance ‘š’, a frequent letter in many East European languages, is considered a distinct letter in Czech, Estonian, Slovakian and Slovenian, but not in Latvian and Lithuanian. As long as REDIRECTs are done, no problem with either policy being chosen, but this issue should be remembered if WP wants to provide non-linguistically restricted search indexes. Then, the conformity of page titles to a standard about the use of the available character set may matter.
 * The Zurich/Zürich example is not really good, I’m afraid, because ‘Zurich’ without Umlaut could easily be considered an exonym; it is used by the city itself on its own website: watch http://www.zuerich.ch/, see the name in large characters, choose English on the top right, and compare! So, although I might have been seen as a supporter of a larger use of original diacritics, I would not object at all about reverting the present ‘Zürich’ page to ‘Zurich’. Anyway, if the discussion goes longer, I’d advise against using Zurich as an example.
 * About the ‘ß’ character, I’ve inserted a complement directly into the section in which it was commented, in reply to ‘Anticipation [...]’ and ‘Kotniski’ (section ‘We need diacritics [...]’). Clpda (talk) 13:08, 30 June 2008 (UTC)


 * Err, yeah, Zurich is a strange example, because they use a transliteration for websites as Zuerich, which isn't the native spelling or the standard English spelling. This entire discussion (most recently) started based on some proposals to move a bunch of tennis players (e.g. Tomáš Šmíd and I forget where the diacritics are in that name) from names with diacritics to undiacriticized (neologisms are cheap) versions, so they're a fair set of examples to use.  There's a clear (but not universal) preference in English publications to strip them, notably in the English websites for the major tournaments, and a move was proposed based on WP:UE.  Somedumbyankee (talk) 18:22, 30 June 2008 (UTC)

Who uses English and how ?
If one looks at the user pages of the participants to this debate, there is clearly a gap between a majority of the native English speakers, who live their language and (probably and understandably) want WP to keep it as pure as possible (despite the fact that its purity is already seriously jeopardized by dialectal differences across all continents), and a majority of the non-native English speakers, considering English as a lingua franca and expecting parts of their own culture be absorbed in that globalized language. Diacritics and special Latin characters (called 'extensions' above) are quite symbolic in this respect. Although thousands of editors develop other linguistic versions of WP (by making original articles or translating existing ones), the English WP is likely to remain the largest and most comprehensive one for many years (also thank to the fact that non-native English speakers contribute!) and I would not be surprised at all if the number of non-native English speaking users exceeds the number of native English speaking users (even by consultation volume, not just the number of people). Another posting on this page (about a 'Wikipedia in English' against an 'Anglophone Wikipedia', from --Anto) already evoked this issue. English use has already escaped its standards at several occasions, in geographical variants leading to dialects and creoles. It is now facing its next evolution through globalization (WP has no responsibility in that!) where, beside new words – as usual – it is now loaning more than just words. Diacritics and special Latin characters are parts of this process. I'm afraid that native English speakers will have to admit that their language do not any more belong to them only. This is the price to pay for world dominance (only on a linguistic point of view, of course). Inputs from all around the world (including diacritics and special Latin characters) will enrich it and should be seen as a positive thing. 85.3.21.150 (talk) 23:15, 27 June 2008 (UTC)
 * That was a thoughtful and rational essay, anon, and you may very well be right about where we are heading. But today, in the here and now, English is still extant as the language of a minority of nations.  And it is not the place (and is against the policies) of Wikipedia to lead the charge in the direction you see the world going.  Wikipedia follows.  If your predictions come true, and English becomes globalized and diacritics become de rigueur (yes, I recognize the irony in my use of a French phrase), then WP:UE will force these new usages upon en.wiki.  That's the beauty of WP:UE (for those who are not trying to use it to push an agenda).  It says that we use what is recognized as standard.  So, if in 20 years, the Washington Post and the Economist and most other English sources are applying these markings and using ßs, then so too will en.wikipedia.  And some may be surprised to find ust how accepting we curmudgeons will be than is expected.  We're not all a bunch of xenophobes and bigots, you know.  We just want the rules applied as they are written, not twisted into something the opposite of what their intent was. Unschool (talk) 23:46, 27 June 2008 (UTC)

(ec; and I agree with Unschool's favorable impression) I would support a Pidgin Wikipedia for those who feel that they want an international standard of their own invention; but what are we anglophones going to use? Every other language on the planet has a Wikipedia which is intelligible to them; English should too. Septentrionalis PMAnderson 23:50, 27 June 2008 (UTC)
 * Indeed, and if you look at article like Kimi Räikkönen and its interwiki links, every single Wikipedia uses diacritics in the title, no problem there. Why do you want English Wikipedia to differ? No bigotry here because of English language's dominant global status, no? :) --Pudeo⺮ 23:55, 27 June 2008 (UTC)
 * The very first interwiki link goes to the Arabic wiki, which puts the name in the Arabic alphabet. Or how about Latvian (Kimi Raikonens) or Basque (Kimi Raikkonen) or Sudanese (Kimi Raikkonen). But hey, evidence just gets in the way of making your point, doesn't it.--Prosfilaes (talk) 01:42, 28 June 2008 (UTC)
 * The only reason because they use that is that they don't know the correct spelling! Neither majority of Croatian people knows his exact name spelling. But, we put the original form on hr.wiki I guess some Finnish guy would correct it but very little of Finnish people speak Basque , Latvian or Sudanese so they can not argue there . Mentioning Arabic wikipedia is meaningless because they don't use Latin script!!!!--Áñtò  &#124; Ãňţõ (talk) 11:05, 28 June 2008 (UTC)
 * When your logic isn't winning, pound on the evidence; when evidence isn't winning, pound on the logic; when both aren't winning, pound on the desk. Frankly, I find the assumption of incompetence to be rude.--Prosfilaes (talk) 12:34, 1 July 2008 (UTC)


 * English not using any (with about three or four minor exceptions) native diacritics is sort of unusual, so it's really only fair to compare other languages that don't use them for their own words. It's also possible that they copied the English version, which is a bit of a flagship for the wikipedia project.  Compare Napoleon I, which leaves off a diacritic in a language that we know uses them...  Somedumbyankee (talk) 01:54, 28 June 2008 (UTC)