User:Hans Adler/Diacritics and special letters in English

From WP:Reference desk/Archives/Language/2011 June 12

 * A lot of the things that children learn at school are only approximations to the full truth. This is one of them. Let me prove with some quotations that foreign (especially French) words do not necessarily drop their diacritics as they are incorporated into English as loanwords.
 * "[...] some French, Latin, and other words are now also English, though the fiction that they are not is still kept up by italics and (with French words) conscientious efforts at pronunciation. Such are tête-á-tête, ennui, status quo, raison d'être, eirenicon, négligé, and perhaps hundreds more." – The King's English (1906, when it was still common to write some loanwords with italics)
 * "The use of diacritics is minimal in English. Native speakers of English, accustomed to a largely diacritic-free script, sometimes object to diacritics on aesthetic grounds, complaining that they defile otherwise plain print with untidy clutter. There is, however, a range of diacritical useage in or related to English, including two everyday marks with diacritical properties: the dot [on lowercase i] and the apostrophe. These are so much part of the writing system that they are seldom thought of as diacritical. [...] The use of non-native diacritics is generally kept to a minimum in English. It is optional in such French loans as café and élite (with acute accents), and is provided in others where the writer and publisher consider the provision necessary for accuracy or flavour: for example, German Sprachgefühl (with umlaut)." – Oxford Companion to the English Language
 * "Exposé has been used in English since the early 19th century. [...] 'Exposé'' can be written either with or without an acute accent: [...] Both forms are widely used. The accented form is the more common of the two, possibly because it clearly indicates how the word is pronounced [...]." – Webster's Dictionary of English Usage
 * "Such usage [of née] demonstrates the extent to which née has lost its literal, French meaning in English." – Webster's Dictionary of English Usage
 * "You may also have noted that the unaccented cliche is sometimes used but the accented cliché is much more common." – Webster's Dictionary of English Usage
 * "In words now accepted as English, use accents only when they make a crucial difference to pronunciation: café cliché communiqué exposé façade soupçon. But: chateau decor elite feted naive. If you use one accent (except the tilde - strictly, a diacritical sign), use all: émigré mêlée protégé résumé." – Economist Style Guide
 * "Some foreign words that enter the English language keep their accent marks (protégé, résumé), others lose them (cafe, facade)." – New York Times Style Guide
 * And of course there are also plenty of English dictionaries. As far as I know, every single English dictionary contains a number of English words for which it gives the spelling with accents as the preferred spelling. (But they tend to disagree with each other in their choice.) Hans Adler 13:25, 12 June 2011 (UTC)

From WT:Naming conventions (use English)/Archive 8

 * I agree about the last point. The discussion is complicated enough as it is. Not even Unicode differentiates between ö in German (an umlaut, i.e. a letter different from o, historically derived from oe via an e written over the o, and therefore traditionally transliterated as oe where no ö is available), ö in Finnish, Hungarian or Turkish (also a letter different from o, but not an umlaut although borrowed from German; cannot be transliterated as oe) and ö in coördination. Some comments about the core of the matter:
 * To the limited extent that the average name of a person or entity can be said to be part of the English language at all, it is part of the English language both with and without the diacritics. Some sources such as international sports associations use ASCII characters exclusively and broadcast their versions widely. Other sources, such as virtually all academic sources, use the accented versions almost exclusively. E.g., if you search for "Gödel's theorem" on Google Books, you will find more than twice as many (English) publications than when you search for "Goedel's theorem". Closer inspection shows that those using the ö spelling are generally of a higher overall quality and more on-topic. They are generally written by people who have English-sounding names, and appeared with English-language academic publishers such as A K Peters, Routledge, Springer (a huge publisher that started in Germany but has had an international scope and been focused on English for a long time), Blackwell, Wiley etc. There are even more hits with the misspelling "Godel's theorem", but almost all of these are due to OCR errors where the original actually used the umlaut.
 * In our globalised age, people don't just read and write about foreign places and people, they also visit them and get exposed to the original versions of their names in the original linguistic context. As a result, the English language is moving away even from the most established English versions of such names and is gradually replacing them with the original versions. Examples include Lyon which used to be spelled Lyons in English but is now more commonly found without the s, Beijing and Kolkata, which used to be referred to by their traditional English names Peking and Calcutta. Presumably by the same mechanism, it is moving towards the original spellings including diacritics. You can see this at work with Google Books searches for Heinrich Brüning: For books until 1950 the ratio "chancellor Brüning":"chancellor Bruening" is 280:1110. For books since 1971 it is 641:278. The ratio "chancellor Schröder":"chancellor Schroeder" is 4090:1730. (The same tendencies can be observed in German, where people increasingly say and write Nijmegen not Nimwegen, Ústí nad Labem not Aussig, Tallinn not Reval, Győr not Raab, 's-Hertogenbosch not Herzogenbusch etc. No doubt other languages are going through the same evolution.)
 * WP:COMMONNAME does not speak about spellings. It speaks about fundamentally different names such as Nazi Party vs. Nationalsozialistische Deutsche Arbeiterpartei. Spelling with or without diacritics is not the kind of question that should routinely be answered by inspecting usage in English sources for each individual topic. For relatively obscure topics we might as well read tea leaves. This is the kind of thing that is addressed by style guides, and our relevant style guide for article titles is WP:DIACRITICS a section of WP:ENGLISH. It says that by default (no consensus of sources), spelling with or without diacritics is both acceptable. But our policies and guidelines are supposed to be descriptive not prescriptive, and actual usage in Wikipedia is that where English sources use the original name with or without diacritics, we almost always use it with diacritics. This is firmly within the normal range of style guides for English publications, and we are following the trend.
 * It makes sense to standardise use or non-use of diacritics, because otherwise we are facing categories in which spellings with diacritics are mixed randomly with spellings without them. We will never be able to get rid of them completely, as there are examples in which diacritics are the most natural way of disambiguation, and since academic works and serious encyclopedias such as Britannica use them in titles (they didn't used to use them – another example that the language is in flux). So the most natural solution is to always use them, as we are already doing.
 * Names in Cyrillic or Chinese letters or any other non-Latin writing system are of course a different matter. Most readers of English texts cannot parse them at all, cannot even form conjectures as to the corresponding pronunciations, and would need a lot of effort to compare two such words just to see if it's the same word. That's why we are not using them but use transliterations or transcriptions into the Latin alphabet instead (where there is no English alternative). And guess what, some of the commonly used transliteration/transcription systems use diacritics and other modified Latin letters. This way English even acquired some words with diacritics such as the one for the (Tibetan) Bön religion. There are some fine distinctions to be made here. E.g. Pinyin, the standard system for transcription of Chinese, uses diacritics. But we don't, following a common practice in China and among academics publishing in English.
 * While it would be nice if this little exposé on háčeks and similar phenomena that I am throwing into the mêlée would give the coup de grâce to this attempted coup d'état, I am aware that, not being an Übermensch, nor a Führer who cannot be ignored, I may be evoking TLDR reactions in some. In that case, take it as a smörgåsbord from which you can pick a few canapés while drinking a Gewürztraminer. (I suggest that you help yourself to an apéritif first, and then start with the crudités. And do try the crêpes and the crème brûlée.) Sorry if there are too many cases of déjà vu on the menu. I hope you won't mistake my arguments for papier-mâché tigers.
 * So much for my 5 øre. Hans Adler né Scheuermann. 06:23, 8 June 2011 (UTC)
 * All those words are commonly spelled without the diacritics. We don't all write for the OED. &#x0298; alaney2k  &#x0298; ( talk ) 14:46, 9 June 2011 (UTC)
 * As Google Books searches with these words show, usage is actually mixed. It is of course wrong to write exposé without the accent because that can only lead to confusion. (A Google books search for "write an expose" has a lot of hits, but they are overwhelmingly OCR errors with exposé in the original. Incidentally, preventing confusion between words that are otherwise spelled equally is one of the reasons for accents in French.) I wouldn't know how to verify this, but I am pretty sure that the English word né cannot be spelled ne. In contrast to the (also somewhat odd) spelling nee for née, nobody would know what is meant. And I refuse to use nee with reference to myself because I don't want anyone to believe I was born female. At the other extreme there is smörgåsbord, which is almost always spelled without the accents. But in any case the spellings with accents are correct variant spellings of the respective English words in the same way that colour and color are both correct variant spellings. Hans Adler 15:53, 9 June 2011 (UTC)


 * It's not the majority of English sources that matters for us. Quality matters more than quantity. The lower segment of newspapers such as the Daily Mail seems to drop accents and replace any 'un-English' or 'un-American' letters consistently. Quality newspapers such as the New York Times or the Guardian do it sometimes but not consistently. As I have shown, the Chicago Manual of Style does not recommend doing it. (I don't have access to the book itself; maybe someone can look up whether it says something helpful about the matter.) And serious encyclopedias such as Britannica consistently use proper names from Latin-based languages in their original form and use romanizations involving diacritics, where appropriate, as in Brāhmī. (With some exceptions. Britannica replaces ß by ss, for example, as is done routinely even by German speakers in Switzerland, and it replaces þ by th. But it dinstinguishes correctly between the first names of Thorbjørn Egner and Thorbjörn Fälldin, for example.) Given that English dictionaries list words such as exposé with an accent, I simply won't buy that Britannica is in error when it uses diacritics in titles. Hans Adler 18:20, 11 June 2011 (UTC)
 * Hans, can you point me to where you've shown that CMoS supports diacritics? --Piotr Konieczny aka Prokonsul Piotrus&#124; talk 18:58, 11 June 2011 (UTC)
 * It was in two links to "Chicago Style Q&amp;A" in my longest post. To quote from them: "In any case, it is not true that English is without accents. I would guess that accents were often dropped in published material many years ago because of the extra difficulty of typesetting them—especially in the case of a word like façade (Webster’s prefers facade but allows façade; American Heritage prefers façade but allows facade). On that basis, I would guess that in the future, accents will become more rather than less common in English." "Assuming that the readers are to be primarily English-speaking, I’ll follow Webster’s 11th Collegiate Dictionary, which lists Iguaçú first (though Iguazú is listed also, as an equal variant; Chicago usually picks the first-listed term and sticks with it)."   Hans Adler 19:11, 11 June 2011 (UTC)
 * Meanwhile I actually found a way to access the Chicago Manual of Style from home.
 * Some example sentences speak for themselves: "He is a member of the Société d'entraide des membres de l'ordre national de la Légion d'honneur."
 * But it gets more explicit elsewhere: "Any foreign words, phrases or titles that occur in an English-language work should be checked for special characters -- that is, letters with accents [...], diphthongs, ligatures, and other alphabetical forms that do not normally occur in English. Most accented letters used in European languages [...] can easily be reproduced in print from an author's software and need no coding. [...] If type is to be set from an author's hard copy, marginal clarifications may be needed for handwritten accents or special characters (e.g., 'oh with grave accent' or 'Polish crossed el'). If a file is being prepared for an automated typesetting system or for presentation in electronic form (or both), special characters must exist or be 'enabled' in the typesetting and conversion programs, and output must be carefully checked to ensure that the characters appear correctly."
 * The following on typesetting French is particularly interesting: "Although French publishers often omit accents on capital letters [...] they should appear where needed in English works, especially in works whose readers may not be familiar with French typographic usage." (My italics.)
 * And on romanization: "Nearly all systems of transliteration require diacritics [...]. Except in linguistic studies or other highly specialized works, a system using as few diacritics as are needed to aid pronunciation is easier to readers, publisher, and author. [e.g. Shiva not Śiva, Vishnu not Viṣṇu] Transliterated forms without diacritics that are listed in any of the Merriam-Webster dictionaries are acceptable in most contexts."
 * Unfortunately I am afraid we will still continue to read that using diacritics in English text is just plain wrong... Hans Adler 22:26, 11 June 2011 (UTC)

From WT:Naming conventions (use English)/Archive 8
WP:COMMONNAME is about titles, not about the spelling of titles. Otherwise almost all article titles that are currently at a British English spelling could be moved to the American English spelling. Try enforcing that, and you will see how wrong you are. The simple fact of the matter is that some sources are not reliable for the spelling of foreign names because they routinely drop all accents or otherwise butcher names. While an acceptable practice in some contexts, it is not acceptable in an encyclopedia of international stature. These sources must therefore be discarded, as they do not give information about this fine point of spelling. Hans Adler 08:14, 12 June 2011 (UTC)

There was not supposed to be a vote at this stage in the first place. The usage of the New York Times is inconsistent. The same person is sometimes spelled with diacritics and sometimes without. (The same holds for the Guardian.) The New York Times Manual of Style has an Amazon preview that goes as far as "accent marks" (page 6), where it says they "are used for French, Italian, Spanish, Portuguese and German words and names. [...] Do not use accents in words or names from other languages (Slavic and Scandinavian ones, for example), which are less familiar to most American writers, editors and readers; such marks would be prone to error, and type fonts often lack characters necessary for consistency. Some foreign words that enter the English language keep their accent marks (protégé, résumé), others lose them (cafe, facade). The dictionary governs spellings, except for those shown in this manual. In the name of a United States resident, use or omit accents as the bearer does; when in doubt, omit them. (Exception: Use accents in Spanish names of Puerto Rico residents.) [...] Some news wires replace the umlaut with an e after the affected vowel. Normally undo that spelling, but check before altering a personal name; some individual Germans use the e form."

So they sometimes drop accents (1) because of technical restrictions, (2) when they cannot guarantee to get them right (dropping them completely is more acceptable than getting them wrong), or (3) when the bearer drops them. We could think of adopting (3), but otherwise this is the same as our current practice, except we don't have the technical restrictions and we are usually able to distinguish between Julia Görges, who is consistently spelled Goerges by sports associations because they always butcher umlauts, and Angelika Roesch, who is occasionally spelled Rösch when a newspaper writer or editor tries to undo that butchering in a case where it didn't occur. Hans Adler 08:14, 12 June 2011 (UTC)

I do not endorse the NYT, Economist and Guardian MOS in all aspects. After all, they are the manuals of style of newspapers that are written almost exclusively by native English speakers. Newspapers are under severe time constraints that can make it impractical to check details of spelling in languages that are generally not well known among this demographic. (Actually, it's easier nowadays, but presumably all those MOSes date from the pre-internet era.) These languages are 'privileged' not because they are major world languages but because of practical concerns. When confronted with your name filtered through an ASCII-only medium (e.g. exchange of emails with someone who does not know how to enter accents on a US keyboard, or does not bother, or typical sports result tables) they would have a choice between guessing that "Vejvancicky" should really be spelled, e.g., "Vejvancićky" or playing it safe and omitting all accents.

We should follow the spirit behind these rules, update them to the internet age and adapt them to the international demographic of our editors, our lack of time constraints, and the higher precision requirements of an encyclopedia when compared to a newspaper. The inevitable result is something very much like the practice of Britannica and other English encyclopedias, or indeed our de facto practice. Hans Adler 08:11, 13 June 2011 (UTC)

You are not in a good position for this kind of attack after making ludicrous, obviously counter-factual claims.
 * Wikipedia linguist Ohms law: "The only use of diacritics in English are through loan words, and they are always dropped (very) quickly."
 * Concise Oxford Companion to the English Language: "There is a continuum in borrowing, from words that remain relatively alien and unassimilated in pronunciation and spelling (as with blasé and soirée from French), through those that become more or less acclimatized (as with elite rather than élite, while retaining a Frenchlike pronunciation, and garage with its various pronunciations) to [...]." (Other examples of pretty old loanwords that are still very commonly spelled with their French accents include bon appétit, café, canapé, château, cliché, communiqué, coup d'état, coup de grâce, crème brûlée, déclassé, décolleté, décor, déjà vu ... If you don't believe that these are all legitimate English spellings, consult a dictionary. They may be rarer among undereducated Americans than among the Brits, but that doesn't make them wrong.)
 * Wikipedia linguist Ohms law: "The 'numerous style guides' that a couple of people above have cited are simply wrong".
 * A large number of English style guides including the Chicago Manual of Style: Detailed information about how and when to use diacritics in English. All agree that foreigners with a Latin-based name are spelled in English precisely as they spell themselves. (Unless they have Napoleon-like fame and hence an English name.)
 * All major encyclopedias: Use diacritics for foreign proper names in all European Latin-based proper names.
 * No style guides that anyone has found yet: Advise dropping diacritics in French or German. Hans Adler 16:09, 2 July 2011 (UTC)

From WT:Naming conventions (use English)/Archive 8
The Brits are probably more open to French and German accents, the Americans to Spanish accents, and, as you say, the South African English speakers to Afrikaans accents. This is all very normal. But there are other phenomena that are orthogonal to this. A linguistic research paper uses diacritics more precisely than a literary criticism paper or an encyclopedia, which again uses them more precisely and more often than a high-quality newspaper, which in turn uses them more precisely and more often than a tabloid or a sports association. On this scale we should use the variant of English that other encyclopedias use, which means extensive use of diacritics, with a small number of (relatively rare) simplifications such as rewriting Middle English words: þorn -> thorn, yoȝ -> yogh. And I doubt that encyclopedic British English and encyclopedic American English differ in how they treat diacritics. Hans Adler 22:38, 14 June 2011 (UTC)

By the same argument all articles must be titled in American English. WP:COMMONNAME is WP:COMMONNAME, not WP:COMMONSPELLING. Trying to make it say something about diacritics is an exercise in tea leaf reading. There is a tiny number of primarily non-English topics that do have independent English names, e.g. Lyons for Lyon or Munich for München. That's where inspecting usage in English-language sources makes sense. But if were to rely on the same method to decide between the spellings Düsseldorf, Dusseldorf and Duesseldorf, we might as well throw dice. (The problem is that every single source either uses these forms randomly, or follows its own manual of style. Therefore which spelling is more common depends on which sources use the word, rather than saying anything about the most standard spelling.) This is not what the other encyclopedias do. They all use diacritics in such cases. Hans Adler 00:00, 15 June 2011 (UTC)

It is not the purpose of an encyclopedia to inform readers about the majority manual of style of the sources covering a topic. An encyclopedia has its own manual of style and follows it. I have given several examples of manuals of style above. They all agree that for a foreign name with accents they prescribe either writing it with or without accents based on criteria that have nothing to do with inspecting other sources. When people move from one culture to another they sometimes change the spellings of their names. That's a relatively rare case that needs special treatment. The vast majority of the cases we are discussing here is politicians, sportspeople etc. who still reside at their country of birth and have not changed their names in any way. They, and the huge number of places with diacritics, should not get random spellings just to make it marginally easier to spell a person who immigrated into the US without the diacritics. Hans Adler 00:50, 15 June 2011 (UTC)

I don't get your point. First, you seem to be arguing only about that very rare case of people with diacritics moving to an English-speaking country. Second, a woman who marries and adopts her husband's surname also remains the same person. That doesn't mean we get to randomly use either of her two names, depending on accidents such as whether most of the sources predate the marriage or not. Instead, we try to find out how she wants to be known in public after the marriage. For people in non-English countries who have diacritics the presumption by all manuals of style that I have seen is that they want to be known under their name with diacritics and therefore, absent technical obstacles against doing so, they are spelled with them. Hans Adler 01:17, 15 June 2011 (UTC)

I'm with Hans and Democratkid. I would preface my comment saying that I refer to proper names of people and places which originate in languages with a Latin script having diacritics. It excludes loan words and names from non-Latin script-based languages. I'll try and be as concise as I can, but the complexity of the subject the risk of being too long-winded, here goes anyway... English is the über-colonial language: Variants abound; Czech characters, Polish characters, Russian Greek; even Arabic Japanese and Chinese characters are capable of being rendered into forms recognisable by people who know only the 26 characters they learned in school. The English language is capable of almost infinite assimilation; hundreds of new loanwords are added to the official English vocabulary every year. To some, "proper Anglicisation" implies the dropping of diacritics; resistance to that is futile. But in the globalised 21st century world, with the trend for information to flow outside of borders, the English alphabet is showing its limitations. The English alphabet, like all other alphabets, is only capable of capturing the pronunciations that are characteristic of that given language. What is more, English is known for its grammatical and pronunciation idiosyncrasies; It is woefully inadequate when trying to capture pronunciations of even many other languages with Romanised characters and standardised pronunciations, such as French and Czech, both of which I speak. As an encyclopaedia, I feel we should strive for a quality higher than the TV newscasters or the journals that still use typesetting (I jest) – both of these often get it terribly wrong, thereby doing a disservice to their target audience. WP is technologically capable of displaying a very wide range of diacritics; we also have armies of editors from various linguistic backgrounds happy to ensure all this is carried out properly. Both these are advantages that can and do give great service to our readers. I am all in favour of keeping diacritics. The fact is that the letters 'ç' and 'é' are already loan-letters in our alphabet (viz their fairly pervasive use: café, façade, rôle). Use of other letters, such as the 'á' (long a), 'ř' ('r' with a haček), for which there are no equivalents, gives clues to a different pronunciation. The reader may not know exactly how such words are pronounced, but they may be at least made aware that it isn't to be pronounced as they might expect an English word to be; those curious will initiate their own enquiries. Expanding their use is to be encouraged and not fought. People may be a little bit puzzled the instant they reach the Václav Havel article, which they accessed by typing 'Vaclav Havel' (without the "long 'a'"); Thankfully for a famous namesake, 'Dvorak' is now universally pronounced using a zh-sound even when the haček is absent. However, for poor Jiří Novák, English people seeing the bare 'Jiri Novak' would undoubtedly call him "Jerry Novak" instead of pronouncing his name as it should be – "Yirzhi Novaak". I would apply the same logic to the correct use of punctuation (the endash, mdash, comma, minus sign) that materiel limitations are not, and should not be, an issue. We don't need to take many steps to ensure the reader has the 'best' information. On the other hand, removing diacritics from names that natively have them amounts to misrepresentation and loss of crucial linguistic information. -- Ohconfucius ¡digame! 04:53, 21 June 2011 (UTC)

From
The current discussion is primarily about how to handle names from foreign languages that are written in an alphabet based on the Latin alphabet. For names in other scripts we transliterate if there is no established English name. Many transliteration systems make heavy use of diacritics. We tend to prefer those which don't, but for some languages they are not available, or not in common use, and so we have to use transliterations with diacritics. In some cases, such as pinyin, it is defensible to simply drop these diacritics, and then we tend to do this. In others it is not, and then we don't. Hans Adler 22:04, 22 June 2011 (UTC)

From WP:Naming conventions (use English)/Diacritics RfC
Each language uses only a small percentage of all the letter-diacritic combinations that exist globally. Yet the universal practice in encyclopedias and other reference works, not just the English ones such as Wikipedia, Britannica and Webster, but also German and French reference works and probably those in most other languages as well, is to use foreign diacritics in the large majority of cases. There seems to be a peculiar fear of some native speakers of English that their language might suffer from foreign infiltration. I cannot otherwise explain this bizarre resistance to precision in the place where it is most appropriate: In articles about foreign people and entities who don't have English names. Hans Adler 20:16, 4 July 2011 (UTC)

I am not thrilled by the precise wording of this proposal, but something like this became necessary because of disputes about articles that are not covered in scholarly sources, have no chance of ever being covered in scholarly sources, and in fact are only barely mentioned in reliable English sources. This is mostly about semi-notable people such as tennis players and hockey players who are barely known outside their (diacritcs-using) home countries. The only English sources that are writing about them are usually sources that routinely strip off all diacritics for simplicity. (Example: Björn Borg, who appears as "Bjorn Borg" in the American sports-centric press, is notable enough so that we can check how better sources deal with him. He is spelled correctly in Britannica etc., and mostly correctly in the quality press.)Hans Adler 09:00, 5 July 2011 (UTC)

Zurich seems to be a special case in French. It is certainly a special case in English: It is one of very few genuine English names for a foreign entity that just happen to look like the original spelling minus diacritics. Similarly, "Napoleon" isn't Napoléon minus the acute, but rather it's the English name. But "Gyor" is not a an English name but a convenience spelling for Győr (here we have the interesting case that there is also an intermediate spelling "Györ" around, which of course is also not a separate English name), and "Bjorn Borg" is not an English name of Björn Borg.

The question is: When we know that English reference works would write a name with diacritics if the found it worthwhile to write about a person at all, should we follow the practice of the sports press and remove the diacritics, or should we just do what is clearly the right thing in our context? The latter happens to be what we have been doing for years, but as it's not codified anywhere, the practice has come under severe attack over recent weeks. We are talking about potentially removing diacritics from more than 5% of our article titles. Hans Adler 09:00, 5 July 2011 (UTC)

From Naming conventions (use English)/Archive 9
"Göthe" is a historical spelling that nobody would ever think of using in modern German except to make a point. In an eccentric way. Spelling Goethe as "Göthe" is like spelling Shakespeare as "Shakspere" in English. The guideline is clearly trying to make the point that English may pick out one of several correct foreign spellings as the only correct English spelling, but Goethe is simply not a reasonable example for that because "Göthe" has not been an acceptable spelling for the historical person for more than a century. I don't doubt that there are reasonable examples, but until one has been located, no example is better than a misleading one. To get out of the edit war, I have at least clarified now that "Göthe" is not an acceptable spelling in modern German. Hans Adler 06:27, 16 June 2011 (UTC)

Mathsci has since reverted my clarification, and I have replaced it by an other one that is hopefully more acceptable, including a footnote to the online Duden article which proves that there is no alternative spelling for the historical Goethe in German. I have since found out that there was indeed a short period of uncertainty after the German spelling reform of 1901, when people were not sure whether surnames were also subject to the reform. They were not. While "Göthe" was probably a more common spelling of Goethe's surname during his liefetime and still appears in modern telephone books (although now rare, probably because people moved to the poet's spelling), he himself preferred the oe spelling, and it gradually became standard.

Saying that there are standard English spellings of German words where historically other spellings in German were possible is pretty pointless since the same applies to genuinely English names such as Shakespeare. I do not doubt that there are examples where English has picked one of several correct spellings in another language as the only correct English spelling, but this is not one, and presenting it as if it was only dilutes the real point. Hans Adler 09:10, 16 June 2011 (UTC)

From Wikipedia talk:Naming conventions (use English)/Archive 9
Our policies and guidelines are notoriously imprecise and only get fixed to cover corner cases correctly as we become aware of the problems. Interpreting guidelines that were written in response to questions such as "'Her Majesty Queen Elizabeth II of the United Kingdom' or 'Queen Liz'?" as if they were the last word on orthography is simply not reasonable. It's also worth noting that under the extremist interpretation of WP:COMMONNAME it contradicts WP:ENGVAR. Our articles equaliser (mathematics) and coequalizer coexist happily and have done for years (to my chagrin, as I value inter-article consistency), even though the spelling 'equalizer' is of course much more common. Hans Adler 17:08, 21 June 2011 (UTC)

I make statistical conclusions on a sample size of fifty. And if you don't like the result you are free to try the same experiment and report what you come up with. If it's substantially different, others may want to do the same. And no, the proper spelling, or otherwise, of French, German, Spanish etc. proper names, i.e. in the case of place names the spelling in Merriam-Webster's Geographical Dictionary and similar authoritative sources, is not an "Eastern European issue". Even if there were a cabal of native speakers of accent-laden languages controlling those articles, that would merely reflect the fact that in this international encyclopedia multilingual readers with no accent phobia are presumably the majority of interested readers for most of the articles that we are discussing here. And creating an unenforceable guidelines that tries to overrun them would only lead to disruption even if they were wrong. But they are right, because they are mostly not doing anything that isn't totally standard and in fact expected for English-language reference works. Hans Adler 18:42, 23 June 2011 (UTC)


 * Wrong. The New York Times and the Washington Post handle accents inconsistently. It's certainly not true that they never use them. The New York Times style guide says on p. 6 (see Amazon preview): "Accent marks are used for French, Italian, Spanish, Portuguese and German words and names. [...] Do not use accents in words and names from other languages [...], which are less familiar to most American writers, editors and readers; such marks would be prone to error, and typefonts often lack characters necessary for consistency. [...] In the name of a United States resident, use or omit accents as the bearer does; when in doubt, omit them. (Exception: Use accents in Spanish names of Puerto Rico residents.)" Actual usage by the New York Times is not as the style guide prescribes but is all over the place. The Washington Post presumably has a similar style guide, and it is similarly inconsistent in its implementation. E.g. François Mitterrand is sometimes written with a ç and sometimes with a c.
 * The list of languages for which accents are prescribed, and other details, differ between the various publications. Some further examples:
 * "Foreign terms that have not become anglicized should be set in italics on first use and given proper accents if from a Latin alphabet. [...] Place-names from foreign languages appear in roman; retain diacritical marks if original is from a Latin alphabet except in commonly anglicized names: Montreal, Quebec, Istanbul. [...] Languages with Latin alphabets: Retain the original diacritical marks (accents, apostrophes, dots, cedillas, glottals, etc.) in unanglicized words in the following languages: Czech, Danish, Dutch, Finnish, French, German, Hawaiian, Hungarian, Icelandic, Irish, Italian, Latvian, Norwegian, Polish, Portuguese, Slovak, Spanish, Swedish, and Turkish. Some anglicized terms from these languages also retain their accents (follow Webster’s)." National Geographic
 * "Put the accents and cedillas on French names and words, umlauts on German ones, accents and tildes on Spanish ones, and accents, cedillas and tildes on Portuguese ones: Françoise de Panafieu, Wolfgang Schäuble, Federico Peña. Leave the accents off other foreign names. Any foreign word in italics should, however, be given its proper accents." Economist
 * "Use on French, German, Portuguese, Spanish and Irish Gaelic words (but not anglicised French words such as cafe, apart from exposé, lamé, résumé, roué). People's names, in whatever language, should also be given appropriate accents where known. Thus: 'Arsène Wenger was on holiday in Bogotá with Rafa Benítez'" Guardian
 * There are two things that almost all style guides of the quality press agree about: (1) Diacritics are always used for the most familiar languages, i.e. at least French, Spanish and German. (2) If the bearer of a name uses them, diacritics are always used when the name appears in italics – which in a newspaper is the context that comes closest to our title context. Hans Adler 19:12, 23 June 2011 (UTC)

From Naming conventions (use English)/Archive 9
I am seeking a consensus on if the policies of WP:UCN and WP:EN continues to be working policies for naming biographical articles, or if such policies have been replaced by a new status quo. This discussion is on-going at Wikipedia talk:Naming conventions (use English). Dolovis (talk) 16:38, 18 May 2011 (UTC)
 * This was above the table of contents, so I have moved it here. Discussion has been going on above for a while. The real question is of course not whether these policies apply but how to interpret them. Do they cover fine points of spelling? Do we follow the manuals of style of our sources, resulting in automatic inconsistency, or do we follow our own, which is similar to that of English-language encyclopedias like Britannica? Hans Adler 17:13, 21 June 2011 (UTC)

From Naming conventions (use English)/Archive 9
Not putting the accents on capital letters in French seems to be an obsolete French practice that existed only for technical reasons. For similar reasons, Swiss keyboards (which are used for entering German, French and Italian) only have the small versions of umlauts and accented letters, and most of the Swiss German press writes Ae, Oe, Ue for Ä, Ö, Ü. But you can see the current French practice at fr:Édith Cresson, which is not a redirect. Hans Adler 08:06, 22 June 2011 (UTC)

See Keyboard layout for the explanation. You can't easily type Élysée on a French keyboard! The situation is explained in detail at fr:Usage des majuscules en français, especially paragraphs 4 (on classical typesetting problems) and 5 (on computer problems). Paragraph 6 then says that these difficulties are disappearing, i.e. that it's no longer a big problem to print words like Élysée correctly. Hans Adler 09:31, 22 June 2011 (UTC)

I also learned at school that capitals in French are always unaccented, but apparently that was wrong. Hans Adler 10:03, 22 June 2011 (UTC)


 * Most of the time they are not "creat[ing] English equivalents" of foreign names at all. They are merely putting foreign names through a simplistic filter that loses a lot of information. Some newspaper manuals of style give advice on how to undo the effect of such filters for those languages for which they have a policy of using diacritics. Because of cases such as Goethe and Goebbels (both spelled this way in German, i.e. not with ö), this usually requires additional research for them. For us this research has never been a problem.
 * Which names are butchered by these filters depends (1) on the location of the source on the scale from trashy tabloids to academic publications, and (2) on the percentage of readers who will be familiar with the accents. Regarding (1), encyclopedias are traditionally near the top end of the scale. Britannica uses diacritics consistently for the most important European languages, for example. Regarding (2), most of our topics with diacritics are so obscure that the majority of interested readers has strong ties to the language in question.
 * In a forum at the Chicago Manual of Style, an editor gave the advice to use the main entry in Merriam-Webster's Geographical Dictionary for those places covered by it. Starting to read this dictionary at the beginning on the search for names that have diacritics in the original, we find:
 * Aabenraa: See Åbenrå.
 * Aaiún, El: See Laayoune.
 * Aalborg: See Ålborg.
 * Aalesund: See Ålesund.
 * Aarhus: See Århus.
 * Ābādān.
 * Abéché.
 * Abela: See Ávila 2.
 * Åbenrå.
 * Åbo: See Turku.
 * Aboukir: See Abū Qīr.
 * Abu Dokhān, Gebel.
 * Abu Kurkas: See Abu Qurqās.
 * Abū Mūsá.
 * Abuná.
 * Abū Qīr.
 * Abu Qurqās.
 * Abū Zabī or Abū Zaby.
 * Abyla: (1) See Musa, Jebel. (2) See Ávila 2.
 * I think the pattern is clear at this point and I needn't continue with names starting with Ac. The original diacritics are used except where English uses a different word. Hans Adler 18:42, 22 June 2011 (UTC)


 * As soon as we have found out which spellings exist and to which language registers they belong, orthography becomes a matter of house style, not of verifiability. This dispute was originated by cases such as Julia Görges, a German who lives in Germany. There is no indication whatsoever that she has adopted an English version of her name. But, as the first line of her article states, "The title of this article contains the character ö. Where it is unavailable or not desired, the name may be represented as Goerges." Now it so happens that a large number of the reliable sources who mention her at all are of the kind where fancy foreign characters are either not available or not desired. Sports organisations such as the Women's Tennis Association and the International Tennis Federation generally convert all names to ASCII characters. (Presumably due to a policy that dates back to a time when most computer systems were unable to handle anything else.) Some news sources specialised on sports tend to follow them, where they republish tables, e.g. tennismagazine.com and ESPN (United States). If you look at these sources, you will see that they are not really about the person, but about statistics.
 * Professional style guides generally contain the advice that the names of people should be written the way they want them to be written, and then presume that for people with diacritics that's with the diacritics. In the discussion at Talk:Julia Görges it was claimed vehemently and repeatedly that she wants her name written with oe, based on the following evidence: The domain of her homepage is julia-goerges.com, and her Twitter account is juliagoerges. Now it just so happens that umlauts became available in domain names only recently, and since there are still serious technical problems with them, almost nobody uses them. And when I tried to register the Twitter account juliagörges [as an experiment; I am not interested in tennis and have never heard of that woman outside Wikipedia], I got the message "Invalid username! Alphanumerics only." This is the filter and the butchering that I was talking about above. On the other hand, on the English version of her homepage her name is spelled consistently as "Görges", including in her profile, where it says: "Name: Julia Görges".
 * Of course her name is also spelled "Görges" in the German sources that were used for the article. And it's spelled that way in the Guardian. And in the New York Times.
 * Yet, just because the butchered version of her names is around in many sources that routinely butcher all names, some editors claim that the butchered version is her "English name". No, it is not. It is an OK spelling of her name. But encyclopedias don't use OK spellings in titles, they use the most correct spellings, and the spellings associated with the highest (most prestigious) language registers. Example: Björn Borg on Britannica. Hans Adler 21:58, 22 June 2011 (UTC)


 * See, now this is different—you are essentially rejecting the argument that we should necessarily follow established usage in en RS, and instead saying just "do it right". I'm not saying that is wrong, but don't pretend you're "follow(ing) the general usage in English reliable sources" when you aren't. NYT is an RS. You're proposing a different criteria. ErikHaugen (talk | contribs) 22:55, 22 June 2011 (UTC)


 * No, I am just insisting on using sources intelligently. There is an unfortunate tendency in Wikipedia to practise tea leaf reading with sources: to use them for things that they never intended to say. A source that is written completely in ASCII cannot possibly contribute to our understanding whether high-quality publications that have the full Unicode range at their disposal write a name with non-ASCII characters or not. (Unless, of course, if it makes efforts to represent non-ASCII characters in ASCII. But reliable sources written in ASCII rarely do that.) Hans Adler 23:42, 22 June 2011 (UTC)


 * You could try googling for "Gerhard Schröder" (which finds essentially the same articles as "Gerhard Schroder") and "Gerhard Schroeder" on various newspaper websites. E.g. the Daily Mail consistently wrote o or oe, the New York Times should have spelled him with ö per its own style guide but sometimes had o instead. USA Today uses the wrong spelling slightly more often, possibly because the news wires strip off all diacritics and USA Today generally didn't undo this. The Britannica article uses ö, of course. Hans Adler 07:14, 24 June 2011 (UTC)

From Naming conventions (use English)/Archive 9
(1) It is well established that reliable sources are only reliable on what they mean to say. Most sports sources are not interested in the spelling of names and convert everything into ASCII. Such sources are only reliable for how a name appears in an ASCII-only environment. Wikipedia is not an ASCII-only environment. (2) Whether to use diacritics or not is a stylistic question like use of "", “” or «» for quotations, or whether to use bold or italics for emphasis. Reliable sources who copy material containing such names from each other generally add or remove the diacritics according to their own manuals of style. Using these reliable sources to decide which version to use would be like tealeaf reading. All recent, serious, general-purpose reference works in English – e.g. Britannica and Merriam-Webster's Geographical Dictionary – use diacritics for the major languages where applicable, with very few exceptions such as Zurich (which has an English name that happens to look like the German name stripped of its diacritic, but is pronounced differently). Hans Adler 09:24, 27 June 2011 (UTC)

Medical reference works are not good sources on wallpapering, books on 15th century Tibetan history are not good sources on higher mathematics, and sports tables are not good sources for the spelling of names. All of these sources are of the type: Can be used when there is no dispute and no better source, but can easily be shot down when someone argues the source got it wrong, or was sloppy, or didn't make some distinctions that must be made in a general-purpose reference work. Hans Adler 20:49, 27 June 2011 (UTC)

From Naming conventions (use English)/Archive 9
WP:DIACRITICS does not contradict WP:COMMONNAME, it only contradicts well-established practice in Wikipedia. De facto, Wikipedia follows the example of major English reference works and prefers using diacritics for all the most familiar languages, almost always. In practice, we do not wait for a subject to hit the quality press and other encyclopedias before spelling them the way encyclopedias spell them. In practice we spell them as encyclopedias do from the beginning. Of course we need reliable sources to establish the correct spelling (as the second quoted sentence says), but they need not be in English. WP:DIACRITICS and WP:COMMONNAME together are not ambiguous, but they do not follow the principle that guidelines should be descriptive of actual practice. If taken literally, they would allow us to quarrel about the titles of roughly 5% of our articles by leaving things more open than they are. Fortunately there is an established practice that prevents these disputes in most cases. Hans Adler 09:38, 27 June 2011 (UTC)

From Naming conventions (use English)/Archive 9
(1) This can be tested by hitting Special:Random many times. A bit less than 10% of our articles have diacritics. I have yet to find an article in this way that could theoretically have diacritics but does not. (2) Piotrus found another test: "Category:Redirects from titles without diacritics (309,123 articles) vs Category:Redirects from titles with diacritics (6,571), which yields us roughly a 1:50 ratio." (3) Take Category:European people, drill down to any of the subcategories for people from a country that uses diacritics. You will see plenty of them. Click on any name that looks as if it should have diacritics but doesn't. Almost always you will find that the person is a resident of the US. Hans Adler 09:47, 27 June 2011 (UTC)

This is obvious in the case of non-anglicized personal names written in the Latin alphabet. The practice differs in many other cases, which I've tried to document here. It should be noted that an earlier version of WP:DIACRITICS did not completely obfuscate the general opinion and practice on diacritical marks: "There is disagreement over what article title to use when a native name uses the Latin alphabet with diacritics (or "accent marks") but general English usage omits the diacritics. A survey that ran from April 2005 to October 2005 ended with a result of 62–46 (57.4%–42.6%) in favor of diacritics, which was a majority but was not considered to be a consensus." The whole section was rewritten in this series of edits. Prolog (talk) 15:56, 27 June 2011 (UTC)

From Naming conventions (use English)/Archive 9
These people aren't particularly notable, so Wikipedia is the first encyclopedia to cover them. And in general it is verifiable from sources in the original language how the name would be spelled by any other English-language encyclopedia, should the person become sufficiently notable. Hans Adler 10:03, 29 June 2011 (UTC)

From Naming conventions (use English)/Archive 9

 * Some people are very obviously trying to change our general practice by arguing out single cases until everyone is too tired to resist them. This is an unethical technique similar to a filibuster and should not be rewarded. It is long-standing practice here that Czech persons get their diacritics in the same way that they get them in professionally edited English encyclopedias. If this continues, it might become a matter for WP:ANI. Hans Adler 09:03, 2 July 2011 (UTC)

Talk:Crêpe has an interesting move discussion.