Wikipedia:Naming conventions (standard letters with diacritics)

Summary
This (proposed) guideline tries to distinguish when accents (also known as diacritics) should be used in an article title, and when they should be used only within the body of the article.

In standard English, there are 26 letters (for simplicity's sake, only lowercase will be used):

a b c d e f g h i j k l m n o p q r s t u v w x y z

However, many words from other languages have been borrowed by English which have not been fully anglicized, such as cafe/café, which in French is spelled café, but in English could show up with or without the accented "é". There are also some surnames that are traditionally commonly written with a diacritic, e.g. Brontë.

Guideline in a nutshell

 * Diacritics should only be used in an article's title, if it can be shown that the word is routinely used in that way, with diacritics, in common usage. This means in reliable English sources, such as encyclopedias, dictionaries, or articles in major English-language newspapers.
 * If the word is routinely listed in reliable English sources without diacritics, then the Wikipedia article should follow that method for the article title, though the diacritics version should be given in the initial paragraph of the article as suggested in Naming conventions (use English).
 * If it is not clear what "common usage" is, then the general Wikipedia guideline is to avoid use of diacritics in article titles.

For exceptions to the above rules (such as technical issues), and further explanation, please see the remainder of the guideline below.

Scope
This guideline is about standard letters (a-z/A-Z) with diacritics, existing in languages of which the native script is based on the Latin alphabet:
 * Vowels with diacritics: é; à; ö; â; ę; ... (as used in languages like French, German, Spanish, and many others).
 * Consonants with diacritics: ł; ś; ń; š; č; ç; ñ; ... (as used in languages like Polish, Czech, Spanish, etc...)

As a Naming conventions guideline, this guideline is about article names, not about the use of alternate versions with or without diacritics in the body of the Wikipedia articles.

This guideline does not apply to redirect pages, which can (and should) use diacritics to ensure that all popular variations of a name's spelling, still redirect to the proper article. Conversely, in those cases, like Charlotte Brontë, where English and Wikipedia use the diacritic, a redirect from the simple form (Charlotte Bronte) is a necessary service to the reader. For those cases where a word with a diacritic and its diacritic-less variant usually have a different meaning (such as canon/cañon), see Wikipedia:Naming conventions (precision).

Other types of diacritics, non-standard letters and ligatures
The majority of this guideline is about "standard" letters with diacritics as are seen in the section above. Thus, this guideline does not apply to the following letters:
 * Ð/ð - The upper case variant of this letter is a D with a diacritic, but the lower case variant is an altogether different character (not the regular form of a d (as in a-z));
 * ɨ - and other IPA-specific glyphs containing what looks like a diacritic;
 * ί, ἰ, ύ, and other diacritics used in languages for which the standard notation doesn't use the Latin alphabet;
 * Diacritics used in some romanization/transliteration systems (e.g. Pinyin);
 * Ligatures like œ and æ (see: Naming conventions (use English));
 * ß - found in the German alphabet;
 * þ - used, for example, in Icelandic;
 * Ə|Ə/ə - found in the Azeri alphabet
 * Signs used to indicate metre schemes in poetry (see for example Dactylic hexameter).

For special-case characters such as þ, ð and ß, because of the limited geographic regions in which these letters are used, English-speakers in other parts of the world (especially those for whom English is a second language) often find these symbols incomprehensible and unpronounceable. Difficulties also arise in terms of how to alphabetize them, since most English speakers, even those for whom English is their native language, do not know where to place them in a standard alphabet. As a result, this guideline recommends that their use be avoided in article titles.

Guideline
If the usual English version of a proper name has a diacritic ("Thomas à Kempis", "Brontë"), the article title should also use the version with the diacritic. Similarly for names with heavy metal umlauts ("Motörhead"), if printable ("Die Ärzte" and not "Die A&#8411;rzte").

Also, if for a noun the major dictionaries (Webster's; OED;...) concur that it is always written with a diacritic, the Wikipedia content page is always at the version with diacritic (e.g. "cliché").

In more ambiguous cases, a standard letter (a-z/A-Z) with a diacritic is acceptable in an article title if all of the following conditions are met:
 * 1) The article title does not have some other usual English version. For example, the German name of Friedrich II. (Preußen), per Naming conventions (names and titles), is entitled Frederick II of Prussia
 * 2) There are multiple English-language reliable publications which use the version with diacritics
 * 3) There is no other naming convention (policy, guideline) that would have the page at a different name;
 * 4) It is a pre-combined printable character.

Other exceptions are supported, for example as a result of a Requested move vote, as far as such exceptions stay in line with the official naming conventions policy ("Generally, article naming should give priority to what the majority of English speakers would most easily recognize, with a reasonable minimum of ambiguity, while at the same time making linking to those articles easy and second nature").

Specifics according to language of origin
preliminary remark: if the use (or not) of a diacritic in English is a "national varieties of English" issue (e.g. "faconne" according to US practice and "façonné" according to UK practice), the issue should be dealt with as described in Manual of Style

Specific languages using the (extended) Latin alphabet

 * Irish : See: Manual of Style (Ireland-related articles) (geographical names; names of people).

Other

 * Languages using Arabic script : See: Naming conventions (Arabic)
 * Chinese : See: Naming conventions (Chinese)
 * Languages using Cyrillic alphabet : See : Naming conventions (Cyrillic) (Belarusian, Bosnian, Bulgarian, Macedonian, Russian, Serbian, Ukrainian)
 * Greek : See: Naming conventions (Greek)
 * Hebrew : See: Naming conventions (Hebrew)
 * Indo-Aryan languages : See: Naming conventions (Dharmic) and Naming conventions (Indic)
 * Japanese : See: Manual of Style (Japan-related articles)
 * Korean : See: Naming conventions (Korean)

Printability
In any case, in names of content pages, characters that are not printable on every machine are to be avoided.

For article text the problem can be avoided by using the unicode template. In article titles that workaround can not be used.

See Naming conventions (Unicode) (draft) for more on this. For example, this "A" with a diacritical mark on top, and another at the bottom: ᾉ renders like this with the "unicode" template: ᾉ, and like this with this in standard text: ᾉ (if both show the same, you're probably not using MSIE...). Note that the "A" used in this example is the Greek upper case &Alpha;/&Alpha; - which is the same glyph as the "A" from Latin alphabet - but, in unicode, a different codepoint (in a different range), non-printable on several machines.

Sticking to the Latin alphabet, there is for example this non-printable character (an "A", with a diacritical mark under it): Ḁ (unicode: "Ḁ").

Likewise, combining diacritical marks (unicode range U+0300 - U+036F) are to be avoided in Wikipedia page names, e.g., trying to write Doña with the combining diacritical mark &#x342;: & (unicode: "&#x044;&#x06F;&#x06E;&#x342;&#x061;" - note that this is also rendered completely different in MSIE and most other browsers on non-MS Windows operating systems).

Category sort key

 * See Categorization

When a diacritic is used in a page name, categories are used with a category sort key based on the variant without diacritic, regardless of alphabetization rules in the originating language, example:
 * Étretat:
 * François Péron:

Being consistent
After the choice has been made whether a name is written with or without diacritics in a page name, other Wikipedia content pages (including categories) referring to this topic adopt the same format in the page name. Examples:
 * George Frideric Handel (and not Georg Friedrich Händel nor George Frederick Haendel) leads to:
 * Categories, per Naming conventions (categories): Category:Compositions by George Frideric Handel
 * Bracketed disambiguation of compositions, per Naming conventions (pieces of music): Water Music (Handel)
 * Camille Saint-Saëns (and not Camille Saint-Saens):
 * Lists, in article namespace: List of compositions by Camille Saint-Saëns
 * Categories, Category:Compositions by Camille Saint-Saëns
 * Bracketed disambiguation, Violin Concerto No. 3 (Saint-Saëns)

Rationale
Several polls have been held regarding the use of diacritics in Wikipedia. Notably, a survey that ran from April 2005 to October 2005 ended with a result of 62–46 (57.4%–42.6%) in favor of diacritics. One of the reasons that such polls usually don't end in a clear consensus one way or another is that the issue is easily presented in a fashion of "always diacritics" or "never diacritics". This is however not the way the English language works:
 * The English language is not completely diacritic free, examples: provençal, château, piñata;
 * Some words can be used interchangeably with or without accent, example: café or cafe;
 * Some words have a different meaning when used with or without accent, example: canon vs. cañon (=canyon);
 * Some loanwords obviously lost their accent in English, examples: siege, chassis;
 * There are some differences between national varieties of English, examples: debacle (Webster's) vs. débâcle (OED);
 * For loanwords, not all languages keep their accents as easily: most accents in English derive from French, some others from Spanish (for example Doña), and other languages spoken in Western Europe, for example böttger ware from German, auto-da-fé from Portuguese. This also limits the diacritics encountered in loanwords to those used in these languages (so, in English more or less limited to grave accent, acute accent, circumflex, diaeresis / umlaut, and for consonants: cedilla and ñ)

Other cited problems with diacritic letters in article titles include:


 * They are difficult to type for many people, and make linking difficult.
 * They can make pronunciation of the article title unclear, for a reader who is not familiar with those symbols
 * They interfere with proper sorting of articles in category listings
 * They don't always show up properly on everyone's web browsers
 * They sometimes cause problems with copy-pasting (for example, if Microsoft Notepad does not handle diacritics properly)
 * They make URLs unnecessarily complicated and difficult to read.

Only very exceptionally does English use an accent where other languages don't, for example: Thomas à Kempis (compare, Dutch: Thomas a Kempis; French: Thomas a Kempis; German: Thomas von Kempen)

For these reasons (while diacritics are neither always applied as in the original language, nor always abandoned when anglicizing), a guideline on the use of diacritics in Wikipedia can not be put in terms of either "always diacritics" or "never diacritics".