Wikipedia:Manual of Style/Persian

Definitions
Persian is a member of the Iranian branch of the Indo-European languages. There are three closely-related varieties of Persian:


 * Persian proper, or Farsi, (فارسی) is spoken in Iran.
 * Dari, or Afghani Persian, (دری) is spoken in Afghanistan and Pakistan.
 * Tajik (Тоҷикӣ / Tojikī / تاجیکی) is spoken in Tajikistan and the former USSR.

The Persian language has been written with a number of different scripts, including Old Persian cuneiform, Pahlavi (Middle Persian) and Avestan. After the Islamic conquest of the Persian Sassanian Empire in 651 AD, Arabic replaced Middle Persian as the language of government, culture and especially religion for the next two centuries.

Written Persian reappeared during the 9th and 10th centuries. Since then it has been written in a modified version of the Arabic script with additional letters. The period of the 13th–15th centuries is known as Classical Persian.

In the Tajik Soviet Socialist Republic of the former USSR the Tajik language has been created on the basis of the local dialects. In 1928–1939 it was shortly written with Latin script, and since 1939 with the Tajik version of Cyrillic alphabet.

Perso-Arabic
There exist several romanization schemes for Persian. None of them can be seen as ultimate and universal. Although, three strategies can be concluded:
 * Monographic scientific ("strict") romanization thoroughly represents Persian pronunciation as well as Persian orthography, especially redundant Arabic letters. It follows the principle "one letter (sign) to one letter (sign)" and avoids digraphs but favours diacritical signs. Examples of such schemes: by the German Oriental Society (Deutsche Morgenländische Gesellschaft, DMG) or by Encyclopædia Iranica (EI).
 * Digraphic practical ("semi-strict") romanization generally follows the above principles but uses both diacritical signs and digraphs. However, the use of digraphs may lead to a confusion when combinations such as سه sh or زه zh occur. Examples of such schemes: the ALA-LC romanization or BGN/PCGN romanization.
 * Simplified romanization employs only the letters of the English alphabet. This generally follows digraphic romanization schemes but drops out any diacritical signs.

Romanization table
This is a compromise version of romanization that combines the existing schemes.

It is expected that the readers of Wikipedia have no linguistic background, so simplified romanization is advised for usage in articles. Original Persian spelling in parenthesis is enough for those who need it. However, the semi-strict romanization may be written alongside (usually after) Persian script to give a clue to the native pronunciation of a name or a word.

The scientific (strict) column is given rather for reference. If you need a more precise transliteration, use the semi-strict one: it is precise enough but uses less diacritical signs and more intuitive.

Notes:

Redundant letters
Persian has seven redundant letters inherited from Arabic: ⟨ث ص⟩ for ⟨س⟩ s, ⟨ذ ظ ض⟩ for ⟨ز⟩ z, ⟨ط⟩ for ⟨ت⟩ t, ⟨ح⟩ for ⟨ه⟩ h. Usually, they are represented in romanizations with one diacritical sign or another. Unlike Arabic, this diacritics does not signify any changes in Persian pronunciation. The motive for this is backward conversion: one could restore the original Persian spelling from a romanization. But if the original spelling for a Persian word is already provided, there is no reason to write these diacritical signs, so you do not have to use them.

Digraphs
When combinations گه gh, که kh, سه sh, زه zh occur, a middle dot ⟨·⟩ or an apostrophe ⟨'⟩ may be employed: g·h, k·h, s·h, z·h.

Vowels
In Classical Persian there were three short vowels: a, i, u, and five long ones: ā, ē, ī, ō, ū. In modern varieties the distinction is between three unstable (formerly short) vowels: a, e, o, and three stable (formerly long) ones: ā, i, u. Sometimes a macron could be seen over the latter two: ī and ū, but as there is no short i and u in Modern Persian (either Farsi or Dari, but not Tajik), there is no need in such redundant notation. In simplified romanization the macron over the stable a could be also ignored. For ē and ō see the section below.

The ending -eh
The Middle Persian nominal ending -ag is written with the Arabic letter ⟨ه⟩ and pronounced either with a in Classical Persian and Dari or e in Iranian Farsi. The tradition is to retain this mute letter h in romanization. So شاهنامه is Shahnameh or Shahnamah. Note that Encyclopædia Iranica prefers -a.

Mute h
The word-final mute ⟨ه⟩ can signify any other final vowel than the above-mentioned ending.

Mute v
The initial combination ⟨خو⟩ that represented either /xʷ/ or /xw/ in Classical Persian has been simplified into /x/ in Modern Persian. It is advised not to transliterate this mute letter but in some cases it may be represented with ⟨ʷ⟩ (U+02B7 MODIFIER LETTER SMALL W). E.g. Khʷārazm or Khārazm.

Dari and Classical Persian
Dari, the variety used in Afghanistan, is more conservative in many ways and retains many traits of Classical Persian:
 * Dari preserves two long vowels ē and ō, while in Iranian Persian they are merged with ī and ū respectively. E.g. the Persian words for "lion" and "milk" are written شیر but pronounced differently in Dari and Classical Persian: shēr and shīr, but the same in Iran: shir. If you want to present this distinction, it is better to write the macron.
 * Dari preserves the quality of diphthongs ay and aw, whereas in Iran they are ow and ey.
 * Dari preserves different pronunciation of the letter ⟨ق⟩ q, whereas in Iran the letter is merged with ⟨غ⟩ gh in pronunciation.
 * Dari uses the semivowel pronunciation w of the letter ⟨و⟩.

It is up to the writer to decide whether to represent or not these linguistic peculiarities in the articles concerning Afghanistan. An advice here: be consistent and do not mix up two varieties. Articles concerning Classical (pre-modern) periods may follow the romanization of the sources cited.

Old and Middle Persian
For Old and Middle Persian use transliteration schemes established by scientific community and/or try to follow the sources. Some simplifications may be applied: Zaraϑuštra → Zarathushtra, Gāϑā → Gatha, etc.

Lead paragraphs
All Persian-related articles should have a lead paragraph which includes the article title in simplified romanization, along with the original Persian script and the semi-strict romanization in parenthesis, the latter gives a reader a general hint how the name or word is pronounced by native speakers. The Persian script may be enclosed in either lang-fa, lang-prs or lang, while the romanization in either unicode or transl.

Consider the following examples:





which gives:
 * Tehran (, Tehrān) is the capital of Iran.
 * Kabul (, Kābol) is the capital of Afghanistan.

Some cases may require variations on this format.

Consider the following:
 * Omar Khayyam (born Ghiyās̱-ad-Din Abu-l-Fatḥ ‘Omar ebn Ebrāhim al-Khayyām Nishāpuri, غیاثالدین ابوالفتح عمر ابراهیم خیام نیشابورﻯ) was a Persian poet and polymath.


 * Ferdowsi, or Firdawsi (full name in, Ḥakim Abu-l-Qāsem Ferdowsi Tusi) was a Persian poet.

The articles that are missing this information are listed at Articles needing Persian script or text.

In accordance with the official Wikipedia policy at Naming conventions (use English), if the name has an accepted English form, then use it everywhere: in the name of the article, in the lead paragraph and in the article itself, e.g. use Kabul, not Kabol, Isfahan, not Esfahan, Kunduz, not Qondoz (except in semi-strict romanization after Persian script).

Redirects
All common transliterations should redirect to the article. There may often be many redirects, but this is intentional and does not represent a problem.

In text
Use simplified romanization for Persian names and words whenever possible. The first time you introduce a Persian name or word, provide the Persian script and the semi-strict transliteration in parenthesis. Example:
 * An early epic poem of Persian classical literature is the Shahnameh (, Shāhnāmeh) by Ferdowsi . Ferdowsi wrote the Shahnameh between 977 and 1010 AD. (Not "Ferdowsī wrote the Šāhnāmeh...")

Tajik Cyrilic
Since Tajik is written with a more or less phonetic alphabet, its romanization causes few difficulties. In general it follows the Wikipedia guidance for Russian.

Note:
 * Tajik has four additional consonants: ⟨ғ, қ, ҳ, ҷ⟩ (that correspond to the Perso-Arabic letters ⟨غ ,ق ,ه ,چ⟩). They are transliterated gh, q, h, j.
 * Tajik has two historically "long" vowels: ⟨ӣ⟩ and ⟨ӯ⟩. Since Tajik pronunciation differs from Farsi and Dari, it is better not to drop the macron to prevent any confusion: ī and ū.
 * Unlike Russian, Tajik has no palatalized consonants. The letters ⟨ё, ю, я⟩ are always represented by digraphs: yo, yu, ya. The letter ⟨ё⟩ should never be confused with the letter ⟨е⟩.
 * The letter ⟨е⟩: e after consonants, ye in other cases (at the start of a word, or following a vowel).
 * The obsolete Russian letters ⟨ц, щ, ы, ь⟩ may occur in some texts: they transliterated as for Russian.