List of Unicode characters



As of Unicode version, there are 149,878 characters with code points, covering 161 modern and historical scripts, as well as multiple symbol sets. This article includes the 1,062 characters in the Multilingual European Character Set 2 (MES-2) subset, and some additional related characters.

Character reference overview
HTML and XML provide ways to reference Unicode characters when the characters themselves either cannot or should not be used. A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and a character entity reference refers to a character by a predefined name.

A numeric character reference uses the format


 * nnnn

or
 * hhhh

where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form. The x must be lowercase in XML documents. The nnnn or hhhh may be any number of digits and may include leading zeros. The hhhh may mix uppercase and lowercase, though uppercase is the usual style.

In contrast, a character entity reference refers to a character by the name of an entity which has the desired character as its replacement text. The entity must either be predefined (built into the markup language) or explicitly declared in a Document Type Definition (DTD). The format is the same as for any entity reference:


 * name

where name is the case-sensitive name of the entity. The semicolon is required.

Because numbers are harder for humans to remember than names, character entity references are most often written by humans, while numeric character references are most often produced by computer programs.

Control codes
65 characters, including DEL. All belong to the common script. Footnotes:
 * 1 Control-C has typically been used as a "break" or "interrupt" key.
 * 2 Control-D has been used to signal "end of file" for text typed in at the terminal on Unix / Linux systems. Windows, DOS, and older minicomputers used Control-Z for this purpose.
 * 3 Control-G is an artifact of the days when teletypes were in use. Important messages could be signalled by striking the bell on the teletype. This was carried over on PCs by generating a buzz sound.
 * 4 Line feed is used for "end of line" in text files on Unix / Linux systems.
 * 5 Carriage Return (accompanied by line feed) is used as "end of line" character by Windows, DOS, and most minicomputers other than Unix- / Linux-based systems
 * 6 Control-O has been the "discard output" key. Output is not sent to the terminal, but discarded, until another Control-o is typed.
 * 7 Control-Q has been used to tell a host computer to resume sending output after it was stopped by Control-S.
 * 8 Control-S has been used to tell a host computer to postpone sending output to the terminal. Output is suspended until restarted by the Control-Q key.
 * 9 Control-U was originally used by Digital Equipment Corporation computers to cancel the current line of typed-in text. Other manufacturers used Control-X for this purpose.
 * 10 Control-X was commonly used to cancel a line of input typed in at the terminal.
 * 11 Control-Z has commonly been used on minicomputers, Windows and DOS systems to indicate "end of file" either on a terminal or in a text file. Unix / Linux systems use Control-D to indicate end-of-file at a terminal.

Latin script
The Unicode Standard (version ) classifies 1,481 characters as belonging to the Latin script.

Basic Latin
95 characters; the 52 alphabet characters belong to the Latin script. The remaining 43 belong to the common script.

The 33 characters classified as ASCII Punctuation & Symbols are also sometimes referred to as ASCII special characters. Often only these characters (and not other Unicode punctuation) are what is meant when an organization says a password "requires punctuation marks".

Latin-1 Supplement
96 characters; the 62 letters, and two ordinal indicators belong to the Latin script. The remaining 32 belong to the common script.

Latin Extended-A
128 characters; all belong to the Latin script.

Latin Extended-B
208 characters; all belong to the Latin script; 33 in the MES-2 subset.

Latin Extended Additional
256 characters; all belong to the Latin script; 23 in the MES-2 subset.

Additional Latin Extended

 * Latin Extended-C (Unicode block)
 * Latin Extended-D (Unicode block)
 * Latin Extended-E (Unicode block)
 * Latin Extended-F (Unicode block)
 * Latin Extended-G (Unicode block)

IPA Extensions
96 characters; all belong to the Latin script; three in the MES-2 subset.

Spacing modifier letters
80 characters; 15 in the MES-2 subset.

Phonetic Extensions

 * Phonetic Extensions (Unicode block)
 * Phonetic Extensions Supplement (Unicode block)

Greek and Coptic
144 code points; 135 assigned characters; 85 in the MES-2 subset.

Greek Extended
For polytonic orthography. 256 code points; 233 assigned characters, all in the MES-2 subset (#670 – 902).

Cyrillic
256 characters; 191 in the MES-2 subset.

Cyrillic supplements

 * Cyrillic Supplement (Unicode block)
 * Cyrillic Extended-A (Unicode block)
 * Cyrillic Extended-B (Unicode block)
 * Cyrillic Extended-C (Unicode block)
 * Cyrillic Extended-D (Unicode block)

Mandaic

 * Mandaic (Unicode block)

Samaritan

 * Samaritan (Unicode block)

Brahmic (Indic) scripts
The range from U+0900 to U+0DFF includes Devanagari, Bengali script, Gurmukhi, Gujarati script, Odia alphabet, Tamil script, Telugu script, Kannada script, Malayalam script, and Sinhala script.

Other Brahmic scripts
Other Brahmic and Indic scripts in Unicode include:


 * Ahom (Unicode block)
 * Balinese (Unicode block)
 * Batak (Unicode block)
 * Bhaiksuki (Unicode block)
 * Buhid (Unicode block)
 * Buginese (Unicode block)
 * Chakma (Unicode block)
 * Cham (Unicode block)
 * Common Indic Number Forms (Unicode block)
 * Dives Akuru (Unicode block)
 * Dogra (Unicode block)
 * Grantha (Unicode block)
 * Hanunoo (Unicode block)
 * Javanese (Unicode block)
 * Kaithi (Unicode block)
 * Kawi (Unicode block)
 * Khmer (Unicode block)
 * Khmer Symbols (Unicode block)
 * Khojki (Unicode block)
 * Khudawadi (Unicode block)
 * Lao (Unicode block)
 * Lepcha (Unicode block)
 * Limbu (Unicode block)
 * Mahajani (Unicode block)
 * Makasar (Unicode block)
 * Marchen (Unicode block)
 * Meetei Mayek (Unicode block)
 * Meetei Mayek Extensions (Unicode block)
 * Modi (Unicode block)
 * Multani (Unicode block)
 * Myanmar (Unicode block)
 * Myanmar Extended-A (Unicode block)
 * Myanmar Extended-B (Unicode block)
 * New Tai Lue (Unicode block)
 * Newa (Unicode block)
 * Phags-pa (Unicode block)
 * Rejang (Unicode block)
 * Saurashtra (Unicode block)
 * Sharada (Unicode block)
 * Siddham (Unicode block)
 * Sundanese (Unicode block)
 * Sundanese Supplement (Unicode block)
 * Syloti Nagri (Unicode block)
 * Tagalog (Unicode block)
 * Tagbanwa (Unicode block)
 * Tai Le (Unicode block)
 * Tai Tham (Unicode block)
 * Tai Viet (Unicode block)
 * Takri (Unicode block)
 * Thai (Unicode block)
 * Tibetan (Unicode block)
 * Tirhuta (Unicode block)

Other South and Central Asian writing systems

 * Gunjala Gondi (Unicode block)
 * Masaram Gondi (Unicode block)
 * Mro (Unicode block)
 * Nag Mundari (Unicode block)
 * Ol Chiki (Unicode block)
 * Sora Sompeng (Unicode block)
 * Tangsa (Unicode block)
 * Toto (Unicode block)
 * Warang Citi (Unicode block)

Southeast Asian writing systems

 * Hanifi Rohingya (Unicode block)
 * Kayah Li (Unicode block)
 * Pahawh Hmong (Unicode block)
 * Pau Cin Hau (Unicode block)

Other African scripts

 * Adlam (Unicode block)
 * Bamum (Unicode block)
 * Bamum Supplement (Unicode block)
 * Bassa Vah (Unicode block)
 * Medefaidrin (Unicode block)
 * Mende Kikakui (Unicode block)
 * NKo (Unicode block)
 * Osmanya (Unicode block)
 * Ottoman Siyaq Numbers
 * Tifinagh (Unicode block)
 * Vai (Unicode block)

Other American scripts

 * Cherokee (Unicode block)
 * Cherokee Supplement (Unicode block)
 * Deseret (Unicode block)
 * Kaktovik Numerals (Unicode block)
 * Osage (Unicode block)

General Punctuation
112 code points; 111 assigned characters; 24 in the MES-2 subset.

Arrows

 * Miscellaneous Symbols and Arrows (Unicode block)
 * Supplemental Arrows-A (Unicode block)
 * Supplemental Arrows-B (Unicode block)
 * Supplemental Arrows-C (Unicode block)

Mathematical symbols

 * Supplemental Mathematical Operators (Unicode block)
 * Miscellaneous Mathematical Symbols-A (Unicode block)
 * Miscellaneous Mathematical Symbols-B (Unicode block)
 * Mathematical Alphanumeric Symbols: Mathematical Alphanumeric Symbols (Unicode block)

Katakana

 * Kana Extended-A (Unicode block)
 * Kana Extended-B (Unicode block)
 * Kana Supplement (Unicode block)
 * Katakana Phonetic Extensions (Unicode block)
 * Small Kana Extension (Unicode block)

CJK Unified Ideographs

 * CJK Unified Ideographs

CJK Radicals

 * CJK Radicals Supplement (Unicode block)
 * CJK Strokes (Unicode block)
 * Kangxi Radicals (Unicode block)

Other East Asian writing systems

 * Counting Rod Numerals (Unicode block)
 * Halfwidth and Fullwidth Forms (Unicode block)
 * Ideographic Description Characters (Unicode block)
 * Khitan Small Script (Unicode block)
 * Lisu (Unicode block)
 * Lisu Supplement (Unicode block)
 * Miao (Unicode block)
 * Modifier Tone Letters (Unicode block)
 * Nushu (Unicode block)
 * Nyiakeng Puachue Hmong (Unicode block)
 * Small Form Variants (Unicode block)
 * Tai Xuan Jing Symbols (Unicode block)
 * Tangut (Unicode block)
 * Tangut Components (Unicode block)
 * Tangut Supplement (Unicode block)
 * Vertical Forms (Unicode block)
 * Wancho (Unicode block)
 * Yi Syllables (Unicode block)
 * Yi Radicals (Unicode block)
 * Yijing Hexagram Symbols (Unicode block)

Ancient and historic scripts

 * Aegean Numbers (Unicode block)
 * Anatolian Hieroglyphs (Unicode block)
 * Ancient Greek Numbers (Unicode block)
 * Ancient Symbols (Unicode block)
 * Avestan (Unicode block)
 * Brahmi (Unicode block)
 * Carian (Unicode block)
 * Caucasian Albanian (Unicode block)
 * Chorasmian (Unicode block)
 * Cuneiform (Unicode block)
 * Cuneiform Numbers and Punctuation (Unicode block)
 * Cypriot Syllabary (Unicode block)
 * Cypro-Minoan (Unicode block)
 * Early Dynastic Cuneiform (Unicode block)
 * Egyptian Hieroglyph Format Controls (Unicode block)
 * Egyptian Hieroglyphs (Unicode block)
 * Elbasan (Unicode block)
 * Elymaic (Unicode block)
 * Glagolitic (Unicode block)
 * Glagolitic Supplement (Unicode block)
 * Gothic (Unicode block)
 * Hatran (Unicode block)
 * Imperial Aramaic (Unicode block)
 * Indic Siyaq Numbers
 * Inscriptional Pahlavi (Unicode block)
 * Inscriptional Parthian (Unicode block)
 * Kharoshthi (Unicode block)
 * Linear A (Unicode block)
 * Linear B Ideograms (Unicode block)
 * Linear B Syllabary (Unicode block)
 * Lycian (Unicode block)
 * Lydian (Unicode block)
 * Manichaean (Unicode block)
 * Mayan Numerals (Unicode block)
 * Meroitic Cursive (Unicode block)
 * Meroitic Hieroglyphs (Unicode block)
 * Nabataean (Unicode block)
 * Nandinagari (Unicode block)
 * Ogham (Unicode block)
 * Old Hungarian (Unicode block)
 * Old Italic (Unicode block)
 * Old North Arabian (Unicode block)
 * Old Permic (Unicode block)
 * Old Persian (Unicode block)
 * Old Sogdian (Unicode block)
 * Old South Arabian (Unicode block)
 * Old Turkic (Unicode block)
 * Old Uyghur (Unicode block)
 * Palmyrene (Unicode block)
 * Phaistos Disc (Unicode block)
 * Phoenician (Unicode block)
 * Psalter Pahlavi (Unicode block)
 * Runic (Unicode block)
 * Sogdian (Unicode block)
 * Soyombo (Unicode block)
 * Ugaritic (Unicode block)
 * Vithkuqi (Unicode block)
 * Yezidi (Unicode block)
 * Zanabazar Square (Unicode block)

Braille

 * Braille Patterns (Unicode block)

Music

 * Western Musical Symbols (Unicode block)
 * Byzantine Musical Symbols (Unicode block)
 * Ancient Greek Musical Notation (Unicode block)
 * Znamenny Musical Notation (Unicode block)

Shorthand

 * Duployan (Unicode block)
 * Shorthand Format Controls (Unicode block)

Sutton SignWriting

 * Sutton SignWriting: Sutton SignWriting (Unicode block)

Emoji

 * Emoji in Unicode

Special areas and format characters

 * Private Use Areas
 * Private Use Area (Unicode block)
 * Supplementary Private Use Area-A (Unicode block)
 * Supplementary Private Use Area-B (Unicode block)
 * Specials (Unicode block)
 * Surrogates
 * Low Surrogates (Unicode block)
 * High Surrogates (Unicode block)
 * High Private Use Surrogates (Unicode block)
 * Tags (Unicode block)
 * Variation Selectors
 * Variation Selectors (Unicode block)
 * Variation Selectors Supplement (Unicode block)