Written Chinese

Written Chinese is a writing system that uses Chinese characters and other symbols to represent the Chinese languages. Chinese characters do not directly represent pronunciation, unlike letters in an alphabet or syllabograms in a syllabary. Rather, the writing system is morphosyllabic: characters are one spoken syllable in length, but generally correspond to morphemes in the language, which may either be independent words, or part of a polysyllabic word. Most characters are constructed from smaller components that may reflect the character's meaning or pronunciation. Literacy requires the memorization of thousands of characters; college-educated Chinese speakers know approximately 4,000. This has led in part to the adoption of complementary transliteration systems as a means of representing the pronunciation of Chinese.

Chinese writing is first attested during the late Shang dynasty (c. 1250),  but the process of creating characters is thought to have begun centuries earlier during the Late Neolithic and early Bronze Age (c. 2500–2000 BCE). After a period of variation and evolution, Chinese characters were standardized under the Qin dynasty (221–206 BCE). Over the millennia, these characters have evolved into well-developed styles of Chinese calligraphy. As the varieties of Chinese diverged, a situation of diglossia developed, with speakers of mutually unintelligible varieties able to communicate through writing using Literary Chinese. In the early 20th century, Literary Chinese was replaced in large part with written vernacular Chinese, largely corresponding to Standard Chinese, a form based on the Beijing dialect of Mandarin. Although most other varieties of Chinese are not written, there are traditions of written Cantonese, written Shanghainese and written Hokkien, among others.

Structure
Written Chinese is not based on an alphabet or syllabary. Most characters can be analyzed as compounds of smaller components, which may be assembled according to several different principles. Characters and components may reflect aspects of meaning or pronunciation. The best known exposition of Chinese character composition is the Shuowen Jiezi, compiled by Xu Shen c. 100 CE. Xu did not have access to the earliest forms of Chinese characters, and his analysis is not considered to fully capture the nature of the writing system. Nevertheless, no later work has supplanted the Shuowen Jiezi in terms of breadth, and it is still relevant to etymological research today.

Derivation of characters
According to the Shuowen Jiezi, Chinese characters are developed on six basic principles. (These principles, though popularized by the Shuowen Jiezi, were developed earlier; the oldest known mention of them is in the Rites of Zhou, a text from c. 150 BCE. ) The first two principles produce simple characters, known as :

 Pictographs : in which the character is a graphical depiction of the object it denotes.
 * Examples:, , .

Indicatives : in which the character represents an abstract notion.
 * Examples:, , .



The remaining four principles produce complex characters historically called, though this term is now generally used to refer to all characters, whether simple or complex. Of these four, two construct characters from simpler parts:

Ideographic compounds : in which two or more parts are used for their meaning. This yields a composite meaning, which is then applied to the new character.
 * Example:, which represents a sun rising in the trees.

Phono-semantic compounds : in which one part—often called the radical—indicates the general semantic category of the character, such as being related to water or ''eyes', with the other part being another character used for its phonetic value.
 * Example:, which is composed of , and , which is used for its pronunciation.



The last two principles do not produce new written forms; they instead transfer new meanings to existing forms:

Transference : in which a character, often with a simple, concrete meaning takes on an extended, more abstract meaning.
 * Example:, which was originally a pictograph depicting a fishing net. Over time, it has taken on an extended meaning, covering any kind of lattice: for instance, it is the word used to refer to computer networks.

Loangraphs : in which a character is used, either intentionally or accidentally, for some entirely different purpose.
 * Example: is not attested in formal writing prior to the Tang dynasty, and was created from the leftmost component of the more ancient character . The ancient character  meaning 'elder brother' continues to be used in idioms and formal writing, whereas  is used in daily conversation in most Chinese dialects. Some dialects such as Minnan which retain features of spoken Old Chinese continue to use  exclusively for 'elder brother' in daily conversation.



In contrast to the popular conception of written Chinese as ideographic, the vast majority of characters—about 95% of those in the Shuowen Jiezi—either reflect elements of pronunciation, or are logical aggregates. In fact, some phonetic complexes were originally simple pictographs that were later augmented by the addition of a semantic root. An example is, now archaic, which was originally a pictograph of a lamp stand , a character that is now pronounced and means 'host', or the character  was added to indicate that the meaning is fire related.

Chinese characters are written to fit into a square, even when composed of two simpler forms written side-by-side or top-to-bottom. In such cases, each form is compressed to fit the entire character into a square.

Strokes
Character components can be further subdivided into individual written strokes. The strokes of Chinese characters fall into eight main categories: "horizontal" $⟨⟩$, "vertical" $⟨⟩$, "left-falling" $⟨⟩$, "right-falling" $⟨⟩$, "rising", "dot" $⟨⟩$, "hook" $⟨⟩$, and "turning" $⟨⟩$, $⟨⟩$, $⟨⟩$.

There are eight basic rules of stroke order in writing a Chinese character, which apply only generally and are sometimes violated:


 * 1) Horizontal strokes are written before vertical ones.
 * 2) Left-falling strokes are written before right-falling ones.
 * 3) Characters are written from top to bottom.
 * 4) Characters are written from left to right.
 * 5) If a character is framed from above, the frame is written first.
 * 6) If a character is framed from below, the frame is written last.
 * 7) Frames are closed last.
 * 8) In a symmetrical character, the middle is drawn first, then the sides.

Layout


As characters are essentially rectilinear and are not joined with one another, written Chinese does not require a set orientation. Chinese texts were traditionally written in columns from top to bottom, which were laid out from right to left. Prior to the 20th century, Literary Chinese used little to no punctuation, with the breaks between sentences and phrases determined largely by context and the rhythms implied by patterns of syllables.

In the 20th century, the layout used in Western scripts—where text is written in rows from left to right, which are laid out from top to bottom—became predominant in mainland China, where it was mandated by the Chinese government in 1955. Vertical layouts are still used for aesthetic effect, or when space limitations require it, such as on signage or book spines. The government of Taiwan followed suit in 2004 for official documents, but vertical layouts have persisted in some books and newspapers.

Less frequently, Chinese is written in rows from right to left, usually on signage or banners, though a left to right orientation remains more common.

The use of punctuation has also become more common. In general, punctuation occupies the width of a full character, such that text remains visually well-aligned in a grid. Punctuation used in simplified Chinese shows clear influence from that used in Western scripts, though some marks are particular to Asian languages. For example, there are double and single quotation marks (『 』 and 「 」), and a hollow full stop (. ), which is used to separate sentences in an identical manner to a Western full stop. A special mark called an enumeration comma (、) is used to separate items in a list, as opposed to the clauses in a sentence.

History
Written Chinese is one of the oldest continuously used writing systems. The earliest examples universally accepted as Chinese writing are the oracle bone inscriptions made during the reign of the Shang king Wu Ding (c. 1250). These inscriptions were made primarily on ox scapulae and turtle shells in order to record the results of divinations conducted by the Shang royal family. Characters posing a question were first carved into the bones. The question's answer was then divined by heating the bones over a fire and interpreting the resulting cracks that formed. The interpretation was then carved into the same oracle bone.

In 2003, 11 isolated symbols carved on tortoise shells were found at the Jiahu archaeological site in Henan—with some bearing a striking resemblance to certain modern characters, such as. The Jiahu site dates from c. 6600 BCE, predating the earliest attested Chinese writing by more than 5,000 years. Garman Harbottle, who had headed a team of archaeologists at the University of Science and Technology of China in Anhui—has suggested that these symbols were precursors to Chinese writing. However, the palaeographer David Keightley argues instead that the time gap is too great to establish any connection. From the Late Shang period (c. 1250), Chinese writing evolved into the form found in cast inscriptions on ritual bronzes made during the Western Zhou dynasty (c. 1046 – 771 BCE) and the Spring and Autumn period (771–476 BCE), a form of writing called bronze script. Bronze script characters are less angular than their oracle bone script counterparts. The script became increasingly regularized during the Warring States period (475–221 BCE), settling into what is called, that Xu Shen used as source material in the Shuowen Jiezi. These characters were later embellished and stylized to yield the seal script, which represents the oldest form of Chinese characters still in modern use. They are used principally for signature seals, or chops, which are often used in place of a signature for Chinese documents and artwork. Li Si promulgated the seal script as the standard throughout China, which had been recently united under the imperial Qin dynasty (221–206 BCE).

The initial adaptation of seal into clerical script can be attributed to scribes in the state of Qin working prior to the wars of unification. Clerical script forms generally have a "flat" appearance, being wider than their seal script equivalents. In the semi-cursive script that evolved from clerical script, character elements begin to run into each other, though the characters themselves generally remain discrete. This is contrasted with fully cursive script, where characters are often rendered unrecognizable by their canonical forms. Regular script is the most widely recognized script, and was considerably influenced by semi-cursive. In regular script, each stroke of each character is clearly drawn out from the others.

Regular script is considered the archetypal Chinese writing and forms the basis for most printed forms. In addition, regular script imposes a stroke order, which must be followed in order for the characters to be written correctly. Strictly speaking, this stroke order applies to the clerical, running, and grass scripts as well, but especially in the running and grass scripts, this order is occasionally deviated from. Thus, for instance, the character must be written starting with the horizontal stroke, drawn from left to right; next, the vertical stroke, from top to bottom; next, the left diagonal stroke, from top to bottom; and lastly the right diagonal stroke, from top to bottom.

Simplification and standardization
Beginning in the mid-20th century, Chinese has primarily been written using either simplified or traditional character forms. Simplified characters, which merge some character forms and reduce the average stroke count per character, were developed by the Chinese government with the stated goal of increasing literacy among the population. During this time, literacy rates did increase rapidly, but some observers instead attribute this to other education reforms and a general increase in the standard of living. Little systematic research has been conducted to support the conclusion that the use of simplified characters has affected literacy rates; studies conducted in China have instead focused on arbitrary statistics, such as quantifying the number of strokes saved on average in a given text sample. Simplified characters are standard in mainland China, Singapore and Malaysia, while traditional characters are standard in Hong Kong, Macau, Taiwan and some overseas Chinese communities.

Simplified forms have also been characterized as being inconsistent. For instance, the traditional is simplified to, in which the phonetic on the right side is reduced from 17 strokes to 3, and the  radical on the left also being simplified. However, the same phonetic component is not reduced in simplified characters such as and —these characters are relatively uncommon, and would therefore represent a negligible stroke reduction. Other simplified forms derive from long-standing calligraphic abbreviations, as with, which has the traditional form of.

Function
Chinese characters have always been used to represent individual spoken syllables. While writing was being invented in the Yellow River valley, words in spoken Chinese were largely monosyllabic, and each written character corresponded to a monosyllabic word. Spoken Chinese varieties have since acquired much more polysyllabic vocabulary, usually compound words composed of morphemes corresponding to older monosyllabic words

For over two thousand years, the predominant form of written Chinese was Literary Chinese, which had vocabulary and syntax rooted in the language of the Chinese classics, as spoken around the time of Confucius (c. 500 BCE). Over time, Literary Chinese acquired some elements of grammar and vocabulary from various varieties of vernacular Chinese that had since diverged. By the 20th century, Literary Chinese was distinctly different from any spoken vernacular, and had to be learned separately. Once learned, it was a common medium for communication between people speaking different dialects, many of which were mutually unintelligible by the end of the first millennium CE.

Varieties of Chinese vary in pronunciation, and to a lesser extent in vocabulary and grammar. Modern written Chinese, which replaced Classical Chinese as the written standard as an indirect result of the 1919 May Fourth Movement, is not technically bound to any single variety; however, it most nearly represents the vocabulary and syntax of Mandarin, by far the most widespread Chinese dialectal family in terms of both geographical area and number of speakers. This form is known as written vernacular Chinese. While some written vernacular Chinese expressions are often ungrammatical or unidiomatic outside of Mandarin, its use permits some communication between speakers of different dialects. This function may be considered analogous to that of linguae francae, such as Latin. For literate speakers, it serves as a common medium; however, the forms of individual characters generally provide little insight to their meaning if not already known. Ghil'ad Zuckermann's exploration of phono-semantic matching in Standard Chinese concludes that the Chinese writing system is multifunctional, conveying both semantic and phonetic content.

The variation in vocabulary among varieties has also led to informal use of "dialectal characters", which may include characters previously used in Literary Chinese that are considered archaic in written Standard Chinese. Cantonese is unique among non-Mandarin regional languages in having a written colloquial standard, used in Hong Kong and overseas, with a large number of unofficial characters for words particular to this language. Written Cantonese has become quite popular on the Internet, while Standard Chinese is still normally used in formal written communications. To a lesser degree, Hokkien is used similarly in Taiwan and elsewhere, though it lacks the level of standardization seen in Cantonese. However, Taiwan's Ministry of Education has promulgated a standard character set for Hokkien, which is taught in schools and encouraged for use by the general population.

Media
Over the history of written Chinese, a variety of media have been used for writing. They include:
 * Bamboo and wooden slips, from at least the 13th century BCE
 * Paper, invented no later than the 2nd century BCE
 * Silk, since at least the Han dynasty
 * Stone, metal, wood, bamboo, plastic and ivory on seals.

Since at least the Han dynasty, such media have been used to create hanging scrolls and handscrolls.

Literacy
Because the majority of modern Chinese words contain more than one character, there are at least two measuring sticks for Chinese literacy: the number of characters known, and the number of words known. John DeFrancis, in the introduction to his Advanced Chinese Reader, estimates that a typical Chinese college graduate recognizes 4,000 to 5,000 characters, and 40,000 to 60,000 words. Jerry Norman, in Chinese, places the number of characters somewhat lower, at 3,000 to 4,000. These counts are complicated by the tangled development of Chinese characters. In many cases, a single character came to have multiple variants. This development was restrained to an extent by the standardization of the seal script during the Qin dynasty, but soon started again. Although the Shuowen Jiezi lists 10,516 characters—9,353 of them unique (some of which may already have been out of use by the time it was compiled) plus 1,163 graphic variants—the Jiyun of the Northern Song dynasty, compiled less than a thousand years later in 1039, contains 53,525 characters, most of them graphic variants.

Dictionaries
Written Chinese is not based on an alphabet or syllabary, so Chinese dictionaries, as well as dictionaries that define Chinese characters in other languages, cannot easily be alphabetized or otherwise lexically ordered, as English dictionaries are. The need to arrange Chinese characters in order to permit efficient lookup has given rise to a considerable variety of ways to organize and index the characters.

A traditional mechanism is the method of radicals, which uses a set of character roots. These roots, or radicals, generally but imperfectly align with the parts used to compose characters by means of logical aggregation and phonetic complex. A canonical set of 214 radicals was developed during the rule of the Kangxi Emperor (around the year 1700); these are sometimes called the Kangxi radicals. The radicals are ordered first by stroke count (that is, the number of strokes required to write the radical); within a given stroke count, the radicals also have a prescribed order.

Every Chinese character falls (sometimes arbitrarily or incorrectly) under the heading of exactly one of these 214 radicals. In many cases, the radicals are themselves characters, which naturally come first under their own heading. All other characters under a given radical are ordered by the stroke count of the character. Usually, however, there are still many characters with a given stroke count under a given radical. At this point, characters are not given in any recognizable order; the user must locate the character by going through all the characters with that stroke count, typically listed for convenience at the top of the page on which they occur.

Because the method of radicals is applied only to the written character, one need not know how to pronounce a character before looking it up; the entry, once located, usually gives the pronunciation. However, it is not always easy to identify which of the various roots of a character is the proper radical. Accordingly, dictionaries often include a list of hard to locate characters, indexed by total stroke count, near the beginning of the dictionary. Some dictionaries include almost one-seventh of all characters in this list. Alternatively, some dictionaries list "difficult" characters under more than one radical, with all but one of those entries redirecting the reader to the "canonical" location of the character according to Kangxi.

Other methods of organization exist, often in an attempt to address the shortcomings of the radical method, but are less common. For instance, it is common for a dictionary ordered principally by the Kangxi radicals to have an auxiliary index by pronunciation, expressed typically in either pinyin or bopomofo. This index points to the page in the main dictionary where the desired character can be found. Other methods use only the structure of the characters, such as the four-corner method, in which characters are indexed according to the kinds of strokes located nearest the four corners (hence the name of the method), or the Cangjie method, in which characters are broken down into a set of 24 basic components. Neither the four-corner method nor the Cangjie method requires the user to identify the proper radical, although many strokes or components have alternate forms, which must be memorized in order to use these methods effectively.

The availability of computerized Chinese dictionaries now makes it possible to look characters up by any of the indexing schemes described, thereby shortening the search process.

Transliteration
Chinese characters do not reliably indicate their pronunciation. Therefore, many transliteration systems have been developed to write the sounds of different varieties of Chinese. While many use the Latin alphabet, systems using the Cyrillic and Perso-Arabic alphabets have also been designed. Among other purposes, these systems are used by students learning the corresponding varieties. The replacement of Chinese characters with a phonetic writing system was first prominently proposed during the May Fourth Movement, partly motivated by a desire to increase the country's literacy rate. The idea gained further support following the victory of the Communists in 1949, who immediately began two parallel programs regarding written Chinese. The first was the development of an alphabet to write the sounds of Mandarin, the variety spoken by around two-thirds of the Chinese population. The other program investigated the simplification of the standard character forms. Initially, character simplification was not competing with the idea of a phonetic script; rather, simplification was intended to make the transition to a fully phonetic writing system easier.

By 1958, official priorities had shifted towards character simplification. The Hanyu Pinyin (or simply 'pinyin') alphabet had been developed, but plans to replace Chinese characters with it were deferred, and the idea is no longer actively pursued. This change in priorities may have been due in part to pinyin's design being specific to Mandarin, to the exclusion of other dialects.

Pinyin uses the Latin alphabet with diacritics to represent the phonology of Standard Chinese. For the most part, pinyin uses phonetic values for letters that reflect their existing pronunciations in Romance languages and the International Phonetic Alphabet (IPA). However, pairs of letters such as and  that correspond to a voicing distinction in languages such as French instead represent the aspiration distinction that is more abundant in Mandarin. Pinyin also uses several consonantal letters to represent markedly different sounds from their assignments in other languages. For example, pinyin and  correspond to sounds similar to English ch and sh, respectively. While pinyin has become the predominant transliteration system for Mandarin, others include bopomofo, Wade–Giles, Yale, EFEO and Gwoyeu Romatzyh.