Written Cantonese

Written Cantonese is the most complete written form of a Chinese language after that for Mandarin Chinese and Classical Chinese. Written Chinese was the main literary language of China until the 19th century. Written vernacular Chinese first appeared in the 17th century, and a written form of Mandarin became standard throughout China in the early 20th century. Cantonese is a common language in places like Hong Kong and Macau. While the Mandarin form can to some extent be read and spoken word for word in other Chinese varieties, its intelligibility to non-Mandarin speakers is poor to incomprehensible because of differences in idioms, grammar and usage. Modern Cantonese speakers have therefore developed new characters for words that do not exist and have retained others that have been lost in standard Chinese.

With the advent of the computer and standardization of character sets specifically for Cantonese, many printed materials in predominantly Cantonese-speaking areas of the world are written to cater to their population with these written Cantonese characters.



Early history
Before the 20th century, the standard written language of China was Classical Chinese, with a grammar and vocabulary based on the Old Chinese of the Spring and Autumn period, of the 8th to the 5th century BCE. While this written standard remained essentially static for over two thousand years, the actual spoken language diverged ever further. The formation of Yue Chinese occurring among the Han population in the Pearl River Delta across many centuries, with the main linguistic influences being the Middle Chinese of the tenth century CE, corresponding to the end of the Tang dynasty, and that of the thirteenth century CE or late Song dynasty, as well as the Tai-Kadai substrate and some influence from pre-Tang Sinitic varieties.

The first Cantonese writings belong to a literary form specific to Canton, called mukjyusyu (木魚書, Jyutping:, Hanyu Pinyin: , literally 'wooden fish book'), that supposedly has its roots in Buddhist chants accompanied by wooden fish. Mukjyu texts were popular light reading, their primary audience were women, as female (and overall) literacy was unusually high in that region. The mukjyus were intended to be sung, similar to other genres such as naamyam, although without musical instruments.

The earliest known mukjyusyu work with elements of written Cantonese, Faazin Gei (花箋記, Jyutping:, Hanyu Pinyin: , literally "The Flowery Paper"), was composed by an unknown author during the late Ming dynasty; its oldest extant edition is dated to 1713. The Faazin Gei is an example of the "scholar and beauty" genre popular at the time, with its story set in Suzhou. Its text, while still being close to Literary Chinese, contains a lot of specific Cantonese wording and even Cantonese vernacular characters, especially in the dialogue sentences, but also in the narrative text. Other such renowned early works include Ji-Hofaa Si (二荷花史, Jyutping:, "The Two Lotus Flowers") and Gamso-Jyunjoeng Saanwusin Gei (金鎖鴛鴦珊瑚扇記, Jyutping: , "Coral Fan and Golden-lock Mandarin-ducks Pendant").

The naamyam (南音; Jyutping:, Hanyu Pinyin: , literally "southern songs"), a genre of song that flourished from the late Ming dynasty and frequently sung in Canton's brothels with accompanying string instruments, possessed language that was generally very literary, with only occasional instances of colloquial Cantonese words. The purpose of such inclusions is debated; they were likely added purely for rhythmic purposes. An example of such practice is Haaktou Cauhan (客途秋恨, Jyutping:, "The Traveler's Autumn Regrets") written in the first decade of 1800s, which is considered one of the most outstanding examples of the naamyam genre.

Written Cantonese vocabulary was used much more extensively in the lungzau (龍舟, Jyutping:, "Dragon boat") songs, performed mainly by beggars on the streets. These songs were considered the least prestigious genre and were rarely published, and then only after careful editing to make them less vernacular in style.

An important landmark in the history of written Cantonese was the publication of Jyut-au (粵謳, Jyutping:, Hanyu Pinyin: , literally: "Cantonese love songs") by Zhao Ziyong (招子庸, Jyutping: , Hanyu Pinyin: ) in 1828, marking the beginning of an extremely popular genre. Being an educated juren, Zhao Ziyong earned some prestige and respect for the previously rejected "heavy" vernacular literature.

Modern times
In the early 20th century, Chinese reformers like Hu Shih saw the need for language reform and championed the development of a vernacular that allowed modern Chinese to write the language the same way they speak. The vernacular language movement took hold, and the written language was standardized as vernacular Chinese. Mandarin was chosen as the basis for the new standard.

The standardization and adoption of written Mandarin preempted the development and standardization of vernaculars based on other varieties of Chinese. No matter which dialect one spoke, they still wrote in standardized Mandarin for everyday writing. However, Cantonese is unique amongst the non-Mandarin varieties in having a widely used written form. Cantonese-speaking Hong Kong used to be a British colony isolated from mainland China before 1997, so most HK citizens do not speak Mandarin. Written Cantonese has developed as a means of informal communication. Still, Cantonese speakers must use standard written Chinese, or even literary Chinese, in most formal written communications, since written Cantonese may be unintelligible to speakers of other varieties of Chinese.



By the 1920s, with the rise of fully written libretti for Cantonese opera, a well-recognised system had arisen for the use of written Cantonese. The theatrical art form became popularised further through the 1950s with the post-war Hong Kong film industry, during which one third of all cinema production was devoted to Cantonese opera. With the consistent use of on-screen subtitles, the film-going audiences regularly encountered written Cantonese at the cinema, as well as on the backs of phonograph records and later audiocassette and CD cases.

Historically, written Cantonese has been used in Hong Kong for legal proceedings in order to write down the exact spoken testimony of a witness, instead of paraphrasing spoken Cantonese into standard written Chinese. However, its popularity and usage has been rising in the last two decades, the late Wong Jim being one of the pioneers of its use as an effective written language. Written Cantonese has become quite popular in certain tabloids, online chat rooms, instant messaging, and even social networking websites; this would be even more evident since the rise of localism in Hong Kong from the 2010s, where the articles written by those localist media are written in Cantonese. Although most foreign movies and TV shows are subtitled in Standard Chinese, some, such as The Simpsons, are subtitled using written Cantonese. Newspapers have the news section written in Standard Chinese, but they may have editorials or columns that contain Cantonese discourses, and Cantonese characters are increasing in popularity on advertisements and billboards.

It has been stated that written Cantonese remains limited outside Hong Kong, including other Cantonese-speaking areas in Guangdong Province. However, colloquial Cantonese advertisements are sometimes seen in Guangdong, suggesting that written Cantonese is widely understood and is regarded favourably, at least in some contexts. Attitudes toward written Cantonese in Guangzhou have been found to be in general positive, though this was limited to the informal and casual domains of life, where the social value of written Cantonese as a marker of cultural solidarity is highest.

Some sources will use only colloquial Cantonese forms, resulting in text similar to natural speech. However, it is more common to use a mixture of colloquial forms and standard Chinese forms, some of which are alien to natural speech. Thus the resulting "hybrid" text lies on a continuum between two norms: standard Chinese and colloquial Cantonese as spoken. It has been found that female gender and a middle class-income are demographic factors that promote a clear separation between standard written Chinese and written Cantonese. On the other hand, men, and both blue-collar workers and college-educated high-income demographics, are factors that tend towards a convergence to standard written Chinese.

Early sources
A good source for well documented written Cantonese words can be found in the scripts for Cantonese opera. Readings in Cantonese colloquial: being selections from books in the Cantonese vernacular with free and literal translations of the Chinese character and romanized spelling (1894) by James Dyer Ball has a bibliography of printed works available in Cantonese characters in the last decade of the nineteenth century. A few libraries have collections of so-called "wooden fish books" written in Cantonese characters. Facsimiles and plot precis of a few of these have been published in Wolfram Eberhard's Cantonese Ballads. See also Cantonese love-songs, translated with introduction and notes by Cecil Clementi (1904) or a newer translation of these by Peter T. Morris in Cantonese love songs : an English translation of Jiu Ji-yung's Cantonese songs of the early 19th century (1992). Cantonese character versions of the Bible, Pilgrims Progress, and Peep of Day, as well as simple catechisms, were published by mission presses. The special Cantonese characters used in all of these were not standardized and show wide variation.

Characters today
Written Cantonese contains many characters not used in standard written Chinese in order to transcribe words not present in the standard lexicon, and for some words from Old Chinese when their original forms have been forgotten. Despite attempts by the government of Hong Kong in the 1990s to standardize this character set, culminating in the release of the Hong Kong Supplementary Character Set (HKSCS) for use in electronic communication, there is still significant disagreement about which characters are correct in written Cantonese, as many of the Cantonese words existed as descendants of Old Chinese words, but are being replaced by some new invented Cantonese words.

Vocabulary
General estimates of vocabulary differences between Cantonese and Mandarin range from 30 to 50 percent. Donald B. Snow, the author of Cantonese as Written Language: The Growth of a Written Chinese Vernacular, wrote that "It is difficult to quantify precisely how different" the two vocabularies are. Snow wrote that the different vocabulary systems are the main difference between written Mandarin and written Cantonese. Ouyang Shan made a corpus-based estimate concluding that one third of the lexical items used in regular Cantonese speech do not exist in Mandarin, but that between the formal registers the differences were smaller. He analyzed a radio news broadcast and concluded that of its lexical items, 10.6% were distinctly Cantonese. Here are examples of differing lexical items in a sentence:

The two Chinese sentences are grammatically identical, using an A-not-A question to ask "Is it theirs?" (referring to an aforementioned object). Though the characters correspond 1:1, the actual glyphs used are all different.

Cognates
There are certain words that share a common root with standard written Chinese words. However, because they have diverged in pronunciation, tone, and/or meaning, they are often written using a different character. One example is the doublet 來 loi4 (standard) and 嚟 lei4 (Cantonese), meaning "to come." Both share the same meaning and usage, but because the colloquial pronunciation differs from the literary pronunciation, they are represented using two different characters. Some people argue that representing the colloquial pronunciation with a different (and often extremely complex) character is superfluous, and would encourage using the same character for both forms since they are cognates (see Derived characters below).

Native words
Some Cantonese words have no equivalents in Mandarin, though equivalents may exist in classical or other varieties of Chinese. Cantonese writers have from time to time reinvented or borrowed a new character if they are not aware of the original one. For example, some suggest that the common word 靚 leng3, meaning pretty in Cantonese but also looking into the mirror in Mandarin, is in fact the character 令 ling3.

Today those characters can mainly be found in ancient rime dictionaries such as Guangyun. Some scholars have made some "archaeological" efforts to find out what the "original characters" are. Often, however, these efforts are of little use to the modern Cantonese writer, since the characters so discovered are not available in the standard character sets provided to computer users, and many have fallen out of usage.

In Southeast Asia, Cantonese people may adopt local Malay words into their daily speech, such as using the term 鐳 leoi1 to mean money rather than 錢 cin2, which would be used in Hong Kong.

Particles
Cantonese particles may be added to the end of a sentence or suffixed to verbs to indicate aspect. There are many such particles; here are a few.


 * 咩 – "me1" is placed at the end of a sentence to indicate disbelief, e.g. 乜你花名叫八兩金咩？ Is your nickname really Raymond Lam?
 * 呢 – "ne1" is placed at the end of a sentence to indicate a question, e.g. 你叫咩名呢？ What is your name?
 * 未 – "mei6" is placed at the end of a sentence to ask if an action is done yet, e.g. 你做完未？ Are you done yet?
 * 吓 – "haa5" is placed after a verb to indicate a little bit, e.g. 食吓 Eat a little bit; "haa2" is used singly to show uncertainty or unbelief, e.g. 吓？乜係咁㗎? What? Is that so?
 * 緊 – "gan2" is placed after a verb to indicate a progressive action, e.g. 我食緊蘋果. I'm eating an apple.
 * 咗 – "zo2" placed after a verb to indicate a completed action, e.g. 我食咗蘋果. I ate an apple.
 * 晒 – "saai3" placed after a verb to indicate an action to all of the targets, e.g. 我食晒啲蘋果. I ate all the apples.
 * 埋 – "maai4" is placed after a verb to indicate an expansion of the target of action, or that the action is an addition to the one(s) previously mentioned, e.g. 我食埋啲嘢就去. I'll go after I finish eating the rest. ("eating the rest" is an expansion of the target of action from the food eaten to the food not yet eaten); 你可以去先，我食埋嘢先去. You can go first. I'll eat before going. (The action "eating" is an addition to the action "going" which is previously mentioned or mutually known.)
 * 哇/嘩 – "waa1 / waa3" interjection of amazement, e.g. 嘩！好犀利呀！ Wow! That's amazing!
 * 㗎啦 – "gaa3 laa1" is used when the context seems to be commonplace, e.g., 個個都係咁㗎啦. Everyone is like that.
 * 啫嘛 – "ze1 maa3" translates as "just", e.g. 我做剩兩頁功課啫嘛. I just have two pages of homework left to do.

Loanwords
Some Cantonese loanwords are written in existing Chinese characters.

Cantonese character formation
Cantonese characters, as with regular Chinese characters, are formed in one of several ways:

Borrowings
Some characters already exist in standard Chinese, but are simply reborrowed into Cantonese with new meanings. Most of these tend to be archaic or rarely used characters. An example is the character 子, which means "child". The Cantonese word for child is represented by 仔(jai), which has the original meaning of "young animal".

Compound formation
The majority of characters used in Standard Chinese are phono-semantic compounds – characters formed by placing two radicals, one hinting as its meaning and one hinting its pronunciation. Written Cantonese continues this practice via putting the 'mouth' radical (口) next to a character pronounced similarly that indicates its pronunciation. As an example, the character 吓 uses the mouth radical with a 下, which means 'down', but the meaning has no relation to the meaning of 吓. (An exception is 咩 mē, which is not pronounced like 羊 (yèuhng, sheep) but was chosen to represent the sound sheep make.) The characters which are commonly used in Cantonese writing include:

There is evidence that the mouth radical in such characters can, over time, be replaced by a different one. For instance, 冧 (lām, "bud"), written with the determinative 冖 ("cover"), is instead written in older dictionaries as 啉, with the mouth radical.

Derived characters
Other common characters are unique to Cantonese or are different from their Mandarin usage, including: 乜, 冇, 仔, 佢, 佬, 俾, 靚 etc. The characters which are commonly used in Cantonese writing include:
 * 冇 mou5 (v. not have). Originally 無. Standard written Mandarin: 沒有
 * 係 hai6 (v. be). Standard written Mandarin: 是
 * 佢 keoi5 (pron. he/she/it). Originally 渠. Standard written Mandarin: 他, 她, 它, 牠, 祂
 * 乜 mat1 (pron. what) often followed by 嘢 to form 乜嘢. Originally 物也. Standard written Mandarin: 什麼
 * 仔 zai2 (n. son, child, small thing). Originally 子.
 * 佬 lou2 (n. guy, dude). Originally 獠.
 * 畀/俾 bei2 (v. give). Standard written Mandarin: 給
 * 靚 leng3 (adj. pretty, handsome). Standard written Mandarin: 漂亮
 * 晒/嗮/曬 saai3 (adv. completely; v. bask in sun)
 * 瞓 fan3 (v. sleep). Originally 困. Standard written Mandarin: 睏, 睡
 * 攞 lo2 (v. take, get). Standard written Mandarin: 拿
 * 拎 ling1 (v. take, get). Standard written Mandarin: 拿
 * 脷 lei6 (n. tongue). Standard written Mandarin: 舌
 * 癐/攰 gui6 (adj. tired). Standard written Mandarin: 累
 * 埞 deng6 (n. place) often followed by 方 to form 埞方. Standard written Mandarin: 地方

The words represented by these characters are sometimes cognates with pre-existing Chinese words. However, their colloquial Cantonese pronunciations have diverged from formal Cantonese pronunciations. For example, 無 ("without") is normally pronounced mou4 in literature. In spoken Cantonese, 冇 mou5 has the same usage, meaning, and pronunciation as 無, except for tone. 冇 represents the spoken Cantonese form of the word "without", while 無 represents the word used in Classical Chinese and Mandarin. However, 無 is still used in some instances in spoken Cantonese, such as 無論如何 ("no matter what happens"). Another example is the doublet 來/嚟, which means "come". 來 loi4 is used in literature; 嚟 lei4 is the spoken Cantonese form.

Workarounds
Though most Cantonese words can be found in the current encoding system, input workarounds are commonly used both by those unfamiliar with them, and by those whose input methods do not allow for easy input (similar to how some Russian speakers might write in the Latin script if their computing device lacks the ability to input Cyrillic). Some Cantonese writers use simple romanization (e.g., use D as 啲), symbols (add a Latin letter "o" in front of another Chinese character; e.g., 㗎 is defined in Unicode but will not display if not installed on the device in use, hence the proxy o架 is often used), homophones (e.g., use 果 as 嗰), and Chinese characters which have different meanings in Mandarin (e.g., 乜, 係, 俾; etc.) For example,