Chinese character internal structures

Chinese character forms studies the external structure of Chinese characters, i.e. strokes, components and whole characters and their structural relations on the pure dimension of forms or appearances. The internal structure of Chinese characters (Pinyin: hànzì nèibù jiégòu; Traditional Chinese: 漢字的內部結構; Simplified Chinese: 汉字内部结构) studies the relationship between the forms, sounds and meanings of the characters, thereby explaining the rationale for character formation.

In the analysis of internal structures, Chinese characters are decomposed into internal structural components in relations with the sound and meaning of the character.

Internal structural components
The character-building units obtained by analyzing the external structure of Chinese characters are external structural components. In internal structures, Chinese characters are analyzed according to the rationale of character formation, and the basic unit of character formation are internal structural components, or internal components in short, also called pianpang (偏旁） or characters (字符).

In most cases, the components of internal structure of a Chinese character is similar to the first-level components in external structure, for example, character 江 is decomposed into 氵 and 工 in both analyses. However, they are not always the same. For example, character "腾" is decomposed according to the internal structure as "semantically related to '马' and phonetically related to '朕'", in a semi-surrounding structure; while the external analysis then simply split it according to the left-right structure. The external structure splitting method is used only when the internal structure analysis cannot be decomposed according to the character formation rational.

According to their sound-meaning relationship with the whole character, internal components can be classified into three categories: semantic component （義符, 义符, 意符, 意旁 or 形旁）, phonetic component (音符, 音旁 or 聲旁) and pure (form) component (記號, 符號).


 * 1) Any component related to the meaning of the character is a semantic component. For example: component "扌" (hand) in characters "推"(push) and "拉" (pull), and "心" (heart) in "思" (think) and "想" (think).
 * 2) A component related to the pronunciation of the character is a phonetic component. For example, "包" (bāo) in "抱" (bào) and "苞" (bāo).
 * 3) A pure (form) component is neither related to the meaning nor to the pronunciation of the character. For example: "多" (duō) in "移" (yí, move), and "立" (lì, stand) in 拉 (lā, pull)".

Traditional internal structural classification
In Shuowen Jiezi, Xu Shen proposed the six categories (六書; liùshū; 'Six Writings') for the formation of Chinese characters, including
 * Pictograms (象形; xiàngxíng; 'form imitation') – A pictographic character consists of one semantic component which is a drawing of the object it represents, such as: 日 (sun) and 月 (moon). When created, character 日 was a simplified picture of the sun and 月 was like the moon.
 * Simple ideograms (指事; zhǐshì; 'indication') express an abstract idea with an iconic form, such as: 一 (one), 二 (two), 三 (three), 上 (up) and 下 (down). The whole character is a semantic component.
 * Compound ideographs (會意; huìyì; 'joined meaning'), are compounds of two or more semantic component to suggest the meaning of the character, for example: 武 ('military', formed from 戈 (dagger-axe) and 止 (foot)) and 信 ('truthful', formed from 人 (person, later reduced to 亻) and 言 (speech)).
 * Phono-semantic compound characters (形声; 形聲; xíngshēng; 'form and sound'); A phono-semantic character consists of a phonetic component and a semantic component, for example, 江 (river, semantic 氵, phonetic 工), 河 (river, semantic 氵， phonetic: 可).
 * Derivative cognates (轉注/转注; zhuǎnzhù; 'reciprocal meaning') is the smallest category and also the least understood. In the postface to the Shuowen Jiezi, Xu Shen gave as an example the characters 考 (kǎo, verify) and 老 (lǎo, old), which had similar Old Chinese pronunciations and may have had the same etymological root, meaning "elderly person", but became lexicalized into two separate words.
 * Rebus (phonetic loan) characters (假借; jiǎjiè; borrowing, making use of) are characters that are "borrowed" to write another morpheme which is pronounced the same or nearly the same. For example, the character 令 (order) and 長 (long), These two characters originally were official titles. The whole loan character is a phonetic component.

Modern internal structural classification
The traditional Six Writings classification presupposed that each component in a Chinese character can either represent the sound or meaning of the character. But, after the long evolution of the Chinese writing system, quite a few of components can no longer effectively play the roles. For example, component 又 in character 邓 and 鸡 can neither represent sound nor meaning, and has become pure form component.

From the internal structure point of view, modern Chinese characters are composed of semantic components, phonetic components and pure form components. These three types of components are used in combination to form the seven structures of modern Chinese characters: semantic component characters, phonetic component characters, pure form component characters, semantic-phonetic characters, semantic-form characters, phonetic-form characters, and semantic-phonetic-form characters.

Semantic component characters
Semantic component characters, or simply semantic characters, are composed of semantic components.

Single semantic component characters
Single semantic component characters are composed of one semantic component, and most of them correspond to pictograms and simple ideograms in the traditional six writings. For example:
 * 田 (field), 井 (well), 門 (door), 网 (net) are ancient pictograms, and 门 (door), 伞(umbrella) are modern pictograms.
 * 一 (one), 二 (two), 三 (three), 刃 (blade) are ancient simple ideograms, and 丫(branch, fork), 凹 (concave), 凸 (convex), and 串 (string) are later simple ideograms.

Multi-semantic component characters
Multi-semantic-component characters are composed of two or more semantic components. They include the compound ideographs in the traditional Six writings. For example,

Most multi-semantic component characters contain two semantic components, for example,
 * 信 (trust): semantic components 人 (people) and 言 (words), trust what people say.
 * 尖 (pointed, sharp, tip): 小 (small) at the top and 大 (large) at the bottom.
 * 拿 (take): 合 (close) your 手 (hands) together to take.

Some characters composed of three semantic components, for example,


 * 掰 (break apart): Separate (分) something with both hands (手).
 * 晶 (brilliant, crystal): three 日 (suns) are very bright.

Some characters repeat the same semantic components, for example,


 * 从 (follow): Indicates that one 人 (person) follows another person.
 * 炎 (flame): Two 火 (fire) represents rising flames.
 * 森 (large forest): Three 木 (trees) means there are many trees.

Some are simplified characters:


 * 尘 (dust): 小 (small) 土 (soil) particles, representing dust.
 * 灭 (extinguish): Use 一 (like a cover) to suppress 火 (fire).
 * 泪 (tears): 氵(water) from 目 (eyes).

There are some special cases

叵 (can not): turn 可 (can) to the opposite (right) side. (Shuowen) 乌 /烏 (crow, a pure black bird), 鳥/鸟 cannot see its eyes. (Note by Duan Yucai) 冇 (none, not have): 有 (have) taken away "二" (content).

Phonetic component characters
A phonetic component character, or shortly phonetic character, is composed of one or more phonetic components.

A single phonetic component character may be used to express an phonetic-loan meaning while its original or basic meaning is still understood by people. For example: The pronunciation of character "花" meaning "spending" is the same as that of the "花" which means "flower" in its original meaning. The latter can be regarded as the phonetic component of the former. A single phonetic component character can also represent a syllable in a transliterated foreign word, for example, the characters in words "打" (dá, dozen) and "馬達" (mǎdá, motor).

Multi-phonetic component characters were produced during the development of writing systems. For example:

"新" (xīn) was originally a semantic-phonetic character, but its modern meaning of "new" has nothing to do with the original semantic component of "斤" (jīn, 0.5 kg), but the sound are similar. In this way, "新" (xīn) then has two phonetic components: "亲" (qīn) and "斤" (jīn).

"耻” (chǐ, shame) used to be written as 恥 which is a semantic-phonetic character. The semantic component 心 (heart) has become 止 (zhǐ, stop), "耻” (chǐ) then has two phonetic components, "耳" (ěr) and 止 (zhǐ).


 * 乒乓(pīngpāng, ping pong), both forms and sounds of the two characters are derived from 兵 (bīng, soldier) with similar sounds.

Pure form component characters
A pure form character is composed of one or more form components, which neither represent the sound nor the meaning of the characters.

Single-component characters
These characters are composed of single pure form components. Many of them were originally ancient pictographic characters, but due to the evolution of the glyphs, they no longer look like the object represented. For example: After tracing to the origin of this kind of characters, it is easy to associate them with the things they represent and obtain the correct meanings.
 * 日: The 日 character in modern regular script is no longer of round shape.
 * 月: It has become a ladder shape.
 * 魚: Not quite like a fish now.

Some characters with single form components are borrowed characters from ancient times. For example:
 * 我: In oracle it is like a weapon with a blade shaped like a saw, and was later used as a first-person pronoun. And in modern Chinese, the original meaning is lost.
 * 方: "Shuowen Jiezi" believes that the original meaning is a kind of boat, which has been borrowed to express the shape of "square".
 * 而: The ancient character was like a beard, now has been borrowed to be a conjunction in modern Chinese.

Some combined characters have been simplified and become single form component characters.
 * 广: The traditional Chinese character is "廣".
 * 农: Traditional character "農".
 * 书: Traditional character "書".
 * 专: Traditional character "專".
 * 门: Traditional character "門".

Multi-component characters
These characters consist of two or more pure form components. Some of these characters came from ancient pictographic characters, but later became non-pictographic. For example,
 * 角(horn): This character in oracle bone script looks like an ox horn
 * 鼎(tripod): The oracle bone inscriptions are in the shape of a tripod.
 * 鹿 (deer): Oracle resembled a deer.

Some came from ancient semantic-phonetic characters, and the semantic and phonetic components of these characters have lost their functions. For example:
 * 騙 (piàn)： It originally meant to jump on the horse. Now means deception, and semantic component 馬 (horse) and phonetic 扁 (biǎn) have become pure forms.
 * 特: semantic 牛 and sound 寺, it originally refed to bull, but now it means "special" and "unusual", and both components are pure forms.
 * 穌：semantic 禾 (crop) and sound 魚 (yu2, fish), Duan Yucai's note in Shuowen says: "If the grain is scattered, pick it up with a loaf." The character now expresses the meaning of awakening, or is used in a person's name.

Some are simplified characters. For example:
 * 头 (head): The traditional Chinese character is “頭”, semantic component 頁 and phonetic component 豆. The simplified character component 大 and the two dots are pure form components.

Some are from ancient ideographic characters. For example:
 * 射 (shooting): The word "射" in oracle bone and bronze inscriptions is like pulling a bow and shooting, now neither 身 nor 寸 can express the sound or meaning of 射.
 * 至 (to): in oracle bones is like an arrow shooting to the ground. According to the current glyph, the original meaning can no longer be seen, let alone the modern meaning of the word.

Semantic-phonetic characters
Semantic-phonetic characters (also called "phono-semantic characters", 意音字, 形聲字) consist of semantic components and phonetic components. The semantic component indicates the category of word meaning, and the phonetic component indicates (or prompts) the pronunciation of the character.

The phonetic components of some semantic-phonetic characters are of exactly the same pronunciation as the whole character. For example,


 * 搬 (bān, move): 般 (bān).
 * 銅 (tóng, copper): 同 (tóng).
 * 辯 (biàn, debate): 辡 (biàn).

The sounds of the character and its phonetic component are the same except in tones. For example,
 * 巍 (wēi, tall): 魏 (wèi).
 * 拥 (yōng): 用 (yòng).
 * 帳 (zhàng, account): 長 (zhǎng).

According to the experiment by Li (1993), among the 7,000 characters in the "Modern Chinese Common Character List", 5,631 are of semantic-phonetic structures. Considering that there are 479 polyphonic characters, the number of semantic-phonetic structures increases accordingly to 6,110. Among them, there are 2292 items with characters and components of the same pronunciations and tones, accounting for 37.51%. There are 1110 items with characters and components of the same pronunciations but different tones, accounting for 18.17%

The phonetic components of some characters are also semantic components. For example:
 * 娶 (qǔ, marry (a wife)): semantic 女 (female), phonetic 取 (qǔ), 取(take) also express the meaning.
 * 駟 (sì, four horses of a cart): semantic 馬 (horse), phonetic 四 (sì), 四 (four) also represents meaning.
 * 懈 (xiè, slack): 解 (xiè, scattered) represents sound and meaning.

Some phonetic or semantic components have some parts omitted. For example: or semantic 老, phonetic 旨 with upper part omitted.
 * 珊 (shān, coral): 冊 is 删 (shān) with the right part omitted.
 * 氮 (dàn, nitrogen): 炎 is 淡 (dàn) with the left part omitted.
 * 夜 (yè, night): semantic 夕, the rest is 亦 (yì) with some strokes omitted.
 * 耆(qí, senior over sixty years old), semantic 老 with lower part omitted, phonetic 旨

There are six combinations of semantic components and phonetic components:


 * Left meaning (semantic) and right sound (phonetic), such as 肝 (gān, liver), 惊 (jīng, fear), 湖 (hú, lake);
 * Right meaning and left sound, such as 鵡 (wǔ, parrot), 剛 (gāng, firm), 甥 (shēng, nephew);
 * Upper meaning and the lower sound: 霖 (lín, rain), 茅 (máo, grass) and 竿 (gān, pole);
 * Lower meaning and upper sound: 盂 (yú， bowl), 岱 (dài, Mount Tai), 鯊 (shā, shark);
 * Outer meaning and inner sounds: 癢 (yǎng, itch), 園 (yuán, garden), 衷 (zhōng, heart), 座 (zuò, seat), 旗 (qí, flag);
 * Inner meaning and outer sound: 辮 (biàn, braid), 悶 (mèn, dull), 摹 (mó, imitation).

Modern character-making mainly inherits traditional character-making methods, but there are also innovations, such as combining the sounds and forms of two characters. For example,


 * 甭 (béng, no need), meaning 不用 (bùyòn, no need), and sound derived from “bùyòng”.
 * 巰 (qiú, compound of hydrogen and sulfur), 氫 (qīng, hydrogen) + 硫 (liú, sulfur), with parts omitted.

Semantic-phonetic characters account for more than 90% in ancient Chinese characters. According to statistics, among the 7,000 modern common characters of the simplified Chinese character writing system, semantic-phonetic characters account for only 56.7%. The traditional Chinese character system is slightly higher.

Semantic-form characters
Semantic-form characters are composed of semantic components and pure form components. They are also called semi-semantic components and semi-pure form components.

Many of these characters were originally semantic-phonetic characters. Due to subsequent changes in the pronunciation of the phonetic components or of the characters, the phonetic components could not effectively represent the pronunciation of the character and became pure form components. For example:


 * 布 (bù, cloth): used to have semantic (component) 巾 (scarf) and phonetic 父 (fù), the phonetic component is no longer 父.
 * 江 (river): used to have semantic 水 and phonetic 工, now in the Mandarin 工 (gōng) does not pronounce 江 (jiāng).
 * 急 (jí, urgent): used to have semantic 心 (heart) and sound 及 (jí). Now the upper component no longer looks like 及.

Due to the simplification of Chinese characters, some phonetic components are no longer effective. For example:


 * 灿(càn, brilliant), not read as 山 (shān).
 * 鸡(jī, chicken), not read as 又 (yòu).
 * 环 (huán, ring), not read as 不 (bù).

Some are modified from ancient pictographic characters. For example:
 * 栗 (lì, chestnut): The upper part of the ancient Chinese character resembles the fruit on a chestnut tree. Now 覀 is a pure form component.
 * 泉 (quán, spring): The oracle character looks like water flowing out of a cave. Now it has become components 白 (bái, white) and 水 (shuǐ, water). 白 is a pure form component.
 * 桑 Mulberry: in oracle script the upper part of 桑 resembles lush branches and leaves. The current 叒 is a pure form component.

Phonetic-form characters
Phonetic-form characters are composed of phonetic components and pure form components. This type of characters mainly comes from ancient semantic-phonetic characters, and the semantic components lost their semantic roles and became pure form components. For example,
 * 球 (qiú, ball)： Originally refers to a kind of beautiful jade, with semantic component 王(玉, jade). Later, it was borrowed to represent a ball, and then extended to a round three-dimensional object, and 王(jade) became a pure form component. And 求 (qiú) remains a phonetic component.
 * 笨 (bèn, stupid): Originally refers to the inner white layer of bamboo, with semantic component 竹 and phonetic 本. Later, the character was borrowed by sound to mean stupid.
 * 华：This is a simplified character with phonetic 化, and 十 is a pure form component.

Semantic-phonetic-form characters
A semantic-phonetic-form character consists of all three kinds of components: semantic, phonetic and pure form components. For example, Semantic-phonetic-form characters are very rare and the examples above are not quite persuasive. Whether it can be justified as an internal structural category remains to be further studied. (If not, the classification above can also be called the "New Six Writings")
 * 岸 (àn, bank, shore), originally had semantic component ⿱山厂 and phonetic 干 (gàn). In modern Chinese, ⿱山厂 is not a character or radical with a sound or meaning, but 山 can still express meaning, while 厂 remains a pure form component.
 * 聽 (tīng, listen), semantic 耳 (ear) and phonetic 壬 (ting3). In modern Chinese characters, the right part has become a pure form component.

Statistics on the internal structures of modern Chinese characters
According to Yang, among the 3500 frequently used Chinese characters of his experiment, semantic component characters are the least, accounting for about 5%; pure form component characters account for about 18%; semantic-form and phonetic-form characters account for about 19%. The largest group is semantic-phonetic characters, accounting for about 58%.

The rationality of characters
Using texts to record a language is to establish a fixed connection between text symbols and language words. If this connection is arbitrary, it is irrational; if there is a reason for it, it is rational.

English words are mainly a phonetic text, and their rationality mainly lies in using a combination of letters to represent the pronunciation of the corresponding word. Chinese characters are phonetic and semantic characters, and their rationality is mainly reflected in the use of phonetic components to express sounds and semantic components to express meanings.

Generally speaking, words with higher rationality are easier to learn and use. Because the unreasonable parts often require rote memorization. There are thousands of modern Chinese characters, and it is unrealistic to require every character to have high rationality.

The phonetic components and semantic components are related to the pronunciation and meaning of the character, so they are reasonable; the pure form components are not related to the pronunciation or meaning of the character, and they are irrational. Therefore, semantic component characters, phonetic component characters, semantic-phonetic characters, pure form component characters are irrational characters. And the other characters are semi-rational characters.

Su defined the rationality of a character set as the proportion of rational components in all the internal components.

The formula is: Rationality = (actual rationality value)/(maximum rationality value).

Professor Su's preliminary experiment results showed the rationality of modern Chinese characters to be about 50%, which is far lower than that of ancient Chinese characters.

Rational characters are easier to learn and often arouse students' interest. In modern Chinese teaching, in order to enhance the rationality of Chinese characters, the traceability method is often used. For example: "日 (sun), 月 (moon), 山 (mountain), 水 (water), 牛 (cow), 羊 (sheep), 网 (net), 木 (wood), 目 (eye), 門 (door) and 刀 (knife)” are all pictographic characters from the etymology point of view. If the teacher makes some etymology analysis with the evolution of glyphs, he/she may achieve twice the result with half the effort.

The traceability analysis mentioned here is only for the convenience of teaching, rather than a comprehensive analysis of the origin and evolution of Chinese characters. There is no need to trace the origin of characters that can be explained from the current situation; only for those characters where the rationale cannot be seen from the current situation and it is easy to trace the origin to explain the rationale, this method can be employed.