Chinese character strokes



Strokes are the smallest structural units making up written Chinese characters. In the act of writing, a stroke is defined as a movement of a writing instrument on a writing material surface, or the trace left on the surface from a discrete application of the writing implement. The modern sense of discretized strokes first came into being with the clerical script during the Han dynasty. In the regular script that emerged during the Tang dynasty—the most recent major style, highly studied for its aesthetics in East Asian calligraphy—individual strokes are discrete and highly regularized. By contrast, the ancient seal script has line terminals within characters that are often unclear, making them non-trivial to count.

Study and classification of strokes is useful for understanding Chinese character calligraphy, ensuring character legibility. identifying fundamental components of radicals, and implementing support for the writing system on computers.

Evolution
The terminals of the individual marks in ancient character forms are often unclear, and it is sometimes nontrivial to count them. The modern motion of discretized strokes did not fully emerge until clerical script:

Purpose
The study and classification of strokes is used for:
 * 1) understanding Chinese character calligraphy – the correct method of writing, shape formation and stroke order required for character legibility;
 * 2) understanding stroke changes according to the style that is in use;
 * 3) defining stroke naming and counting conventions;
 * 4) identifying fundamental components of Han radicals; and
 * 5) their use in computing.

Formation
When writing Han radicals, a single stroke includes all the motions necessary to produce a given part of a character before lifting the writing instrument from the writing surface; thus, a single stroke may have abrupt changes in direction within the line. For example:
 * Cjk k str v.svg (Vertical / shù) is classified as a basic stroke because it is a single stroke that forms a line moving in one direction.
 * Cjk k str vhv.svg (Vertical – Horizontal – Vertical / shù zhé zhé) is classified as a compound stroke because it is a single stroke that forms a line that includes one or more abrupt changes in direction. This example is a sequence of three basic strokes written without lifting the writing instrument such as the ink brush from the writing surface.

Direction
All strokes have direction. They are unidirectional and start from one entry point. As such, they are usually not written in the reverse direction by native users. Here are some examples:

Types
CJK strokes are an attempt to identify and classify all single-stroke components that can be used to write Han radicals. There are some thirty distinct types of strokes recognized in Chinese characters, some of which are compound strokes made from basic strokes. The compound strokes comprise more than one movement of the writing instrument, and many of these have no agreed-upon name.

Basic strokes
A basic stroke is a single calligraphic mark moving in one direction across a writing surface. The following table lists a selection of basic strokes divided into two stroke groups: simple and combining. "Simple strokes" (such as Horizontal / Héng and Dot / Diǎn) can be written alone. "Combining strokes" (such as Bend / Zhé and Hook / Gōu) never occur alone, but must be paired with at least one other stroke forming a compound stroke. Thus, they are not in themselves individual strokes.

Note, the basic stroke Diǎn "Dot" is rarely a real dot. Instead it usually takes the shape of a very small line pointing in one of several directions, and may be long enough to be confused with other strokes.

Compound strokes
A compound stroke (also called a complex stroke) is produced when two or more basic strokes are combined in a single stroke written without lifting the writing instrument from the writing surface. The character (pinyin: yǒng) "eternity", described in more detail in, demonstrates one of these compound strokes. The centre line is a compound stroke that combines three stroke shapes in a single stroke.

In most cases, concatenating basic strokes together form a compound stroke. For example, Vertical / Shù combined with Hook / Gōu produce (Vertical–Hook / Shù Gōu). A stroke naming convention sums the names of the basic strokes, in the writing order.
 * Basics for making compound strokes

An exception to this applies when a stroke makes a strictly right-angle turn in the Simplified Chinese names. Horizontal (Héng) and Vertical  (Shù) strokes are identified only once when they appear as the first stroke of a compound; any single stroke with successive 90° turns down or to the right are indicated by a Bend 折 (pinyin: zhé). For example, an initial Shù followed by an abrupt turn right produces (Shù Zhé). In the same way, an initial Shù followed by an abrupt turn right followed by a second turn down produces (Shù Zhé Zhé). However, their inherited names are "Vertical–Horizontal" and "Vertical–Horizontal–Vertical". We need not to use "Bend" in the inherited names.

Nearly all complex strokes can be named using this simple scheme.

Nomenclature
Organization systems used to describe and differentiate strokes may include the use of roman letters, Chinese characters, numbers, or a combination of these devices. Two methods of organizing CJK strokes are by:


 * Classification schemes that describe strokes by a naming convention or by conformity to a taxonomy; and
 * Categorization schemes that differentiate strokes by numeric or topical grouping.

In classification schemes, stroke forms are described, assigned a representative character or letterform, and may be arranged in a hierarchy. In categorization schemes, stroke forms are differentiated, sorted and grouped into like categories; categories may be topical, or assigned by a numeric or alpha-numeric nominal number according to a designed numbering scheme.

Benefits
Organizing strokes into a hierarchy aids a user's understanding by bringing order to an obtuse system of writing that has organically evolved over the period of centuries. In addition, the process of recognizing and describing stroke patterns promotes consistency of stroke formation and usage. When organized by naming convention, classification allows a user to find a stroke quickly in a large stroke collection, makes it easier to detect duplication, and conveys meaning when comparing relationships between strokes. When organized by numbering scheme, categorization aids a user in understanding stroke differences, and makes it easier to make predictions, inferences and decisions about a stroke.

Limitations
Strokes are described and differentiated using the criteria of visual qualities of a stroke. Because this can require subjective interpretation, CJK strokes cannot be placed into a single definitive classification scheme because stroke types lack a universal consensus on the description and number of basic and compound forms. CJK strokes cannot be placed into a single definitive categorization scheme due to visual ambiguity between strokes, and therefore cannot be segregated into mutually exclusive groups. Other factors inhibiting organization based on visual criteria are the variation of writing styles, and the changes of appearance that a stroke undergoes within various characters.

Pinyin naming convention in Unicode standard
A naming convention is a classification scheme where a controlled vocabulary is used systematically to describe the characteristics of an item. The naming convention for a CJK stroke is derived from the path mark left by the writing instrument. In this instance, the first letter of each stroke component - transliterated with pinyin pronunciation - are concatenated to form a stroke name with the sequence of letters indicating the basic strokes or stroke components used to create the CJK stroke. This system is used in the Unicode standard when encoding CJK stroke characters. In a basic stroke example, H represents the stroke named  ; in a compound example, HZT represents.

While no consensus exists, there are up to 12 distinct basic strokes that are identified by a unique radical.

There are many CJK compound strokes, however there is no consensus for sequence letter naming of compound strokes using the basic strokes. The following table demonstrates one of the CJK stroke naming convention:

Besides, some strokes have been unified or abandoned in Unicode:

Note that some names in the list do not follow the rules of controlled vocabulary. For example, stroke P (Piě) is not found in the compound stroke PN. The name "PN" comes from 平捺 (pinyin: Píng Nà), not 撇捺 (pinyin: Piě Nà). The meaning of 平 (pinyin: Píng) is "flat", and it should be called "BN" 扁捺 (pinyin: Biǎn Nà) if the rules are to be followed closely. The letter "Z" in stroke SWZ means 左 (pinyin: Zuǒ), not 折 (pinyin: Zhé). The meaning of 左 is "left", and it is not defined in the naming convention. Moreover, some 折 (pinyin: Zhé) strokes are far more than or far less than 90°, such as stroke HZZZG, stroke HZZP and stroke PZ.

Some strokes are not included in the Unicode standard, such as, , , , , , etc.

In Simplified Chinese, stroke TN is usually written as  (It was called "stroke DN", but Unicode has rejected it ).

English abbreviated naming conventions
There is another naming convention that use abbreviated forms of the English names for CJK strokes. The first letter of the English names are used in the naming system. The controlled vocabulary can be divided into two groups.

The first group is the abbreviated forms of the basic strokes.

The second group is the abbreviated forms of deformations used to form compound strokes.

“Zag” can be omitted in the naming system. The following table demonstrates the English abbreviation naming convention:

Numbering scheme
A numbering scheme is a categorisation method where similar strokes are grouped into categories labeled by nominal numbers. Category numbering may be an index of numbers of types, with sub-types indicated by a decimal point followed by another number or a letter.

The following table is a common numbering scheme that uses similar names as the Roman letter naming convention, but the stroke forms are grouped into major category types (1 to 5), which further break down into 25 sub-types in category 5.

Some strokes are not included in the numbering scheme, such as stroke, , , , , , , , etc.

Besides, there are ways of grouping strokes that are different from the Unicode standard. For example, stroke is merged into stroke  in Unicode system, while it is merged into  in this numbering scheme.

Number of strokes
Stroke number or stroke count is the number of strokes making up a character. Stroke count plays an important role in Chinese character sorting, teaching and computer information processing. Stroke numbers vary dramatically from characters to characters, for example, characters, and  have only one stroke, while the character  has 36 strokes, and  (a composition of  in triplicate) has 48. The Chinese character with the most strokes in the entire Unicode character set is (the aforementioned  in quadruplicate) with 64 strokes.

Counting strokes
There are effective methods to count the strokes of a Chinese character correctly. First of all, stroke counting is to be carried out on the standard regular script form of the character, and according to its stroke order. And if needed, a standard list of strokes or list of stroke orders issued by the authoritative institution should be consulted.

If two strokes are connected at the endpoints, whether they are separated into two strokes or linked into one stroke can be judged by the following rules:


 * 1) If the two strokes are connected in the upper left corner of a character or component, then separate them into two strokes.
 * Examples: (stroke order: ㇐㇓), (㇑㇕㇐) and  (㇑㇕㇐㇐).
 * 1) If they are connected in the upper right corner, then one stroke.
 * Examples: (㇑㇕㇐), (㇓㇆㇐㇐),  (㇓㇆㇑㇕㇐).
 * 1) If they are connected in the lower left corner, then if it is a fully enclosed structure, then count as two separated strokes
 * Examples: (㇑㇕㇐), (㇑㇕㇑㇕㇐㇐),  (㇑㇕㇐㇑㇐)
 * Exceptions:
 * 1) If it is not fully enclosed, then count as one stroke.
 * Examples: (㇑㇗㇑), (㇐㇓㇔㇗),  (㇐㇑㇑㇑㇕㇐㇐㇓㇆㇓㇔㇗).
 * Exceptions:(Taiwan: 12511；Mainland：1515)
 * 1) If they are connected in the lower right corner, then two strokes.
 * Examples: (㇑㇕㇐), (㇑㇕㇑㇕㇐㇐),  (㇑㇕㇐㇑㇐).

An important prerequisite for connecting two strokes into one stroke is: the tail of the first stroke is connected with the head of the second stroke.

Distribution of characters
Chart of Standard Forms of Common National Characters is a standard character set of 4,808 characters issued by Taiwan's Ministry of Education. The stroke numbers of characters range from 1 to 32 strokes. The 11-stroke group has the most characters, taking 9.297% of the character set. On the average, there are 12.186 strokes per character.

The List of Frequently Used Characters in Modern Chinese is a standard character set of 3,500 characters issued by the Ministry of Education of the People's Republic of China. The stroke numbers of characters range from 1 to 24 strokes. The 9-strokes characters are the most, taking 11.857% of the character set. On the average, there are 9.7409 strokes per character.

The Unicode Basic CJK Unified Ideographs is an international standard character set issued by ISO and Unicode, the same character set of the China national standard 13000.1. There are 20,902 Chinese characters, including simplified and traditional characters from China, Japan and Korea (CJK). The stroke numbers of characters range from 1 to 48 strokes. The 12-strokes group has the most characters, taking 9.358% of the character set. On the average, there are 12.845 strokes per character.

Stroke form
Stroke forms are the shapes of strokes. Different classification schemes have different numbers of categories by which one may classify individual strokes.

Two categories
The strokes of modern Chinese characters can be divided into plane strokes and turning or bent strokes.


 * Plane or basic strokes move in only one direction, or only curve gently—usually less than 90 degrees.
 * Examples:heng, ti , shu , pie , dian ,  na


 * Bent strokes are composed of plane strokes and turning points with sharper bends. Bent strokes also called derived strokes or compound strokes.
 * Examples:.

Five categories
When the six plane strokes of are classified into four categories by putting "ti" into category heng, and na into dian, then together with the bent stroke category, a five-category system is formed: Current national standards of PRC such as Stroke Orders of Commonly-used Standard Chinese Characters and many reference works published in China adopt the five categories of strokes, and stipulate the heng–shu–pie–dian–zhe stroke-group order. This order is consistent with the stroke order of the character : ㇐㇑㇓㇔㇟, and as such is called the " order". In Hong Kong and Taiwan among other places, people also use the group order of dian–heng–shu–pie–zhe
 * 1) heng,  ti.
 * 2) shu.
 * 3) pie.
 * 4) dian,  na.
 * 5) Bent strokes:.

The five basic strokes of heng, shu , pie , dian , and zhe at the beginning of each group are called main stroke shapes; and the following strokes are called subordinate stroke shapes, or secondary strokes. The name of a category is the name of the main stroke. For example, category heng include main stroke heng and secondary stroke ti.

There are disputes over the classification of the vertical hook stroke among the five types of strokes. In the currently effective national standards, belongs to category shu, but some language scholars argue that it should be put in the zhe ('bend') category.

Eight categories
In this classification, a new category gou ( 'hook'), which include all the strokes with hooks, is divided out from the original bend category; then, together with the six types of plane strokes, an eight-category system is formed: Because the character happens to contain strokes similar to each of these eight types, this classification is also called the Eight Principles of Yong.
 * 1) heng : ㇐.
 * 2) ti (.
 * 3) shu (.
 * 4) pie (.
 * 5) dian (.
 * 6) na (.
 * 7) zhe (.
 * 8) gou (.

CJK strokes
The stroke forms of a standard Chinese character set can be classified into a more detailed stroke table (or stroke list), for instance, the Unicode CJK strokes list has 36 types of stroke:

A stroke table is also called a stroke alphabet, whose function in the Chinese writing system is akin to the Latin alphabet for the English writing system.

YES strokes
Another stroke table is the YES Stroke Alphabet, which is used in YES stroke alphabetical order.

Stroke alphabet
This is a list of 30 strokes:

The stroke alphabet is built on the basis of Unicode CJK Strokes and the Standard of Chinese Character Bending Strokes of the GB13000.1 Character Set. There are totally 30 strokes, sorted by the standard plane strokes order of heng, tiao, ti , shu , pie , dian , na and the bending points order of zhe , wan  and gou.

Names
The English name is formed by the initial Pinyin letters of each character in the Chinese name, similar to the naming of CJK strokes in Unicode, (i.e., H: heng, T: ti/tiao, S: shu, P: pie, D: dian, N: na; z: zhe, w: wan and g: gou).

For more on stroke forms, stroke naming and stroke tables, please visit the previous sections.

Stroke order
The term stroke order can refer to one of two concepts: Because the direction of strokes is relatively simple, people generally refer to the latter meaning when talking about stroke order.
 * The direction in which a stroke is written—for example, the heng stroke is made horizontally from left to right, while the shu  stroke is written vertically from top to bottom.
 * The order in which strokes are written one by one to form a Chinese character.

Certain stroke orders guidelines are recommended to ensure speed, accuracy, and legibility in composition, as most Chinese characters have many strokes. As such, teachers enforce exactly one stroke order for each character, marking every deviation as a mistake, so everyone writes these characters the same way. The stroke order follows a few simple rules, though, which aids in memorizing these. To write CJK characters, one must know how to write CJK strokes, and thus, needs to identify the basic strokes that make up a character.

The most basic rules of stroke order are:
 * 1) Heng,  then shu.
 * Examples:、．
 * 1) Pie,  then na.
 * Examples:、．
 * Up, then down.
 * Examples:、．
 * 1) Left, then right.
 * Examples:、．
 * 1) Outside, then inside.
 * Examples:、、．

The stroke orders of and  are

The order of strokes is a summary of people's experience in writing Chinese characters correctly and conveniently. It plays an important role in the teaching, sorting and computer information processing of Chinese characters. The stroke order of cursive script is quite flexible and changeable, so the standard of stroke order generally refers to the stroke order of regular script.

The current stroke order standards are
 * China's Stroke Orders of the Commonly-used Standard Chinese Characters, and
 * Taiwan's Handbook of the Stroke Orders of the Commonly-used National Chinese Characters.

Character Sorting
Chinese characters can be sorted into different orders by their strokes. Stroke-based sorting methods include Stroke-count sorting, Stroke-order sorting, Stroke-count-stroke-order sorting, and YES sorting.

Stroke-count sorting
Characters may be sorted by their number of strokes. For example, the different characters in 、 are sorted into:
 * (5)
 * (6)
 * (8)
 * (10)
 * 、(12)
 * (14)

Stroke-order sorting
The characters are firstly arranged by their first strokes according to an order of stroke groups—such as or

then the characters with first strokes belonging to the same group, if any, are sorted by their second strokes in a similar way, and so on. This method is usually employed to support stroke-count sorting to deal with characters of the same stroke number. For instance, (12) starts with stroke  of the pie  group, and   (12) starts with  of the zhe  group, and pie is before zhe in groups order, so  goes before.

Stroke–count–stroke–order sorting
This is a combination of the previous two methods. In China, stroke-based sorting normally refers to stroke–count–stroke–order sorting. The Chinese national standard stroke-based sorting is in fact an enhanced stroke-count-stroke-order method Characters are arranged by stroke count, followed by stroke order. For example, the different characters in 、 are sorted into where each character is put at a unique position.
 * (5)
 * (6)
 * (8)
 * (10)
 * (12)
 * (12)
 * (14)

YES sorting
YES is a simplified stroke-based sorting method free of stroke counting and grouping, but without comprising accuracy. It has been used successfully to index the characters in the Xinhua Zidian and Xiandai Hanyu Cidian.

Stroke combination
There are three types of combinations between two strokes : In a Chinese character, multiple stroke combinations are usually used together. Such as:.
 * 1) Separation: the strokes are separated from each other. Such as:.
 * 2) Connection: the strokes are connected, this type can be further divided into two categories:
 * 3) The end point of one stroke is connected with the body of another stroke
 * 4) An end of the first stroke is connected to the following stroke's body, such as (stroke order: ),
 * 5) The body of the first stroke is connected to an end of the following stroke, such as:.
 * 6) Two strokes are connected end to end, including
 * 7) head-to-head, such as (stroke order: ),
 * 8) tail-to-tail, such as the first two strokes of ,
 * 9) tail-to-head, such as (㇇㇚). Another example: in character  , the first two strokes are connected head-to-head, the second two tail-to-tail, and the last stroke is connected to the first stroke head-to-tail.
 * 10) Intersection: the strokes are intersected. Such as:.

The same strokes and stroke order may form different Chinese characters or character components due to different combinations. For example:
 * (stroke order: ),

Stroke combinations can function to distinguish Chinese characters.

Distribution
The following tables present some experimental results on the distribution of Chinese character strokes in several dictionaries and character sets. The strokes are summarized in the five categories of heng (, 'horizontal'), shu (,  'vertical'), pie (,  'left-falling'), dian (,  'dot') and zhe (,  'bent').

Frequency
where field Characters includes the numbers of characters containing the strokes of each type, and field Appearances includes the number of appearances of the strokes in each type. The data is from an experiment on the 16,339 traditional and simplified Chinese characters in the Cihai (1979 edition), sorted in descending order of frequencies of appearance.

The data is from an experiment on the 20,902 traditional and simplified Chinese characters in the GB13000.1 character set—equivalent to the Unicode BMP CJK Chinese character set—sorted in descending order of frequencies of appearance.

The statistical results above made by different people on different character sets are basically consistent: The most commonly used stroke is heng, followed by shu. The least used is pie. The orders of dian and zhe  are different, though their frequencies are close.

Initial and final strokes of characters
There are 2,322 characters started with the heng stroke, 29.827% of the dictionary. There are 2,288 characters that end with heng, or 29.390% of the dictionary.

The data of the table is from an experiment on the 7,784 characters in the Chinese Character Information Dictionary, sorted in descending order of numbers of characters started.

The data is from an experiment on the 20,902 traditional and simplified Chinese characters in the GB13000.1 character set—equivalent to the Unicode BMP CJK character set—sorted by the number of characters started in descending order.

The above statistical results on the first and last strokes of Chinese characters made by different people on different character sets are consistent. The descending orders of strokes by number of characters started are all and the descending orders of strokes by number of characters ended are all Some rules can be drawn from here, such as: Stroke pie generally does not appear as the last stroke of a character or component, but more often as the first stroke. Stroke dian, including na, appear more often at the end of characters or components.

Eight Principles of Yong
The Eight Principles of Yong explain how to write eight common strokes in regular script which are found all in one character, (, "forever", "permanence"). It was traditionally believed that the frequent practice of these principles as a beginning calligrapher could ensure beauty in one's writing.


 * Eight basic strokes
 * D black.png － the Diǎn 點 / 点, is a dot, filled from the top, to the bottom, traditionally made by "couching" the brush on the page.
 * H black.png － the Héng 横, is horizontal, filled from left to right, the same way the Latin letters A, B, C, D are written.
 * S black.png － the Shù 豎 / 竖, is vertical-falling. The brush begins by a dot on top, then falls downward.
 * G black.png － the Gōu 鈎(鉤) / 钩, ending another stroke, is a sharp change of direction either down (after a Heng) or left (after a Shù).
 * T black.png － the Tí 提 / Tiāo 提, is a flick up and rightwards.
 * W black.png － the Wān 彎 / 弯, follows a concave path on the left or on the right.
 * P black.png － the Piě 撇, is a falling leftwards (with a slight curve).
 * N black.png － the Nà 捺, is falling rightwards (with an emphasis at the end of the stroke).
 * (+ XG-black.png － the Xié 斜 is sometimes added to the 永's strokes. It's a concave Shù falling right, always ended by a Gōu).

Use in computing
The stroke count method is based on the order of strokes to input characters on Chinese mobile phones.

As part of Chinese character encoding, there have been several proposals to encode the CJK strokes, most of time with a total around 35~40 entries. Most notable is the current Unicode block “CJK Strokes” (U+31C0..U+31EF), with 36 types of strokes: