Draft:Chinese character strokes



Strokes are the most basic unit of organization within Chinese characters. When writing, a stroke consists of the trace left on the writing surface from a discrete application of the writing implement. Individual strokes are highly regularized in the so-called regular script, and highly studied for aesthetic properties throughout East Asian calligraphy. In the ancient seal script, the terminals of lines within characters are often not clear, making it difficult to count the number of lines. The familiar notion of strokes first came into being with the appearance of clerical script. Study and classification of strokes is useful for understanding Chinese character calligraphy, ensuring character legibility. identifying fundamental components of radicals, and implementing support for the writing system on computers.

Emergence
as illustrated by

Formation
When writing radicals, a single stroke includes all the motions necessary to produce a given part of a character before lifting the writing instrument from the writing surface; thus, a single stroke may have abrupt changes in direction within the line. For example:
 * Cjk k str v.svg (Vertical / shù) is classified as a basic stroke because it is a single stroke that forms a line moving in one direction.
 * Cjk k str vhv.svg (Vertical – Horizontal – Vertical / shù zhé zhé) is classified as a compound stroke because it is a single stroke that forms a line that includes one or more abrupt changes in direction. This example is a sequence of three basic strokes written without lifting the writing instrument from the writing surface.

Direction
All strokes have direction. They are unidirectional and start from one entry point. As such, they are usually not written in the reverse direction by native users. Here are some examples:

Number
Stroke number, or stroke count, is the number of strokes of a Chinese character. Stroke number plays an important role in Chinese character sorting, teaching and computer information processing. Stroke numbers vary dramatically from characters to characters, for example, the characters and  have only one stroke, while  has 36, and  (composed of  in triplicate) has 48. Among characters encoded in Unicode, the one with the highest number of strokes is (the aforementioned 'dragon', instead in quadruplicate), with a total of 64.

Counting
There are effective methods to count the strokes of a Chinese character correctly. First of all, stroke counting is to be carried out on the standard regular form of the character, and according to its stroke order. And if needed, a standard list of strokes or list of stroke orders issued by the authoritative institution should be consulted.

If two strokes are connected at the endpoints, whether they are separated into two strokes or linked into one stroke can be judged by the following rules:


 * If the two strokes are connected in the upper left corner of a character or component, then separate them into two strokes, such as: (stroke order: ㇐㇓),  (㇑㇕㇐) and  (㇑㇕㇐㇐).
 * If they are connected in the upper right corner, then one stroke, such as: (㇑㇕㇐),  (㇓㇆㇐㇐),  (㇓㇆㇑㇕㇐).
 * If they are connected in the lower left corner, then if it is a fully enclosed structure, then count as two separated strokes, such as: (㇑㇕㇐),  (㇑㇕㇑㇕㇐㇐),  (㇑㇕㇐㇑㇐) ; if it is not fully enclosed, then count as one stroke, such as:  (㇑㇗㇑),  (㇐㇓㇔㇗),  (㇐㇑㇑㇑㇕㇐㇐㇓㇆㇓㇔㇗).
 * If they are connected in the lower right corner, then two strokes, such as: (㇑㇕㇐),  (㇑㇕㇑㇕㇐㇐),  (㇑㇕㇐㇑㇐).

An important prerequisite for connecting two strokes into one stroke is: the tail of the first stroke is connected with the head of the second stroke.

Distribution
Chart of Standard Forms of Common National Characters is a standard character set of 4,808 characters issued by the Ministry of Education of Taiwan (ROC). The stroke numbers of a character range from 1 to 32 strokes. The 11-strokes group has the most characters, taking 9.297% of the character set. On the average, there are 12.186 strokes per character.

The List of Frequently Used Characters in Modern Chinese is a standard character set of 3,500 characters issued by the Ministry of Education of the People's Republic of China. The stroke numbers of characters range from 1 to 24 strokes. The 9-strokes characters are the most, taking 11.857% of the character set. On the average, there are 9.7409 strokes per character.

The Unicode Basic CJK Unified Ideographs is an international standard character set issued by ISO and Unicode, the same character set of the China national standard 13000.1. There are 20,902 Chinese characters, including simplified and traditional characters from China, Japan and Korea (CJK). The stroke numbers of characters range from 1 to 48 strokes. The 12-strokes group has the most characters, taking 9.358% of the character set. On the average, there are 12.845 strokes per character.

A stroke table functions in the Chinese writing system somewhat like the Latin alphabet does in English.

Ordering
The term stroke order can refer to one of two concepts: Because the direction of strokes is relatively simple, people generally refer to the latter meaning when talking about stroke order.
 * The direction in which a stroke is written—for example, the heng stroke is made horizontally from left to right, while the shu  stroke is written vertically from top to bottom.
 * The order in which strokes are written one by one to form a Chinese character.

The most basic rules of stroke order are:
 * 1) Heng,  then shu.
 * Examples:、．
 * 1) Pie,  then na.
 * Examples:、．
 * Up, then down.
 * Examples:、．
 * 1) Left, then right.
 * Examples:、．
 * 1) Outside, then inside.
 * Examples:、、．

The stroke orders of and  are

The order of strokes is a summary of people's experience in writing Chinese characters correctly and conveniently. It plays an important role in the teaching, sorting and computer information processing of Chinese characters. The stroke order of cursive script is quite flexible and changeable, so the standard of stroke order generally refers to the stroke order of regular script.

The current stroke order standards are
 * China's Stroke Orders of the Commonly-used Standard Chinese Characters, and
 * Taiwan's Handbook of the Stroke Orders of the Commonly-used National Chinese Characters.

Sorting
Stroke order refers to the order in which the strokes of a Chinese character are written. A stroke is a movement of a writing instrument on a writing surface. Certain stroke orders guidelines are recommended to ensure speed, accuracy, and legibility in composition, as most Chinese characters have many strokes. As such, teachers enforce exactly one stroke order for each character, marking every deviation as a mistake, so everyone writes these characters the same way. The stroke order follows a few simple rules, though, which aids in memorizing these. To write CJK characters, one must know how to write CJK strokes, and thus, needs to identify the basic strokes that make up a character. Chinese characters can be sorted into different orders by their strokes. The important stroke-based sorting methods include: Stroke-count sorting, Stroke-count-stroke-order sorting, GB stroke-based sorting and YES sorting.

By stroke count
Characters may be sorted by total number of strokes. For example, the different characters in 、 are sorted into:
 * (5)
 * (6)
 * (8)
 * 、、(12)
 * (14)

By stroke order
The characters are firstly arranged by their first strokes according to an order of stroke groups—such as or

then the characters with first strokes belonging to the same group, if any, are sorted by their second strokes in a similar way, and so on. This method is usually employed to support stroke-count sorting to deal with characters of the same stroke number. For instance, (12) starts with stroke  of the pie  group, and   (12) starts with  of the zhe  group, and pie is before zhe in groups order, so  goes before.

Stroke–count–stroke–order sorting
This is a combination of the previous two methods. In China, stroke-based sorting normally refers to stroke–count–stroke–order sorting. The Chinese national standard stroke-based sorting is in fact an enhanced stroke-count-stroke-order method Characters are arranged by stroke count, followed by stroke order. For example, the different characters in 、 are sorted into where each character is put at a unique position.
 * (5)
 * (6)
 * (8)
 * (10)
 * (12)
 * (12)
 * (14)

YES sorting
YES is a simplified stroke-based sorting method free of stroke counting and grouping, but without comprising accuracy. It has been used successfully to index the characters in the Xinhua Zidian and Xiandai Hanyu Cidian.

Combinations
There are three types of stroke combinations between two strokes :
 * 1) Separation: the strokes are separated from each other. Such as:, ,.
 * 2) Connection: the strokes are connected, this type can be further divided into two categories:
 * 3) The end point of one stroke is connected with the body of another stroke
 * 4) An end of the first stroke is connected to the following stroke's body, such as
 * 5) The body of the first stroke is connected to an end of following stroke, such as:
 * 6) Both types of connection are used: such as.
 * 7) Two strokes are connected end to end, including head-head, tail-tail and tail-head . Such as: , , ,.
 * 8) Intersection: the strokes are intersected. Such as:.

In a Chinese character, multiple stroke combinations are usually used together. Such as:.

The same strokes and stroke order may form different Chinese characters or character components due to different combinations. like: . In other words, stroke combinations have the function of distinguishing Chinese characters.

Classification
CJK strokes are an attempt to identify and classify all single-stroke components that can be used to write Han radicals. There are some thirty distinct types of strokes recognized in Chinese characters, some of which are compound strokes made from basic strokes. The compound strokes comprise more than one movement of the writing instrument, and many of these have no agreed-upon name.

Basic strokes
A basic stroke is a single calligraphic mark moving in one direction across a writing surface. The following table lists a selection of basic strokes divided into two stroke groups: simple and combining. "Simple strokes" (such as Horizontal / Héng and Dot / Diǎn) can be written alone. "Combining strokes" (such as Bend / Zhé and Hook / Gōu) never occur alone, but must be paired with at least one other stroke forming a compound stroke. Thus, they are not in themselves individual strokes.

Compound strokes
A compound stroke (also called a complex stroke) is produced when two or more basic strokes are combined in a single stroke written without lifting the writing instrument from the writing surface. The character (pinyin: yǒng) "eternity", described in more detail in, demonstrates one of these compound strokes. The centre line is a compound stroke that combines three stroke shapes in a single stroke.

Basics for making compound strokes
In most cases, concatenating basic strokes together form a compound stroke. For example, Vertical / Shù combined with Hook / Gōu produce (Vertical–Hook / Shù Gōu). A stroke naming convention sums the names of the basic strokes, in the writing order.

An exception to this applies when a stroke makes a strictly right-angle turn in the Simplified Chinese names. Horizontal (Héng) and Vertical  (Shù) strokes are identified only once when they appear as the first stroke of a compound; any single stroke with successive 90° turns down or to the right are indicated by a Bend  (pinyin: zhé). For example, an initial Shù followed by an abrupt turn right produces (Shù Zhé). In the same way, an initial Shù followed by an abrupt turn right followed by a second turn down produces (Shù Zhé Zhé). However, their inherited names are "Vertical–Horizontal" and "Vertical–Horizontal–Vertical". We need not to use "Bend" in the inherited names.

Nearly all complex strokes can be named using this simple scheme.

Nomenclature
Organization systems used to describe and differentiate strokes may include the use of roman letters, Chinese characters, numbers, or a combination of these devices. Two methods of organizing CJK strokes are by:


 * 1) Classification schemes that describe strokes by a naming convention or by conformity to a taxonomy; and
 * 2) Categorization schemes that differentiate strokes by numeric or topical grouping.

In classification schemes, stroke forms are described, assigned a representative character or letterform, and may be arranged in a hierarchy. In categorization schemes, stroke forms are differentiated, sorted and grouped into like categories; categories may be topical, or assigned by a numeric or alpha-numeric nominal number according to a designed numbering scheme.

Benefits
Organizing strokes into a hierarchy aids a user's understanding by bringing order to an obtuse system of writing that has organically evolved over the period of centuries. In addition, the process of recognizing and describing stroke patterns promotes consistency of stroke formation and usage. When organized by naming convention, classification allows a user to find a stroke quickly in a large stroke collection, makes it easier to detect duplication, and conveys meaning when comparing relationships between strokes. When organized by numbering scheme, categorization aids a user in understanding stroke differences, and makes it easier to make predictions, inferences and decisions about a stroke.

Limitations
Strokes are described and differentiated using the criteria of visual qualities of a stroke. Because this can require subjective interpretation, CJK strokes cannot be placed into a single definitive classification scheme because stroke types lack a universal consensus on the description and number of basic and compound forms. CJK strokes cannot be placed into a single definitive categorization scheme due to visual ambiguity between strokes, and therefore cannot be segregated into mutually exclusive groups. Other factors inhibiting organization based on visual criteria are the variation of writing styles, and the changes of appearance that a stroke undergoes within various characters.

Roman letter convention
A naming convention is a classification scheme where a controlled vocabulary is used systematically to describe the characteristics of an item. The naming convention for a CJK stroke is derived from the path mark left by the writing instrument. In this instance roman letters are concatenated to form a stroke name as a sequence of one or more roman letters indicating the component strokes used to create the CJK stroke. The first letter of the Han radical's pinyin pronunciation represents each basic stroke. In a basic stroke example, H represents the stroke named ; in a compound example, HZT represents.

While no consensus exists, there are up to 12 distinct basic strokes that are identified by a unique Han radical.

There are many compound strokes, however there is no consensus for sequence letter naming of compound strokes using the basic strokes. The following table demonstrates the CJK stroke naming convention:

Some strokes have been unified or abandoned in Unicode:

Note that some names in the list do not follow the rules of controlled vocabulary. For example, stroke P is not found in the compound stroke PN. The name "PN" comes from, not. The meaning of is "flat", and it should be called "BN"  (pinyin: Biǎn Nà) if the rules are to be followed closely. The letter "Z" in stroke SWZ means ), not ). The meaning of is "left", and it is not defined in the naming convention. Moreover, some (pinyin: Zhé) strokes are far more than or far less than 90°, such as stroke HZZZG, stroke HZZP and stroke PZ.

Some strokes are not included in the Unicode standard, such as, , , , , , etc.

In Simplified Chinese, stroke TN is usually written as  (It was called "stroke DN", but Unicode has rejected it ).

Abbreviations
Naming conventions that use abbreviated forms of the CJK strokes also exist. After the names of CJK strokes are translated into English, first letters of the English names are used in the naming system. The controlled vocabulary can be divided into two groups.

The first group is the abbreviated forms of the basic strokes.

The second group is the abbreviated forms of deformations.

“Zig” can be omitted from the naming system. The following table demonstrates the CJK stroke naming convention:

Numbering
A numbering scheme is a categorisation method where similar strokes are grouped into categories labeled by nominal numbers. Category numbering may be an index of numbers of types, with sub-types indicated by a decimal point followed by another number or a letter.

The following table is a common numbering scheme that uses similar names as the Roman letter naming convention, but the stroke forms are grouped into major category types (1 to 5), which further break down into 25 sub-types in category 5.

Stroke forms and stroke tables
Scholars' estimates as to the number of distinct strokes, per their shapes, are not consistent. From the perspective of Chinese pedagogy, there are relatively few types of strokes, but from a calligraphic perspective, or one of a font designer or artist, more useful types emerge. For example, the stroke shu (, 'vertical') can be further divided into 'long shu, 'short shu and 'hanging needle shu'. pie can be divided into 'horizontal pie, 'slanting pie',  and 'vertical ''pie. Some strokes are not included in the numbering scheme, such as stroke, , , , , , , , etc.

Besides, there are ways of grouping strokes that are different from the Unicode standard. For example, stroke is merged into stroke  in Unicode system, while it is merged into  in this numbering scheme.

Eight principles of Yong
The Eight Principles of Yong explain how to write eight common strokes in regular script which are found all in one character, (, "forever", "permanence"). It was traditionally believed that the frequent practice of these principles as a beginning calligrapher could ensure beauty in one's writing.


 * D black.png － the Diǎn /, is a dot, filled from the top, to the bottom, traditionally made by "couching" the brush on the page.
 * H black.png － the Héng, is horizontal, filled from left to right, the same way the Latin letters A, B, C, D are written.
 * S black.png － the Shù /, is vertical-falling. The brush begins by a dot on top, then falls downward.
 * G black.png － the Gōu /, ending another stroke, is a sharp change of direction either down (after a Heng) or left (after a Shù).
 * T black.png － the Tí / Tiāo, is a flick up and rightwards.
 * W black.png － the Wān /, follows a concave path on the left or on the right.
 * P black.png － the Piě, is a falling leftwards (with a slight curve).
 * N black.png － the Nà, is falling rightwards (with an emphasis at the end of the stroke).
 * (+ XG-black.png － the Xié is sometimes added to the 's strokes. It's a concave Shù falling right, always ended by a Gōu).

In computing
The stroke count method is based on the order of strokes to input characters on Chinese mobile phones.

As part of Chinese character encoding, there have been several proposals to encode the CJK strokes, most of time with a total around 35 to 40 entries. The Unicode block "CJK Strokes" (U+31C0..U+31EF) encodes 36 types of strokes: