Sinitic languages

The Sinitic languages, often synonymous with the Chinese languages, are a group of East Asian analytic languages that constitute a major branch of the Sino-Tibetan language family. It is frequently proposed that there is a primary split between the Sinitic languages and the rest of the family (the Tibeto-Burman languages). This view is rejected by some researchers but has found phylogenetic support among others. The Macro-Bai languages, whose classification is difficult, may be an offshoot of Old Chinese and thus Sinitic; otherwise, Sinitic is defined only by the many varieties of Chinese unified by a shared historical background, and usage of the term "Sinitic" may reflect the linguistic view that Chinese constitutes a family of distinct languages, rather than variants of a single language.

Population
Over 91% of the Chinese population speaks a Sinitic language. Approximately 1.52 billion people are speakers of the Chinese macrolanguage, of whom about three-quarters speak a Mandarin variety. Estimates of the number of global speakers of Sinitic branches as of 2018–2019, both native and non-native, are listed below:

Languages


Dialectologist Jerry Norman estimated that there are hundreds of mutually unintelligible Sinitic languages. They form a dialect continuum in which differences generally become more pronounced as distances increase, though there are also some sharp boundaries. The Sinitic languages can be divided into Macro-Bai languages and Chinese languages, and the following is one of many potential ways of subdividing these languages. Some varieties, such as Shaozhou Tuhua, are hard to classify and thus are not included in the following briefs.

Macro-Bai languages
This is a language family first proposed by linguist Zhengzhang Shangfang, and was expanded to include Longjia and Luren. It likely split off from the rest of Sinitic during the Old Chinese period. The languages included are all considered minority languages in China and are spoken in the Southwest. The languages are: All other Sinitic languages henceforth would be considered Chinese.
 * Bai
 * Cai-Long: Caijia, Longjia, Luren

Chinese
The Chinese branch of the family is classified into at least seven main families. These families are classified based on five main evolutionary criteria: The varieties within one family may not be mutually intelligible with each other. For instance, Wenzhounese and Ningbonese are not highly mutually intelligible. The Language Atlas of China identifies ten groups: with Jin, Hui, Pinghua, and Tuhua not part of the seven traditional groups.
 * 1) The evolution of the historical fully muddy  initials
 * 2) The distribution of rimes across the four tone qualities, as conditioned by voicing and aspiration of initials
 * 3) The evolution of the checked  tone category
 * 4) The loss or retention of coda position plosives and nasals
 * 5) The palatalisation of the  initial  in front of high vowels
 * Mandarin
 * Jin
 * Yue
 * Hakka
 * Min
 * Wu
 * Hui
 * Gan
 * Xiang
 * Pinghua and Tuhua

Mandarin
Varieties of Mandarin are used in the Western Regions, the Southwest, Huguang, Inner Mongolia, Central Plains and the Northeast, by around three-quarters of the Sinitic-speaking population. Historically, the prestige variety has always been Mandarin, which is still reflected today in Standard Chinese. Standard Chinese is now an official language of the Republic of China, People's Republic of China, Singapore and United Nations. Re-population efforts, such as that of the Qing dynasty in the Southwest, tended to involve Mandarin speakers. Classification of Mandarin lects has undergone several significant changes, though nowadays it is commonly divided as such, based on the distribution of the historical checked tone: as well as other lects, which do not neatly fall into these categories, such as Mandarin Junhua varieties.
 * Northeastern
 * Beijing (sometimes considered part of Northeastern)
 * Jiaoliao (sometimes "Peninsular")
 * Jilu (sometimes "Northern")
 * Central Plains (or "Zhongyuan")
 * Lanyin (sometimes "Northwestern" and considered part of Central Plains)
 * Jin (often considered a top-level group due to the Language Atlas of China)
 * Southwestern (sometimes "Upper Yangtse")
 * Jianghuai (or "Lower Yangtze", sometimes "Huai", "Southern" or "Southeastern")

Varieties of Mandarin can be defined by their universally lost -m final, low number of tones, and smaller inventory of classifiers, among other features. Mandarin lects also often have rhotic erhua rimes, though the amount of its use may vary between lects. Loss of checked tone is an often cited criterion for Mandarin languages, though lects such as Yangzhounese and Taiyuannese show otherwise.

Northeastern and Beijing Mandarin
Northeastern Mandarin is spoken in Heilongjiang, Jilin, most of Liaoning and northeastern Inner Mongolia, whereas Beijing Mandarin is spoken in northern Hebei, most of Beijing, parts of Tianjin and Inner Mongolia. The two families' most notable features are the heavy use of rhotic erhua and seemingly random distribution of the dark checked tone, and generally having four tones with the contours of high flat, rising, dipping, and falling. Northeastern Mandarin, especially in Heilongjiang, contains many loanwords from Russian. Northeastern Mandarin lects can be divided into three main groups, namely Hafu (including Harbinnese and Changchunnese), Jishen (including Jilinnese and Shenyangnese), and Heisong. Notably, the extinct Taz language of Russia is also a Northeastern Mandarin language. Beijing is sometimes included in Northeastern Mandarin due to its distribution of the historical dark checked tone, though is listed as its own group by others, often due to its more regular light checked tones.

Jilu Mandarin
Jilu Mandarin is spoken in southern Hebei and western Shandong, and is often represented with Jinannese. Notable cities that use Jilu Mandarin lects include Cangzhou, Shijiazhuang, Jinan and Baoding. Characteristically Jilu Mandarin features include merging the dark checked into the dark level tone, the light checked into light level or departing based on the manner of articulation of the initial, and vowel breaking in tong rime series' checked-tone words, among other features.

Jilu Mandarin can be classified into Baotang, Shiji, Canghui and Zhangli. Zhangli is of note due to its preservation of a separate checked tone.

Jiaoliao Mandarin
Jiaoliao Mandarin is spoken in the Jiaodong and Liaodong Peninsulae, which includes the cities of Dalian and Qingdao, as well as several prefectures along the China-Korea border. Like Jilu Mandarin, its light checked tone is merged into light level or departing based on the manner of articulation of the initial, though its dark checked is merged into the rising. Its initial  terms are pronounced with a null initial (apart from open  rime series  finals), unlike the  of Northern and Beijing Mandarin.

Based on, for example, the pronunciation of the palatalized initial, Jiaoliao Mandarin can be divided into Qingzhou, Denglian and Gaihuan areas.

Central Plains and Lanyin Mandarin
Central Plains Mandarin is spoken in the Central Plains of Henan, southwestern Shanxi, southern Shandong and northern Jiangsu, as well as most of Shaanxi, southern Ningxia and Gansu and southern Xinjiang, in famous cities such as Kaifeng, Zhengzhou, Luoyang, Xuzhou, Xi'an, Xining and Lanzhou. Central Plains Mandarin lects merge the historical checked tones with a lesser muddy and clear  initial together with the rising tone, and those with a fully muddy  initial are merged with the light level tone.

Lanyin Mandarin, spoken in northern Ningxia, parts of Gansu, and northern Xinjiang, is sometimes grouped with Central Plains Mandarin due to its merged lesser light and dark checked tones, though it is realised as a departing tone.

Subdivision of Central Plains Mandarin is not fully agreed upon, though one possible subdivision sees 13 divisions, namely Xuhuai, Zhengkai, Luosong, Nanlu, Yanhe, Shangfu, Xinbeng, Luoxiang, Fenhe, Guanzhong, Qinlong, Longzhong and Nanjiang. Lanyin Mandarin, on the other hand, is divided as Jincheng, Yinwu, Hexi, and Beijiang. The Dungan language is a collection of Central Plains Mandarin varieties spoken in the former Soviet Union.

Jin
Jin is spoken in most of Shanxi, western Hebei, northern Shaanxi, northern Henan and central Inner Mongolia, often represented by Taiyuannese. It was first proposed as a lect separate from the rest of Mandarin by Li Rong, where it was proposed as lects in and around Shanxi with a checked tone, though this stance is not without disagreement. Jin varieties also often has disyllabic words derived from syllable splitting (分音詞), through the infixation of.

As per the Language Atlas by Li, Jin is divided into Dabao, Zhanghu, Wutai, Lüliang, Bingzhou, Shangdang, Hanxin, and Zhiyan branches.

Southwestern Mandarin
Spoken in Yunnan, Guizhou, northern Guangxi, most of Sichuan, southern Gansu and Shaanxi, Chongqing, most of Hubei and bordering parts of Hunan, as well as Kokang of Myanmar and parts of northern Thailand, Southwestern Mandarin speakers take up the most area and population of all Mandarinic language groups, and would be the eighth most spoken language in the world if separated from the rest of Mandarin. Southwestern Mandarinic tends to not have retroflex consonants, and merges all checked tone categories together. Except for Minchi, which has a standalone checked category, the checked tone is merged with another category. Representative lects include Wuhannese and Sichuanese, and sometimes Kunmingnese.

Southwestern Mandarin tends to be split into Chuanqian, Xishu, Chuanxi, Yunnan, Huguang and Guiliu branches. Minchi is sometimes separated as a remnant of Old Shu.

Huai
Huai is spoken in central Anhui, northern Jiangxi, far western and eastern Hubei and most of Jiangsu. Due to its preservation of a checked tone, some linguists believe that Huai ought to be treated as a top-level group, like Jin. Representative lects tend to be Nanjingnese, Hefeinese and Yangzhounese. The Huai of Nanjing has likely served as a national prestige during the Ming and Qing periods, though not all linguists support this viewpoint.

The Language Atlas divides Huai into Tongtai, Huangxiao, and Hongchao areas, with the latter further split into Ninglu and Huaiyang. Tongtai, being geographically located furthest west, has the most significant Wu influence, such as in its distribution of historical voiced plosive series.

Yue
Yue Chinese is spoken by around 84 million people, in western Guangdong, eastern Guangxi, Hong Kong, Macau and parts of Hainan, as well as overseas communities such as Kuala Lumpur and Vancouver. Famous lects such as Cantonese and Taishanese belong to this family. Yue Chinese lects generally possess long-short distinctions in their vowels, which is reflected in their almost universally split dark-checked and often split light-checked tones. They generally also tend to preserve all three checked plosive finals and three nasal finals. The status of Pinghua is uncertain, and some believe its two groups, Northern and Southern, should be listed under Yue, though some reject this standpoint. Yue is generally split into Cantonese (which itself contains Yuehai, Xiangshan, and Guanbao), Siyi, Gaoyang, Qinlian, Wuhua, Goulou (which includes Luoguang), Yongxun and the two Pinghua branches. Siyi is generally agreed to be the most divergent, and Goulou is believed to be the one which is closest related to Pinghua.

Hakka
Hakka Chinese is a direct result of several migration waves from Northern China to the South, and is spoken in eastern Guangdong, parts of Taiwan, western Fujian, Hong Kong, southern Jiangxi, as well as scattered points in the rest of Guangdong, Hunan, Guangxi and Hainan, along with overseas communities such as in West Kalimantan and Bangka Belitung Islands in Indonesia, by an estimated total of 44 million people. Some believe that Hakka is closely related to other groups, such as Gan, Yue, or Tongtai. Hakka varieties generally have no voiced plosive initials and preserve the historical initial  as an n-like sound.

Hakka can be divided into Yuetai, Hailu, Yuebei, Yuexi, Tingzhou, Ninglong, Yuxin and Tonggui. Meizhounese is often used as the representative variety of Hakka.

Min
Min Chinese is a direct descendant of Old Chinese, and is spoken in Chaoshan and Zhanjiang of Guangdong, Hainan, Taiwan, most of Fujian and parts of Jiangxi and Zhejiang, by around 76 million people. Due to significant amounts of migration, many people in Southeast Asia and Hong Kong are also able of speaking Min varieties. Lects such as Teoswa, Hainanese, Hokkien (incl. Taiwanese) and Hokchiu are all Min varieties.

Since Min descended from Old Chinese rather than Middle Chinese, it has some features that would be out of place in other varieties. For instance, some words with the initial  are not affricates in Min. This, interestingly, has led to many languages, such as Occitan, Inuktitut, Latin, Māori and Telugu, loaning the Sinitic word for 'tea'  with a plosive. Min varieties also have a very large number of words with literary pronunciations.

Min can primarily be split into Coastal and Inland Min varieties. The former contains the Southern Min branches of Quanzhang (Hokkien), Chaoshan (Teoswa), Datian and Zhongshan, the Eastern Min branches of Houguan and Funing, Qionglei Min, as well as Puxian Min, whereas the latter includes Northern, Central and Shaojiang Min. Shaojiang Min acts as a transitional area between Min, Gan, and Hakka.

Wu
Wu Chinese is spoken in most of Zhejiang, Shanghai, southern Jiangsu, parts of southern Anhui and eastern Jiangxi by around 82 million people. Many large cities in the Yangtze Delta, such as Suzhou, Changzhou, Ningbo and Hangzhou, use a Wu variety. Wu varieties generally have a fricative initial in their negators, a three-way plosive distinction, as well as a checked coda preserved as a glottal stop, except for Oujiang lects, where it has become vowel length, and Xuanzhou. Shanghainese, Suzhounese and Wenzhounese are usually used as representatives of Wu. Wu Chinese varieties generally have a massive number of vowels, which rivals even North Germanic languages. The Dondac variety has been observed to have 20 phonemic monophthongal vowels, according to one analysis.

Qian Nairong divides Wu into Taihu (or Northern Wu), Taizhou, Oujiang, Chuqu and Wuzhou. Northern Wu is further divided into Piling, Suhujia, Tiaoxi, Linshao, Yongjiang, and Hangzhou, though Hangzhou's classification is unclear.

Hui
Huizhou Chinese is spoken in western Hangzhou, southern Anhui and parts of Jingdezhen, by around 5 million people. It is identified as a top-level group by the Language Atlas, though some linguists believe in other theories, such as it being a Gan-influenced Wu variety, due to an identifiable basis of Old Wu features. Hui varieties are phonologically diverse, and some features are shared with Wu, such as the simplification of diphthongs. Hui can be divided into Jishe, Xiuyi, Qiwu, Jingzhan and Yanzhou branches, with Tunxinese and Jixinese being representatives.

Gan
Gan Chinese is spoken in northern and central Jiangxi, parts of Hebei and Anhui and eastern Hunan, by 22 million people, sometimes believed to be related to Hakka. Gan varieties tend to not palatalize terms with the initial  and have an f-like initial in closed  and  initial  terms, among other features.

Gan can also be divided into Northern and Southern groups. The Northern group was formed during the Tang dynasty, whereas the Southern group was developed based on Northern Gan. The Language Atlas sees Gan divided into Changdu, Yiliu, Jicha, Fuguang, Yingyi, Datong, Dongsui, Huaiyue, and Leizi branches. Nanchangnese is often chosen as the representative. Shaojiang Min is identified to be influenced or even closely related to Fuguang Gan.

Xiang
Xiang Chinese is spoken in central and western Hunan and nearby parts of Guangxi and Guizhou by an estimated 37 million people. Due to migrations, Xiang can be split into New and Old Xiang groups, with Old Xiang having fewer Mandarin-influenced features. Xiang varieties have universally lost their checked codas, but the majority of them still have a unique preserved checked tone contour. Most also have a three-way plosive distinction, like Wu varieties.

One way of dividing Xiang varieties sees five distinct families, namely Changyi, Hengzhou, Louzhao, Chenxu, and Yongzhou. Changshanese and one of Shuangfengnese or Loudinese are usually taken as Xiang representatives.

Internal classification
The traditional, dialectological classification of Chinese languages is based on the evolution of the sound categories of Middle Chinese. Little comparative work has been done (the usual way of reconstructing the relationships between languages), and little is known about mutual intelligibility. Even within the dialectological classification, details are disputed, such as the establishment in the 1980s of three new top-level groups: Huizhou, Jin and Pinghua, although Pinghua is itself a pair of languages and Huizhou maybe half a dozen.

Like Bai, the Min languages are commonly thought to have split off directly from Old Chinese. The evidence for this split is that all Sinitic languages apart from the Min group can fit into the structure of the Qieyun, a 7th-century rime dictionary. However, this view is not universally accepted.

Points of contention
Like many other language families, Sinitic languages have had problems with classification. The following are a few examples.

Southern China
Traditionally, the lect of urban Hangzhou and New Xiang of eastern Hunan are not considered Mandarin. However, linguists such as Richard VanNess Simmons and Zhou Zhenhe have observed that these two varieties possess more qualifying features of Mandarin languages. For instance, the vowels of the second division of the  initial is often raised and backed in Wu and Xiang, while they are not in Hangzhounese and New Xiang.

Nantongnese has heavy Wu influence, which has led to it also having raised and backed vowels.

Danzhounese and Maihua are both traditionally considered Yue lects. Recent research, however, has noted that these are both are more likely unclassified. Maihua, for example, may be a Yue-Hakka-Hainanese Min mixed language.

Dongjiang Bendihua is spoken in and around Huizhou and Heyuan. Its classification has always been unclear, though the most common standpoint is that it is considered Hakka.

Northern China
The variety spoken in the Ganyu District of Lianyungang is listed as a variety of Central Plains Mandarin in the Language Atlas of China, though its tonal distribution is more similar to Peninsular Mandarin varieties.

Relationships between groups
Jerry Norman classified the traditional seven dialect groups into three larger groups: Northern (Mandarin), Central (Wu, Gan, and Xiang), and Southern (Hakka, Yue, and Min). He argued that the Southern Group is derived from a standard used in the Yangtze valley during the Han dynasty (206 BC – 220 AD), which he called Old Southern Chinese, while the Central group was transitional between the Northern and Southern groups. Some dialect boundaries, such as between Wu and Min, are particularly abrupt, while others, such as between Mandarin and Xiang or between Min and Hakka, are much less clearly defined.

Scholars account for the transitional nature of the central varieties in terms of wave models. Iwata argues that innovations have been transmitted from the north across the Huai River to the Lower Yangtze Mandarin area and from there southeast to the Wu area and westwards along the Yangtze River valley and thence to southwestern areas, leaving the hills of the southeast largely untouched.

A quantitative study
A 2007 study compared fifteen major urban dialects on the objective criteria of lexical similarity and regularity of sound correspondences, and subjective criteria of intelligibility and similarity. Most of these criteria show a top-level split with Northern, New Xiang, and Gan in one group and Min (samples at Fuzhou, Xiamen, Chaozhou), Hakka, and Yue in the other group. The exception was phonological regularity, where the one Gan dialect (Nanchang Gan) was in the Southern group and very close to Meixian Hakka, and the deepest phonological difference was between Wenzhounese (the southernmost Wu dialect) and all other dialects.

The study did not find clear splits within the Northern and Central areas:


 * Changsha (New Xiang) was always within the Mandarin group. No Old Xiang dialect was in the sample.
 * Taiyuan (Jin or Shanxi) and Hankou (Wuhan, Hubei) were subjectively perceived as relatively different from other Northern dialects but were very close in mutual intelligibility. Objectively, Taiyuan had substantial phonological divergence but little lexical divergence.
 * Chengdu (Sichuan) was somewhat divergent lexically but very little on the other measures.

The two Wu dialects (Wenzhou and Suzhou) occupied an intermediate position, closer to the Northern/New Xiang/Gan group in lexical similarity and strongly closer in subjective intelligibility but closer to Min/Hakka/Yue in phonological regularity and subjective similarity, except that Wenzhou was farthest from all other dialects in phonological regularity. The two Wu dialects were close to each other in lexical similarity and subjective similarity but not in mutual intelligibility, where Suzhou was closer to Northern/Xiang/Gan than to Wenzhou.

In the Southern subgroup, Hakka and Yue grouped closely together on the three lexical and subjective measures but not in phonological regularity. The Min dialects showed high divergence, with Min Fuzhou (Eastern Min) grouped only weakly with the Southern Min dialects of Xiamen and Chaozhou on the two objective criteria and was slightly closer to Hakka and Yue on the subjective criteria.

Internal comparison
The following section will be dedicated to comparing non-Bai and non-Cai–Long Sinitic languages. Though all stem from Old Chinese, they have all developed differences with each other.

Writing system
Typographically, the vast majority of Sinitic languages use Sinographs. However, some varieties, such as Dungan and Hokkien, have alternative scripts, namely Cyrillic and Latin alphabets. Even between varieties which use Sinographs, characters are repurposed or invented to cover for the difference in vocabulary. Examples include in Yue,  in Hakka,  in Hokkien,  in Wu,  in Xiang, and  in Mandarin. Note that both traditional and simplified characters can be used to write any lect.

Phonology
Phonologically speaking, though all Sinitic languages possess tones, their contours and the total number of tones vary wildly, from Shanghainese, which can be analysed to have only two tones, to Bobainese, which has ten. Sinitic languages also vary wildly in their phonological inventories and phonotactics. Take for instance  seen in Pingdingnese, or   of Xuanzhounese, which both show syllables which do not follow the (single) consonant-glide-vowel-consonant syllable structure of more well-known lects. Tone sandhi is also a feature which not all lects share. Cantonese, for instance, only has a very weak system, whereas Wu varieties not only have complex, intricate systems, which affect almost all syllables, but also uses it to mark for grammatical part of speech. Take for instance, this simplified analysis of Suzhounese tone sandhi:

Grammar
Disregarding phonology, grammar is the feature of Sinitic languages which differ the most. The majority of Sinitic languages do not possess tenses, though exceptions include Northern Wu lects such as Shanghainese and Suzhounese, though it is largely breaking down in Shanghainese due to Mandarin influence. Sinitic languages generally also have no case marking, though lects such as Linxianese and Hengshannese do possess case particles, with the latter expressing it through tone change. Sinitic languages generally have SVO word order and possess classifiers.

Verb usage may be different between Sinitic languages. Notice the double verb marking seen in lects such as Beijingese, in these sentences meaning "today I go to Guangzhou":

Indirect object marking
Sinitic languages tend to vary greatly in how they mark indirect objects. The area which varies tends to be the placement of the indirect and direct objects.

Mandarinic, Xiang, Hui, and Min languages often place the indirect object (IO) before the direct object (DO). Some lects have switched to IO-DO structure due to Mandarin influence, such as Nanchangese and Shanghainese, though Shanghainese also has the alternative word order.

On the other hand, Gan, Wu, Hakka, and Yue languages tend to place the DO in front of the IO.

Classifiers
Like other East Asian languages such as Japanese and Korean, Sinitic languages have a system of classifers, however, use of classifiers vary greatly in features such as definiteness. In Cantonese, for instance, they can be used to mark possession, which is rare in Sinitic while common in Southeast Asia.

and are the most common generic classifiers cross-linguistically. As previously mentioned, Mandarinic languages tend to have fewer classifiers whereas the Southern non-Mandarinic varieties tend to have more.

Demonstratives
Sinitic languages can vary greatly in their system of demonstratives. Standard Mandarin and other Northeastern varieties have a two-way system: (proximal) and  (distal), but this is not the only system found in Sinitic languages.

Wuhannese has a neutral demonstrative, which can be used regardless of the distance to the deictic center. Similar systems are found in Northern Wu lects such as Suzhounese and Ningbonese.

In the above sentence, can be translated as both 'this' and 'that'. Though Wuhannese has this system of a one-term neutral system, it also has a two-way proximal-distal system. This is the same for most other lects with a one-term system.

Even within two-way systems, which is the most common system, terms could have developed to mean the opposite distance from the deitic center. Cantonese (distal) and Shanghainese  (proximal) are both etymologically from, for instance.

Many Sinitic languages have three-way systems, but the three distances are not always the same ones. For instance, whereas Guangshan Mandarin has a person-oriented proximal, medial, and distal system, Xinyu Gan has a distance-oriented close, proximal, and distal system. Gan especially has many varieties with a three-way system, sometimes even marked with tone and vowel length rather than just changing the term used.

A small number of varieties possess even four- or five-term demonstrative systems. Take for instance the following:

These two lects use tone change and vowel length respectively to distinguish between the four demonstratives.