Khasi language

Khasi (Ka Ktien Khasi) is an Austroasiatic language with just over a million speakers in north-east India, primarily the Khasi people in the state of Meghalaya. It has associate official status in some districts of this state. The closest relatives of Khasi are the other languages in the Khasic group of the Shillong Plateau; these include Pnar, Lyngngam and War.

Khasi is written using the Latin script. In the first half of the 19th century, attempts to write Khasi in Bengali-Assamese script met with little success.

Geographic distribution and status
Khasi is natively spoken by people in India (as of 2011). It is the first language of one-third of the population of Meghalaya, or, and its speakers are mostly found in the Khasi Hills and Jaintia Hills regions. There are also small Khasi-speaking communities in neighbouring states of India, the largest of which is in Assam: people. There is also a very small number of speakers in Bangladesh.

Khasi has been an associate official language of some districts within Meghalaya since 2005, and as of 2012, was no longer considered endangered by UNESCO. There are demands to include this language to the Eighth Schedule to the Constitution of India.

A sizeable number of books have been published in Khasi, including novels, poetry, religious works, school textbooks and non-fiction. The most famous Khasi poet is U Soso Tham (1873–1940), whose death is commemorated annually as a regional holiday in the state of Meghalaya. Khasi has a good presence on the internet, including blogs and several online newspapers.

Dialects
Khasi has significant dialectal variation, and this presents a challenge with regard to classifying the Khasic languages.

Some dialects of Khasi include:
 * Sohra Khasi
 * Mylliem Khasi
 * Mawlai Khasi
 * Nongkrem Khasi
 * War Khasi, not to be confused with the closely associated War language
 * Bhoi Khasi
 * Nonglung

In addition, Pnar, Maram (including Langrin) and Lyngngam have been listed as types of Khasi, although more recent studies seem to indicate that these are sister languages to Khasi, and that Khasi actually began as a marginal Pnar dialect.

Bhoi, from Nongpoh, and Nonglung from Umsning, in Ri Bhoi District, differ substantially from Standard Khasi in their word order. They are distinct enough from Standard Khasi to be sometimes considered separate languages, with Bhoi sometimes classified as intermediate between Khasi and Pnar, and Nonglung being part of Mnar, variously classified as a type of War or of Pnar. On the other hand, Sohra and War Khasi are lexically very similar.

The Sohra dialect is taken as Standard Khasi, as it was the first dialect to be written in Latin and Bengali scripts by the British. While Standard Khasi is spoken by majority in Shillong, it is in turn significantly different from the other Shillong dialects (eight at most) which form a dialect continuum across the capital region.

Phonology
This section discusses mainly the phonology of Standard Khasi as spoken in and around the capital city, Shillong.

Khasi, mainly spoken in Meghalaya, is surrounded by unrelated languages: Assamese to the north and east, Sylheti to the south (both Indo-Aryan languages), Garo (a Tibeto-Burman language) to the west, and a plethora of other Tibeto-Burman languages including Manipuri, Mizo and Bodo.

Although over the course of time, language change has occurred, Khasi retains some distinctive features:
 * Khasi remains a stress language, without tones, unlike many of its Tibeto-Burman neighbors.
 * Like its Mon-Khmer relatives, Khasi has a large inventory of phonemic vowels (see below)
 * The syllable structure of Khasi words resembles that of many Mon-Khmer languages, with many lexical items showing a CCVC shape, in which many combinations of consonants are possible in the onset (see examples below).

Script
Before British colonization, some of the Khasi Syiems (Royals) used to keep official records and communicate with one another on paper primarily using the Bengali script. William Carey wrote the language with the Bengali script between 1813 and 1838. A large number of Khasi books were written in the Bengali script, including the famous book Ka Niyom Jong Ki Khasi or The Religion of the Khasis, which is an important work on the Khasi religion.

The Welsh missionary, Thomas Jones, arrived in Sohra on June 22, 1841, and proceeded to write down the local language in the Latin script. As a result, the modified Latin alphabet of the language has a few similarities with the Welsh alphabet. The first journal in Khasi was U Nongkit Khubor (The Messenger) published at Mawphlang in 1889 by William Williams.

Khasi alphabet
Khasi in Latin script has a different system, distinct from that of English. Khasi uses a 23-letter alphabet by removing the letters c, f, q, v, x and z from the basic Latin alphabet and adding the diacritic letters ï and ñ, and the digraph ng, which is treated as a letter in its own right. The diagraph ng is also present in Welsh alphabet.

Note
 * Vowel length is not usually marked in the orthography, although it can be marked optionally by an acute accent (sim "bird" vs. ''rí  "country").
 * The peculiar placement of k is due to it replacing c. c and ch were originally used in place of k and kh. When c was removed from the alphabet, k was put in its place.
 * The inclusion of g is only due to its presence in the letter ng. It is not used independently in any word of native origin.
 * h represents both the fricative sound as well as the glottal stop(ʔ) word-finally.
 * y is not pronounced as in year, but acts as a schwa(ə), and as a glottal stop between vowels. The sound in  year is written with ï.

Lost Khasi Script
A local legend tells of how the Khasi people received their script from God, and that subsequently the Khasi people lost their script in a great flood. In 2017, it was reported that there is evidence of an undeciphered script, currently stored at the Kamarupa Anusandhan Samity Library in Guwahati, Assam, that is considered to be Khasi in origin.

Grammar
Khasi is an Austroasiatic language and has its distinct features of a large number of consonant conjuncts, with prefixing and infixing.

Word order
The order of elements in a Khasi noun phrase is (Case marker)-(Demonstrative)-(Numeral)-(Classifier)-(Article)-Noun-(Adjective)-(Prepositional phrase)-(Relative clause), as can be seen from the following examples:

ar tylli ki sim

two CL PL bird

'two birds'

kato ka kynthei kaba wan mynnin

that:FEM FEM girl FEM-relative come yesterday

'that girl who came yesterday'

ka kmie jong phi

FEM mother of you

'your mother'

Gender
Khasi has a pervasive gender system. There are four genders in this language:

Humans and domestic animals have their natural gender:


 * ka kmie "mother"
 * u kpa "father"
 * ka syiar "hen"
 * u syiar "rooster"

Rabel (1961) writes: "the structure of a noun gives no indication of its gender, nor does its meaning, but Khasi natives are of the impression that nice, small creatures and things are feminine while big, ugly creatures and things are masculine....This impression is not borne out by the facts. There are countless examples of desirable and lovely creatures with masculine gender as well as of unpleasant or ugly creatures with feminine gender"

Though there are several counterexamples, Rabel says that there is some semantic regularity in the assignment of gender for the following semantic classes:

The matrilineal aspect of the society can also be observed in the general gender assignment, where so, all central and primary resources associated with day-to-day activities are signified as Feminine; whereas Masculine signifies the secondary, the dependent or the insignificant.

Note: However do note that there are no such universal rules for gender assignment of nouns in Khasi. There are a lot of exceptions and one such is syntiew (flower) which is stereotypically considered feminine but is accompanied with masculine gender signifier "u" i.e. u syntiew. This gender assignment to nouns is highly depended on what the native speakers assign the noun which they all naturally agree upon but which can vary sometimes like according to the mood or tone.

Classifiers
Khasi has a classifier system, apparently used only with numerals. Between the numeral and noun, the classifier tylli is used for non-humans, and the classifier ngut is used for humans, e.g.

Don ar tylli ki sim ha ruh.

there:are two CL PL bird in cage

'There are two birds in the cage.'

Don lai ngut ki Sordar ha shnong.

there:are three CL PL chief in village

'There are three chiefs in the village.'

Adjectives
There is some controversy about whether Khasi has a class of adjectives. Roberts cites examples like the following:

u briew ba-bha

MASC man REL-good

'a good man'

In nearly all instances of attributive adjectives, the apparent adjective has the prefix /ba-/, which seems to be a relativiser. There are, however, a few adjectives without the /ba-/ prefix:

u 'riew sníew

MASC man bad

'a bad man'

When the adjective is the main predicate, it may appear without any verb 'be':

U ksew u lamwir.

MASC dog MASC restless

'The dog is restless.'

In this environment, the adjective is preceded by an agreement marker, like a verb. Thus it may be that Khasi does not have a separate part of speech for adjectives, but that they are a subtype of verb.

Prepositions and prepositional phrases
Khasi appears to have a well-developed group of prepositions, among them
 * bad "with, and"
 * da "with (instrumental)"
 * na "from"
 * ha "in, at"
 * sha "in, at"
 * jong "of"

The following are examples of prepositional phrases:

ka kmie jong phi

FEM mother of you

'your mother'

u slap u ther na ka bneng

MASC rain MASC pour from FEM sky

'Rain poured from the sky.'

Agreement
Verbs agree with 3rd person subjects in gender, but there is no agreement for non-3rd persons (Roberts 1891):

The masculine and feminine markers /u/ and /ka/ are used even when there is a noun phrase subject (Roberts 1891:132):

Ka miaw ka pah.

FEM cat FEM meow

'The cat meows.’

Tense marking
Tense is shown through a set of particles that appear after the agreement markers but before the verb. Past is a particle /la/ and future is /yn/ (contracted to 'n after a vowel):

Negation
Negation is also shown through a particle, /ym/ (contracted to 'm after a vowel), which appears between the agreement and the tense particle. There is a special past negation particle /shym/ in the past which replaces the ordinary past /la/ (Roberts 1891):

Copulas
The copula is an ordinary verb in Khasi, as in the following sentence:

U Blei u long jingïeid.

MASC God MASC be love

'God is love’

Causative verbs
Khasi has a morphological causative /pn-/ (Rabel 1961). (This is spelled pyn in Roberts (1891)):

Word order
Word order in simple sentences is subject–verb–object (SVO):

U ksew u bam doh.

MASC dog MASC eat flesh

'The dog eats meat.’

However, VSO order is also found, especially after certain initial particles, like hangta 'then' (Rabel 1961).

hangta la ong i khnai ïa ka Naam

then PAST say dimin mouse ACC FEM Naam

'Then said the (little) mouse to Naam ... '

Case marking
Sometimes the object is preceded by a particle ya (spelled ia in Roberts 1891). Roberts says "ia, 'to', 'for', 'against' implies direct and immediate relation. Hence its being the sign of the dative and of the accusative case as well"

U la ái ïa ka kitab ïa nga.

MASC PAST give ACC FEM book ACC me

'He gave the book to me.'

It appears from Roberts (1891) that Khasi has differential object marking, since only some objects are marked accusative. Roberts notes that nouns that are definite usually have the accusative and those that are indefinite often do not.

Rabel (1961) says "the use of ïa is optional in the case of one object. In the case of two objects one of them must have ïa preceding.... If one of the objects is expressed by a pronoun, it must be preceded by ïa."

Broadly speaking, Khasi marks for eight cases, with the nominative case remaining unmarked, for a total of nine cases.

All case markers can appear with or without the prenominal markers/articles u, ka, i and ki, and are placed before the prenominal markers.

Passive
Khasi has a passive, but it involves removing the agent of the sentence without putting the patient in subject position. (A type called the 'non-ascensional passive'). Compare the following active-passive pair (Roberts 1891) where the patient continues to have accusative case and remains in the object position:

Ki dang tháw ïa ka ïing da ki dieng..

PL contin build ACC FEM house with PL wood

'They are building the house with wood.'

Dang tháw ïa ka ïing.

contin build ACC FEM house

'The house is being built.'

This type of passive is used, even when the passive agent is present in a prepositional phrase:

La lah pyniap ïa ka masi da {U Míet}.

PAST PFV kill ACC FEM cow by {{{no gloss|U}} Miet}

'The cow was killed by U Miet.'

Questions
Yes–no questions seem to be distinguished from statements only by intonation:

Phi kit khoh Til?

you {are carrying} {a basket} Til?

'Will you take a basket, Til? Phin shim ka khoh, Til?

Wh-questions don't involve moving the wh-element:

u leit shaei?

MASC go where

Where is he going?'

Embedded clauses
Subordinate clauses follow the main verb that selects them (Roberts 1891:169):

Nga tip ba phi la leh ia kata.

I know that you PAST do ACC that

'I know that you have done that'

Relative clauses follow the nouns that they modify and agree in gender:

Ka {samla kynthei} ka-ba wan mynhynnin ka la iáp.

FEM girl FEM-relative come yesterday FEM PAST die

'The girl who came yesterday has died.'

Contractions
A variety of Khasi prepositions and other words are contracted or reduced both in spoken and written language. One of the most common form of contractions is when a pronoun is grouped with the verbs "yn" or "ym" (for e.g. u yn contracts to u'n). Or when a preposition is grouped with a vowel-like gender identifier such as "u" and "i" (for e.g. ha u contracts to h'u).

Reduced words
Reduced form of words are common in the Khasi language. Most of the time, one or a couple of letters are dropped at the beginning of a word (for e.g. briew can become 'riew). There's no clear rule behind this process but usually these words that undergo reduction begins with more than one consonants; the reduced word is accompanied by an apostrophe from the start to mark so. The reduced form of the word is still understood by its context of usage and since its last inner syllabus and letters (i.e. rhyme) are always preserved.

These reduced forms of words are mostly seen in compound forms where the reduced word is affixed with other words to give rise to new words with new meanings. In compound form, the apostrophe is not used anymore. For e.g. 'riew as in riewkhlaw, riewspah, riewhyndai etc.

Article 1 of the Universal Declaration of Human Rights
Khasi Alphabet

Ïa ki bynriew baroh la kha laitluid bad ki ïaryngkat ha ka burom bad ki hok. Ha ki la bsiap da ka bor pyrkhat bad ka jingïatiplem bad ha ka mynsiem jingsngew shipara, ki dei ban ïatrei bynrap lang.

(Jinis 1 jong ka Jingpynbna-Ïar Satlak ïa ki Hok Longbriew-Manbriew)

Assamese script যা কি বৃনৰ‌্যের বাৰহ লা খা লাচলোছ বাড কী যৰূঙ্কট হা কি বুৰম বাড ক হক. হাকি লা বৃস্যপ দা ক বৰ-পৃৰ্খট বাড ক চিংযাতিপলেম বাড হা ক মৃন্স্যেম চিংস্ঙেউ শীপাৰা, কী দেই বাণ যত্ৰেই বৃনৰাপ লাং.

(জিনিস বানৃঙ্গং জং ক চিংপৃনবৃনা-যাৰ সত্লাক যা কি হক লংব্ৰ্যের-মানব্র্যের.)

IPA

jaː ki bɨnreʊ baːrɔʔ laː kʰaː lacloc bat ki jaːrɨŋkat haː kaː burɔm bat ki hɔk. haː ki laː bsjap daː kaː bɔːr pɨrkʰat bat kaː dʒɪŋjaːtɪplɛm bat haː kaː mɨnseːm dʒɨŋsŋɛʊ ʃiparaː ki dɛɪ ban jaːtrɛɪ bɨnrap laŋ

(dʒinɪs banɨŋkɔŋ dʒɔŋ kaː dʒɨŋpɨnbnaː-jaːr satlak jaː ki hɔk lɔŋbreʊ manbreʊ)

Gloss

To the human all are born free and they equal in the dignity and the rights. In them are endowed with the power thought and the conscience and in the spirit feeling fraternity they should to work assist together.

(Article first of the Declaration Universal of the Rights Humanity)

Translation

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should work towards each other in a spirit of brotherhood.