Sotho tonology

Like most other Niger–Congo languages, Sesotho is a tonal language, spoken with two basic tones, high (H) and low (L). The Sesotho grammatical tone system (unlike the lexical tone system used in Mandarin, for example) is rather complex and uses a large number of "sandhi" rules.

However, the Sesotho system is by no means the most complicated, nor even one of the more complicated. For example, there exist African grammatical tone languages with much more than just two tonemes, and the existence of breathy voiced consonants in the Nguni and other languages greatly complicates their tonology. (In Sesotho there is absolutely no interaction whatsoever between the tonemes and phones of the syllables.) There are also very few instances of "floating" tones, and fewer grammatical constructs indicated purely by a change in tone. (The most common instances of this are rule 1 of the plain copulative and the formation of many positive participial sub-mood clauses.) The rules are generally not very dramatic either, and there is generally a very strong tendency to preserve underlying high tones. (For example, in the Nguni languages the underlying high tone of verb stems, subjectival concords, the noun pre-prefix, and/or objectival concords often shifts several syllables to the right, to the antepenultimate or penultimate syllable.)

The tone of a syllable is carried by the vowel, or the nasal, if the nasal is syllabic. The tone carried by syllabic (and, in Northern Sotho and Setswana, syllabic $⟨r⟩$ is left over from the elided vowel.

Tone types
Underlyingly, each syllable of every morpheme may be described as having one of two tone types: high (H [ ¯ ]) and null (ø). On the surface, all remaining null tones default to low (the LTA rule below) and the language is therefore spoken with two contrasting tonemes (H and L).

A classic example of a nasal carrying a tone:


 * To form a locative from a noun, one of the possible procedures involves simply suffixing a low tone to the noun. To form the locative meaning "on the grass" one suffixes -ng to the word  jwang‡ [ _ ¯ ], giving jwanng‡ [ _ ¯ _ ], with the two last syllabic nasals having contrasting tones.

Names, being nouns, frequently have a tonal pattern distinct from the noun:


 * The Sesotho word for "mother" is mme‡ [ _ ¯ ], but a child would call their own mother  mme‡ [ ¯ _ ], using it as a first name. Also,  ntate‡ [ _ _ ¯ ] means "father", while  ntate‡ [ _ ¯ ¯ ] might be used by a small child to address their father.

Allotones
In speech, the two surface tonemes may be pronounced as one of several allotones due to the influence of surrounding tones and the length of the syllable. These changes naturally occur due to the way the language is spoken, including the effect of the penultimate lengthening, but ultimately each syllable of every morpheme may be completely described as having only high and low tones.

In this and related articles, the tonemes of a word are delimited with square brackets and the specific (approximate) spoken allotones are between curly braces.


 * lepata euphemism; tonemes: [ _ ¯ _ ] (L—H — L), allotones: { _ ＼ _ } (low—high-falling—low)

Thus in all there are, at least in our analysis, eight allotones { ¯ ¯ – _ _ ﹨ ﹨ ＼ }.

Most of these allotones only appear on the final word in the phrase in moderately slow or emphasised speech. When not phrase-final, the mid, high-falling, high-mid, low-falling, and extra-low allotones are normally not heard. Bear in mind that the falling tones only occur on lengthened syllables, and if a word has irregular stress then the falling tones will not appear on the penult (for example, the second form of the first demonstrative pronoun has tonemic pattern [ ¯ ¯ ] which is pronounced { ¯ ＼ } due to the stressed final syllable).

There are no rising tones. For example, [ _ ¯ ] (where the L is penultimate) is pronounced { ﹨ – } though one might have expected *{ ／ ¯ }. This is a general trend among almost all Bantu languages with (contrastive or stressed) lengthened vowels, though languages with depressor consonants do have audible upward "swoops" on depressing syllable onsets which may be interpreted as rising allotones.

There are several cases of seemingly tonemic instances of some of these allotones. As expected, some ideophones and radical interjectives have strange tones, but relative concord has an irregular extra-high tone (except when used to form demonstrative pronouns). The difference in relative pitch between the high tone and its extra-high allotone is less than that between the low and high tones.

Tone usage
The purpose of the tones can fall into at least one of the following categories:

Characteristic tone
Each complete Sesotho word has an inherent tone for its syllables, which, although not essential to forming correct speech, will betray a foreign accent:


 * motho‡ [ _ _ ] ('human being')
 * ntja‡ [ _ ¯ ] ('dog')
 * Mosotho‡ [ _ ¯ _ ] ('singular of Basotho')
 * lerata‡ [ _ _ ¯ ] ('noise')

Various factors mean that the tones of a word may change, but the characteristic tone in a Sesotho word is found when the word is the last in a question sentence not employing the interrogative adverb na?. In this situation, downdrift is greatly attenuated, the penultimate syllable of the sentence is short (although the vowel of the last syllable may completely cut), and the tone of the last word is largely preserved (though a final H tone may fall to L).


 * O batla ho eba setsebi‡ { _ ＼ _ } ('you want to be a scientist')
 * Na o batla ho eba setsebi?‡ { _ ＼ _ } ('do you want to be a scientist?')
 * O batla ho eba setsebi?‡ { _ ¯ _ } ('do you want to be a scientist?')

Distinguishing/semantic tone
The most important property of tonal languages which distinguishes them from languages that merely use pitch as part of intonation (such as English) is the existence of numerous tonal minimal pairs. Often, a few words may be composed of exactly the same syllables/phonemes, yet have different characteristic tones (the example H verbs have low final tone due to the Finality Restriction):


 * ho aka‡ [ _ ¯ _ ] ('to kiss')
 * ho aka‡ [ _ _ _ ] ('to tell lies')


 * jwang‡ [ _ ¯ ] ('grass')
 * jwang<sup title="Included in audio sample">‡ [ ¯ _ ] ('how?')


 * ho tena<sup title="Included in audio sample">‡ [ _ ¯ _ ] ('to wear')
 * ho tena<sup title="Included in audio sample">‡ [ _ _ _ ] ('to disgust')

There are, however, several basic homophones pronounced with exactly the same tonal patterns. In these cases only the context may be used to distinguish between the different meanings.


 * -laola L verb (i) 'rule'; (ii) 'divine'
 * -rola H verb (i) 'to forge metal', 'to hammer'; (ii) 'to undress'
 * mohlwa [ _ ¯ ] (i) 'termite'; (ii) 'lawn grass (of the graminaceae family)'

There are instances of words being changed either through inflexion or derivation and as a result ending up sounding exactly like other words.


 * hlolo [ _ _ ] (i) 'hare', (ii) 'creation' (from the L verb -hlola)

Grammatical tone
It regularly occurs that two otherwise similar sounding phrases may have two very different meanings mainly due to a difference in tone of one or more words or concords.


 * Ke ngwana wa hao<sup title="Included in audio sample">‡ [ _ _ ¯ ¯ ¯ _ ] ('I am your child')
 * Ke ngwana wa hao<sup title="Included in audio sample">‡ [ ¯ _ ¯ ¯ ¯ _ ] ('he/she/it is your child')
 * O mobe<sup title="Included in audio sample">‡ [ _ _ ¯ ] ('you are ugly')
 * O mobe<sup title="Included in audio sample">‡ [ ¯ _ ¯ ] ('he/she is ugly')
 * Ke batlana le bona<sup title="Included in audio sample">‡ [ _ _ _ _ ¯ _ _ ] ('I am looking for them' present indicative mood)
 * Ke batlana le bona<sup title="Included in audio sample">‡ [ ¯ ¯ _ _ ¯ _ _ ] ('as I was looking for them' participial sub-mood; this is not a complete sentence but part of a longer sentence)

Note that when grammatical tone is used the tone of the significant word may influence the relative pitch of the rest of the phrase, although the tones of other words tend to remain intact.

Downdrift
Downdrift, where the absolute pitch (not tones) of the speaker's voice is gradually decreased as the sentence continues (often resulting in initial low tones being pronounced at a higher pitch than final high tones), is a feature during natural speech. Basically, a high tone immediately following a low tone is pronounced at a slightly lower frequency than a previous high tone.

Additionally, a slightly more dramatic lowering of pitch (a downstep) may occur between certain syllables. In Sesotho, the downstep (indicated with a !) naturally occurs between words (being less noticeable if the first word has no low tones) though there is at least one instances (in rule 1 of the plain copulative) where the lack of downstep (as well as other tonal factors) changes the utterance's meaning. In the following example, a grave accent (à) indicates a low tone and an acute accent (á) indicates a high tone.

This downdrift is greatly attenuated when the sentence is a question not using the interrogative adverb na?.

Verb tone
Sesotho verb stems fall into two categories: H stems and L stems. The difference lies in the "underlying tone" of the stem's first syllable (or the stem's "basic tone") being either high or null. When used with an object in the indicative remote future tense (the simple -tla- tense) the verb's stem is monotonous (all syllables high toned or all low toned) with the underlying tone of the first syllable spread to all the following syllables.

Nouns derived from the verb stem are fossilised with the tones of the simple class 15 infinitive as appears in medial positions without a subject or object. The procedure for creating this tonal pattern is intricate and involves several tonal rules.

These factors may also apply in normal verbal conjugations. Adding a verbal suffix (through derivation, not inflexion) creates a new verb stem which falls in the same tone category as the original, and is subject to the same rules.


 * -paqama (L verb stem) lie (face downwards) ⇒ ho paqama [ _ _ _ _ ] ('to lie') ⇒  ho paqamisa [ _ _ _ _ _ ] ('to cause to lie') ⇒  ho paqamisuwa [ _ _ _ _ _ _ ] ('to be caused to lie'), etc.
 * -ahlola (H verb stem) ('judge') ⇒ ho ahlola [ _ ¯ ¯ _ ] ('to judge')  kahlolo [ ¯ ¯ _ ] ('judgement'),  moahlodi [ _ ¯ ¯ _ ] ('judge'),  boahlodi [ _ ¯ ¯ _ ] ('state of being a judge')

The tones of the noun prefixes of nouns derived from verbs are independent of the tones of the stem.

Some nouns derived from verbs have idiomatic tonal patterns independent of the original verb stem's tones.


 * -loka (L verb stem) ('be sufficient') ⇒ -lokela ('be sufficient for') ⇒  tokelo ('human right'; irregular tone [ _ ¯ _ ] instead of the expected [ _ _ _ ])

Several "tonal melodies" may be assigned to certain verbal conjugations based on the desired tense, aspect, and mood (for example, with many verb conjugations the only difference between the indicative mood and the participial sub-mood is one of tone). These are applied before most other rules and may be indicated by a code including the symbols H (high tone), L (low tone), B (verb stem's basic tone), and * (iteratively applying the preceding tone).

For example, applying the (present) "Subjunctive Melody" (HL*H) to the H verb stem -bona ('see') and the L verb stem  -sheba ('look for') results in both  ke shebe tau ('so I may look at the lion') and  ke bone tau ('so I may see the lion') being pronounced with exactly the same tone pattern [ ¯ ¯ ¯ _ ¯ ].

Another way to designate the melodies is to use a standard template of the tense in question and indicate the melody by assigning tones to specific syllables in the resultant word (for example, the final syllable, the subjectival concord, etc.). So for the above example the Subjunctive Melody (actually, present-future subjunctive) may be specified by putting H tones on the first syllable (the subjectival concord's basic tone is ignored), the second syllable, and final syllable of the word and putting an explicit L tone on the fourth syllable (unless if the verb is disyllabic, in which case the fourth syllable is the final syllable and has an H tone)—thus preventing HTD.

Tonal rules
Sesotho is a grammatical tone language; this means that words may be pronounced with varying tonal patterns depending on their particular function in a sentence. Another interpretation is that the tones of the language interact in their own intricate "tonal grammar."

In order to create certain grammatical constructs, certain tonal rules may be used to modify the underlying tones of the word to create their surface tones. The words are then spoken using the surface tones.

This system is naturally somewhat complex. Indeed, the development of autosegmental phonology was largely motivated by the need for a satisfactory theoretical framework to deal with the tonal grammars of Niger–Congo languages. This article attempts to explain certain aspects of Sesotho tonology in a rule-based autosegmental framework.

The rules presented below are almost exclusively used in constructing the verbal complex as this is the part of speech most radically affected by the tonal grammar.

About autosegmental phonology
Autosegmental phonology was motivated by the need to represent properties which seem to span several "segments" (in our case, syllables) and seem to be somewhat independent of them. Underlyingly (that is, in the speaker's lexicon), some, but not necessarily all, of the segments of morphemes are associated with one or more properties. The segments are on one "tier" and their properties are on another, and the relationships between the two are indicated by joining them with association lines as follows:

Each of the rules changes the associations in some way. For example, High Tone Doubling (HTD) causes the underlying H tone on the first syllable of the verb to also be linked to the syllable immediately to the right:

In this article, the application of several rules in succession will be indicated with the following abbreviation:

The fact that the line emanating from the second syllable is only linked on the HTD line means that this is the first time that syllable is associated with that property.

Typology
One popular classification of tonal Bantu languages broadly separates them into two group: shifting languages and spreading languages. The Sotho–Tswana languages are bounded spreading languages as they have primitive rules which directly cause underlying high tones to be associated with (spread to) syllables to the right. The closely related Nguni languages, on the other hand, are unbounded shifting languages as they have primitive rules which directly cause underlying high tones to be moved (shifted to) syllables to the right. The following table presents an informal comparison between the tonal processes found in Sesotho and isiZulu (undefined = isiZulu, undefined = Sesotho):

In the table, a process is unbounded if there is no set limit on the number of syllables over which it may occur. Sesotho has basic bounded spread (High Tone Doubling) and isiZulu has basic unbounded shift. Bounded shift in Sesotho occurs as the cumulative effect of bounded right tone spread (High Tone Doubling) and Left Branch Delinking, while various forms of spreading may occur in isiZulu if the word is very short or has two or more underlying highs.

Some tonal rules
In dealing with verbs, the following rules may be applied at various times:


 * High Tone Doubling (HTD) causes the H tone found on the first syllable of the verb stem, or on an H toned subjectival concord (whether it is used as part of a verb or a copulative), to be spread to (associated with) the syllable immediately to the right. For example, ("They see" with no direct object; the bullets • are used here to join the parts of single words which would have been written separately in the current disjunctive orthography):


 * Iterative Tone Spread (ITS) causes the H tone found on the first syllable of the verb stem to be spread repeatedly to the right until the end of the verb complex. This rule is only applied in certain situations (such as when forming the perfect). For example, ("I have bought for..." with two direct objects):


 * Right Branch Delinking (RBD) is an application of the obligatory contour principle which causes an H tone spread from a subjectival concord to a verbal auxiliary infix or objectival concord immediately to the left of the verb stem to be removed (delinked) if the verb stem is an H stem. For example, ("They see"):


 * Left Branch Delinking (LBD) is an application of the "obligatory" contour principle which causes the H tone on the first syllable of an H verb stem to be delinked if the stem immediately follows an H toned subjectival concord, resulting in tonal pattern (HøH). This rule is idiolectical and is not applied by all Sesotho speakers. For example, ("They see..." when used with a direct object):


 * The Finality Restriction (FR) causes any H tones spread to the final syllable of the verb complex to be removed. This rule is not applied under all circumstances, and is never applied if the verb's stem is monosyllabic (that is, it never delinks the H tone on the verb stem's first syllable). It is also never applied when the verb is immediately followed a direct object (therefore it doesn't undo ITS, or the high tone copied to a disyllabic H verb's last syllable if it is immediately followed by an object). For example, ("I love" with no direct object):


 * Low Tone Assignment (LTA) is the very last rule applied and is always applied in all circumstances (not just when dealing with verbs). It simply assigns all unlinked segments (that is, segments with null tone) with an L tone. For example, ("She is looking on behalf of" with two direct objects):

Some examples
To construct many verb forms, including many positive indicative tenses without direct objects as well as infinitives, the following rules are applied in order:

Note that the three main levels are always applied in this order, though the actual rules contained in the levels will change depending on the parts of speech, verb moods, etc. For the word o a bina ('she is singing') the application of the rules is as follows:

The word appears on the surface with tonal pattern [ ¯ _ ¯ _ ].

Furthermore, the second last syllable of the word is lengthened (or "stressed"), and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels { ¯ _ ＼ _ }.

Extending the word by one syllable ( o a bintsha 'She is conducting'):

The word appears on the surface with tonal pattern [ ¯ _ ¯ ¯ _ ] (the high beneath the third syllable is associated with two syllables).

The second last syllable of the word is lengthened and the interaction of the tones as well as the penultimate lengthening results in the word being pronounced with pitch levels { ¯ _ ¯ ＼ _ }.