Prosodic bootstrapping

Prosodic bootstrapping (also known as phonological bootstrapping) in linguistics refers to the hypothesis that learners of a primary language (L1) use prosodic features such as pitch, tempo, rhythm, amplitude, and other auditory aspects from the speech signal as a cue to identify other properties of grammar, such as syntactic structure. Acoustically signaled prosodic units in the stream of speech may provide critical perceptual cues by which infants initially discover syntactic phrases in their language. Although these features by themselves are not enough to help infants learn the entire syntax of their native language, they provide various cues about different grammatical properties of the language, such as identifying the ordering of heads and complements in the language using stress prominence, indicating the location of phrase boundaries, and word boundaries. It is argued that prosody of a language plays an initial role in the acquisition of the first language helping children to uncover the syntax of the language, mainly due to the fact that children are sensitive to prosodic cues at a very young age.

Argument for
The argument for prosodic bootstrapping was first introduced by Gleitman and Wanner (1982), who observed that infants might use prosodic cues (particularly acoustic cues) to discover underlying grammatical information about their native language. These cues (e.g. intonation contour in a question phrase, lengthening a final segment) could aid infants in dividing the speech input into different lexical units, and furthermore aid in placing these units into syntactic phrases appropriate to the language.

Prosodic bootstrapping may also provide an explanation to the problem as to how infants segment continuous input. Just like adult speakers, children are exposed to continuous speech. Hearing continuous speech poses a problem for children learning their native language because pauses in speech do not align with word boundaries. As a result, children have to construct word representations from the speech that they hear.

A study conducted by Christophe et al. (1994) showed that infants, aging three-days old, are sensitive to acoustic properties of a language. It was shown that three-day olds are able to discriminate bisyllabic stimuli with the same segments based on whether they were extracted from within a word or across a word boundary. The duration of the word initial consonant and the word final vowel are the cues for the existence of a word boundary, which infants may use to learn about syntactic structure.

Another main support for the prosodic bootstrapping hypothesis is that the use of prosodic elements to segment parts of speech can occur at a very early age, as early as 3 days, where infants have shown the ability to differentiate languages based on phonological characteristics alone, and the fact that the use of prosodic cues occurs before the use of lexical or syntactic data. This has led to hypothesis of "bootstrapping from the signal"/"prosodic bootstrapping", which has three main elements:
 * 1) The syntax of language is correlated with acoustic properties.
 * 2) Infants can detect and are sensitive to these acoustic properties.
 * 3) These acoustic properties can be used by infants when processing speech.

Phonological phrases
A phonological phrase boundary indicates how the continuous speech stream is broken up into smaller units, which infants use to pick out and more closely identify individual parts of the sentence. A phonological phrase can contain between four and seven syllables, and can be detected by infants, due to the fact that the edges of the phrases are either strengthened or lengthened. Various studies have been done to test if prosody helps with acquisition of syntax, morphology, and phonology.

Another acoustic cue that indicates a prosodic boundary is the duration of a pause. These pauses will usually be longer in duration at the edge of a word boundary, when referring to clause boundaries. For example, the two sentences below, while seemingly similar on the surface representation, have different prosodic structure, which correlates to the different syntactic structure ("..." = longer duration of pause in speech): Using different durations of pause, the underlying syntactic structure can be better distinguished by the listener.
 * 1) "The boy met the girl at the teach in" → [The boy]NP ... [met the girl]VP ... [at the teach in]PP
 * 2) "The boy met the girl and the teacher" → [The boy]NP ... [met the girl and the teacher]VP

Acquiring lexicon
For infants who are learning their native language, it is difficult to extract words from speech waves because pronounced words are not separated by silence. There are several proposals for lexical acquisition. The first is that children hear words in isolation: if a new piece goes between two words that are known, the new piece must be a new word. The second proposal is that there are some cues in the speech that give signal to the presence of a word boundary: duration, pitch, energy.

The fact that speech is presented in a continuous stream without pause only makes the task of acquiring a language more difficult for infants. It has been proposed that prosodic features such as the strength of certain sounds, relative to their location in the word, can be used to break apart and identify fragments within the speech stream, in order to differentiate between potentially ambiguous sentences. In English for example, the final [d] in the word "bold" tends to be "weak", in that it is not fully released. On the other hand, an initial [d] in a word such as "dime" is more clearly released, opposed to its word-final counterpart. This difference in strong v. weak sounds may help to better identify where the sound occurs in the word, whether at the beginning or the end.

Studies have shown that phonological boundaries can be interpreted as word boundaries, which further aids the child in the task of developing a lexicon. For example, Millotte et al. (2010) tested 16-month olds, observing how children use phonological phrase boundaries to constrain lexical access. When infants heard a prosodic boundary, they were able to detect the existence of a word boundary. In the experiments authors used the conditioned head-turn procedure which showed that when infants were trained to turn their heads for a bisyllabic word, they responded to sentences that contained this word more often than to those that contained both syllables of this word, but separated by a phonological phrase boundary.

Because prosodic boundaries will never occur inside of a word, thus infants will not be constrained in how they identify words in the speech signal. For example, children can differentiate between words such as "dice" and "red ice", even though both are phonologically similar. This is because a prosodic boundary will not appear in the middle of the word *(d][ice) but around the word instead ([dice]).

Children use phonological phrase boundaries to constrain lexical access. They infer the existence of a word boundary given a prosodic boundary. If two sequences differ in prosody while being made up of identical segments (pay per vs. paper), children treat them as different sequences. Studies that measured cues from prosody to phonological phrases have been done in a variety of languages that differ from each other, providing support that phonological phrases could possibly aid in acquiring lexicon universally.

Acquiring syntax
In addition to helping to identify lexical items, a key element of prosodic bootstrapping involves using prosodic cues to identify syntactic knowledge about the language. Because prosodic phrase boundaries are correlated to syntactic boundaries, listeners can determine the syntactic category of a word, using only prosodic boundary information. Christophe et al. (2008) demonstrated that adults could use prosodic phrases to determine the syntactic category of ambiguous words. Listeners were provided two sentences with an ambiguous word [mɔʀ], which could either belong to a verb category ("mord", translated as "it bites") or a noun category ("mort", translated as the adjective "dead").

The table above depicts the two sentences heard by French-speaking adults in Christophe et al. (2008), where the emboldened word is the phonetically ambiguous word, and the brackets represent phonological phrase boundaries. Using the position of the prosodic boundaries, adults were able to determine which category the ambiguous word [mɔʀ] belonged to, since the word is assigned to a different phonological phrase, depending on its syntactic category and semantic meaning in the sentence.

An important tool for acquiring syntax is the use of function words (e.g. articles, verb morphemes, prepositions) to point out syntactic constituent boundaries. These function words frequently occur in language, and generally appear at the borders of prosodic units. Because of their high frequency in the input, and the fact that they tend to have only one to two syllables, infants are able to pick out these function words when they occur at the edges of a prosodic unit. In turn, the function words can help learners determine the syntactic category of the neighboring words (e.g., learning that the word "the" [ðə] introduces a noun phrase, and that suffixes such as "-ed" require a verb to precede it). For example, in the sentence "The turtle is eating a pigeon", through the use of function words such as "the" and the auxiliary verb "is", children can get better sense as to where prosodic boundaries fall, resulting in a division such as [The turtle][is eating][a pigeon], where brackets indicate a boundary. As a result, infants tend to look out for these words to better identify the beginnings and ends of the prosodic units. Noun articles like "the" or "a", in English for example, can only be followed by noun, since they are the only words that can fit this category; one would never hear a sentence such as "The *destroy was widespread". Likewise, the use of verb morphemes (e.g. past tense "-ed" [d]/[t], continuous "-ing" [iŋ], auxiliary "is" [ɪz]) indicate that a verb must precede it, and indicate that no other word can fill the category besides a verb (e.g. *"I saw that he *happyed yesterday").

In a study by Carvalho et al. (2016), experimenters tested preschool children, where they showed that by the age of 4 prosody is used in real time to determine what kind of syntactic structure sentences could have. The children in the experiments were able to determine the target word as a noun when it was in a sentence with a prosodic structure typical for a noun and as a verb when it was in a sentence with a prosodic structure typical for a verb. Children by the age of 4 use phrasal prosody to determine the syntactic structure of different sentences.

Stress
Rhythm is an important aspect of prosody in terms of syllable timing and emphasis, and varies from language to language. Languages are grouped into different categories based on their rhythm, primarily in stress based, rhythm (syllable) based, and mora based categories. Infants around 6 months of age have shown to be able to differentiate between different languages solely on the basis of these particular stress differences. More specifically, infants by 2 months of age can from vague categories of different rhythmic structures, those that are native classes, and those that are nonnative. Before reaching 2 months, infants can distinguish between languages of any class, but by the age of 2 months can only put languages in the native or nonnative class. For example, English speaking infants will have a hard time differentiating between English and Dutch (since both are stress based languages), but will be able to distinguish Russian (a stress based language) and Japanese (a mora based language). By 2 months, however, an English-speaking baby will group syllable-timed and mora-timed languages into one "nonnative" group, and thus will have a hard time differentiating languages such as French (syllable-timed) and Japanese (mora-timed). This stress variance is also a useful tool for bilingual infants, and acts as a strong indicator when differentiating between different languages being learned.

Detecting head direction
The question of whether the head direction parameter can be detected using prosodic cues has been tested with French babies listening to Turkish sentences, in order to determine whether or not 6 to 12 weeks old babies are sensitive to prosodic prominence in speech. Setting the head direction parameter allows infants to acquire a hierarchal branching structure for a particular language, which determines whether the language is left-headed (right-branching) or right-headed (left-branching). This particular experiment (Christophe et al. 2003) had 6- to 12-week-old babies listening to modified "nonsense" (the modified French and modified Turkish sentences in the table below) sentences that were neither French nor Turkish, but only differed in the fact that the Turkish-based sentences were head final and French based sentenced were head initial. The reasoning behind this is that infants might be able detect prominence within these phonological phrases, as prominence has been shown to follow a systematic pattern with languages; head-initial languages have prominence on the right (French), while head-final languages have prominence on the left (Turkish).

These nonsense sentences were created in order to eliminate any non-prosodic interference (e.g. phonological differences, different number of syllables, etc.) thus babies would only be able to differentiate between the two languages based on the prominence of prosodic cues in the sentences.

Jusczyk et al. (1992) tested 9 month-olds, where they showed that infants are sensitive to acoustic correlates of main phrasal units that are present in the prosody of English sentences. The prosodic markers in the input are longer durations of the syllable that precedes a main phrasal boundary and declinations in fundamental frequency.

Computational modeling
Several language models have been used to show that in a computational simulation, prosody can help children acquire syntax.

In one study, Gutman et al. (2015) build a computational model that used prosodic structure and function words to jointly determine the syntactic categories of words. The model assigned syntactic labels to prosodic phrases with success, using phrasal prosody to determine the boundaries of phrases, and function words at the edges for classification. The study presented the model of how early syntax acquisition is possible with the help of prosody: children access phrasal prosody and pay attention to words placed at the edges of prosodic boundaries. The idea behind the computational implementation is that prosodic boundaries signal syntactic boundaries and function words that are used to label the prosodic phrases. As an example, the sentence "She's eating a cherry" has a prosodic structure such as [She's eating] [a cherry] where the skeleton of a syntactic structure is [VN NP] (VN is for verbal nucleus where a phrase contains a verb and adjacent words such as auxiliaries and subject pronouns). Here, children may utilize their knowledge of function words and prosodic boundaries in order to create an approximation of syntactic structure.

In a study by Pate et al. (2011), where a computational language model was presented, it was shown that acoustic cues can be helpful for determining syntactic structure when they are used with lexical information. Combining acoustic cues with lexical cues may usefully provide children with initial information about the place of syntactic phrases which supports the prosodic bootstrapping hypothesis.

Criticism
A key criticism of the bootstrapping theory in general is that these mechanisms (whether they be syntactic, semantic, or prosodic) serve mainly as a starting point for learning the language. That is, the bootstrapping mechanisms are only useful up to a certain point in linguistic development for infants, and thus there might be some other mechanism that might be used later on, since the bootstrapping mechanisms primarily use information that is not controlled for "cross-linguistic variation" (information that varies from language to language).

Regarding prosodic bootstrapping in particular, there is speculation on how accurately prosodic phrases map to syntactic structure. That is, phrases with identical syntactic structure can have different possible prosodic structures. In the sentence "The cat chased the rat that ate the cheese.", the prosodic structure would resemble:

[The cat] [chased the rat] [that ate the cheese]

However, the prosodic unit [chased the rat] in this case is not a syntactic constituent, demonstrating that not every prosodic unit is a syntactic unit. Rather, one can observe that a language may not always provide one-to-one mapping from prosodic information to linguistic units. Prosody does not give children direct and systematic information from prosodic structure to linguistic structure.

Jusczyk (1997) argued that most people who accept this theory assume that children are drawing on "a range of information available in the speech signal that extends beyond prosody", further explaining that relying on prosodic information alone is not enough to learn the structure of the language.