Bootstrapping (linguistics)

Bootstrapping is a term used in language acquisition in the field of linguistics. It refers to the idea that humans are born innately equipped with a mental faculty that forms the basis of language. It is this language faculty that allows children to effortlessly acquire language. As a process, bootstrapping can be divided into different domains, according to whether it involves semantic bootstrapping, syntactic bootstrapping, prosodic bootstrapping, or pragmatic bootstrapping.

Etymology
In literal terms, a bootstrap is the small strap on a boot that is used to help pull on the entire boot. Similarly in computer science, booting refers to the startup of an operation system by means of first initiating a smaller program. Therefore, bootstrapping refers to the leveraging of a small action into a more powerful and significant operation.

Bootstrapping in linguistics was first introduced by Steven Pinker as a metaphor for the idea that children are innately equipped with mental processes that help initiate language acquisition. Bootstrapping attempts to identify the language learning processes that enable children to learn about the structure of the target language.

Connectionism
Bootstrapping has a strong link to connectionist theories which model human cognition as a system of simple, interconnected networks. In this respect, connectionist approaches view human cognition as a computational algorithm. On this view, in terms of learning, humans have statistical learning capabilities that allow them to problem solve. Proponents of statistical learning believe that it is the basis for higher level learning, and that humans use the statistical information to create a database which allows them to learn higher-order generalizations and concepts.

For a child acquiring language, the challenge is to parse out discrete segments from a continuous speech stream. Research demonstrates that, when exposed to streams of nonsense speech, children use statistical learning to determine word boundaries. In every human language, there are certain sounds that are more likely to occur with each other: for example, in English, the sequence [st] is attested word-initially (stop), but the sequence *[gb] occurs only across a syllable break.

It appears that children can detect the statistical probability of certain sounds occurring with one another, and use this to parse out word boundaries. Utilizing these statistical abilities, children appear to be able to form mental representations, or neural networks, of relevant pieces of information. Pieces of relevant information include word classes, which in connectionist theory, are seen as each having an internal representation and transitional links between concepts. Neighbouring words provide concepts and links for children to bootstrap new representations on the basis of their previous knowledge.

Innateness
The innateness hypothesis was originally coined by Noam Chomsky as a means to explain the universality in language acquisition. All typically-developing children with adequate exposure to a language will learn to speak and comprehend the language fluently. It is also proposed that despite the supposed variation in languages, they all fall into a very restricted subset of the potential grammars that could be infinitely conceived. Chomsky argued that since all grammars universally deviate very little from the same general structure and children seamlessly acquire language, humans must have some intrinsic language learning capability that allows us to learn language. This intrinsic capability was hypothesized to be embedded in the brain, earning the title of language acquisition device (LAD). According to this hypothesis, the child is equipped with knowledge of grammatical and ungrammatical types, which they then apply to the stream of speech they hear in order to determine the grammar this stream is compatible with. The processes underlying this LAD relates to bootstrapping in that once a child has identified the subset of the grammar they are learning, they can then apply their knowledge of grammatical types in order to learn the language-specific aspects of the word. This relates to the Principles and Parameters theory of linguistics, in that languages universally consist of basic, unbroken principles and vary by specific parameters.

Semantic bootstrapping
Semantic bootstrapping is a linguistic theory of language acquisition which proposes that children can acquire the syntax of a language by first learning and recognizing semantic elements and building upon, or bootstrapping from, that knowledge.

According to Pinker, semantic bootstrapping requires two critical assumptions to hold true:
 * 1) A child must be able to perceive meaning from utterances. That is, the child must associate utterances with, for example, objects and actions in the real world.
 * 2) A child must also be able to realize that there are strong correspondences between semantic and syntactic categories. The child can then use the knowledge of these correspondences to create, test, and internalize grammar rules iteratively as the child gains more knowledge of their language.

Acquiring the state/event contrast
When discussing the acquisition of temporal contrasts, the child must first have a concept of time outside of semantics. In other words, the child must be able to have some mental grasp on the concept of events, memory, and general progression of time before attempting to conceive it semantically. Semantics, especially with regard to events and memory concepts, appears to be far more language-general, with meanings being more universal concepts rather than the individual segments being used to represent them. For this reason, semantics requires far more cognition than external stimuli in acquiring it, and relies much on the innate capability of the child to develop such abstraction; the child must first have a mental representation of the concept, before attempting to link a word to that meaning. In order to actually learn time events, several processes must occur: (Data in list cited from )
 * 1) The child must have a grasp on temporal concepts
 * 2) They must learn which concepts are represented in their own language
 * 3) They must learn how their experiences are representative of certain event types that are present in the language
 * 4) They must learn the different morphological and syntactic representations of these events

Using these basic stepping stones, the child is able to map their internal concept of the meaning of time onto explicit linguistic segments. This bootstrapping allows them to have hierarchical, segmental steps, in which they are able to build upon their previous knowledge to aid future learning.

Tomasello argues that in learning linguistic symbols, the child does not need to have explicit external linguistic contrasts, and rather, will learn about these concepts via social context and their surroundings. This can be demonstrated with semantic bootstrapping in that the child does not explicitly receive information on the semantic meaning of temporal events, but learns to apply their internal knowledge of time to the linguistic segments that they are being exposed to.

Acquiring the count/mass contrast
With regard to mapping the semantic relationships for count, it follows previous bootstrapping methods. Since the context in which children are presented with number quantities usually have visual aid to accompany them, the child has a relatively easy way to map these number concepts. Count nouns are nouns which are viewed as being discrete entities or individuals. For nouns which denote discrete entities, granted that the child already has the mental concept for BOY and THREE in place, they will see the set of animate, young, human males (i.e. boys) and confirm that the set has a cardinality of three.

For mass nouns which denote non-discrete substances, in order to count, they act to demonstrate the relationship between atoms of the word and substance. However, mass nouns can vary with regard to the sharpness or narrowness that they refer to an entity. For example, a grain of rice has a much narrower quantity definition than a bag of rice.

"Of" is a word that children are thought to learn the definition of as being something that transforms a substance into a set of atoms. For example, when one says:

The word of is used in (3) to mark the mass noun water is partitioned into gallons. The initial substance now denotes a set. The child again uses visual cues to grasp what this relationship is.

Syntactic bootstrapping
Syntactic bootstrapping is a theory about the process of how children identify word meanings based on their syntactic categories. In other words, how knowledge of grammatical structure, including how syntactic categories (adjectives, nouns, verbs, etc.) combine into phrases and constituents in order to form sentences, "bootstraps" the acquisition of word meaning. The main challenge this theory tackles is the lack of specific information extralinguistic-information context provides on mapping word meaning and making inferences. It accounts for this problem by suggesting that children do not need to rely solely on environmental context to understand meaning or have the words explained to them. Instead, the children infer word meanings from their observations about syntax, and use these observations to infer word meaning and comprehend future utterances they hear.

This in-depth analysis of Syntactic bootstrapping provides background on the research and evidence; describing how children acquire lexical and functional categories, challenges to the theory as well as cross-linguistic applications.

Prosodic bootstrapping
Even before infants can comprehend word meaning, prosodic details assist them in discovering syntactic boundaries. Prosodic bootstrapping or phonological bootstrapping investigates how prosodic information—which includes stress, rhythm, intonation, pitch, pausing, as well as dialectal features—can assist a child in discovering the grammatical structure of the language that they are acquiring.

In general, prosody introduces features that reflect either attributes of the speaker or the utterance type. Speaker attributes include emotional state, as well as the presence of irony or sarcasm. Utterance-level attributes are used to mark questions, statements and commands, and they can also be used to mark contrast.


 * Prosodic features associated with the speaker: emotional state, irony, sarcasm
 * Prosodic features associate with utterance type: question, statement, command, contrast

Similarly, in sign language, prosody includes facial expression, mouthing, and the rhythm, length and tension of gestures and signs.

In language, words are not only categorized into phrases, clauses, and sentences. Words are also organized into prosodic envelopes. The idea of a prosodic envelope is that words that go together syntactically also form a similar intonation pattern. This explains how children discover syllable and word boundaries through prosodic cues. Overall, prosodic bootstrapping explores determining grammatical groupings in a speech stream rather than learning word meaning.

One of the key components of the prosodic bootstrapping hypothesis is that prosodic cues may aid infants in identifying lexical and syntactical properties. From this, three key elements of prosodic bootstrapping can be proposed:
 * 1) The syntax of language is correlated with acoustic properties.
 * 2) Infants can detect and are sensitive to these acoustic properties.
 * 3) These acoustic properties can be used by infants when processing speech.

There is evidence that the acquisition of language-specific prosodic qualities starts even before an infant is born. This is seen in neonate crying patterns, which have qualities that are similar to the prosody of the language that they are acquiring. The only way that an infant could be born with this ability is if the prosodic patterns of the target language are learned in utero. Further evidence of young infants using prosodic cues is their ability to discriminate the acoustic property of pitch change by 1–2 months old.

Prosodic cues for syntactic structure
Infants and young children receive much of their language input in the form of infant-directed speech (IDS) and child-directed speech (CDS), which are characterized as having exaggerated prosody and simplification of words and grammar structure. When interacting with infants and children, adults often raise and widen their pitch, and reduce their speech rate. However, these cues vary across cultures and across languages.

There are several ways in which infant- and child-directed speech can facilitate language acquisition. In recent studies, it is shown that IDS and CDS contain prosodic information that may help infants and children distinguish between paralinguistic expressions (e.g. gasps, laughs, expressions) and informative speech. In Western cultures, mothers speak to their children using exaggerated intonation and pauses, which offer insight about syntactic groupings such as noun phrases, verb phrases, and prepositional phrases. This means that the linguistic input infants and children receive include some prosodic bracketing around syntactically relevant chunks.

(1) Look the boy is patting the dog with his hand. (2) *Look the boy ... is ... patting the ... dog with his ... hand. (3) Look ... [DP The boy] ... [VP is patting the dog] ... [PP with his hand].

A sentence like (1) will not typically be produced with the pauses indicated in (2), where the pauses "interrupt" syntactic constituents. For example, pausing between the and dog would interrupt the determiner phrase (DP) constituent, as would pausing between his and hand. Most often, pauses are placed so as to group the utterance into chunks that correspond to the beginnings and ends of constituents such as determiner phrases (DPs), verb phrases (VPs), and prepositional phrases (PPs). As a result, sentences like (3), where the pauses correspond to syntactic constituents, are much more natural.

Moreover, within these phrases are distinct patterns of stress, which helps to differentiate individual elements within the phrase, such as a noun from an article. Typically, articles and other unbound morphemes are unstressed and are relatively short in duration in contrast to the pronunciation of nouns. Furthermore, in verb phrases, auxiliary verbs are less stressed than main verbs. This can be seen in (4).

4. They are RUNning.

Prosodic bootstrapping states that these naturally occurring intonation packages help infants and children to bracket linguistic input into syntactic groupings. Currently, there is not enough evidence to suggest that prosodic cues in IDS and CDS facilitate in the acquisition of more complex syntax. However IDS and CDS are richer linguistic inputs for infants and children.

Prosodic cues for clauses and phrases
There is continued research into whether infants use prosodic cues – in particular, pauses – when processing clauses and phrases. Clauses are the largest constituent structure in a phrase and are often produced in isolation in conversation; for example, "Did you walk the dog?". Consequently, phrases are smaller components of clauses. For example, "the tall man" or "walks his dog". Peter Jusczyk argued that infants use prosody to parse speech into smaller units for analysis. He, along with colleagues, reported that 4.5 month old infants illustrated a preference for artificial pauses at clause boundaries in comparison to pauses at other places in a sentence; preferring pauses at clause boundaries illustrates infants' abilities to discriminate clauses in a passage. This reveals that while infants do not understand word meaning, they are in the process of learning about their native language and grammatical structure. In a separate study, Jusczyk reported that 9 month old infants preferred passages with pauses occurring between subject-noun phrases and verb phrases. These results are further evidence of infant sensitivity for syntactic boundaries. In a follow-up study by LouAnn Gerken et al., researchers compared sentences such as (1) and (2). The prosodic boundaries are indicated by parentheses.

5. (Joe)(kissed the dog). 6. (He kissed)(the dog).

In (1), there is a pause before the verb $⟨kissed⟩$. This is also the location of the subject-verb phrase boundary. Comparably in (2), which contains a weak pronoun, speakers either do not produce a salient prosodic boundary or place the boundary after the verb $⟨kissed⟩$. When tested, 9 month old infants illustrated a preference for pauses located before the verb, such as in (1). However, when passages with pronoun subjects were used, such as in (2), infants did not show a preference for where the pause occurs. While these results again illustrate that infants are sensitive to prosodic cues in speech, they introduce evidence that infants prefer prosodic boundaries that occur naturally in speech. Although the use of prosody in infant speech processing is generally viewed as assisting infants in speech parsing, it has not yet been established how this speech segmentation enriches the acquisition of syntax.

Criticism
Critics of prosodic bootstrapping have argued that the reliability of prosodic cues has been overestimated and that prosodic boundaries do not always match up with syntactic boundaries. It is argued instead that while prosody does provide infants and children useful clues about a language, it does not explain how children learn to combine clauses, phrases, and sentences, nor word meaning. As a result, a comprehensive account of how children learn language must combine prosodic bootstrapping with other types of bootstrapping as well as more general learning mechanisms.

Pragmatic bootstrapping
Pragmatic bootstrapping refers to how pragmatic cues and their use in social context assist in language acquisition, and more specifically, word learning. Pragmatic cues are illustrated both verbally and through nonlinguistic cues. They include hand gestures, eye movement, a speaker's focus of attention, intentionality, and linguistic context. Similarly, the parsimonious model proposes that a child learns word meaning by relating language input to their immediate environment. An example of Pragmatic Bootstrapping would be a teacher saying the word $⟨dog⟩$ while gesturing to a dog in the presence of a child.

Gaze following
YouTube Video - Word Learning – Gaze Direction

Children are able to associate words with actions or objects by following the gaze of their communication partner. Often, this occurs when an adult labels an action or object while looking at it.


 * Baldwin carried out experiments where 18-month-olds were shown two novel objects and then concealed them in separate containers. The experimenters would then peek into one of the containers and say, "There's a modi in here"; and then remove both objects from the container and give them to the child. When asked for the "modi", the child would hold up the object that the experimenter had been looking at when they labelled the object. This illustrates how children use eye gaze and labelling to learn the name of novel objects.
 * Tomasello and Akhtar applied a “Show Me Widget” test where a novel and nameless action was performed with a novel and nameless object. The experimenter would perform the action with the object and then pass the object to the child and instruct the child to "widget". The experimenter's behavior before they passed the child the object was manipulated between two conditions:

Action Highlighted Condition: The experimenter would prepare an object that the child would use to perform a specific action by correctly orientating the object. The experimenter would then hold out the object and say, "Widget, Jason! Your turn!".

Object Highlighted Condition: The experimenter would not prepare the object for the child and would simply hold out the object to the child and say, "Widget, Jason! Your turn!".

The results from the experiment illustrated that children in the Action Highlighted Condition associated the novel word with the novel action, whereas the children in the Object Highlighted Condition assumed the novel word referred to the novel object. To understand that the novel word referred to the novel action, children had to learn from the experimenter's nonverbal behavior that they were requesting the action of the object. This illustrates how non-linguistic context influences novel word learning.

Observing adult behavior
Children also look at adults' faces when learning new words, which can often lead to better understanding of what words mean. In everyday speech, mistakes are often made. So, why don't children end up learning the wrong words for the targeted things? This may be because children are able to see whether the word was right or wrong for the intended meaning by seeing the adult's facial expressions and behaviour.


 * Tomasello and Barton performed multiple studies to see if infants could understand whether an action was intentional or accidental, and if they could learn and understand a new verb based on emotional cues.

Verb: Plunk ..."I'm going to plunk Big Bird!"

The adult said this sentence without previously explaining what the verb "plunk" would mean. Afterwards, the adult would do one of two things.

Action 1 She then performed the target action intentionally, saying "There!", followed immediately by another action on the same apparatus performed "accidentally", in an awkward fashion saying "Whoops!"

Action 2 Same as Action 1, however, reversed.

Afterwards, the children were asked to do the same to another apparatus, and see if the children would perform the targeted action.

Verb: Plunk "Can you go plunk Mickey Mouse?"

The results were that the children were able to understand the intended action for the new word in which they just heard, and performed the action when asked. By watching the adult's behavior and facial expressions, they were able to understand what the verb "plunk" meant and figure out whether it was the targeted action or the accidental action.


 * Akhtar, Carpenter and Tomasello conducted a similar experiment that focused on the behaviors of adults and word learning, but this time, for nouns. In this experiment, two experimenters and a guardian of the child were inside a room, playing with 3 objects. Each object was played with for an equal amount of time, with equal excitement. Afterwards, one experimenter and the guardian would leave, and the experimenter left behind would present a new toy and play with it with the same excitement as the other objects for about the same length of time. Afterwards, the other experimenter and the guardian came back, and so began the experiment and the two ways in which the experiment would be carried out, which was labelled the "Language" and "No-Language" condition. These simply mean that in the Language condition, the new toy had a term for itself, while in the No-Language condition, the term was not used.

Language "Look, I see a gazzer! A gazzer!"

No-Language "Look, I see a toy! A toy!"

Afterwards, the adults would leave then ask the child to bring the new object over. In the Language condition, the child would correctly bring the targeted object over. In the No-Language condition, the child would just randomly bring an object over.

This presents the discovery of two things...


 * 1) The child was aware of which object was new for the adults that left the room.
 * 2) The child knew that the adult was excited because the object was new, and that is why they would use this new term that they had never heard before.

...and the child was able to understand this based on the emotional behaviors of the adult.