User:InnocuousPilcrow/sandbox

Statistical learning is the ability for humans and other animals to extract commonalities and transitional probabilities from the world around them to learn about the environment. Although statistical learning is now thought to be a generalized learning mechanism, the phenomenon was first identified in human infant language acquisition. The earliest evidence for these statistical learning abilities comes from a paper by Jenny Saffran and colleagues, in which 8-month-old infants were presented with nonsense streams of monotonous speech. Each stream was composed of four three-syllable “words” that were repeated randomly. After exposure to the speech streams for two minutes, infants reacted differently to hearing “words” as opposed to “nonwords” from the speech stream, where nonwords were composed of the same syllables that the infants had been exposed to, but in a different order. This suggests that infants are able to extract transitional probabilities of language even with very limited exposure. This method of learning is thought to be one way that children learn which groups of syllables form individual words. Since the initial discovery of the role of statistical learning in lexical acquisition, the same mechanism has been proposed for elements of phonological acquisition, e.g., and syntactical acquisition, e.g.,, as well as in non-linguistic domains, e.g.,.

Lexical Acquisition
The role of statistical learning in language acquisition has been particularly well documented in the area of lexical acquisition. Although other factors play a role as well, an important contribution to infants’ abilities to learn to segment individual words from a continuous stream of speech is their ability to pick up on statistical regularities of the speech they hear around them. This mechanism is powerful and can operate over a short time scale.

Original Findings


Unlike written language, spoken language does not have any clear boundaries between words; spoken language is a continuous stream of sound rather than individual words with blanks between them. This lack of segmentation between linguistic units presents a problem for young children learning language, who must be able to pick out individual units from the continuous speech streams that they hear. One proposed method of how children are able to solve this problem is that they are attentive to the statistical regularities of the world around them. For example, in the phrase "pretty baby," children are more likely to hear the sounds pre and ty heard together during the entirety of the lexical input around them than they are to hear the sounds ty and ba together. In an artificial grammar learning study with adult participants, Saffran, Newport, and Aslin found that participants were able to locate word boundaries based only on transitional probabilities, suggesting that adults are capable of using statistical regularities in a language-learning task.

To determine if young children have these same abilities Saffran and her colleagues exposed 8-month-old infants to an artificial grammar. The grammar was composed of four words, each comprised of three nonsense syllables. During the experiment, infants heard a continuous speech stream of these words for two minutes. Importantly, the speech was presented in a monotone with no cues (such as pauses, intonation, etc.) to word boundaries other than the statistical probabilities. Within a word, the transitional probability of two syllable pairs was 1.0: in the word bidaku, for example, the probability of hearing the syllable da immediately after the syllable bi was 100%. Between words, however, the transitional probability of hearing a syllable pair was much lower: After any given word (e.g., bidaku) was presented, one of three words could follow (in this case, padoti, golabu, or tupiro), so the likelihood of hearing any given syllable after ku was only 33%.

To determine if infants were picking up on the statistical information, each infant was presented with multiple presentations of either a word from the artificial grammar or a nonword made up of the same syllables but presented in a random order. Infants who were presented with nonwords during the test phase listened significantly longer to these words than infants who were presented with words from the artificial grammar, showing a novelty preference for these new nonwords. However, the implementation of the test could also be due to infants learning serial-order information and not to actually learning transitional probabilities between words. That is, at test, infants heard strings such as dapiku and tilado that were never presented during learning; they could simply have learned that the syllable ku never followed the syllable pi.

To look more closely at this issue, Saffran and colleagues conducted another study in which infants underwent the same training with the artificial grammar but then were presented with either words or part-words rather than words or nonwords. The part-words were syllable sequences composed of the last syllable from one word and the first two syllables from another (such as kupado). Because the part-words had been heard during the time when children were listening to the artificial grammar, preferential listening to these part-words would indicate that children were learning not only serial-order information, but also the statistical likelihood of hearing particular syllable sequences. Again, infants showed greater listening times to the novel (part-) words, indicating that 8-month-old infants were able to extract these statistical regularities from a continuous speech stream.

Further Research
This result has been the impetus for much more research on the role of statistical learning in lexical acquisition and other areas (see ). In a follow-up to the original report, Aslin, Saffran, and Newport found that even when words and part words occurred equally often in the speech stream, but with different transitional probabilities between syllables of words and part words, infants were still able to detect the statistical regularities and still preferred to listen to the novel part-words over the familiarized words. As in the original study, infants were trained on continuous streams of four tri-syllabic nonsense words. However, two of the words were repeated with high frequency (90 times over the course of the training) while two were repeated with low frequency (45 times over the course of the training). Because of the presence of high-frequency and low-frequency words, some part-words occurred more often in the training stimuli than others. At test, infants were exposed to the low-frequency words and the high-frequency part-words. Both the words and the non-words had occurred 45 times during training, but the transitional probabilities between the first two syllables of each of the test stimuli had been 100% for words and 50% for part-words. For example, during the training, infants were exposed to the words pabiku, tibudo, golatu, and daropi. Infants heard the word pabiku as often as they heard the part-word tudaro, but the syllable pa was always followed by the syllable bi, while the syllable tu was followed by the syllable da only 50% of the time. Again, at test, infants listed longer to the part-words than to the words, demonstrating that, although the individual syllable sequences had been heard an equal number of times, infants were discriminating between the words and part-words based on the transitional probabilities.This finding provides stronger evidence that infants are able to pick up transitional probabilities from the speech they hear, rather than just being aware of frequencies of individual syllable sequences.

Another follow-up study examined the extent to which the statistical information learned during this type of artificial grammar learning feeds into knowledge that infants may already have about their native language. Saffran exposed infants living in an English-language environment to the same type of artificial grammar described above. However, rather than testing infants on words, nonwords, or part-words in isolation, either words or part-words were placed into either English-language frames or nonsense frames. For example, in the English frame condition, words or part-words were heard after the phrase “I like my…” whereas in the nonsense frame condition, words or part-words were heard after the phrase “Zy fike ny…” In the English language condition, infants preferred to listen to words over part-words, whereas there was no significant difference in the nonsense frame condition. This finding suggests that even pre-linguistic infants are able to integrate the statistical cues they learn in a laboratory into their previously-acquired knowledge of a language. In other words, once infants have acquired some linguistic knowledge, they incorporate newly-acquired information into that previously-acquired learning.

A related finding indicates that slightly older infants can acquire both lexical and grammatical regularities from a single set of input, suggesting that they are able to use outputs of one type of statistical learning (cues that lead to the discovery of word boundaries) as input to a second type (cues that lead to the discovery of syntactical regularities ). In two studies, Saffran and Wilson exposed infants to artificial grammars composed of five-word sentences. Each word in the grammar was a two-syllable word; there was an option of two words for each word position in the sentence (i.e., the first word could be either dato or kuga, the second word could be either pidu or gobi, etc.). Additionally, each sentence could optionally begin with the word la; this prevented infants from relying on the absolute syllable position (i.e., learning that the first syllable of a sentence was either da or ku; the third syllable was always pi or go, etc.). Ungrammatical sentences were created by switching permissible words in the second and fourth word-position in the sentence. At test, 12-month-olds preferred to listen to sentences that had the same grammatical structure as the artificial language they had been tested on rather than sentences that had a different (ungrammatical) structure. Because learning grammatical regularities requires infants to be able to determine boundaries between individual words, this indicates that infants who are still quite young are able to acquire multiple levels of language knowledge (both lexical and syntactical) simultaneously, indicating that statistical learning is a powerful mechanism at play in language learning.

Despite the large role that statistical learning appears to play in lexical acquisition, it is likely not the only mechanism by which infants learn to segment words. Statistical learning studies are generally conducted with artificial grammars that have no cues to word boundary information other than transitional probabilities between words. Real speech, though, has many different types of cues to word boundaries, including prosodic and phonotactic information. Mattys et al. found that infants use this type of information to determine word boundaries in conjunction with statistical information, indicating that infants are sensitive to many different types of cues in the world.

Phonological Acquisition
There is evidence that statistical learning is an important component of both discovering which phonemes are important for a given language and which contrasts within phonemes are important. Having this knowledge is important for aspects of both speech perception and speech production.

Distributional Learning
Since the discovery of infants’ statistical learning abilities in word learning, the same general mechanism has also been studied in other facets of language learning. For example, it is well-established that infants can discriminate between phonemes of many different languages but eventually become unable to discriminate between phonemes that do not appear in their native language ; however, it was not clear how this decrease in discriminatory ability came about. Maye et al. suggested that the mechanism responsible might be a statistical learning mechanism in which infants track the distributional regularities of the sounds in their native language. To test this idea, Maye et al. exposed 6- and 8-month-old infants to a continuum of speech sounds that varied on the degree to which they were voiced. The distribution that the infants heard was either bimodal, with sounds from both ends of the voicing continuum heard most often, or unimodal, with sounds from the middle of the distribution heard most often. The results indicated that infants from both age groups were sensitive to the distribution of phonemes. At test, infants heard either non-alternating (repeated exemplars of tokens 3 or 6 from an 8-token continuum) or alternating (exemplars of tokens 1 and 8) exposures to specific phonemes on the continuum. Infants exposed to the bimodal distribution listened longer to the alternating trials than the non-alternating trials while there was no difference in listening times for infants exposed to the unimodal distribution. This finding indicates that infants exposed the bimodal distribution were better able to discriminate sounds from the two ends of the distribution than were infants in the unimodal condition, regardless of age. This type of statistical learning differs from that used in lexical acquisition, as it requires infants to track frequencies rather than transitional probabilities, and has been named “distributional learning.”

Distributional learning has also been found to help infants contrast two phonemes that they initially have difficulty in discriminating between. Maye, Weiss, and Aslin found that infants who were exposed to a bimodal distribution of a non-native contrast that was initially difficult to discriminate were better able to discriminate the contrast than infants exposed to a unimodal distribution of the same contrast. After exposure to either the bimodal or unimodal distribution, infants were habituated on an exemplar from token 6 of an 8-token continuum and then exposed to an exemplar from token 3; the listening times for infants in the bimodal condition increased from the immediately preceding habituation trials upon presentation of this new exemplar while the listening times for infants in the unimodal condition continued to decrease, indicating that infants in the bimodal condition were better able to discriminate the two exemplars than those in the unimdoal condition. Maye et al. also found that infants were able to abstract features of a contrast (i.e., voicing onset time) and generalize that feature to the same type of contrast at a different place of articulation, a finding that has not been found in adults.

In a review of the role of distributional learning on phonological acquisition, Werker et al. note that distributional learning cannot be the only mechanism by which phonetic categories are acquired. However, it does seem clear that this type of statistical learning mechanism can play a role in this skill, although research is ongoing.

Perceptual Magnet Effect
A related finding regarding statistical cues to phonological acquisition is a phenomenon known as the perceptual magnet effect. In this effect, a prototypical phoneme of a person’s native language acts as a “magnet” for similar phonemes, which are perceived as belonging to the same category as the prototypical phoneme. In the original test of this effect, adult participants were asked to indicate if a given exemplar of a particular phoneme differed from a referent phoneme. If the referent phoneme is a non-prototypical phoneme for that language, both adults and 6-month-old infants show less generalization to other sounds than they do for prototypical phonemes, even if the subjective distance between the sounds is the same. That is, adults and infants are both more likely to notice that a particular phoneme differs from the referent phoneme if that referent phoneme is a non-prototypical exemplar than if it is a prototypical exemplar. The prototypes themselves are apparently discovered through a distributional learning process, in which infants are sensitive to the frequencies with which certain sounds occur and treat those that occur most often as the prototypical phonemes of their language.

Syntactical Acquisition
A statistical learning device has also been proposed as a component of syntactical acquisition for young children. Early evidence for this mechanism came largely from studies of computer modeling or analyses of natural language corpora. These early studies focused largely on distributional information specifically rather than statistical learning mechanisms generally. Specifically, in these early papers it was proposed that children created templates of possible sentence structures involving unnamed categories of word types (i.e., nouns or verbs, although children would not put these labels on their categories). Children were thought to learn which words belonged to the same categories by tracking the similar contexts in which words of the same category appeared. Later studies expanded these results by looking at the actual behavior of children or adults who had been exposed to artificial grammars. These later studies also considered the role of statistical learning more broadly than the earlier studies, placing their results in the context of the statistical learning mechanisms thought to be involved with other aspects of language learning, such as lexical acquisition.

Experimental Results
Evidence from a series of four experiments conducted by Gomez and Gerken suggest that children are able to generalize grammatical structures with less than two minutes of exposure to an artificial grammar. In the first experiment, 11-12 month-old infants were trained on an artificial grammar composed of nonsense words with a set grammatical structure. At test, infants heard both novel grammatical and ungrammatical sentences. Infants oriented longer towards the grammatical sentences, in line with previous research that suggests that infants generally orient for a longer amount of time to natural instances of language rather than altered instances of language e.g.,. (This familiarity preference differs from the novelty preference generally found in word-learning studies, due to the differences between lexical acquisition and syntactical acquisition.) This finding indicates that young children are sensitive to the grammatical structure of language even after minimal exposure. Gomez and Gerken also found that this sensitivity is evident when ungrammatical transitions are located in the middle of the sentence (unlike in the first experiment, in which all the errors occurred at the beginning and end of the sentences), that the results could not be due to an innate preference for the grammatical sentences caused by something other than grammar, and that children are able to generalize the grammatical rules to new vocabulary.

Together these studies suggest that infants are able to extract a substantial amount of syntactic knowledge even from limited exposure to a language. Children apparently detected grammatical anomalies whether the grammatical violation in the test sentences occurred at the end or in the middle of the sentence. Additionally, even when the individual words of the grammar were changed, infants were still able to discriminate between grammatical and ungrammatical strings during the test phase. This generalization indicates that infants were not learning vocabulary-specific grammatical structures, but abstracting the general rules of that grammar and applying those rules to novel vocabulary. Furthermore, in all four experiments, the test of grammatical structures occurred five minutes after the initial exposure to the artificial grammar had ended, suggesting that the infants were able to maintain the grammatical abstractions they had learned even after a short delay.

In a similar study, Saffran found that adults and older children (first and second grade children) were also sensitive to syntactical information after exposure to an artificial language which had no cues to phrase structure other than the statistical regularities that were present. Both adults and children were able to pick out sentences that were ungrammatical at a rate greater than chance, even under an “incidental” exposure condition in which participants’ primary goal was to complete a different task while hearing the language.

Although the number of studies dealing with statistical learning of syntactical information is limited, the available evidence does indicate that the statistical learning mechanisms are likely a contributing factor to children’s ability to learn their language.

Statistical Learning in Bilingualism
Much of the early work using statistical learning paradigms focused on the ability for children or adults to learn a single language, consistent with the process of language acquisition for monolingual speakers or learners. However, it is estimated that approximately 60-75% of people in the world are bilingual. More recently, researchers have begun looking at the role of statistical learning for those who speak more than one language. Weiss, Gerfen, and Mitchel examined how hearing input from multiple artificial languages simultaneously can affect the ability to learn either or both languages. Over four experiments, Weiss et al. found that, after exposure to two artificial languages, adult learners are capable of determining word boundaries in both languages when each language is spoken by a different speaker. However, when the two languages were spoken by the same speaker, participants were able learn both languages only when they were “congruent”—when the word boundaries of one language matched the word boundaries of the other. When the languages were incongruent—a syllable that appeared in the middle of a word in one language appeared at the end of the word in the other language—and spoken by a single speaker, participants were able to learn, at best, one of the two languages. A final experiment showed that the inability to learn incongruent languages spoken in the same voice was not due to syllable overlap between the languages but due to differing word boundaries.

Similar work replicates the finding that learners are able to learn two sets of statistical representations when an additional cue is present (two different male voices in this case). In their paradigm, the two languages were presented consecutively, rather than interleaved as in Weiss et al.’s paradigm, and participants did learn the first artificial language to which they had been exposed better than the second, although participants’ performance was above chance for both languages.

Word-Referent Mapping
A statistical learning mechanism has also been proposed for learning the meaning of words. Specifically, Yu and Smith conducted a pair of studies in which adults were exposed to pictures of objects and heard nonsense words. Each nonsense word was paired with a particular object. There were 18 total word-referent pairs, and each participant was presented with either 2, 3, or 4 objects at a time, depending on the condition, and heard the nonsense word associated with one of those objects. Each word-referent pair was presented 6 times over the course of the training trials; after the completion of the training trials, participants completed a forced-alternative test in which they were asked to choose the correct referent that matched a nonsense word they were given. Participants were able to choose the correct item more often than would happen by chance, indicating, according to the authors, that they were using statistical learning mechanisms to track co-occurrence probabilities across training trials.

However, further research indicates that learners in this type of task may be using a “propose-but-verify” mechanism rather than a statistical learning mechanism. Medina et al. and Trueswell et al. argue that, because Yu and Smith only tracked knowledge at the end of the training, rather than tracking knowledge on a trial-by-trial basis, it is impossible to know if participants were truly updating statistical probabilities of co-occurrence (and therefore maintaining multiple hypotheses simultaneously), or if, instead, they were forming a single hypothesis and checking it on the next trial. For example, if a participant is presented with a picture of a dog and a picture of a shoe, and hears the nonsense word vash she might hypothesize that vash refers to the dog. On a future trial, she may see a picture of a shoe and a picture of a door and again hear the word vash. If statistical learning is the mechanism by which word-referent mappings are learned, then the participant would be more likely to select the picture of the shoe than the door, as shoe would have appeared in conjunction with the word vash 100% of the time. However, if participants are simply forming a single hypothesis, they may fail to remember the context of the previous presentation of vash (especially if, as in the experimental conditions, there are multiple trials with other words in between the two presentations of vash) and therefore be at chance in this second trial. According to this proposed mechanism of word learning, if the participant had correctly guessed that vash referred to the shoe in the first trial, her hypothesis would be confirmed in the subsequent trial.

To distinguish between these two possibilities, Trueswell et al. conducted a series of experiments similar to those conducted by Yu and Smith except that participants were asked to indicate their choice of the word-referent mapping on each trial. Participants would therefore have been at chance in their choices in their first trial. The results from the subsequent trials indicate that participants were not, as Yu and Smith had suggested, using a statistical learning mechanism, but instead were using a propose-and-verify mechanism, holding only one potential hypothesis in mind at a time. Specifically, if participants had chosen an incorrect word-referent mapping in an initial presentation of a nonsense word (from a display of five possible choices), their likelihood of choosing the correct word-referent mapping in the next trial of that word was still at chance, or 20%. If, though, the participant had chosen the correct word-referent mapping on an initial presentation of a nonsense word, the likelihood of choosing the correct word-referent mapping on the subsequent presentation of that word was approximately 50%. These results were also replicated in a condition where participants were choosing between only two alternatives. These results suggest that participants did not remember the surrounding context of individual presentations and were therefore not using statistical cues to determine the word-referent mappings. Instead, participants make a hypothesis regarding a word-referent mapping and, on the next presentation of that word, either confirm or reject the hypothesis accordingly.

Overall, these results, along with similar results from Medina et al., indicate that word meanings are likely not learned through a statistical learning mechanism. The authors of these papers do emphasize, though, that statistical learning is a likely mechanism for other aspects of language learning, including word boundaries, phonology, and syntax, as discussed above.

Need for Social Interaction
Additionally, statistical learning by itself cannot account even for those aspects of language acquisition for which it has been shown to play a large role. For example, Kuhl, Tsao, and Liu found that young English-learning infants who spent time in a laboratory session with a native Mandarin speaker were able to distinguish between phonemes that occur in Mandarin but not in English, unlike infants who were in a control condition. Infants in this control condition came to the lab as often as infants in the experimental condition, but were exposed only to English; when tested at a later date, they were unable to distinguish the Mandarin phonemes. In a second experiment, the authors presented infants with audio or audiovisual recordings of Mandarin speakers and tested the infants’ ability to distinguish between the Mandarin phonemes. In this condition, infants failed to distinguish the foreign language phonemes. This finding indicates that social interaction is a necessary component of language learning and that, even if infants are presented with the raw data of hearing a language, they are unable to take advantage of the statistical cues present in that data if they are not also experiencing the social interaction.

Domain Generality
Although the phenomenon of statistical learning was first discovered in the context of language acquisition and there is much evidence of its role in that purpose, work since the original discovery has suggested that statistical learning may be a domain general skill and is likely not unique to humans. For example, Saffran, Johnson, Aslin, and Newport found that both adults and infants were able to learn statistical probabilities of “words” created by playing different musical tones (i.e., participants heard the musical notes D, E, and F presented together during training and were able to recognize those notes as a unit at test as compared to three notes that had not been presented together). In non-auditory domains, there is evidence that humans are able to learn statistical visual information whether that information is presented across space, e.g., or time, e.g.,. Evidence of statistical learning has also been found in other primates, e.g., and some limited statistical learning abilities have been found even in non-primates like rats. Together these findings suggest that statistical learning may be a generalized learning mechanism that happens to be utilized in language acquisition, rather than a mechanism that is unique to the human infant’s ability to learn his or her language(s).