User:Jadewang/Sandbox

Categorical Perception (CP) is the perception of different sensory phenomena as being qualitatively, or categorically, different. It is opposed to continuous perception, the perception of different sensory phenomena as being located on a smooth continuum. Moreover, in categorical perception, equal magnitude differences that lie across a category boundary will be much easier to discriminate than if that difference did not straddle a boundary. Categorical perception enables combinatorial usage of elements in systems like language and music. Such “particulate” systems are especially efficient because they utilize combinatorial methods, that is, they make use of a set of discrete elements with little inherent meaning (such as tones or phonemes) and combine them to form structures with a great diversity of meanings.

Categorical perception (CP) can be inborn or can be induced by learning. Formerly thought to be peculiar to speech and color perception, CP turns out to be far more general, and may be related to how the neural networks in our brains detect the features that allow us to sort the things in the world into their proper categories, "warping" perceived similarities and differences so as to compress some things into the same category and separate others into different categories.

Categorical Perception in Language
Phonemes are a basic unit in human language, the smallest structural unit that distinguishes meaning. For example, the sound in the words tip, stand, water, and cat is a phoneme. (In transcription, phonemes are placed between slashes, as here.) Although there exists variation in the pronunciation of the in each word, they are conceived of as being the same sound. Like tones and intervals in music, consonants and vowels are learned sound categories in language. This allows acoustically variable tokens to be transformed into stable mental categories, which are developed through linguistic experience. Because people with different native languages have different language experience, these mental categories will be defined differently between them; it is these differences in experience that allow us to study the effects of categorical perception.

A particularly well-studied example of consonant contrasts is the perception of English phonemes and  by Japanese speakers, for whom the two sounds function as a single -like phoneme. Using multidimensional scaling on similarity judgments in a substantial set of and  tokens (produced by systematic variations along the continuum, of which changes in the formants are relevant), the perceptual warpings of acoustic space are dramatically different for native English and Japanese speakers. If the sounds do not resemble any categories in the native language, the ability to distinguish between sounds is preserved in adults, as demonstrated by naïve American infants and adults who could discriminate between Zulu clicks. However, the language-specific framework for linguistically familiar sounds solidifies considerably by the first year of life, before infants become competent speakers of their native language.

Moreover, the division of phonemes into categories takes place at a preconscious level, that is, the subject does not need to pay attention or even be consciously aware that the stimuli are different for that difference to show up in an EEG signal. Näätänen et al. developed a technique called mismatch negativity (MMN), which detects auditory changes in the event-related potential (ERP, an evoked EEG) evoked by deviant stimuli in a background of standard stimuli (considered a form of echolalic memory). The /ö/-/õ/-/o/ vowel continuum, which is divided into three phonemes in Estonian and two phonemes in Finnish, elicits an MMN response only when the standard and deviant straddle a category boundary in the participants’ native language.

The nature of linguistic experience on categorical perception of speech is not merely a matter of contrast and precision of frequency discrimination and  temporal discrimination. The linguistic rules of an individual’s language play a crucial role in that person’s perception of an auditory stimulus. In Japanese, nonnasal coda consonants are not permitted (although in it is permitted in French), and Japanese subjects report both /ebzo/ and /ebuzo/ as having three syllables with a vowel perceived between the 'b' and the 'z' (and do not show MMN between the two stimuli) whereas French subjects both show MMN and perceive the difference between the two.

At this point, it is necessary to clarify that categorical perception is not synonymous with categorical interpretation of learned sound categories. For instance, the clearest cases of CP are consonants and only some vowels show CP, that is to say, they are mapped onto learned categories without being perceived as such. Primarily, this is potentially due to contextual variation in their production, and suggests that this type of categorical mapping is a higher order mental process. And although categorical perception (CP) is instrumental for speech perception, it is highly unlikely to be a special mechanism evolved particularly for human speech, since CP for speech contrasts can be observed in animals.

Language-induced categorical perception
Both innate and learned CP are sensorimotor effects: The compression/separation biases are sensorimotor biases, and presumably had sensorimotor origins, whether during the sensorimotor life-history of the organism, in the case of learned CP, or the sensorimotor life-history of the species, in the case of innate CP. The neural net I/O models are also compatible with this fact: Their I/O biases derive from their I/O history. But when we look at our repertoire of categories in a dictionary, it is highly unlikely that many of them had a direct sensorimotor history during our lifetimes, and even less likely in our ancestors' lifetimes. How many of us have seen a unicorn in real life? We have seen pictures of them, but what had those who first drew those pictures seen? And what about categories I cannot draw or see (or taste or touch): What about the most abstract categories, such as goodness and truth?

Some of our categories must originate from another source than direct sensorimotor experience, and here we return to language and the Whorf Hypothesis: Can categories, and their accompanying CP, be acquired through language alone? Again, there are some neural net simulation results suggesting that once a set of category names has been "grounded" through direct sensorimotor experience, they can be combined into Boolean combinations (man = male & human) and into still higher-order combinations (bachelor = unmarried & man) which not only pick out the more abstract, higher-order categories much the way the direct sensorimotor detectors do, but also inherit their CP effects, as well as generating some of their own. Bachelor inherits the compression/separation of unmarried and man, and adds a layer of separation/compression of its own.

These language-induced CP-effects remain to be directly demonstrated in human subjects; so far only learned and innate sensorimotor CP have been demonstrated. The latter shows the Whorfian power of naming and categorization, in warping our perception of the world. That is enough to rehabilitate the Whorf Hypothesis from its apparent failure on color terms (and perhaps also from its apparent failure on eskimo snow terms ), but to show that it is a full-blown language effect, and not merely a vocabulary effect, it will have to be shown that our perception of the world can also be warped, not just by how things are named but by what we are told about them.

Categorical Perception in Music
Categorical perception in music has been shown for pitch intervals, melody contours, and rhythm (duration and duration ratio). Pitch intervals are learned sound categories in music, similar to the way vowels and consonants are learned sound categories in speech. In musicians, CP for musical intervals are as sharp as those for CP for speech, whereas the phenomenon is conspicuously absent in nonmusicians.

Although CP for musical intervals are absent in nonmusicians, both musicians and nonmusicians are able to detect changes in “melody,” that is, pitch contour violations in a sequence of five tones. The MMN response occur to contour violations are larger for culturally familiar intervals, indicating preconscious discrimination even in nonmusicians, and in the absence of contour violations, nonmusicians exhibit a MMN response to changes in interval pattern.

Duration in musical rhythms are also categorically perceived. For instance, there are broad categories of duration in Western musical sequences: short times (200-300 ms), which are perceived in terms of grouping patterns, and long times (450-900 ms), which are perceived individual entities. Also, music students have been shown to exhibit CP in discrimination tasks for duration ratio.

Classical Haskins View
The early research of categorical perception at Haskins Laboratories took place soon after the first research-oriented speech synthesizer was built. In order to study CP, there was an unambiguous need to define it first. The Haskins View first tackles the "categorical" part of CP with a three-pronged definition.

  The Literal: CP of categories by the individual and the environment, use of categories with or without language, and does not have to be relevant to speech.  The Phenomenal: subjective perception or experience of discontinuity across continua (e.g. phonemes). Provided the listener hears synthetic sounds as speech, they will perceive abrupt changes at places of boundaries on the continuum. Since "ideal" categorical perception (that is, where category labels are the only predictors of subject behavior) is not observed, it is reasonable to describe CP in a manner of "degrees." Categorical perception refers to a mode by which stimuli are responded to, and can only be responded to, in absolute terms. Successive stimuli drawn from a physical continuum are no perceived as forming a continuum, but as members of discrete categories. They are identified absolutely, that is independently of the context in which they occur. Subjects asked to discriminate between pairs of such “categorical” stimuli are able to discriminate between stimuli drawn from different categories, but not between stimuli drawn from the same category. In other words, discrimination is limited by identification: subjects can only discriminate between stimuli that they identify differently. (Studdert-Kennedy et al., 1970) Subjective experience alone is insufficient for science, so the empirical arm of the definition is needed.  The Empirical: in order to determine whether CP is observed, a CP experiment must include the following:

•	identification and labeling test: stimuli are presented repeatedly and randomly, and subjects classify them into one category or another.

•	discrimination test (ABX form): measures the accuracy of discrimination for equidistant stimulus pairs to stimulus location.

•	phoneme boundary effect: peak of discrimination function is expected to coincide with category boundary.

These experiments then inform whether CP has been observed with the following criteria for the results:  Category boundary is defined as the point of maximum slope (or when responses to the categories are at chance). CP predicts the labeling probabilities to change abruptly at the boundary. Discrimination functions are maximal at the boundary. This is expected because stimuli pairs are more easily distinguished when they straddle the boundary. Within category discrimination is at or close to chance level. Labeling probabilities (1) perfectly predicts discrimination functions (2) & (3), that is to say, the peaks in (2) and (3) are in the same place as the category boundary in (1). 

The modern definition of categorical perception
This evolved into the contemporary definition of CP, which is no longer peculiar to speech or dependent on the motor theory: CP occurs whenever perceived within-category differences are compressed and/or between-category differences are separated, relative to some baseline of comparison. The baseline might be the actual size of the physical differences involved, or, in the case of learned CP, it might be the perceived similarity or discriminability within and between categories before the categories were learned, compared to after.

The typical learned CP experiment would be the following: A set of stimuli is tested (usually in pairs) for similarity or discriminability. In the case of similarity, Multidimensional scaling might be used to scale the rated pairwise similarity of the set of stimuli. In the case of discriminability, same/different judgments and signal detection analysis might be used to estimate the pairwise discriminability of a set of stimuli. Then the same subjects or a different set are trained, using trial and error and corrective feedback, to sort the stimuli into two or more categories. After the categorization has been learned, similarity or discriminability are tested again, and compared against the untrained data. If there is significant within-category compression and/or between-category separation, this is operationally defined as CP.

The Whorf Hypothesis
We can now return both to the "Whorf Hypothesis" and the "weaker" CP for vowels: According to the Sapir-Whorf Hypothesis (of which Lawrence's acquired similarity/distinctiveness effects would simply be a special case), colors are perceived categorically only because they happen to be named categorically: Our subdivisions of the spectrum are arbitrary, learned, and vary across cultures and languages. But Berlin & Kay (1969) showed that this was not so: Not only do most cultures and languages subdivide and name the color spectrum the same way, but even for those who don't, the regions of compression and separation are the same. We all see blues as more alike and greens as more alike, with a fuzzy boundary in between, whether or not we have named the difference. So there is no Whorfian learning effect with colors: Or is there?

Evolved CP
First, back to vowels. The signature of CP is within-category compression and/or between-category separation. The size of the CP effect is merely a scaling factor; it is this compression/separation "accordion effect," that is CP's distinctive feature. In this respect, the "weaker" CP effect for vowels, whose motor production is continuous rather than categorical, but whose perception is by this criterion categorical, is every bit as much of a CP effect as the ba/pa and ba/da effects. But, as with colors, it looks as if the effect is an innate one: Our sensory category detectors for both color and speech sounds are born already "biased" by evolution: Our perceived color and speech-sound spectrum is already "warped" with these compression/separations.

Learned CP
Is that all there is to it? Apparently not. There are still the Lane/Lawrence demonstrations, lately replicated and extended by Goldstone (1994), that CP can be induced by learning alone. And there are also the countless categories cataloged in our dictionaries that could not possibly be inborn (though nativist theorists such as Fodor [1983] have sometimes seemed to suggest that all of our categories are inborn). There are even recent demonstrations that although the primary color and speech categories are probably inborn, their boundaries can be modified or even lost as a result of learning, and weaker secondary boundaries can be generated by learning alone.

Perhaps CP performs some useful function in categorization? In the case of innate CP, our categorically biased sensory detectors pick out their prepared color and speech-sound categories far more readily and reliably than if our perception had been continuous. Could something similar be the case for our repertoire of learned categories too?

Biological Foundations
Although CP plays an important role in music and is instrumental for speech perception, it is highly unlikely to be a special mechanism evolved particularly for human speech, since CP for speech contrasts can be observed in animals. (Kuhl & Miller, 1975) Moreover, categories can be learned by animals; the categorization of computer-generated 3D stimuli as a ‘cat’ or a ‘dog’, can be learned by a primate. [14]

Development
Developmentally, phonemic discrimination matures quite early in any sensory modality. At 12 months of age, it is reported that infants lose capacity to discriminate non-native consonantal contrasts that fall within a single native category is significantly reduced. [1, 2] Visually, 4- to 6-month old infants can identify English from French by viewing silently presented articulations, but at 8 months, only bilingual infants could. [17] Moreover, fluent speakers of a second language who have acquired that second language after the age of five still have trouble with vowel contrasts. [3]

Neurophysiological Evidence
Much of CP occurs at a preconscious level. [21] In auditory evoked potentials (ERP), there is generally a direct relationship between the timing of a response and the anatomical level at which it occurs. Correlates of categorical perception in the auditory cortex have been observed as early as 100 ms after the stimulus onset. [4, 8] Furthermore, linguistic experience shapes perception of components like vowels and consonants at a preconscious level. As mentioned previously, the vowel continuum /ö/-/õ/-/o/ is divided differently between Estonian (3 vowels) and Finnish (2 vowels) subjects, and mismatch negativity (MMN) occurs only if the difference between stimuli is across-category for the subject. [4] Although this occurs at a preconscious level, training effects have been observed. [5] Native Finnish speakers as well as Hungarians fluent in Finnish exhibit MMN across the /æ/-/e/ (which occurs in Finnish but not Hungarian), whereas monolingual Hungarian-speakers did not. [5] Hindi speakers have a perceptual boundary between /da/ and /Da/ (which are both mapped onto /da/ in English -- the /Da/ is perceived being slightly more "breathy"). [18] In a /ba/-/da/-/Da/ continuum, MMN was observed in Hindi and non-Hindi (French) speakers in the /ba/-/da/ boundary, but only in Hindi speakers was MMN observed across the /da/-/Da/ boundary. [6] Although Sharma et al. found an effect in MMN along a /ba/-/pa/ continuum and along the /da/-/ta/ continuum. [7,8] Additionally, across-boundary mismatch is observed for the /dæ/-/tæ/ continuum using MEG as well. [19]

Continuous Models
In 1991, Kuhl proposed a “perceptual magnet effect” in humans to explain the decrease in perceptual sensitivity near category prototypes. [11] Indeed, the “perceptual magnet effect” has been observed for both vowels and consonants, and is believed to mature for one’s native language between 7 and 11 months of age. [25-28] However, no existing studies attempt to separate the effects of the perceptual magnet and category boundaries, if indeed the effects are produced through separated mechanisms. The expected differences can be predicted in the auditory perceptual space in neural network simulations of auditory training. [29, 30]