User:Janeha/sandbox

= Speech Perception = Speech perception can be defined as the "process of imposing a meaningful perceptual experience on an otherwise meaningless speech input". Speech perception is used to extract fundamental and distinctive features and the subtle differences in the timing of the acoustic signals(e.g., the consonants and the vowel

Overview
The process of speech perception begins with the auditory process, the acoustic signal. These acoustic signals are further decoded into acoustic cues, or phonemes. This acoustic information can be used at a higher-level of cognition when sounds are processed into meaningful utterances. In order to understand speech perception, speech, can first be defined as a complex, generative, and a rule-based variations of the limited number of discrete elements, such as phonemes. These elements, following hierarchical organization, are structurally formulated into words, and then into sentences. Although speech is received serially, it is processed in a parallel fashion. Speech is conceptually distinct from language, and it is believed to be specifically unique to humans. Acoustic cues are the perception of synthetic speech sounds and it can be defined by the dimensions of frequency, intensity and time. The perception of phonetic categories depend on these acoustic cues regarding the relative timing of the noise and the voice onset time (VOT) of the word. Numerous acoustic features are associated with a single phoneme. For example, the production of the phoneme /d/ is associated with different acoustic features depending on the vowel that precedes or follows the word in the context. Speech can be characterized by formants, patterns of high-energy peaks across frequencies and it further provides information about the speech-sound identity. For example, /da/ (higher frequency), can be distinguished from /ga/ (lower frequency). VOT is a temporal cue and it can be measured relative to the position of the release burst of speech. The different acoustical cues that are relevant for each type of acoustic signal can be explained by the biological view of speech perception. For example, the function of hemispherical asymmetries in auditory processing provides a solution to the underlying reason on how the listener can "invariantly" perceive speech. It was found that brain is developed in such way that speech perception can be processed in a parallel and in a complementary fashion. Left hemisphere is dominantly responsible for the rapid processing of speech, whereas right hemisphere is responsible for the comprehension of speech.

Biological Perspective
The role of biology in speech perception has been extensively researched, and it demonstrated that biological data plays a critical role in shaping the theories of speech. The acoustic signals are first processed at the auditory processing systems. Due to the complexity of the auditory pathway, the incoming acoustic signal is highly processed and re-coded by the time it reaches the auditory cortex. The auditory cortices in the two hemispheres are relatively specialized. The cortical auditory areas are organized into the left auditory cortical regions and it is mainly responsible for speech decoding and speech distinctions. The left auditory cortex has a shorter window and is sensitive to faster events (20-50ms), whereas the right cortex is sensitive at a slower range (150-250ms). It is studied that human nervous system are organized in such way that it processes simultaneous signals, and that it allows for people to readily understand and perceive speech. Animal studies are often used because humans and animals share the basic auditory system. Studying the macaque monkeys demonstrated the presence of neurons in the primary auditory cortex. It was demonstrated that the primary motor cortex is capable of representing the timing of phonetically important speech components.

Theories of Speech Perception
Speech perception can be represented by different kinds of theories and processes. These approaches can be dived into two general types: (1) acoustic based theories; and (2) articulatory based theories.

Acoustic Approach
The acoustic based theories argue the dominant importance of the acoustics in speech perception. This approach directly links the acoustic signals to the distinctive features of the acoustic signals, and the phonemes. Acoustic based theories also suggest that speech perception requires specialized mental processing. This suggests the modular process of the brain in that there are special locations in the brain that specializes in processing speech perception. The modularity of language can be supported from the following theoretical perspectives: categorical perception, duplex perception, and cognitive disorders. It focuses on the determinants of the acoustic cues and how the auditory system would extract theses cues from the acoustic signals. A strong proponent of acoustic based theories would argue that listening to speech is processed in a way in which specialized speech mechanisms are used to perceive speech, whereas, a weak proponent would suggest that listening to speech engages in previous knowledge of language, thus context based, and more experience based approach. Further, they tend to focus on how perception of speech is influenced by working memory. Some of the general arguments provided by the acoustic based theorists are: "(1) many language phonologies chose sounds on acoustic, not articulatory, terms; (2) nonspeaking infants and animals can differentiate many speech sounds; and (3) humans can differentiate nonspeech sounds perhaps as well as speech sounds". EXPLAIN The acoustics view of speech perception is limited by the segmentation problem and the variability problem because there are no one-to-one mapping between phonemes and the acoustic instantiations. Although there is lack of invariance in human's speech, the listener can quickly process and understand these enormous variances. Humans are able to auditorily encode the acoustic signals into speech. The ease of perceiving speech by the listener contradicts the complexity of speech perception. Using the acoustics view of speech, the theorists are not able to sufficiently explain the underlying cause for this phenomenon on how humans can readily perceive and understand speech. Thus, the theorists reconciled this acoustics view of speech with a nonmodular, but gestural approach in understanding the process of speech perception. More importantly, this new conceptualization advances the interdisciplinary approach to auditory perception in the area of cognitive science.

Articulatory Approach
The articulatory based theories claim that there is no direct relationship between the acoustic signal and the perceived phoneme, instead, there are involvements of the higher level neuromotor mediation in which the input pattern is compared with an internally generated pattern.

Motor Theory (Developed by Liberman and colleagues)
The motor theory of speech perception claims that objects of speech perception are articulatory events rather than acoustic events. Thus, the articulatory events are influenced by the neuromotor commands to the articulators, essentially, perceiving speech can be understood as perceiving gestures. This theory suggests that speech is perceived in terms of the place and manner of production of the acoustic signal, by mapping the phonemes and the vocal tract shapes. For example, the intended gestures of speech, such as lips, tongue and vocal folds can cue the listeners in understanding speech. Unlike the modular theory where it argues that there is a special speech processing area in the brain, the motor theory proposes that there are "overlapping activity of several neural networks, those that supply control signals to the articulators, and those that process incoming neural patterns from the ear". Further, although the acoustic view claims that acoustic signal itself contains the phonemes that can be extracted from the speech signals, the motor theory argues that acoustic signal does not contain the phonemes, but rather the acoustic cues. The features can be extensively recoded to recover the phonemes using the neuromotor mediation, and thereby leading to the articulatory gestures. Categorical perception is one of the main arguments of the motor theorists. The notion of categorical perception allows some solution to the variability in the acoustic inputs, such as the variability in dialects, noise, and emotion, and also, it is known as the lack of invariance. It occurs when a wide range of acoustic cues result in the perception of a limited number of sound categories, for example, from the experiments on VOT, categorical perception can be found when humans categorically perceive the stimuli /da/ (17ms) and ta (91ms) as different speech sounds. More importantly, according to the Mcgurk Effect, it was found that speech perception is influenced by context, in this case, visual perception of the articulatory behaviours. This further demonstrates that when there are visuo-audio mismatch, the perception of speech tends to weigh the visual perception much heavily. This multimodal perception of speech appeared to compensate for any lack of clarities resulting from coarticulation. The link between coarticulation and perceptual compensation can provide a strong evidence for the perceptual processes that are specific to speech. In sum, the general process of speech perception can be understood as the back and forth speech production and speech perception between the speaker and the receiver. The speaker goes through serial processes in producing speech, such as the producing phonemes, followed by neuromotor commands, muscle contractions, vocal tract shapes, and ultimately the acoustic signals in which the listener subconsciously perceives as the articulatory gestures. These gestures depend on a specialized speech specific decoder, and eventually, the listener will be able to comprehend the speech into meaningful words and sentences.

Direct Realist Theory
Like the motor theory, the direct realist theory of speech perception claims that objects of speech perceptions are articulatory. However the motor theory proposes that specialized and speech specific mechanisms play a critical role in speech perception, whereas the direct realist theory argues that "speech perception can be broadly characterized in the same term. The direct realist theory is literally defined in individual terms. "Direct", in the direct realist theory suggests that the acoustic signals provided by the speaker contains enough information for the listener to determine the articulatory gestures that can structure the acoustic signal exhibited by the speaker. This allows for the listener to "simply detect relevant information". The "realist" segment of the direct realist theory means that the listeners can recover the physical properties of speech. For example, the phonetic features that are represented by articulatory gestures. The realist view contrasts deeply with the mentalistic view in ways that phonetic features are directly related to the articulatory gestures, whereas in the mentalistic view, the phonetic features are internally generated by the acoustics-phoneme process of speech perception. There are no acoustic features that invariantly specify speech, instead, there are invariant properties in the articulatory gestures that clarify and make speech comprehensible to the listener. The articulatory gestures, according to Fowler, are processed in the premotor cortex, and it mediates the speech perception. Hence, the speaker's articulatory gestures, such as the closing or opening of the lips, would structure the acoustic signal, which then can mediate the perception of the listener to recover and process these articulatory gestures.