Multistable auditory perception

Multistable auditory perception is a cognitive phenomenon in which certain auditory stimuli can be perceived in multiple ways. While multistable perception has been most commonly studied in the visual domain, it also has been observed in the auditory and olfactory modalities. In the olfactory domain, different scents are piped to the two nostrils, while in the auditory domain, researchers often examine the effects of binaural sequences of pure tones. Generally speaking, multistable perception has three main characteristics: exclusivity, implying that the multiple perceptions cannot simultaneously occur; randomness, indicating that the duration of perceptual phases follows a random law, and inevitability, meaning that subjects are unable to completely block out one percept indefinitely.

History
While binocular rivalry has been studied since the 16th century, the study of multistable auditory perception is relatively new. Diana Deutsch was the first to discover multistability in human auditory perception, in the form of auditory illusions involving periodically oscillating tones.

Experimental Findings
Different experimental paradigms have since been used to study multistable perception in the auditory modality. One is auditory stream segregation, in which two different frequencies are presented in a temporal pattern. Listeners experience alternating percepts: one percept is of a single stream fluctuating between frequencies, and the alternative percept is of two separate streams repeating single frequencies each.

Other experimental findings demonstrate the verbal transformation effect. In this paradigm, the input is a speech form repeated rapidly and continuously. The alternating percepts here are words—for example, continuous repetition of the word “life” results in the bistability of “life” and “fly.” Prefrontal activation is implicated with such fluctuations in percept, and not with changes in the physical stimulus, and there is also a possible inverse relationship between left inferior frontal and cingulate activation involved in this percept alternation.

Principles of Perceptual Bistability
The temporal dynamics observed in auditory stream segregation are similar to those of bistable visual perception, suggesting that the mechanisms mediating multistable perception, the alternating dominance and suppression of multiple competing interpretations of ambiguous sensory input, might be shared across modalities. Pressnitzer and Hupe analyzed results of an auditory streaming experiment and demonstrated that the perceptual experience that occurred exhibited all three properties of multistable perception found in the visual modality—exclusivity, randomness, and inevitability.

Exclusivity was satisfied, as there was “spontaneous alternation between mutually exclusive percepts,” and very little time was spent in an “indeterminate” experience. Randomness also characterized the phenomenon, as the first phase of perception is longer in duration than subsequent phases, and then the “steady-state of the temporal dynamics of auditory streaming is purely stochastic with no long-term trend.” Lastly, the percept alternation was inevitable; even though volitional control did reduce suppression of the specified percept, it did not exclude perception of the alternative percept altogether. These similarities between perceptual bistability in the visual and auditory modalities raise the possibility of a common mechanism governing the phenomenon. In Pressnitzer and Hupe's subjects, the distributions of phase durations in the two modalities were not significantly different, and it has been speculated that the intraparietal sulcus, likely involved in crossmodal integration, could be responsible for bistability in both domains. However, the absence of subject-specific biases across the modalities contradicts the notion that a “single top-down selection mechanism were the sole determinant of the auditory and visual bistability.” This observation, along with evidence of neural correlates at different stages of processing, instead suggests that competition is distributed and “based on adaptation and mutual inhibition, at multiple neural processing stages.”

Place model
When using a two stream tone test, specific populations of neurons activate, known as the place model. Event related potential (ERP) amplitude increases when the difference of the frequency of the two tones increase. This model hypothesizes that when this is happening, the distance between the two populations of neurons increase, so that the two populations will interact less with each other, allowing for easier tone segregation.

fMRI results
FMRI has been used to measure the correlation between listening to alternating tones compared to single stream of tones. The posterior regions of the left auditory cortex were modulated by the alternating tones, indicating that there may be areas of the brains responsible for stream segregation.

Sequential grouping
A problem of large behavioral importance is the question of how to group auditory stimuli. When a continuous stream of auditory information is received, numerous alternative interpretations are possible, but individuals are only consciously aware of one percept at a time. For this to occur, the auditory system must segregate and group incoming sounds, the goal being to “construct, modify, and maintain dynamic representations of putative objects within its environment”. It has been suggested that this process of binding sound events into groups is driven by different levels of similarities. One principle for binding is based on the perceptual similarity between individual events. Sounds that share many or all of their acoustic features are more likely to have been emitted by the same source, and thus are more likely to be linked to form a “proto-object”. The other principle for binding is based on the sequential predictability of sound events. If events reliably follow each other, it is also more likely that they have a common underlying cause.

Competition
A theory explaining the alternation of auditory percepts is that different interpretations are neurally represented simultaneously, but all but the dominant one at the time are suppressed. This idea of competition among parallel hypotheses might provide an explanation for the temporal dynamics observed in auditory stream segregation. The initial perceptual phase is held longer than the subsequent ones, “with the duration of the first phase being stimulus-parameter dependent and an order of magnitude longer in duration than parameter-independent subsequent phases”. At stimulus onset, the first percept might be that which is easiest to discover, based on featural proximity (and thus stimulus-parameter dependent), and it is held for relatively longer because time is required for other hypotheses to form. As more sensory information is received and processed, the “neural associations underlying the alternative sound organizations become strong and start to vie for dominance” and “the probabilities of perceiving different organizations tend to become more balanced with time”.