User:Kalekosphone/sandbox

In perception, auditory scene analysis (ASA) is the process by which the auditory system constructs meaningful perceptual components from sound. The term was coined by psychologist Albert Bregman, whose 1991 book summarized contemporary research and proposed conceptual foundations for the field. The study of ASA has traditionally concerned the perception of multiple, distinct sound sources. More recently, this term has also been used to encompass perception related to other factors in sound generation, such as reverberation due to the environment. Computational auditory scene analysis (CASA) involves the implementation of ASA in computational systems, which has contributed both to the development of formal theories of ASA in human listeners and to building systems for machine perception. The interaction of auditory scene analysis with the classic psychological concept of attention is described by the cocktail party problem.

Background
The soundwave received by the ear is often composed of a mixture of sounds produced by different sources. For instance, different instruments in a musical ensemble each produce their own distinct vibrations, but those vibrations combine in the air before reaching the ear. Despite only observing this single soundwave, a listener typically experiences several streams of sound, which may each appear to arise from a separate source. Auditory scene analysis describes that process by which multiple meaningful entities (e.g., sources such as musical instruments) are perceived from the single soundwave received at the ear. Determining these entities from the observed soundwave alone is ill-posed: there are infinitely many combinations of arbitrary source soundwaves that could physically match the observed soundwave. Therefore, because source structure is not inherent in sound alone, the auditory system itself must embody (realize, carry out) principles by which [it makes this determination].

Hypothesized principles have typically been formulated as heuristics which govern whether vibrations of varying frequencies across time should be grouped (as belonging to a single source) or segregated, somewhat analogous to Gestalt principles of perceptual organization in vision. In CASA, such principles have also been formulated in terms of constraints on the relationship between distinct sounds comprising a mixture (e.g., statistical independence) as well as assumptions about the properties of isolated sources. In the latter case, ASA is seen as a process of Bayesian inference, in which probable sources are inferred given the observed soundwave and assumptions (i.e., prior beliefs) about single sources.

In line with thinking on natural scene statistics, Bregman proposed that organisms' auditory systems are adapted to regularities in their natural sonic environment through evolution, and that this adaptation is the basis for many ASA principles. That is, because sources do not produce arbitrary sounds, organisms are able to internalize structure through evolution... He also suggested that perceptual learning throughout one's lifetime could shape how organisms hear auditory scenes.

History
In his 1895 treatise On the sensations of tone, Hermann von Helmholtz described how a note played by a musical instrument is composed of multiple, harmonically related pure tones (which each consist of a single frequency). He further described how one could manipulate whether they perceived the sound as a single note or as comprised of pure tones, which he termed synthetic versus analytic listening. In contrast to work on visual perceptual grouping which occurred in the early 20th century by the school of Gestalt perception, it was only in the 1950s that significant seminal work on auditory perceptual organization was conducted. Examples include Colin Cherry's 1953 research on the cocktail party problem and Broadbent and Ladefoged's 1957 work on grouping different frequencies in vowel sounds.

Studies on auditory perceptual organization continued in the 1960s-1980s by researchers such as Richard Warren, Chris Darwin, Albert Bregman, and others, with this body of work summarized in Bregman's 1990 book. A small amount of CASA research had begun prior to the publication of Bregman's book, mainly on speech separation systems.

Bregman's book further motivated computationally-inclined researchers to attempt to instantiate human auditory grouping principles in computational systems, particularly in the style set out by David Marr in his 1982 book Vision.

Research on ASA continues to the present but according to a 2016 review, the field lacks a comprehensive account of human ASA. Furthermore, a 2014 review noted that there is a lack of ecologically relevant research on human ASA, as most research has involved relatively simple laboratory stimuli rather than sounds that people hear in their everyday environments. In CASA, machine learning approaches such as non-negative matrix factorization and deep learning are being applied to solve ASA in specific applications, such as speech separation.

Experimental methods
To study ASA, it is necessary to examine what components (sometimes called streams) are perceived in an auditory scene. Therefore, experiments typically involve manipulating aspects of the sound mixture hypothesized to affect perceptual organization and assessing their effect on perceptual organization. Such experiments have used a variety of methods to measure listeners' perception, including:


 * collecting listeners' direct subjective reports of perception; for instance, whether they hear one or two concurrent sounds or on the clarity of a melody in a mixture of tones.
 * measuring a listener's ability to recognize whether a familiar melody is present in a set of interleaved tones, which may be difficult if the notes of the melody are perceptually segregated into different streams.
 * measuring a listener's ability to make temporal judgments; for instance, about the order of sounds or the duration of silences between sounds. Listeners tend to be better at making judgments about the temporal relationship between two sounds when they subjectively report that the two sounds group into the same stream, and tend to be worse when they report that the two sounds are segregated into different streams.
 * measuring psychophysical thresholds, for instance, the amplitude at which a tone embedded in a noisy background is detectable.
 * having a listener adjust an isolated "comparison" stimulus until it sounds like some component in a mixture
 * measuring a listener's classification of stimuli, for instance, whether a stimulus sounds like one vowel or another depends on specific sound frequencies being grouped together.
 * having a listener adjust an isolated "comparison" stimulus until it sounds like some component in a mixture
 * measuring a listener's classification of stimuli, for instance, whether a stimulus sounds like one vowel or another depends on specific sound frequencies being grouped together.

Perceptual phenomena
Researchers have identified several phenomena in which listeners tend to perceive specific types of source structure in ambiguous auditory scenes, or in which the manipulation of specific acoustic parameters lead to the perception of distinct sources. Bregman broadly characterized these as phenomena as involving grouping over time ("sequential grouping") or grouping in frequency ("simultaneous grouping"). He also distinguished "primitive segregation" from "schema-based" scene analysis that depends on the recognition of learned patterns such as a particular melody or language. Another class of ASA.

Another class of ASA phenomena do not have so much to do with grouping distinct elements, but rather show what occurs perceptually when sounds overlap in time and frequency -- this is a very bad sentence -- here, we refer to these as "filling-in" phenomena.

Sequential grouping
When sounds occur in succession, the auditory system must determine which sets of sounds were produced by the same source. Musicians can exploit the principles by which the auditory system achieves this in order to create the perception of a single melody (actually played by different musicians), or several melodies (actually played by a single musician). For example, the bregman interlocking xylophone example. For example, Bach.

Sequential grouping has most often been studied using sequences of tones. One commonly used sequence is the "ABA sequence", first introduced by van Noorden in 1975. The ABA sequence consists of two types of tones (A & B), which may vary in a number of acoustic attributes such as frequency or amplitude. The three-tone set is repeated with an intervening silence (ABA_ABA_ABA_). To test the effect of the varied acoustic parameters on perceptual grouping, listeners are typically asked whether they hear the sequence as an integrated "galloping" rhythm involving both tones (ABA_ABA_ABA_), or whether the sequence "splits" into two segregated isochronous rhythms, one fast (A_A_A_A_) and one slow (B___B___B___). Other measures of perceptual grouping, such as the ability to make temporal judgments between the A and B tones, may also be used.[give example] For some settings, tone sequences are bistable, meaning that

Parameters that affect the tendency of the ABA sequence to split perceptually include the frequency and timing of the tones. When the difference between the frequency of the A and B tones is small, listeners will tend to hear an integrated sequence. In contrast, when the difference between the onset of the two tones is smaller (i.e., the overall sequence is faster), the sequence will tend to split. The musical examples cited above take advantage of these perceptual effects. By using more complex tone sequences, Tougas and Bregman showed that tones will tend to group so as to create streams which span relatively narrow frequency ranges, even if equally sized frequency differences between adjacent tones would result from alternative perceptual organizations. Furthermore, even when the absolute frequency difference between two tones is held constant, they can group or segregate depending on their surrounding context. Other parameters that affect include spectral similarity, onset similarity and overall sound amplitude. For instance, it can be easier to hear a quiet voice mixed in with a loud voice than two loud voices...

Repetition also affects sequential grouping. The more times that the ABA segment is repeated, the more likely listeners are to report that the sequence splits into two streams. One actively pursued hypothesis is that the relative predictibility of tone sequences (Bendixen ... )

Repetition
- capturing a tone out of a stream - demo ... this may also work in simultaneous grouping

- cumulative streaming - demo

- embedded repetition - demo

Spatial hearing

.

Onset and offset
sudden amplitude changes instead of slow ones - demo

offset doesn't seem to matter so much - demo

old plus new hueristic

Harmonic mistuning
one of the most commonly studied

Common frequency modulation
Frequency modulated tones will tend to .... However, if two sets of tones are frequency-modulated out of phase with each other, this will not ...

there is not an increase in segregation if the tones are frequency-modulated out of phase with each other.

Spatial hearing

....

Perceptual "filling-in"
- continuity effect

- spectral completion

- results with speech

Schema-based
- melodies: diana deutsch

-billig word thing

- kevin

Auditory scene analysis across species
Different species may possess different ASA mechanisms, specific to their ecology. For instance, starlings are able to detect the presence of a familiar birdsong when it is mixed with several previously unheard songs, but humans are incapable of this task even after training.

BATS, marine mammals, owls, crickets

difference between us and many animals --> they use echos. except some blind people. a way to work this in?