Speech-to-song illusion

The speech-to-song illusion is an auditory illusion discovered by Diana Deutsch in 1995. A spoken phrase is repeated several times, without altering it in any way, and without providing any context. This repetition causes the phrase to transform perceptually from speech into song. Though mostly notable with languages that are non-tone, like English and German, it is possible to happen with tone languages, like Thai and Mandarin.

Discovery and first experiment
The illusion was discovered by Deutsch in 1995 when she was preparing the spoken commentary on her CD ‘Musical Illusions and Paradoxes'. She had the phrase ‘sometimes behave so strangely' on a loop, and noticed that after it had been repeated several times it appeared to be sung rather than spoken. Later she included this illusion on her CD ‘Phantom Words and other Curiosities' and noted that once the phrase had perceptually morphed into song, it continued to be heard as song when played in the context of the full sentence in which it occurred.

Deutsch, Henthorn, and Lapidis examined the illusion in detail. They showed that when this phrase was heard only once, listeners perceived it as speech, but after several repetitions, they perceived it as song. This perceptual transformation required that the intervening repetitions be exact; it did not occur when they were transposed slightly, or presented with the syllables in jumbled orderings. In addition, when listeners were asked to repeat back the phrase after hearing it once, they repeated it back as speech. Yet when they were asked to repeat back the phrase after hearing it ten times, they repeated it back as song.

Neurological substrates of the illusion
Theories of the neurological substrates of speech and song perception have been based on responses to speech and song stimuli, and these differ in their features. For example, the pitch content within spoken syllables generally changes dynamically, while the pitches of musical notes tend to be stable and the notes tend to be of longer duration. For this reason, theories of the brain substrates of speech and song perception have invoked explanations in terms of the acoustic features involved. Yet in the speech-to-song illusion a phrase is repeated exactly, with no change in its features; however, it can be heard either as speech or as song. For this reason, several studies have explored the brain regions that are involved in the illusion. Increased activation has been found in the frontal and temporal lobes of both hemispheres when the listener was perceiving a repeated spoken phrase as sung rather than spoken. The activated regions included several that other researchers had found to be activated while listening to song.

Speech material conducive to the illusion
Phrases that are marked by syllables with stable pitches and that favor a metrical interpretation tend to be conducive to the illusion. However, the illusion is not enhanced by regular repetitions of the entire phrase. Further, the illusion is stronger for phrases in languages that are more difficult to pronounce and when listeners are unable to understand the language of the utterance.

Listeners who experience the illusion
The speech-to-song illusion occurs in listeners both with and without musical training. It occurs in listeners who speak different languages, including the non-tone languages English, Irish, Catalan, German, Italian, Portuguese, French, Croatian, and Hindi, and the tone languages Thai and Mandarin; however, it is weaker in speakers of tone languages than non-tone languages.

Related illusions
Margulis and Simchi-Gross have reported related illusions in which different types of sound are transformed into music by repetition. Random sequences of tones were heard as more musical when they were looped, and clips consisting of a mix of environmental sounds sounded more musical following repetition. These effects were weaker than that of the original speech-to-song illusion, perhaps because speech and song are particularly intertwined perceptually, and also because the characteristics of the speech producing the original illusion are particularly conducive to a strong effect.

Explanations of the illusion
Repetition is a particularly important characteristic of music, and so provides an important cue that a phrase should be considered as music rather than speech. More specifically, in song, the pitches of vowels are distinctly heard, but in speech they appear watered down. It has been suggested that in speech the neural circuitry underlying pitch perception is somewhat inhibited, enabling the listener to focus attention on consonants and vowels, which are important to verbal meaning. Exact repetition of spoken words may cause this circuitry to become disinhibited, so that pitches are heard more saliently, and so as sung. Indeed, the brain structures that are activated when the illusion occurs correspond largely to those that are activated in response to song.

In addition, several features of a spoken phrase that are likely to occur in song are conducive to the illusion. These include syllables with more stable pitches, and phrases with more regular distributions of accents. Other explanations invoke higher-level musical structure and memory. Listeners are better able to discriminate pitches in repeated rather than unrepeated phrases when the pitches violate the structure Western of tonal music. Long term memory for melodies may also be involved: If the prosodic features of a spoken phrase are similar to those of a well-known melody, the brain circuitries underlying musical pitch patterns and rhythms can be invoked, so that the phrase is heard as song.

Relationship to musical composition
Many composers, including Gesualdo, Monteverdi, and Mussorgsky, have argued that expressivity in music can be derived from inflections in speech, and they have included features of speech in their music. Another relationship was invoked by Steve Reich, in his compositions such as Come Out and It's Gonna Rain. He presented spoken phrases in stereo and looped them, gradually offsetting the sounds from the two sources so as to create musical effects, and these were enhanced as the discrepancy widened. Further, in Reich’s composition Different Trains brief excerpts of speech were embedded in instrumental music so as to bring out their musical quality. Today, much popular music, particularly rap music, consists of chanting rhythmic speech with musical accompaniment.