Speech shadowing

Speech shadowing is a psycholinguistic experimental technique in which subjects repeat speech at a delay to the onset of hearing the phrase. The time between hearing the speech and responding, is how long the brain takes to process and produce speech. The task instructs participants to shadow speech, which generates intent to reproduce the phrase while motor regions in the brain unconsciously process the syntax and semantics of the words spoken. Words repeated during the shadowing task would also imitate the parlance of the shadowed speech.

The reaction time between perceiving speech and then producing speech has been recorded at 250 ms for a standardised test. However, for people with left dominant brains, the reaction time has been recorded at 150 ms. Functional imaging finds that the shadowing of speech occurs through the dorsal stream. This area links auditory and motor representations of speech through a pathway that starts in the superior temporal cortex, extends to the inferior parietal cortex and ends with the posterior and inferior frontal cortexes, specifically in Broca's area.

The speech shadowing technique was created as a research technique by the Leningrad Group led by Ludmilla Chistovich and Valerij Kozhevnikov in the late 1950s. In the 1950s, the Motor theory of speech perception was also in development through Alvin Liberman and Franklin S. Cooper. It has been used for research on stuttering and divided attention, with focus on the distraction of conversational audio while driving. Speech shadowing also has applications for language learning, as an interpretation method and in singing.

History
The Leningrad group was interested in the time difference between the articulation and perception of speech. The speech shadowing technique was formulated to measure this difference. To measure the initiation of speech, an artificial palate was placed in the speaker's mouth. When the tongue moved to begin pronunciation and touched the palate, the measurement of reaction time began. The experiment concluded that the reaction time for consonants was consistently shorter than the reaction time to any vowel. The reaction time to a vowel depended on the consonant that came before it. This supported the phoneme as being the most basic unit of speech registered by the brain, rather than a syllable. The phoneme is the smallest distinguishable unit of sound, but the smallest unit that has assigned meaning is a consonant-vowel syllable.

Ludmilla Chistovich and Valerij Kozhevnikov focused on research of the mental processes that stimulate the functions of perception and production of speech in communication. In linguistics, speech perception was the chronological process that analysed steadily paced and similar sounding words but Chistovich and Kozhevnikov found speech perception to be the staggered integration of syllables known as non-linear dynamics. This refers to the diversity of tones and syllables in speech, which is perceived without a conscious detection of delay and forgotten with the limited working memory capacity. This observation developed research towards the speech shadowing technique for research in psycholinguistics.

Shadowing was used to measure the reaction time taken to repeat consonant-vowel syllables. Alveolar consonants were measured when the tongue first touched an artificial palate and labial consonants were measured by the contact of metal pieces when the upper and lower lips pressed together. The participant would begin to mimic the consonant as the speaker finished the utterance of the consonant. This consistent rapid response shifted research focus towards close speech shadowing.

Close speech shadowing is when the technique requires an immediate repetition, at the fastest pace a person is able to achieve. It does not allow people to hear the entire phrase beforehand or to understand the words vocalised until the end of a sentence. It was found that close speech shadowing would occur at the shortest delay of 250 ms. It has also been found to occur with a minimum delay between 150 m/s in left-hemisphere dominant brains. The left hemisphere is associated with enhanced performance with linguistic skill and information processing. It engages with analytic patterns of thought and experiences ease with the speech shadowing task.

The short delay of response occurs as the motor regions of the brain have recorded cues that are related to consonants. The brain would then estimate the adjacent vowel syllable before it is heard. When the vowel is registered through the auditory system, it would confirm the action to produce speech based on the estimate. If the vowel estimate is denied, a short delay in response occurs as the motor region configures an alternate vowel.

Biological functioning
Research has developed a biological model as to how the meaning of speech can be perceived instantaneously even though the sentence has never been heard before. An understanding of syntactic, lexical and phonemic characteristics is first required for this to occur. Speech perception also requires the physical components of the auditory system to recognise similarities in sounds. Within the basilar membrane, energy is transferred, and specific frequencies can be detected and activate auditory hairs. The auditory hairs can be stimulated to sharpened activity when a tonal emission is held for 100 ms. This length of time indicates that speech shadowing ability can be enhanced by a moderately paced phrase.

Shadowing is more complex than only the use of the auditory system. A shadow response can reduce the delay by analysing the temporal difference between the pronunciation of phonemes within a syllable. During a shadowing task, the process of perceiving speech and a subsequent response by the production of speech does not occur separately, it would partially overlap. The auditory system shifts between a translation stage of perceiving phonemes and a choice phase of anticipating the following phonemes to create an immediate response. This period of overlap occurs in 20 – 90 ms, depending on the combination of vowels with consonants.

The translation phase involves afferent codes that uses the auditory system and neural networks. The choice phase involves efferent codes, which uses muscle groups that contribute to a response. These coding systems are functionally different but interact to create a positive feedback loop in auditory functioning. This linking between perception and response in a speech shadowing task can be enhanced by the instructions given to participants. Analysing the variations of instructions of shadowing tasks concludes that through each case, the motor systems are primed to respond optimally and reduce a delay in reaction time. These points of interaction between the systems that permit speech perception and production occur without consciousness. This feedback loop is experienced as a linear process in functional reality. When participants are instructed to shadow speech, functional reality consists only of intent to reproduce speech, active listening and production of speech.

Speech perception also has links to phonological processing skills. This includes recognition of all phonemes in a language and how they can combine to form common syllables. A low understanding of phonological norms can negatively affect performance in a speech shadowing task. This is measured through the inclusion of proper and nonsense words in the task. High phonological processing skills produced shorter reaction times and low phonological processing skilled participants experienced uncertainty and slower responses.

Motor theory of speech perception
The mechanisms of speech shadowing could also be accounted for by the motor theory of speech perception. It states that shadowed words are perceived by shifting attention towards to motions and gestures that are created during pronunciation of speech instead of an attentional shift towards rhythmic and tonal characteristics of sound. The behaviourist theory cites that the motor system has primary functioning during both speech perception and production. Auditory and visual analysis has established that the vocal tract has developed a coarticulation of consonants and vowels during shadowing. This provides evidence that human speech is a communication form of efficient coding rather than of complex semantics and syntax. The interaction between the coding of perception and production of speech in this motor theory has also gained more evidence through the discovery of mirror neurones.

Stuttering
The speech shadowing technique is part of research methods that examine the mechanics of stuttering and identifies practical improvement strategies. A primary characteristic of stuttering is a repeated movement, characterised by the repetition of a syllable. In this activity, stutters are made to shadow a repeated movement that is internally or externally sourced. It reduces the likelihood of stuttering as the linguistic mental block is overturned and conditioned to provide an opening for fluid speech. Mirror neurones of the frontal lobe are active during this exercise and act to link speech perception and production. This process combined with cortical priming is engaged to produce the visible response.

Another primary characteristic of stuttering is a fixed posture, involving the prolongation of sounds. Speech shadowing research involving fixed postures produces no benefit in improving speech flow. The elongation of words in this stuttering characteristic does not align with the auditory system, which functions efficiently with moderately paced speech.

Speech shadowing has also been used in research into pseudo-stuttering, a voluntary speech impediment. Pseudo-stuttering involves identifying primary stuttering characteristics and realistic shadowing. It is used as an activity when studying fluency disorders, for students to experience how psychological and social outcomes are impacted by stuttering with strangers. Participants of this activity reported feelings of anxiety, frustration and embarrassment, which aligned with the reported emotional states of natural stutterers. The participants also reported lowered expectations towards sufferers in public situations.

Dichotic Listening Test
The speech shadowing technique is used in dichotic listening tests, produced by E. Colin Cherry in 1953. During dichotic listening tests, subjects are presented with two different messages, one in the right ear and one in the left ear. The participants are instructed to focus on one of the two messages and to shadow the attended message out loud. The perceptual ability of the participant is measured as subjects attend to the instructed message while the alternate message behaves as a distraction. Various stimuli are then presented to the other ear, and subjects are afterwards queried on what can be recalled from these messages despite instruction to ignore. Speech shadowing has here been manipulated as an experimental technique to study and test divided attention.

Driving
Research into the effect of audio stimuli resulting from mobile phone use while driving, has used the speech shadowing technique in its methodology. Speech shadowing tasks that have combined a conversational stimulus with a visual stimulus while driving are reported by participants as a distraction that directs focus away from the road and visual periphery. The study concludes that the combination of audio and visual stimuli have little effect on a driver's ability to manoeuvre a vehicle but it does impair spatial and temporal judgement, which is not detected by the driver. This includes a driver's judgement of their speed, distance from a parallel vehicle and a delayed reaction to a sudden brake from a driver ahead.

The speech shadowing technique had also been used to research whether it is the action of producing speech or concentration on the semantics of speech that distracts drivers. The task of simple speech shadowing had no effects on driving ability but the combination of simple speech shadowing with a content associated follow-up activity showed impairment in reaction time. The high attentional demand required for this alternate task shifts concentration from the primary task of driving. This impairment is problematic as fast reaction time when driving is required to respond to general traffic signals and signage as well as unpredictable events to maintain safety.

Speech shadowing has also been used to imitate the amount of concentration that is lost when people engage in mobile phone conversations while driving, depending on the location that the mobile phone is placed. Speech shadowing from a sound source that is located in front of a driver produces a shorter delay in reaction time and more accuracy in shadowed content than when the sound source is located beside the driver. This research concluded that concentration on a visual stimuli draws the attention of the auditory system to the same direction and that conversational audio emitted from a mobile phone placed in front of a driver produces less distraction than a mobile phone placed to the side of a driver as it is closest to the forward-facing visual stimuli of the road that is a driver's primary focus.

Language learning
The most basic form of speech shadowing occurs without the need of cognition. This is evidenced by the phonetic imitation of mentally impaired individuals who do not require prior knowledge to engage in a shadowing task but do not understand the semantics of the shadowed speech. The higher process of acquiring a language is also innate. It can be spontaneously developed through the technique of speech shadowing as sounds are repeated and also semantically related. Research to enhance the developing reading skills of children use the speech shadowing technique which states that the pace children are verbally taught should be catered towards a child's reading ability. Poor readers have slower reaction times in speech shadowing activities than good readers for age-relatively difficult content. They would also experience slower shadowing responses when sentences were partially grammatically incorrect. Shadowing research has identified a low understanding of grammatical structure and a low range of vocabulary as characteristics of a poor reader and target areas for developmental aid.

When learning a foreign language, shadowing can be used as a technique to practice speech and to acquire knowledge. It follows an interactionist perspective of language development. The method of speech shadowing in a learning setting involves providing shadowing tasks of incremental semantic and pronunciation difficulty and rating the accuracy of the shadowed response. It was previously difficult to create a standardised scoring system as learners would slur and skip words when uncertain in order to keep up with the pace of the phrases that were to be shadowed. Automatic scoring using alignment-based and clustering-based scoring techniques were designed and are now implemented to improve the experience of learning of a foreign language through speech shadowing techniques.

Remote learning of language can occur without the presence of a real-time speaker through text-to-speech applications and using the principle of speech shadowing. As part of the process to perceive sound, the auditory system distinguishes formant frequencies. The first formant characteristic perceived in the cochlear is the most prominent cue as it there is an attentional shift towards this signal. The formant characteristics of synthetically produced speech currently differs to speech produced by the human vocal tract. This information received effects the pronunciation of speech produced in a shadowing activity. Applications for learning languages are focused on developing greater accuracy in pronunciation and pitch since these features are also replicated when shadowing speech.

Interpretation
Interpreters also use the speech shadowing technique, with modifications to the delivery and expected result. The first difference is that the shadowing response is chosen to be delivered in a different language to the initial vocalisation of the phrase. The phrase is also not translated verbatim. Languages may not carry parallel words of meaning, so the role of an interpreter is to place emphasis on semantics during translation. Close speech shadowing would be the primary focus of an interpreter as the role involves the production of a semantically accurate response as well as a steady, conversation-like pace. The goal of interpretation is to generate the effect of an absent third person while producing brevity and clarity in the conversation. Although the role of the interpreter is to be aligned with the pace, the conversation cannot move too fast. Mental load only allows for partial overlap between perceiving, comprehending, translating and producing speech and it is also affected by diminishing returns. An interpreter is commonly engaged with a non-dominant language to communicate. Shadowing speech during a positron emission tomography finds greater stimulation of the temporal cortex and motor-function regions. This demonstrates that a greater conscious effort is required to engage with a non-dominant language.

Singing
Speech shadowing can be used in the alternate form of vocal shadowing. It also requires the process of perception and production but with inverted energy distributions of a low input and a large output. Vocal shadowing perceives pure tones and focuses on the manipulation of the vocal tract to produce a shadowed response. Singers in comparison to non-singers are able to produce a shadowed response phrase that includes more accuracy in achieving the target frequencies and rapid movement between the frequencies. Research associates this ability with greater control and awareness of the vocal-fold breadth. The glottal stop is a technique manipulated by singers during shadowing to enhance frequency change.