Wolfgang von Kempelen's speaking machine



Wolfgang von Kempelen's speaking machine is a manually operated speech synthesizer that began development in 1769, by Austro-Hungarian author and inventor Wolfgang von Kempelen. It was in this same year that he completed his far more infamous contribution to history: The Turk, a chess-playing automaton, later revealed to be a very far-reaching and elaborate hoax due to the chess-playing human-being occupying its innards. But while the Turk's construction was completed in six months, Kempelen's speaking machine occupied the next twenty years of his life. After two conceptual "dead ends" over the first five years of research, Kempelen's third direction ultimately led him to the design he felt comfortable deeming "final": a functional representational model of the human vocal tract.

First design
Kempelen's first experiment with speech synthesis involved only the most rudimentary elements of the vocal tract necessary to produce speech-like sounds. A kitchen bellows, used to stoke fires in wood-burning stoves, was invoked as a set of lungs to supply the airflow. A reed extracted from a common bagpipe was implemented as the glottis, the source of the raw fundamental sound in the vocal tract. The bell of a clarinet made for a sufficient mouth, despite its rigid form. This basic model was able to produce simple vowel sounds only, though some additional articulation was possible by positioning one's hand at the bell opening to obstruct airflow. The physical hardware for constructing the nasals, plosives and fricatives that most consonants require was not present, however. Kempelen, like many other early pioneers of phonetics, misunderstood the source of the perceived "higher frequencies" of certain sounds as a function of the glottis, rather than as the function of the formants of the entire vocal tract, so he abandoned his single-reed design for a multiple-reed approach.

Second design
The second design involved a console, similar to that of a musical organ of the period, in which the operator manned a set of keys, one for each letter. The sounds were produced by a common bellows that fed air through various pipes with the appropriate shapes and obstructions needed to produce that letter. Through experimentation, he came to find that the reed's resonant length was not crucial to the creation of the high-frequency components of certain vowels and fricatives, so he tuned them all to be the same pitch for the sake of consistency between letters. While not all letters were represented at this point, Kempelen had developed the technology required to produce most vowels and several consonants, including the plosive /p/, and the nasal /m/, and thus was in a position to begin forming syllables and short words. However, this immediately led to the primary flaw of his second design: the parallel nature of the multiple reeds allowed for more than one letter to be sounded at a time. And in the process of building syllables and words, the sonic “overlap” (now referred to as co-articulation) rendered sounds very uncharacteristic of human speech, undermining the intention of the design altogether. Kempelen comments:

“In order to continue my experiments it was necessary, above all, that I should have a perfect knowledge of what I wanted to imitate. I had to make a formal study of speech and continually consult nature as I conducted my experiments. In this way my talking machine and my theory concerning speech made equal progress, the one serving as guide to the other.”

"It was possible, following the methods I'd been using, to invent separate letters, but never to combine them to form syllables, and that it was absolutely necessary to follow nature which has only one glottis and one mouth, through which every sound emerges and which gives a unity to them."

Thus, Kempelen began work on his third, and ultimately final design, which itself was in many ways a "close-as-possible" representation of the physiology of the vocal tract.

Third design
The third approach followed a similar design to the first, which was conceptually more faithful to the natural design of the human vocal tract than the second. It consisted, like before, of a bellows, a reed and a simulated mouth (this time made of India rubber, for better creation of vowel sounds via manipulation by hand), but also included a "throat" to which a "nasal cavity" was attached (complete with two "nostrils" for pronouncing nasal consonants), as well as several levers and tubes dedicated to pronouncing /s/ and /ʃ/, a rod that would interfere with the reeds vibration to articulate /r/, and separate, smaller bellows that would allow air to pass the reed while the mouth was completely closed (a feature required for pronouncing /b/). At one point, a special valve intended to simulate /f/ was included, but was later removed when it was revealed that the same sound could be achieved by simply closing all of the orifices of the machine and allowing air to leak from the cracks. Similarly, at one point in the design, there was an alternate "mouth" assembly consisting of a wooden box with a pair of hinged shutters that acted as lips. Inside the box resided a hinged, wooden, string-operated flap that acted as a tongue. The purpose of this assembly was to mimic the mouth and tongue in the construction of plosives such as "b” and "d”, but was later removed when Kempelen recognized that without a proper tongue, the machine would never be able to produce /t/, /d/, /k/ and /ɡ/. He found his way around this entire problem by replacing /t/ and /k/ with the /p/, and /d/ and /ɡ/ with /b/ (which itself only differed in voicing from /p/). In the context of a familiar word, listeners often ignored the mispronunciation altogether (a phenomenon later explored by researchers in the field of cognitive science). Kempelen believed that people were more forgiving of the errors made by his machine due to the frequency of the reed and vocal tract resonant length he chose to use, which create a resonance much more like a young child, than that of an adult. This third design, unlike those before it, was completely capable of speaking complete phrases in French, Italian and English (German was possible, but required a greater skill-level by the operator, due to the more frequent use of consonants in the German language). Its greatest limitation was the bellows, which, although they were six times the capacity of human lungs, ran empty of air much faster than that of its human counterpart. Because the design was based on a single reed as the glottal sound-source, he had none of the problems of co-articulation that came inherently with the second design. But that single reed also meant that the Speaking Machine had a monotone voice. Kempelen expended some time to try and introduce several prosodic pitch-variation mechanisms into the reed assembly, but to no avail. He decided to leave the design to be improved upon by the next batch of experimenters. All of these important additions for the third design came from the two decades of intensive research of the vocal tract in relation to spoken languages by Kempelen, for which the behavior of each crucial physiological element of speech production was scrutinized and replicated acoustically and/or mechanically.

A significant contribution
Shortly after the completion and exhibition of his Speaking Machine, in 1804, von Kempelen died, though not before publishing an extremely comprehensive journal of the past twenty years of his research in phonetics. The 456-page book, titled Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine (which translates to The Mechanism of Human Speech, with a Description of a Speaking Machine, published in 1791), contained every technical aspect of both Kempelen's construction of the Speaking Machine (including the preliminary designs) and his studies of the human vocal tract.

In 1837, Sir Charles Wheatstone resurrected the work of Wolfgang von Kempelen, creating an improved replica of his Speaking Machine. Using new technology developed over the previous 50 years, Wheatstone was able to further analyze and synthesize components of acoustic speech, giving rise to the second wave of scientific interest in phonetics. After viewing Wheatstone's improved replica of the Speaking Machine at an exposition, a young Alexander Graham Bell set out to construct his own speaking machine with the help and encouragement of his father. Bell's experiments and research ultimately led to his invention of the telephone in 1876, which revolutionized global communication.

In 1968, Marcel Van den Broecke (University of Amsterdam) built a replica as part of an MA thesis, about which he reported in "Sound Structures", Marcel van den Broecke, Vincent van Heuven and Wim Zonneveld (eds.), chapter 2, p 9-19: "Wolfgang von Kempelen's Speaking Machine as a Performer", Foris Publications, Dordrecht-Netherlands/Cinnaminson-USA, 1983. Acoustic predictions using N-tube approximations of the vocal tract and applying them to the replica's characteristics showed what had already been established perceptually, namely that the machine could only produce two vowel-like sounds, viz. an /a/-like vowel and an /o/-like vowel. Of the consonants produced, the general purpose plosive is very convincing. A general purpose nasal can also easily be identified, but sibilants and the rattling /r/ are as unpleasant as eye witness von Windisch reported two centuries earlier.