SIGSALY

SIGSALY (also known as the X System, Project X, Ciphony I, and the Green Hornet) was a secure speech system used in World War II for the highest-level Allied communications. It pioneered a number of digital communications concepts, including the first transmission of speech using pulse-code modulation.

The name SIGSALY was not an acronym, but a cover name that resembled an acronym—the SIG part was common in Army Signal Corps names (e.g., SIGABA). The prototype was called the "Green Hornet" after the radio show The Green Hornet, because it sounded like a buzzing hornet, resembling the show's theme tune, to anyone trying to eavesdrop on the conversation.

Development
At the time of its inception, long-distance telephone communications used the "A-3" voice scrambler developed by Western Electric. It worked on the voice inversion principle. The Germans had a listening station on the Dutch coast which could intercept and break A-3 traffic.

Although telephone scramblers were used by both sides in World War II, they were known not to be very secure in general, and both sides often cracked the scrambled conversations of the other. Inspection of the audio spectrum using a spectrum analyzer often provided significant clues to the scrambling technique. The insecurity of most telephone scrambler schemes led to the development of a more secure scrambler, based on the one-time pad principle.

A prototype was developed at Bell Telephone Laboratories, under the direction of A. B. Clark, assisted by British mathematician Alan Turing, and demonstrated to the US Army. The Army was impressed and awarded Bell Labs a contract for two systems in 1942. SIGSALY went into service in 1943 and remained in service until 1946.

Operation
SIGSALY used a random noise mask to encrypt voice conversations which had been encoded by a vocoder. The latter was used to minimize the amount of redundancy (which is high in voice traffic), in order to reduce the amount of information to be encrypted.

The voice encoding used the fact that speech varies fairly slowly as the components of the throat move. The system extracts information about the voice signal 50 times a second (every 20 milliseconds).
 * ten channels covering the telephone passband (250 Hz – 2,950 Hz); are rectified and filtered to extract how much energy is in each of these channels.
 * another signal indicating whether the sound is voiced or unvoiced;
 * if voiced, a signal indicating the pitch; this also varied at less than 25 Hz.

Next, each signal was sampled for its amplitude once every 20 milliseconds. For the band amplitude signals, the amplitude converted into one of six amplitude levels, with values from 0 through 5. The amplitude levels were on a nonlinear scale, with the steps between levels wide at high amplitudes and narrower at low amplitudes. This scheme, known as "companding" or "compressing-expanding", exploits the fact that the fidelity of voice signals is more sensitive to low amplitudes than to high amplitudes. The pitch signal, which required greater sensitivity, was encoded by a pair of six-level values (one coarse, and one fine), giving thirty-six levels in all.

A cryptographic key, consisting of a series of random values from the same set of six levels, was subtracted from each sampled voice amplitude value to encrypt them before transmission. The subtraction was performed using modular arithmetic: a "wraparound" fashion, meaning that if there was a negative result, it was added to six to give a positive result. For example, if the voice amplitude value was 3 and the random value was 5, then the subtraction would work as follows:


 * $$3 - 5 \equiv -2,\ -2 + 6 \equiv 4\pmod 6 \,$$

— giving a value of 4.

The sampled value was then transmitted, with each sample level transmitted on one of six corresponding frequencies in a frequency band, a scheme known as "frequency-shift keying (FSK)". The receiving SIGSALY read the frequency values, converted them into samples, and added the key values back to them to decrypt them. The addition was also performed in a "modulo" fashion, with six subtracted from any value over five. To match the example above, if the receiving SIGSALY got a sample value of 4 with a matching random value of 5, then the addition would be as follows:


 * $$4 + 5 \equiv 9,\ 9 - 6 \equiv 3\pmod 6 \,$$

— which gives the correct value of 3.

To convert the samples back into a voice waveform, they were first turned back into the dozen low-frequency vocoded signals. An inversion of the vocoder process was employed, which included:
 * a white noise source (used for unvoiced sounds);
 * a signal generator (used for voiced sounds) generating a set of harmonics, with the base frequency controlled by the pitch signal;
 * a switch, controlled by the voiced/unvoiced signal, to select one of these two as a source;
 * a set of filters (one for each band), all taking as input the same source (the source selected by the switch), along with amplifiers whose gain was controlled by the band amplitude signals.

The noise values used for the encryption key were originally produced by large mercury-vapor rectifying vacuum tubes and stored on a phonograph record. The record was then duplicated, with the records being distributed to SIGSALY systems on both ends of a conversation. The records served as the SIGSALY one-time pad, and distribution was very strictly controlled (although if one had been seized, it would have been of little importance, since only one pair of each was ever produced). For testing and setup purposes, a pseudo-random number generating system made out of relays, known as the "threshing machine", was used.

The records were played on turntables, but since the timing – the clock synchronization – between the two SIGSALY terminals had to be precise, the turntables were by no means just ordinary record-players. The rotation rate of the turntables was carefully controlled, and the records were started at highly specific times, based on precision time-of-day clock standards. Since each record only provided 12 minutes of key, each SIGSALY had two turntables, with a second record "queued up" while the first was "playing".

Usage


The SIGSALY terminal was massive, consisting of 40 racks of equipment. It weighed over 50 tons, and used about 30 kW of power, necessitating an air-conditioned room to hold it. Too big and cumbersome for general use, it was only used for the highest level of voice communications.

A dozen SIGSALY terminal installations were eventually set up all over the world. The first was installed in the Pentagon building rather than the White House, which had an extension line, as the US President Franklin Roosevelt knew of the British Prime Minister Winston Churchill's insistence that he be able to call at any time of the day or night. The second was installed 60 m below street level in the basement of Selfridges department store on Oxford Street, London, close to the US Embassy on Grosvenor Square. The first conference took place on 15 July 1943, and it was used by both General Dwight D. Eisenhower as the commander of SHAEF, and Churchill, before extensions were installed to the Embassy, 10 Downing Street and the Cabinet War Rooms. One was installed in a ship and followed General Douglas MacArthur during his South Pacific campaigns. In total during WW2, the system supported about 3,000 high-level telephone conferences.

The installation and maintenance of all SIGSALY machines was undertaken by the specially formed and vetted members of the 805th Signal Service Company of the US Army Signal Corps. The system was cumbersome, but it worked very effectively. When the Allies invaded Germany, an investigative team discovered that the Germans had recorded significant amounts of traffic from the system, but had erroneously concluded that it was a complex telegraphic encoding system.

Significance
SIGSALY has been credited with a number of "firsts"; this list is taken from (Bennett, 1983):
 * 1) The first realization of enciphered telephony
 * 2) The first quantized speech transmission
 * 3) The first transmission of speech by pulse-code modulation (PCM)
 * 4) The first use of companded PCM
 * 5) The first examples of multilevel frequency-shift keying (FSK)
 * 6) The first useful realization of speech bandwidth compression
 * 7) The first use of FSK-FDM (Frequency Shift Keying-Frequency Division Multiplex) as a viable transmission method over a fading medium
 * 8) The first use of a multilevel "eye pattern" to adjust the sampling intervals (a new, and important, instrumentation technique)