Contact analysis

In cryptanalysis, contact analysis is the study of the frequency with which certain symbols precede or follow other symbols. The method is used as an aid to breaking classical ciphers.

Contact analysis is based on the fact that, in any sample of any written language, certain symbols appear adjacent to other symbols with varying frequencies. Moreover, these frequencies are roughly the same for almost all samples of that language, even when the distribution of the symbols themselves differs significantly from normal. This is true regardless of whether the symbols being used are words or letters.

In some ciphers, these properties of the natural language plaintext are preserved in the ciphertext, and have the potential to be exploited in a ciphertext-only attack.

Although in a sense contact analysis can be considered a type of frequency analysis, most discussions of frequency analysis concern themselves with the simple probabilities of the symbols in the text: $$P(X_i=a)$$ or $$P(X_i=a \cap X_{i+1}=b)$$

Contact analysis is based on the conditional probability that certain letters will precede or succeed other letters: $$P(X_i=b \mid X_{i-1}=a)$$, or $$P(X_i=c \mid X_{i-2}=a \cap X_{i-1}=b)$$, or even $$P(X_i \sub S \mid X_{i-1}\sub T \cap X_{i+1} \sub T)$$, where $$S$$ and $$T$$ are subsets of the alphabet being used.

Where frequency analysis is based on first-order statistics, contact analysis is based on second or third-order statistics.