Biclique attack

A biclique attack is a variant of the meet-in-the-middle (MITM) method of cryptanalysis. It utilizes a biclique structure to extend the number of possibly attacked rounds by the MITM attack. Since biclique cryptanalysis is based on MITM attacks, it is applicable to both block ciphers and (iterated) hash-functions. Biclique attacks are known for having weakened both full AES and full IDEA, though only with slight advantage over brute force. It has also been applied to the KASUMI cipher and preimage resistance of the Skein-512 and SHA-2 hash functions.

The biclique attack is still the best publicly known single-key attack on AES. The computational complexity of the attack is $$2^{126.1}$$, $$2^{189.7}$$ and $$2^{254.4}$$ for AES128, AES192 and AES256, respectively. It is the only publicly known single-key attack on AES that attacks the full number of rounds. Previous attacks have attacked round reduced variants (typically variants reduced to 7 or 8 rounds).

As the computational complexity of the attack is $$2^{126.1}$$, it is a theoretical attack, which means the security of AES has not been broken, and the use of AES remains relatively secure. The biclique attack is nevertheless an interesting attack, which suggests a new approach to performing cryptanalysis on block ciphers. The attack has also rendered more information about AES, as it has brought into question the safety-margin in the number of rounds used therein.

History
The original MITM attack was first suggested by Diffie and Hellman in 1977, when they discussed the cryptanalytic properties of DES. They argued that the key-size was too small, and that reapplying DES multiple times with different keys could be a solution to the key-size; however, they advised against using double-DES and suggested triple-DES as a minimum, due to MITM attacks (MITM attacks can easily be applied to double-DES to reduce the security from $$2^{56*2}$$ to just $$2*2^{56}$$, since one can independently bruteforce the first and the second DES-encryption if they have the plain- and ciphertext).

Since Diffie and Hellman suggested MITM attacks, many variations have emerged that are useful in situations, where the basic MITM attack is inapplicable. The biclique attack variant was first suggested by Dmitry Khovratovich, Rechberger and Savelieva for use with hash-function cryptanalysis. However, it was Bogdanov, Khovratovich and Rechberger who showed how to apply the concept of bicliques to the secret-key setting including block-cipher cryptanalysis, when they published their attack on AES. Prior to this, MITM attacks on AES and many other block ciphers had received little attention, mostly due to the need for independent key bits between the two 'MITM subciphers' in order to facilitate the MITM attack — something that is hard to achieve with many modern key schedules, such as that of AES.

The biclique
For a general explanation of what a biclique structure is, see the article for bicliques.

In a MITM attack, the keybits $$K_1$$ and $$K_2$$, belonging to the first and second subcipher, need to be independent; that is, they need to be independent of each other, else the matched intermediate values for the plain- and ciphertext cannot be computed independently in the MITM attack (there are variants of MITM attacks, where the blocks can have shared key-bits. See the 3-subset MITM attack). This property is often hard to exploit over a larger number of rounds, due to the diffusion of the attacked cipher.

Simply put: The more rounds you attack, the larger subciphers you will have. The larger subciphers you have, the fewer independent key-bits between the subciphers you will have to bruteforce independently. Of course, the actual number of independent key-bits in each subcipher depends on the diffusion properties of the key-schedule.

The way the biclique helps with tackling the above, is that it allows one to, for instance, attack 7 rounds of AES using MITM attacks, and then by utilizing a biclique structure of length 3 (i.e. it covers 3 rounds of the cipher), you can map the intermediate state at the start of round 7 to the end of the last round, e.g. 10 (if it is AES128), thus attacking the full number of rounds of the cipher, even if it was not possible to attack that amount of rounds with a basic MITM attack.

The meaning of the biclique is thus to build a structure effectively, which can map an intermediate value at the end of the MITM attack to the ciphertext at the end. Which ciphertext the intermediate state gets mapped to at the end, of course depends on the key used for the encryption. The key used to map the state to the ciphertext in the biclique, is based on the keybits bruteforced in the first and second subcipher of the MITM attack.

The essence of biclique attacks is thus, besides the MITM attack, to be able to build a biclique structure effectively, that depending on the keybits $$K_1$$ and $$K_2$$ can map a certain intermediate state to the corresponding ciphertext.

Bruteforce
Get $$2^d$$ intermediate states and $$2^d$$ ciphertexts, then compute the keys that maps between them. This requires $$2^{2d}$$ key-recoveries, since each intermediate state needs to be linked to all ciphertexts.

Independent related-key differentials
(This method was suggested by Bogdanov, Khovratovich and Rechberger in their paper: Biclique Cryptanalysis of the Full AES )

Preliminary: Remember that the function of the biclique is to map the intermediate values, $$S$$, to the ciphertext-values, $$C$$, based on the key $$K[i,j]$$ such that: $$\forall i,j : S_j \xrightarrow[f]{K[i,j]}C_i $$

Procedure: Step one: An intermediate state($$S_0$$), a ciphertext($$C_0$$) and a key($$K[0,0]$$) is chosen such that: $$S_0\xrightarrow[f]{K[0,0]}C_o$$, where $$f$$ is the function that maps an intermediate state to a ciphertext using a given key. This is denoted as the base computation.

Step two: Two sets of related keys of size $$2^d$$ is chosen. The keys are chosen such that: In other words: An input difference of 0 should map to an output difference of $$\Delta_i$$ under a key difference of $$\Delta^K_i$$. All differences are in respect to the base computation. An input difference of $$\nabla_j$$ should map to an output difference of 0 under a key difference of $$\nabla^K_J$$. All differences are in respect to the base computation.
 * The first set of keys are keys, which fulfills the following differential-requirements over $$f$$ with respect to the base computation: $$0\xrightarrow[f]{\Delta^K_i}\Delta_i $$
 * The second set of keys are keys, which fulfills the following differential-requirements over $$f$$ with respect to the base computation: $$\nabla_j \xrightarrow[f]{\nabla^K_j}0$$
 * The keys are chosen such that the trails of the $$\Delta_i$$- and $$\nabla_j$$-differentials are independent – i.e. they do not share any active non-linear components.

Step three: Since the trails do not share any non-linear components (such as S-boxes), the trails can be combined to get: $$0\xrightarrow[f]{\Delta^K_i}\Delta_i \oplus \nabla_j \xrightarrow[f]{\nabla^K_j}0 = \nabla_j \xrightarrow[f]{\Delta^K_i \oplus \nabla^K_j}\Delta_i$$, which conforms to the definitions of both the differentials from step 2. It is trivial to see that the tuple $$(S_0, C_0, K[0,0])$$ from the base computation, also conforms by definition to both the differentials, as the differentials are in respect to the base computation. Substituting $$S_0, C_0$$ $$K[0,0]$$ into any of the two definitions, will yield $$0\xrightarrow[f]{0}0$$ since $$\Delta_0 = 0, \nabla_0 = 0$$ and $$\Delta^K_0 = 0$$. This means that the tuple of the base computation, can also be XOR'ed to the combined trails: $$S_0 \oplus \nabla_j \xrightarrow[f]{K[0,0] \oplus \Delta^K_i \oplus \nabla^K_j}C_0 \oplus \Delta_i$$

Step four: It is trivial to see that: $$S_j = S_0 \oplus \nabla_j$$ $$K[i,j] = K[0,0] \oplus \Delta^K_i \oplus \nabla^K_j$$ $$C_i = C_0 \oplus \Delta_i $$ If this is substituted into the above combined differential trails, the result will be: $$S_j \xrightarrow[f]{K[i,j]}C_i$$ Which is the same as the definition, there was earlier had above for a biclique: $$\forall i,j : S_j \xrightarrow[f]{K[i,j]}C_i $$

It is thus possible to create a biclique of size $$2^{2d}$$ ($$2^{2d}$$ since all $$2^d$$ keys of the first set of keys, can be combined with the $$2^d$$ keys from the second set of keys). This means a biclique of size $$2^{2d}$$ can be created using only $$2*2^d$$ computations of the differentials $$\Delta_i$$ and $$\nabla_j$$ over $$f$$. If $$\Delta_i \neq \nabla_j$$ for $$i+j>0$$ then all of the keys $$K[i,j]$$ will also be different in the biclique.

This way is how the biclique is constructed in the leading biclique attack on AES. There are some practical limitations in constructing bicliques with this technique. The longer the biclique is, the more rounds the differential trails has to cover. The diffusion properties of the cipher, thus plays a crucial role in the effectiveness of constructing the biclique.

Other ways of constructing the biclique
Bogdanov, Khovratovich and Rechberger also describe another way to construct the biclique, called 'Interleaving Related-Key Differential Trails' in the article: "Biclique Cryptanalysis of the Full AES ".

Biclique Cryptanalysis procedure
Step one: The attacker groups all possible keys into key-subsets of size $$2^{2d}$$ for some $$d$$, where the key in a group is indexed as $$K[i,j]$$ in a matrix of size $$2^d \times 2^d$$. The attacker splits the cipher into two sub-ciphers, $$f$$ and $$g$$ (such that $$E = f \circ g$$), as in a normal MITM attack. The set of keys for each of the sub-ciphers is of cardinality $$2^d$$, and is called $$K[i,0]$$ and $$K[0,j]$$. The combined key of the sub-ciphers is expressed with the aforementioned matrix $$K[i, j]$$.

Step two: The attacker builds a biclique for each group of $$2^{2d}$$ keys. The biclique is of dimension-d, since it maps $$2^d$$ internal states, $$S_j$$, to $$2^d$$ ciphertexts, $$C_i$$, using $$2^{2d}$$ keys. The section "How to build the biclique" suggests how to build the biclique using "Independent related-key differentials". The biclique is in that case built using the differentials of the set of keys, $$K[i,0]$$ and $$K[0,j]$$, belonging to the sub-ciphers.

Step three: The attacker takes the $$2^d$$ possible ciphertexts, $$C_i$$, and asks a decryption-oracle to provide the matching plaintexts, $$P_i$$.

Step four: The attacker chooses an internal state, $$S_j$$ and the corresponding plaintext, $$P_i$$, and performs the usual MITM attack over $$f$$ and $$g$$ by attacking from the internal state and the plaintext.

Step five: Whenever a key-candidate is found that matches $$S_j$$ with $$P_i$$, that key is tested on another plain-/ciphertext pair. if the key validates on the other pair, it is highly likely that it is the correct key.

Example attack
The following example is based on the biclique attack on AES from the paper "Biclique Cryptanalysis of the Full AES ". The descriptions in the example uses the same terminology that the authors of the attack used (i.e. for variable names, etc). For simplicity it is the attack on the AES128 variant that is covered below. The attack consists of a 7-round MITM attack with the biclique covering the last 3 rounds.

Key partitioning
The key-space is partitioned into $$2^{112}$$ groups of keys, where each group consist of $$2^{16}$$ keys. For each of the $$2^{112}$$ groups, a unique base-key $$K[0,0]$$ for the base-computation is selected. The base-key has two specific bytes set to zero, shown in the below table (which represents the key the same way AES does in a 4x4 matrix for AES128):

\begin{bmatrix} - & - & - & 0 \\ 0 & - & - & - \\ - & - & - & - \\ - & - & - & - \end{bmatrix} $$ The remaining 14 bytes (112 bits) of the key is then enumerated. This yields $$2^{112}$$ unique base-keys; one for each group of keys. The ordinary $$2^{16}$$ keys in each group is then chosen with respect to their base-key. They are chosen such that they are nearly identical to the base-key. They only vary in 2 bytes (either the $$i$$'s or the $$j$$'s) of the below shown 4 bytes:

\begin{bmatrix} - & - & i & i \\ j & - & j & - \\ - & - & - & - \\ - & - & - & - \end{bmatrix} $$ This gives $$2^8 K[i,0]$$ and $$2^8 K[0,j]$$, which combined gives $$2^{16}$$ different keys, $$K[i,j]$$. these $$2^{16}$$ keys constitute the keys in the group for a respective base key.

Biclique construction
$$2^{112}$$ bicliques is constructed using the "Independent related-key differentials" technique, as described in the "How to construct the biclique" section. The requirement for using that technique, was that the forward- and backward-differential trails that need to be combined, did not share any active non-linear elements. How is it known that this is the case? Due to the way the keys in step 1 is chosen in relation to the base key, the differential trails $$\Delta_i$$ using the keys $$K[i,0]$$ never share any active S-boxes (which is the only non-linear component in AES), with the differential trails $$\nabla_j$$ using the key $$K[0,j]$$. It is therefore possible to XOR the differential trails and create the biclique.

MITM attack
When the bicliques are created, the MITM attack can almost begin. Before doing the MITM attack, the $$2^d$$ intermediate values from the plaintext: $$P_i\xrightarrow[]{K[i,0]}\xrightarrow[v_i]{}$$, the $$2^d$$ intermediate values from the ciphertext: $$\xleftarrow[v_j]{}\xleftarrow[]{K[0,j]}S_j$$, and the corresponding intermediate states and sub-keys $$K[i,0]$$ or $$K[0,j]$$, are precomputed and stored, however.

Now the MITM attack can be carried out. In order to test a key $$K[i,j]$$, it is only necessary to recalculate the parts of the cipher, which is known will vary between $$P_i\xrightarrow[]{K[i,0]}\xrightarrow[v_i]{}$$ and $$P_i\xrightarrow[]{K[i,j]}\xrightarrow[v_i]{}$$. For the backward computation from $$S_j$$ to $$\xleftarrow[v_j]{}$$, this is 4 S-boxes that needs to be recomputed. For the forwards computation from $$P_i$$ to $$\xrightarrow[v_i]{}$$, it is just 3 (an in-depth explanation for the amount of needed recalculation can be found in "Biclique Cryptanalysis of the full AES " paper, where this example is taken from).

When the intermediate values match, a key-candidate $$K[i,j]$$ between $$P_i$$ and $$S_j$$ is found. The key-candidate is then tested on another plain-/ciphertext pair.

Results
This attack lowers the computational complexity of AES128 to $$2^{126.18}$$, which is 3–5 times faster than a bruteforce approach. The data complexity of the attack is $$2^{88}$$ and the memory complexity is $$2^8$$.