Neural cryptography

Neural cryptography is a branch of cryptography dedicated to analyzing the application of stochastic algorithms, especially artificial neural network algorithms, for use in encryption and cryptanalysis.

Definition
Artificial neural networks are well known for their ability to selectively explore the solution space of a given problem. This feature finds a natural niche of application in the field of cryptanalysis. At the same time, neural networks offer a new approach to attack ciphering algorithms based on the principle that any function could be reproduced by a neural network, which is a powerful proven computational tool that can be used to find the inverse-function of any cryptographic algorithm.

The ideas of mutual learning, self learning, and stochastic behavior of neural networks and similar algorithms can be used for different aspects of cryptography, like public-key cryptography, solving the key distribution problem using neural network mutual synchronization, hashing or generation of pseudo-random numbers.

Another idea is the ability of a neural network to separate space in non-linear pieces using "bias". It gives different probabilities of activating the neural network or not. This is very useful in the case of Cryptanalysis.

Two names are used to design the same domain of research: Neuro-Cryptography and Neural Cryptography.

The first work that it is known on this topic can be traced back to 1995 in an IT Master Thesis.

Applications
In 1995, Sebastien Dourlens applied neural networks to cryptanalyze DES by allowing the networks to learn how to invert the S-tables of the DES. The bias in DES studied through Differential Cryptanalysis by Adi Shamir is highlighted. The experiment shows about 50% of the key bits can be found, allowing the complete key to be found in a short time. Hardware application with multi micro-controllers have been proposed due to the easy implementation of multilayer neural networks in hardware.

One example of a public-key protocol is given by Khalil Shihab. He describes the decryption scheme and the public key creation that are based on a backpropagation neural network. The encryption scheme and the private key creation process are based on Boolean algebra. This technique has the advantage of small time and memory complexities. A disadvantage is the property of backpropagation algorithms: because of huge training sets, the learning phase of a neural network is very long. Therefore, the use of this protocol is only theoretical so far.

Neural key exchange protocol
The most used protocol for key exchange between two parties A and B in the practice is Diffie–Hellman key exchange protocol. Neural key exchange, which is based on the synchronization of two tree parity machines, should be a secure replacement for this method. Synchronizing these two machines is similar to synchronizing two chaotic oscillators in chaos communications.

Tree parity machine
The tree parity machine is a special type of multi-layer feedforward neural network.

It consists of one output neuron, K hidden neurons and K×N input neurons. Inputs to the network take three values:
 * $$x_{ij} \in \left\{ -1,0,+1 \right\}$$

The weights between input and hidden neurons take the values:
 * $$w_{ij} \in \left\{-L,...,0,...,+L \right\}$$

Output value of each hidden neuron is calculated as a sum of all multiplications of input neurons and these weights:
 * $$\sigma_i=\sgn(\sum_{j=1}^{N}w_{ij}x_{ij})$$

Signum is a simple function, which returns −1,0 or 1:
 * $$\sgn (x) = \begin{cases}

-1 & \text{if } x < 0, \\ 0 & \text{if } x = 0, \\ 1 & \text{if } x > 0. \end{cases}$$

If the scalar product is 0, the output of the hidden neuron is mapped to −1 in order to ensure a binary output value. The output of neural network is then computed as the multiplication of all values produced by hidden elements:
 * $$\tau=\prod_{i=1}^{K}\sigma_i$$

Output of the tree parity machine is binary.

Protocol
Each party (A and B) uses its own tree parity machine. Synchronization of the tree parity machines is achieved in these steps
 * 1) Initialize random weight values
 * 2) Execute these steps until the full synchronization is achieved
 * 3) Generate random input vector X
 * 4) Compute the values of the hidden neurons
 * 5) Compute the value of the output neuron
 * 6) Compare the values of both tree parity machines
 * 7) Outputs are the same: one of the suitable learning rules is applied to the weights
 * 8) Outputs are different: go to 2.1

After the full synchronization is achieved (the weights wij of both tree parity machines are same), A and B can use their weights as keys. This method is known as a bidirectional learning. One of the following learning rules can be used for the synchronization:
 * Hebbian learning rule:
 * $$w_i^+=g(w_i+\sigma_ix_i\Theta(\sigma_i\tau)\Theta(\tau^A\tau^B))$$


 * Anti-Hebbian learning rule:
 * $$w_i^+=g(w_i-\sigma_ix_i\Theta(\sigma_i\tau)\Theta(\tau^A\tau^B))$$


 * Random walk:
 * $$w_i^+=g(w_i+x_i\Theta(\sigma_i\tau)\Theta(\tau^A\tau^B))$$

Where:
 * $$\Theta(a,b)=0$$ if $$a \ne b$$ otherwise $$\Theta(a,b)=1$$

And:
 * $$g(x)$$ is a function that keeps the $$w_i$$ in the range $$\{-L, -L+1,...,0,...,L-1,L\}$$

Attacks and security of this protocol
In every attack it is considered, that the attacker E can eavesdrop messages between the parties A and B, but does not have an opportunity to change them.

Brute force
To provide a brute force attack, an attacker has to test all possible keys (all possible values of weights wij). By K hidden neurons, K×N input neurons and boundary of weights L, this gives (2L+1)KN possibilities. For example, the configuration K = 3, L = 3 and N = 100 gives us 3*10253 key possibilities, making the attack impossible with today's computer power.

Learning with own tree parity machine
One of the basic attacks can be provided by an attacker, who owns the same tree parity machine as the parties A and B. He wants to synchronize his tree parity machine with these two parties. In each step there are three situations possible: It has been proven, that the synchronization of two parties is faster than learning of an attacker. It can be improved by increasing of the synaptic depth L of the neural network. That gives this protocol enough security and an attacker can find out the key only with small probability.
 * 1) Output(A) ≠ Output(B): None of the parties updates its weights.
 * 2) Output(A) = Output(B) = Output(E): All the three parties update weights in their tree parity machines.
 * 3) Output(A) = Output(B) ≠ Output(E): Parties A and B update their tree parity machines, but the attacker can not do that. Because of this situation his learning is slower than the synchronization of parties A and B.

Other attacks
For conventional cryptographic systems, we can improve the security of the protocol by increasing of the key length. In the case of neural cryptography, we improve it by increasing of the synaptic depth L of the neural networks. Changing this parameter increases the cost of a successful attack exponentially, while the effort for the users grows polynomially. Therefore, breaking the security of neural key exchange belongs to the complexity class NP.

Alexander Klimov, Anton Mityaguine, and Adi Shamir say that the original neural synchronization scheme can be broken by at least three different attacks—geometric, probabilistic analysis, and using genetic algorithms. Even though this particular implementation is insecure, the ideas behind chaotic synchronization could potentially lead to a secure implementation.

Permutation parity machine
The permutation parity machine is a binary variant of the tree parity machine.

It consists of one input layer, one hidden layer and one output layer. The number of neurons in the output layer depends on the number of hidden units K. Each hidden neuron has N binary input neurons:
 * $$x_{ij} \in \left\{ 0,1 \right\}$$

The weights between input and hidden neurons are also binary:
 * $$w_{ij} \in \left\{0,1 \right\}$$

Output value of each hidden neuron is calculated as a sum of all exclusive disjunctions (exclusive or) of input neurons and these weights:


 * $$\sigma_i=\theta_N(\sum_{j=1}^{N}w_{ij}\oplus x_{ij})$$

(⊕ means XOR).

The function $$\theta_N(x)$$ is a threshold function, which returns 0 or 1:
 * $$\theta_N(x) = \begin{cases}

0 & \text{if } x \leq N/2, \\ 1 & \text{if } x > N/2. \end{cases}$$

The output of neural network with two or more hidden neurons can be computed as the exclusive or of the values produced by hidden elements:
 * $$\tau=\bigoplus_{i=1}^{K}\sigma_i$$

Other configurations of the output layer for K>2 are also possible.

This machine has proven to be robust enough against some attacks so it could be used as a cryptographic mean, but it has been shown to be vulnerable to a probabilistic attack.

Security against quantum computers
A quantum computer is a device that uses quantum mechanisms for computation. In this device the data are stored as qubits (quantum binary digits). That gives a quantum computer in comparison with a conventional computer the opportunity to solve complicated problems in a short time, e.g. discrete logarithm problem or factorization. Algorithms that are not based on any of these number theory problems are being searched because of this property.

Neural key exchange protocol is not based on any number theory. It is based on the difference between unidirectional and bidirectional synchronization of neural networks. Therefore, something like the neural key exchange protocol could give rise to potentially faster key exchange schemes.