Elliptic curve only hash

The elliptic curve only hash (ECOH) algorithm was submitted as a candidate for SHA-3 in the NIST hash function competition. However, it was rejected in the beginning of the competition since a second pre-image attack was found.

The ECOH is based on the MuHASH hash algorithm, that has not yet been successfully attacked. However, MuHASH is too inefficient for practical use and changes had to be made. The main difference is that where MuHASH applies a random oracle, ECOH applies a padding function. Assuming random oracles, finding a collision in MuHASH implies solving the discrete logarithm problem. MuHASH is thus a provably secure hash, i.e. we know that finding a collision is at least as hard as some hard known mathematical problem.

ECOH does not use random oracles and its security is not strictly directly related to the discrete logarithm problem, yet it is still based on mathematical functions. ECOH is related to the Semaev's problem of finding low degree solutions to the summation polynomial equations over binary field, called the Summation Polynomial Problem. An efficient algorithm to solve this problem has not been given so far. Although the problem was not proven to be NP-hard, it is assumed that such an algorithm does not exist. Under certain assumptions, finding a collision in ECOH may be also viewed as an instance of the subset sum problem. Besides solving the Summation Polynomial Problem, there exists another way how to find second pre-images and thus collisions, Wagner's generalized birthday attack.

ECOH is a good example of hash function that is based on mathematical functions (with the provable security approach) rather than on classical ad hoc mixing of bits to obtain the hash.

The algorithm
Given $$n$$, ECOH divides the message $$M$$ into $$n$$ blocks $$M_0,\ldots,M_{n-1}$$. If the last block is incomplete, it is padded with single 1 and then appropriate number of 0. Let furthermore $$P$$ be a function that maps a message block and an integer to an elliptic curve point. Then using the mapping $$P$$, each block is transformed to an elliptic curve point $$P_i$$, and these points are added together with two more points. One additional point $$X_1$$ contains the padding and depends only on the message length. The second additional point $$X_2$$ depends on the message length and the XOR of all message blocks. The result is truncated to get the hash $$H$$.

$$\begin{align} P_i &{}:= P(M_i,i)\\ X_1 &{}:= P'(n) \\ X_2 &{}:= P^*(M_i, n)\\ Q &{}:= \sum_{i=0}^{n-1} P_i + X_1 + X_2\\ R &{}:= f(Q) \end{align}$$

The two extra points are computed by $$P'$$ and $$P^*$$. $$Q$$ adds all the elliptic curve points and the two extra points together. Finally, the result is passed through an output transformation function f to get the hash result $$R$$. To read more about this algorithm, see "ECOH: the Elliptic Curve Only Hash".

Examples
Four ECOH algorithms were proposed, ECOH-224, ECOH-256, ECOH-384 and ECOH-512. The number represents the size of the message digest. They differ in the length of parameters, block size and in the used elliptic curve. The first two uses the elliptic curve B-283: $$ X^{283} + X^{12} + X^7 + X^5 + 1 $$, with parameters (128, 64, 64). ECOH-384 uses the curve B-409: $$ X^{409} + X^{87} + 1 $$, with parameters (192, 64, 64). ECOH-512 uses the curve B-571: $$ X^{571} + X^{10} + X^5 + X^2 + 1 $$, with parameters (256, 128, 128). It can hash messages of bit length up to $$ 2^{128} $$.

Properties

 * Incrementality: ECOH of a message can be updated quickly, given a small change in the message and an intermediate value in ECOH computation.
 * Parallelizability: This means the computation of the $$ P_i's $$ can be done on parallel systems.
 * Speed: The ECOH algorithm is about thousand times slower than SHA-1. However, given the developments in desktop hardware towards parallelization and carryless multiplication, ECOH may in a few years be as fast as SHA-1 for long messages. For short messages, ECOH is relatively slow, unless extensive tables are used.

Security of ECOH
The ECOH hash functions are based on concrete mathematical functions. They were designed such that the problem of finding collisions should be reducible to a known and hard mathematical problem (the subset sum problem). It means that if one could find collisions, one would be able to solve the underlying mathematical problem which is assumed to be hard and unsolvable in polynomial time. Functions with these properties are known provably secure and are quite unique among the rest of hash functions. Nevertheless, second pre-image (and thus a collision) was later found because the assumptions given in the proof were too strong.

Semaev Summation Polynomial
One way of finding collisions or second pre-images is solving Semaev Summation Polynomials. For a given elliptic curve E, there exists polynomials $$ f_n $$ that are symmetric in $$n$$ variables and that vanish exactly when evaluated at abscissae of points whose sum is 0 in $$E$$. So far, an efficient algorithm to solve this problem does not exist and it is assumed to be hard (but not proven to be NP-hard).

More formally: Let $$\mathbf{F}$$ be a finite field, $$E$$ be an elliptic curve with Weierstrass equation having coefficients in $$\mathbf{F}$$ and $$O$$ be the point of infinity. It is known that there exists a multivariable polynomial $$f_n(X_1,\ldots,X_N)$$ if and only if there exist < $$y_1,\ldots,y_n$$ such that $$(x_1,y_1)+\ldots+(x_n,y_n) = O$$. This polynomial has degree $$2^{n-2}$$ in each variable. The problem is to find this polynomial.

Provable security discussion
The problem of finding collisions in ECOH is similar to the subset sum problem. Solving a subset sum problem is almost as hard as the discrete logarithm problem. It is generally assumed that this is not doable in polynomial time. However a significantly loose heuristic must be assumed, more specifically, one of the involved parameters in the computation is not necessarily random but has a particular structure. If one adopts this loose heuristic, then finding an internal ECOH collision may be viewed as an instance of the subset sum problem.

A second pre-image attack exists in the form of generalized birthday attack.

Second pre-image attack
Description of the attack: This is a Wagner’s Generalized Birthday Attack. It requires 2143 time for ECOH-224 and ECOH-256, 2206 time for ECOH-384, and 2287 time for ECOH-512. The attack sets the checksum block to a fixed value and uses a collision search on the elliptic curve points. For this attack we have a message M and try to find a M' that hashes to the same message. We first split the message length into six blocks. So $$ M'= (M_1,M_2,M_3,M_4,M_5,M_6)$$. Let K be a natural number. We choose K different numbers for $$(M_0,M_1)$$ and define $$M_2$$ by $$ M_2 := M_0 + M_1 $$. We compute the K corresponding elliptic curve points $$P(M_0,0) + P(M_1,1) + P(M_2,2)$$ and store them in a list. We then choose K different random values for $$(M_3,M_4)$$, define $$ M_5 := M_3 + M_4 $$, we compute $$ Q - X_1 - X_2 - P(M_3,3) - P(M_4,4) - P(M_5, 5)$$, and store them in a second list. Note that the target Q is known. $$X_1$$ only depends on the length of the message which we have fixed. $$X_2$$ depends on the length and the XOR of all message blocks, but we choose the message blocks such that this is always zero. Thus, $$X_2$$ is fixed for all our tries.

If K is larger than the square root of the number of points on the elliptic curve then we expect one collision between the two lists. This gives us a message $$(M_1,M_2,M_3,M_4,M_5,M_6)$$ with: $$ Q = \sum_{i=0}^5 P(M_i,i) + X_1 + X_2 $$ This means that this message leads to the target value Q and thus to a second preimage, which was the question. The workload we have to do here is two times K partial hash computations. For more info, see "A Second Pre-image Attack Against Elliptic Curve Only Hash (ECOH)".

Actual parameters:
 * ECOH-224 and ECOH-256 use the elliptic curve B-283 with approximately $$2^{283}$$ points on the curve. We choose $$K = 2^{142}$$ and get an attack with complexity $$2^{143}$$.
 * ECOH-384 uses the elliptic curve B-409 with approximately $$2^{409}$$ points on the curve. Choosing $$K=2^{205}$$ gives an attack with complexity $$2^{206}.$$
 * ECOH-512 uses the elliptic curve B-571 with approximately $$2^{571}$$ points on the curve. Choosing $$K=2^{286}$$ gives an attack with complexity $$2^{287}.$$

ECOH2
The official comments on ECOH included a proposal called ECOH2 that doubles the elliptic curve size in an effort to stop the Halcrow-Ferguson second preimage attack with a prediction of improved or similar performance.