Draft:Fiat-Naor Algorithm

Fiat-Naor algorithm is an algorithm proposed by Amos Fiat and Moni Naor for black-box function inversion with preprocessing. In this problem, a computationally unbounded preprocessing algorithm is given a function $$f: [N] \to [N]$$ and outputs an advice of $$S$$ bits. Given the advice and the black-box access to $$f$$, an online algorithm is asked to invert $$f$$ at some point $$y$$ in time $$T$$. The goal is to obtain the best tradeoff between $$S$$ and $$T$$. Fiat-Naor algorithm works for every $$S$$ and $$T$$ satisfying $$S^3 T = \tilde{\Omega}(N^3)$$ (where the $$\tilde{\Omega}$$-notation hides a poly-logarithmic factor). This is essentially the best known tradeoff for this problem so far.

Problem Description
The task of black-box function inversion with preprocessing is as follows. In the preprocessing stage, a computationally unbounded preprocessing algorithm $$\mathcal{P}$$ is given a function $$f: [N] \to [N]$$, where $$[N]$$ denotes the set $$\{1, 2, \dotsc, N\}$$, and outputs an advice $$\sigma$$ of $$S$$ bits. In the online stage, an online algorithm $$\mathcal{A}$$ is given the black-box access to $$f$$, the advice $$\sigma$$, and some point $$y$$ in the image of $$f$$ and is asked to find $$x$$ such that $$f(x) = y$$ in time $$T$$. Black-box access to $$f$$ allows $$\mathcal{A}$$ to query any $$z \in [N]$$ and obtain $$w = f(z)$$.

Naive Solutions
There are two naive solutions.

Combining two solutions we get an algorithm working for every $$S + T = \tilde{\Theta} (N)$$ in general.
 * $$\mathcal{P}$$ does nothing. $$\mathcal{A}$$ makes different queries to $$f$$ until it finds an $$x$$ such that $$f(x) = y$$. This gives us $$S = \Theta (1)$$ and $$T = \Theta (N)$$.
 * $$\mathcal{P}$$ outputs a look-up table that stores every $$y$$ in the image of $$f$$ with a preimage. $$\mathcal{A}$$ just searches in the table. This gives us $$S = \Theta (N)$$ and $$T = \tilde{\Theta} (1)$$.

Inverting Random Functions
Martin Hellman was the first who studied inverting functions with preprocessing. His algorithm, later made rigorous by Fiat and Naor, can invert a random function $$f$$ with high probability over the choice of $$f$$ for every $$S$$ and $$T$$ satisfying $$S^2 T = \tilde{\Omega}(N^2)$$. However, it does not work with arbitrary functions.

Lower Bound
Andrew Yao proved a lower bound of $$ST = \tilde{\Omega}(N)$$ for function inversion.

Algorithm Description
Fiat-Naor algorithm inverts arbitrary function $$f$$ at any image with probability $$1 - O(1/N)$$ over the randomness of preprocessing algorithm $$\mathcal{P}$$. The algorithm works as follows.


 * Preprocessing algorithm $$\mathcal{P}(f)$$: Uniformly choose $$z_1, \dotsc, z_l$$ from $$[N]$$ and store $$(z_1, f(z_1)), \dotsc, (z_l, f(z_l))$$ in a look-up table $$L$$. Define $$D = \{x \in [N]: f(x) \ne f(z_1), \dotsc, f(z_l)\}$$. Let $$t = \lceil |D| \log N / l \rceil$$ and $$J = \lceil 2N\log N / | D | \rceil$$. Choose $$g_1, \dotsc, g_r: [N] \times [J] \to [ N ]$$ from a hash function family $$G$$. Run subroutine $$\mathcal{P}'$$ given $$D, J, t, g_i$$ for $$i = 1, \dotsc, r$$ to obtain $$C_1, \dotsc, C_r$$. The advice $$\sigma$$ contains $$L, |D|, g_1, C_1, \dotsc, g_r, C_r$$.
 * Subroutine $$\mathcal{P}'(D, J, t, g)$$: Define function $$g^*: [N] \to D$$ as $$g^*(x) = g(x, j)$$ where $$j$$ is the smallest number in $$[J]$$ such that $$g(x, j) \in D$$. Let function $$h = g^* \circ f$$. Uniformly choose $$w_1, \dotsc, w_m$$ from $$D$$. Store $$(w_1, h^t(w_1)) \dotsc, (w_m, h^t(w_m))$$ in a look-up table $$C$$. Here $$h^t(w)$$ denotes the result of iteratively evaluating $$h$$ for $$t$$ times starting from $$w$$.


 * Online algorithm $$\mathcal{A}^f(\sigma, y)$$: Search in the look-up table $$L$$ for an entry in the form of $$(z, y)$$. If such an entry exists, output $$z$$. Otherwise, calculate $$t$$ and $$J$$ from $$|D|$$ as in $$\mathcal{P}$$. Run subroutine $$\mathcal{A}'$$ given $$L, J, t, g_i, C_i, y$$ for $$i = 1, \dotsc, r$$. If a solution is found in the subroutine, output it.
 * Subroutine $$\mathcal{A}'^f(L, J, t, g, C, y)$$: Define $$g^*$$ and $$h$$ as in $$\mathcal{P}'$$. Note that evaluating $$g^*$$ requires checking whether $$g(x, j) \in D$$. $$\mathcal{A}'$$ can do so by checking if $$f(g(x, j))$$ is stored as an image in $$L$$. Starting from $$f(g(y))$$, iteratively evaluate $$h$$ for $$t$$ steps. In each step, if reaching some $$h^t(w_i)$$ stored in $$C$$, immediately jump to the corresponding $$w_i$$. If reaching $$f(g(y))$$ again, output the immediate predecessor.

Parameters
The algorithm takes a space parameter $$\tilde{S}$$ and a time parameter $$\tilde{T}$$, which will be different from the actual space $$S$$ and $$T$$ by some poly-logarithmic factor in $$N$$, respectively. $$\tilde{S}$$ and $$\tilde{T}$$ decide other parameters as follows, where notation $$\approx$$ hides constant factors.


 * $$l \approx \tilde{S} \log N$$
 * $$m \approx N / \tilde{T}$$
 * $$r \approx \tilde{S} \tilde{T} \log N / N$$

$$G$$ is required to be a family of $$k$$-wise independent hash functions for some $$k \approx N / \tilde{S}$$, each with a $$\tilde{O}(1)$$-space description. Functions $$g_1, \dotsc, g_r$$ are chosen in some way such that they are pairwise independent, and the evaluation of these function takes amortized $$\tilde{O}(1)$$ time. Fiat and Naor gave such a construction, in which these functions are not fully independent to achieve the amortized $$\tilde{O}(1)$$ time.

Subroutines: Hellman's Algorithm
The subroutines in Fiat-Naor algorithm is built on Hellman's algorithm for inverting random function.

The main idea of preprocessing is to build $$m$$ chains of length $$t$$ in the form of $$w, f(w), f^2(w), \dotsc, f^t(w)$$ and store the starting point $$w$$ and ending point $$f^t(w)$$ in a loop-up table $$C$$ as the advice. On input $$y$$, the online algorithm iteratively evaluate $$f$$. In each step, if it reaches an endpoint $$f^t(w)$$ of some chain, it immediately jump back to the starting point $$w$$. If $$y$$ is covered by some chain (not as a starting point), then the algorithm will reach $$y$$ again in $$t$$ step and find the preimage, which is the immediate predecessor of $$y$$ is this chain.

Probability analysis shows that with $$mt^2 = O(N)$$, we can ensure that there are few enough collisions in $$m$$ chains with randomly chosen starting points, so that these chains cover $$\Theta (mt)$$ distinct values. This allows inverting $$f$$ at a $$\Theta(mt/N)$$-fraction of points.

It remains to amplify the success probability. For this purpose, the above algorithm is not directly applied to $$f$$, but to $$h = g \circ f$$ re-randomized by a random function $$g$$. Then by repeating $$r \approx N \log N / mt$$ times with independently chosen $$g_1, \dotsc, g_r$$, every possible input $$y$$ is covered at least once with probability $$1 - O(1/N)$$. Hellman's algorithm uses space $$S = \tilde{O}(rm) = \tilde{O}(N/t)$$ and $$T = \tilde{O}(rt) = \tilde{O}(N/m)$$ (where $$\tilde{O}$$-notation hides poly-logarithmic factors in $$N$$), so $$S^2 T = \tilde{\Omega}(N^2)$$ is sufficient to let $$mt^2 = O(N)$$.

Since a random function $$g$$ is unlikely to have a succinct description, Hellman suggested heauristically using some function family $$G$$. Fiat and Naor made this argument rigorous with their construction of $$G$$. They also showed that in general, for every function $$f$$ where every image has at most $$N/q$$ preimages, Hellman's algorithm works if $$mt^2 \approx q$$ and thus for every $$S$$ and $$T$$ satisfying $$S^2 T = \tilde{\Omega}(qN^2)$$.

Handle Images with Many Preimages
Hellman's algorithm does not obtain a good tradeoff when there are some elements with many preimages, which makes $$q$$ big. This case has to be handled to invert arbitrary functions.

The look-up table $$L$$ is exactly for this purpose. It contains random elements and their images. These images can be inverted by searching in $$L$$. It remains to handle images not included in $$L$$, that is function $$f$$ over the remaining domain $$D = \{x \in [N]: f(x) \ne f(z_1), \dotsc, f(z_l)\}$$. Since $$l \approx \tilde{S} \log N$$, with probability $$1 - O(1/N)$$, every image of $$f$$ with more than $$N/\tilde{S}$$ preimages are contained in $$L$$. Therefore, the remaining images have relatively few preiamges and can be inverted using Hellman's algorithm.

To apply Hellman's algorithm over a partial domain $$D$$, we use a function $$g*: [N] \to D$$ to re-randomize $$f$$. Then function $$h = g^* \circ f$$ becomes a function inside $$D$$. Note that hashing to arbitrary subset $$D$$ is difficult. For this reason, $$g^*$$ is designed as $$g^*(x) = g(x, j)$$ where $$j$$ is the smallest number in $$[J]$$ such that $$g(x, j) \in D$$. The online algorithm has to query $$f(g(x, j))$$ to check whether $$g(x, j) \in D$$, so it makes at most $$J$$ more queries to evaluate $$h$$.

Since every image of $$D$$ has at most $$\tilde{S}$$, setting $$mt^2 = O(\tilde{S})$$ can guarantee that $$\Theta(mt)$$ elements of $$D$$ are covered in each subroutine. Thus $$r \approx \tilde{S} \tilde{T} \log N / N \approx |D| \log N/mt$$ repetitions can amplify the success probability for every image to $$1 - O(1/N)$$. The actual space and time are $$S = O(l + rm) = \tilde{O}(\tilde{S})$$ and $$T = \tilde{O}(rtJ) = \tilde{O}(\tilde{T})$$. By setting $$S^3 T = \tilde{\Omega}(N^3)$$, it is guaranteed that $$mt^2 \approx N / \tilde{T} \cdot (|D| \log N / l)^2 \approx N/\tilde{T} \cdot (|D| / \tilde{S})^2 < N^3/(\tilde{T} \tilde{S}^2) = \tilde{O}(S).$$

Applications
The line of research in function inversion has developed practical tools for cryptanalysis, such as Rainbow table. Because of the generality of function inversion, Fiat-Naor algorithm can be applied to other data structure problems such as string searching and 3SUM-Indexing. It is also applied to computational complexity to beat some conjectured lower bounds.