Cyclotomic fast Fourier transform

The cyclotomic fast Fourier transform is a type of fast Fourier transform algorithm over finite fields. This algorithm first decomposes a DFT into several circular convolutions, and then derives the DFT results from the circular convolution results. When applied to a DFT over $$GF(2^m)$$, this algorithm has a very low multiplicative complexity. In practice, since there usually exist efficient algorithms for circular convolutions with specific lengths, this algorithm is very efficient.

Background
The discrete Fourier transform over finite fields finds widespread application in the decoding of error-correcting codes such as BCH codes and Reed–Solomon codes. Generalized from the complex field, a discrete Fourier transform of a sequence $$\{f_i\}_{0}^{N-1}$$ over a finite field GF(pm) is defined as


 * $$F_j=\sum_{i=0}^{N-1}f_i\alpha^{ij}, 0\le j\le N-1, $$

where $$\alpha$$ is the N-th primitive root of 1 in GF(pm). If we define the polynomial representation of $$\{f_i\}_{0}^{N-1}$$ as


 * $$f(x) = f_0+f_1x+f_2x^2+\cdots+f_{N-1}x^{N-1}=\sum_{0}^{N-1}f_ix^i,$$

it is easy to see that $$F_j$$ is simply $$f(\alpha^j)$$. That is, the discrete Fourier transform of a sequence converts it to a polynomial evaluation problem.

Written in matrix format,
 * $$\mathbf{F}=\left[\begin{matrix}F_0\\F_1\\ \vdots \\ F_{N-1}\end{matrix}\right]=

\left[\begin{matrix} \alpha^0&\alpha^0 &\cdots & \alpha^0\\ \alpha^0 & \alpha^1 &\cdots &\alpha^{N-1}\\ \vdots &\vdots & \ddots & \vdots \\ \alpha^{0} & \alpha^{N-1} &\cdots & \alpha^{(N-1)(N-1)} \end{matrix}\right] \left[\begin{matrix}f_0\\f_1\\\vdots\\f_{N-1}\end{matrix}\right]=\mathcal{F}\mathbf{f}. $$

Direct evaluation of DFT has an $$O(N^2)$$ complexity. Fast Fourier transforms are just efficient algorithms evaluating the above matrix-vector product.

Algorithm
First, we define a linearized polynomial over GF(pm) as


 * $$L(x) = \sum_{i} l_i x^{p^i}, l_i \in \mathrm{GF}(p^m).$$

$$L(x)$$ is called linearized because $$L(x_1+x_2) = L(x_1) + L(x_2)$$, which comes from the fact that for elements $$x_1,x_2 \in \mathrm{GF}(p^m),$$$$(x_1+x_2)^p=x_1^p+x_2^p.$$

Notice that $$p$$ is invertible modulo $$N$$ because $$N$$ must divide the order $$p^m-1$$ of the multiplicative group of the field $$\mathrm{GF}(p^m)$$. So, the elements $$\{0, 1, 2, \ldots, N-1\}$$ can be partitioned into $$l+1$$ cyclotomic cosets modulo $$N$$:
 * $$\{0\},$$
 * $$\{k_1, pk_1, p^2k_1, \ldots, p^{m_1-1}k_1\},$$
 * $$\ldots,$$
 * $$\{k_l, pk_l, p^2k_l, \ldots, p^{m_l-1}k_l\},$$

where $$k_i=p^{m_i}k_i \pmod{N}$$. Therefore, the input to the Fourier transform can be rewritten as


 * $$f(x)=\sum_{i=0}^l L_i(x^{k_i}),\quad L_i(y) = \sum_{t=0}^{m_i-1}y^{p^t}f_{p^tk_i\bmod{N}}.$$

In this way, the polynomial representation is decomposed into a sum of linear polynomials, and hence $$F_j$$ is given by
 * $$F_j=f(\alpha^j)=\sum_{i=0}^lL_i(\alpha^{jk_i})$$.

Expanding $$\alpha^{jk_i} \in \mathrm{GF}(p^{m_i})$$ with the proper basis $$\{\beta_{i,0}, \beta_{i,1}, \ldots, \beta_{i,m_i-1}\}$$, we have $$\alpha^{jk_i} = \sum_{s=0}^{m_i-1}a_{ijs}\beta_{i,s}$$ where $$a_{ijs} \in \mathrm{GF}(p)$$, and by the property of the linearized polynomial $$L_i(x)$$, we have


 * $$F_j=\sum_{i=0}^l\sum_{s=0}^{m_i-1}a_{ijs}\left(\sum_{t=0}^{m_i-1}\beta_{i,s}^{p^t}f_{p^{t}k_i\bmod{N}}\right)$$

This equation can be rewritten in matrix form as $$\mathbf{F}=\mathbf{AL\Pi f}$$, where $$\mathbf{A}$$ is an $$N\times N$$ matrix over GF(p) that contains the elements $$a_{ijs}$$, $$\mathbf{L}$$ is a block diagonal matrix, and $$\mathbf{\Pi}$$ is a permutation matrix regrouping the elements in $$\mathbf{f}$$ according to the cyclotomic coset index.

Note that if the normal basis $$\{\gamma_i^{p^0}, \gamma_i^{p^1}, \cdots, \gamma_i^{p^{m_i-1}}\} $$ is used to expand the field elements of $$\mathrm{GF}(p^{m_i})$$, the i-th block of $$\mathbf{L}$$ is given by:
 * $$\mathbf{L}_i=

\begin{bmatrix} \gamma_i^{p^0} &\gamma_i^{p^1}  &\cdots  &\gamma_i^{p^{m_i-1}}\\ \gamma_i^{p^1} &\gamma_i^{p^2}  &\cdots  &\gamma_i^{p^{0}}\\ \vdots & \vdots & \ddots & \vdots\\ \gamma_i^{p^{m_i-1}} &\gamma_i^{p^0}  &\cdots  &\gamma_i^{p^{m_i-2}}\\ \end{bmatrix} $$ which is a circulant matrix. It is well known that a circulant matrix-vector product can be efficiently computed by convolutions. Hence we successfully reduce the discrete Fourier transform into short convolutions.

Complexity
When applied to a characteristic-2 field GF(2m), the matrix $$\mathbf{A}$$ is just a binary matrix. Only addition is used when calculating the matrix-vector product of $$\mathrm{A}$$ and $$\mathrm{L\Pi f}$$. It has been shown that the multiplicative complexity of the cyclotomic algorithm is given by $$O(n(\log_2n)^{\log_2\frac{3}{2}})$$, and the additive complexity is given by $$O(n^2/(\log_2 n)^{\log_2\frac{8}{3}})$$.