User:A2831408/Winograd Fast Fourier Transform Algorithm

Winograd Fast Fourier Transform Algorithm (Winograd FFT) is proposed by Shmuel Winograd in 1978. Winograd FFT algorithm uses about the same number of additions to Cooley–Tukey FFT algorithm, while using significantly fewer multiplications.

For example, using the matrix representation to express the Discrete Fourier transform(DFT) formula:

$$\begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_{n-1} \end{bmatrix}= \begin{bmatrix} 1 &  1 & 1 & \cdots & 1 & 1 \\ 1 & w & w^2 & \cdots & w^{n-2} & w^{n-1}\\ \vdots & \vdots  & \vdots  & \cdots  & \vdots  & \vdots \\ 1 & w^{n-1} & w^{2(n-1)} & \cdots & w^{(n-2)(n-1)} & w^{(n-1)(n-1)}\end{bmatrix}\begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_{n-1}  \end{bmatrix}, \quad w=e^{-j\begin{matrix} \frac{2\pi}{n} \end{matrix}}$$

If n is a prime, then we can remove the first row and the first column and use the algorithm for n-1 point cyclic convolution instead of the algorithm for n point DFT.

n is prime
For example, n=5:

$$\begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1  & 1 & 1 \\ 1 & w & w^2 & w^3 & w^4 \\ 1 & w^2  & w^4  & w  & w^3  \\ 1 & w^3 & w & w^4 & w^2 \\ 1 & w^4 & w^3 & w^2 & w\end{bmatrix}\begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}, \qquad w=e^{\frac{-j2\pi}{5} }$$

We can rewrite the transform as

$$\begin{bmatrix} y_1-x_0 \\ y_2-x_0 \\ y_3-x_0 \\ y_4-x_0 \end{bmatrix} = \begin{bmatrix}w & w^2 & w^3 & w^4 \\ w^2 & w^4  & w  & w^3  \\ w^3 & w & w^4 & w^2 \\ w^4 & w^3 & w^2 & w\end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}$$

Arrange the row in the order 1, 2, 4, 3 and the column in the order 1, 3, 4, 2.

$$\begin{bmatrix} y_1-x_0 \\ y_2-x_0 \\ y_4-x_0 \\ y_3-x_0 \end{bmatrix} = \begin{bmatrix} w & w^3 & w^4 & w^2 \\ w^2  & w  & w^3  & w^4  \\ w^4 & w^2 & w & w^3 \\ w^3 & w^4 & w^2 & w\end{bmatrix}\begin{bmatrix}  x_1 \\ x_3 \\ x_4 \\ x_2 \end{bmatrix}\quad$$

This is the standard form of cyclic convolution and we can use the algorithm for 4 point cyclic convolution to compute the transform.

n is the power of a prime
In the case $$n=p^r$$, we can permute the rows and columns of the matrix by the following order:


 * can be divided by $$p^r$$
 * can be divided by $$p^{r-1}$$


 * can be divided by $$p^0$$
 * can be divided by $$p^0$$

Then we can find a block that is a cyclic matrix in the transform matrix so that we can apply cyclic convolution.

For example, n=9:

$$\begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \\y_5 \\ y_6 \\ y_7 \\ y_8\end{bmatrix} = \begin{bmatrix} 1 & 1 & 1  & 1 & 1 &  1 & 1  & 1 & 1\\ 1 & w & w^2 & w^3 & w^4 & w^5 & w^6 & w^7 & w^8  \\ 1 & w^2 & w^4  & w^6  & w^8 & w & w^3 & w^5 & w^7  \\ 1 & w^3 & w^6 & 1 & w^3 & w^6 &1 & w^3 & w^6 \\ 1 & w^4 & w^8 & w^3 & w^7 & w^2 & w^6 & w & w^5\\ 1 & w^5 & w & w^6 & w^2 & w^7 & w^3 & w^8 & w^4\\ 1 & w^6 & w^3 & 1 & w^6 & w^3 &1 & w^6 & w^3 \\ 1 & w^7 & w^5 & w^3 & w & w^8 & w^6 & w^4 & w^2\\ 1 & w^8 & w^7 & w^6 & w^5 & w^4 & w^3 & w^2 &w\\ \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \\ x_4 \\x_5 \\ x_6 \\ x_7 \\ x_8 \end{bmatrix} , \qquad w=e^{\frac{-j2\pi}{9} }$$

Arrange the row and the column to be

$$\begin{bmatrix} y_0\\ y_3\\ y_6 \\ y_1 \\ y_2 \\ y_4 \\y_8  \\ y_7 \\ y_5\end{bmatrix} = \left[ \begin{array}{ccc|ccccccc} 1 & 1 & 1  & 1 & 1 &  1 & 1  & 1 & 1\\ 1 & 1 & 1 & w^3 & w^6 & w^3 & w^6 & w^3 & w^6  \\ 1 &1 & 1  & w^6  & w^3 & w^6 & w^3 & w^6 & w^3  \\ \hline 1 & w^3 & w^6 & w & w^5 & w^7 & w^8 & w^4 & w^2 \\ 1 & w^6 & w^3 & w^2 & w & w^5 & w^7 & w^8 & w^4\\ 1 & w^3 & w^6 & w^4 & w^2 & w & w^5 & w^7 & w^8\\ 1 & w^6 & w^3 & w^8 & w^4 & w^2 &w & w^5 & w^7 \\ 1 & w^3 & w^6 & w^7 & w^8 & w^4 & w^2 & w& w^5\\ 1 & w^6 & w^3 & w^5 & w^7 & w^8 & w^4 & w^2 &w\\ \end{array} \right] \begin{bmatrix} x_0\\ x_3\\ x_6 \\ x_1 \\ x_5 \\ x_7 \\x_8  \\ x_4 \\ x_2 \end{bmatrix}$$

We can apply cyclic convolution to the 6 by 6 cyclic matrix located in the lower right corner of the matrix.

Other n
In this case we can take $$n=n_1\cdot n_2\cdot \cdots n_r$$, where $$n_1, n_2, \cdots ,n_r$$ are mutually prime.

For a coefficient $$w^k$$ in n point FFT, we can define the order $$(i_1,\cdots,i_r)$$ of k such that


 * $$k=i_1(\text{mod }n_1)$$
 * $$\cdots$$
 * $$k=i_r(\text{mod }n_r)$$

By Chinese remainder theorem, there exist a coefficient in n point FFT is of order $$(i_1,\cdots,i_r)$$ for all order $$i_1,\cdots,i_r$$.

Then we arange the matrix by lexicographic order of its order than we can derive a matrix combined by $$D_{n_1}$$.( $$D_{n_1}$$ is the $$n_1$$ by $$n_1$$ DFT matrix)

For example, n = 6 = 2$$\cdot$$3:

$$\begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \\ y_4 \\y_5\end{bmatrix} = \begin{bmatrix} 1 & 1 & 1  & 1 & 1 &  1\\ 1 & w & w^2 & w^3 & w^4 & w^5\\ 1 & w^2 & w^4 & 1 & w^2 & w^4 \\ 1 & w^3 &1 & w^3 & 1 & w^3\\ 1 & w^4 & w^2 & 1 & w^4 & w^2\\ 1 & w^5 & w^4 & w^3 & w^2 & w^1\\ \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \\ x_4 \\x_5\end{bmatrix} , \qquad w=e^{\frac{-j\pi}{3} }$$

The order is


 * 0 - (0, 0)
 * 1 - (1, 1)
 * 2 - (1, 1)
 * 3 - (1, 1)
 * 4 - (1, 1)
 * 5 - (1, 1)

Arrange the row and the column to be:

$$\begin{bmatrix} y_0 \\ y_3 \\ y_2 \\ y_1 \\ y_4 \\y_5\end{bmatrix} = \left[ \begin{array}{cc|cc|cc} 1 & 1 & 1  & 1 & 1 &  1\\ 1 & w^3 & 1 & w^3 & 1 & w^3\\ \hline 1 & 1 & w^2 & w^2 & w^4 & w^4 \\ 1 & w^3 & w^2 & w^5 & w^4 & w\\ \hline 1 & 1 & w^2 & w^2 & w^4 & w^4\\ 1 & w^3 & w^2 & w^5 & w^4 & w^1\\ \end{array} \right] \begin{bmatrix} x_0 \\ x_3 \\ x_2 \\ x_1 \\ x_4 \\x_5\end{bmatrix} = \begin{bmatrix} D_2 & D_2 & D_2 \\ D_2 & w^2D_2 & w^4D_2\\ D_2 & w^4D_2 & w^2D_2\end{bmatrix} \begin{bmatrix} y_0 \\ y_3 \\ y_2 \\ y_1 \\ y_4 \\y_5\end{bmatrix} = D3\otimes D2 \begin{bmatrix} y_0 \\ y_3 \\ y_2 \\ y_1 \\ y_4 \\y_5\end{bmatrix}$$

Number of Operations
The number of additions and multiplications required for both n-point Winograd FFT and n-point FFT algorithms for various values of n are as follows (the input is complex):

Disadvantage
Although Winograd FFT benefits in reducing the multiplications in FFT algorithms, it still has some disadvantages.


 * The design of Winograd FFT is much more complex, involving rearrangement using remainders and transforming multidimensional matrix multiplication.
 * The number of additions may increase significantly, and at current computing speeds, the benefits of reducing the number of multiplications may outweigh the loss of increased additions.