Split-radix FFT algorithm

The split-radix FFT is a fast Fourier transform (FFT) algorithm for computing the discrete Fourier transform (DFT), and was first described in an initially little-appreciated paper by R. Yavne (1968) and subsequently rediscovered simultaneously by various authors in 1984. (The name "split radix" was coined by two of these reinventors, P. Duhamel and H. Hollmann.) In particular, split radix is a variant of the Cooley–Tukey FFT algorithm that uses a blend of radices 2 and 4: it recursively expresses a DFT of length N in terms of one smaller DFT of length N/2 and two smaller DFTs of length N/4.

The split-radix FFT, along with its variations, long had the distinction of achieving the lowest published arithmetic operation count (total exact number of required real additions and multiplications) to compute a DFT of power-of-two sizes N. The arithmetic count of the original split-radix algorithm was improved upon in 2004 (with the initial gains made in unpublished work by J. Van Buskirk via hand optimization for N=64 ), but it turns out that one can still achieve the new lowest count by a modification of split radix (Johnson and Frigo, 2007). Although the number of arithmetic operations is not the sole factor (or even necessarily the dominant factor) in determining the time required to compute a DFT on a computer, the question of the minimum possible count is of longstanding theoretical interest. (No tight lower bound on the operation count has currently been proven.)

The split-radix algorithm can only be applied when N is a multiple of 4, but since it breaks a DFT into smaller DFTs it can be combined with any other FFT algorithm as desired.

Split-radix decomposition
Recall that the DFT is defined by the formula:
 * $$ X_k = \sum_{n=0}^{N-1} x_n \omega_N^{nk} $$

where $$k$$ is an integer ranging from $$0$$ to $$N-1$$ and $$\omega_N$$ denotes the primitive root of unity:
 * $$\omega_N = e^{-\frac{2\pi i}{N}},$$

and thus: $$\omega_N^N = 1$$.

The split-radix algorithm works by expressing this summation in terms of three smaller summations. (Here, we give the "decimation in time" version of the split-radix FFT; the dual decimation in frequency version is essentially just the reverse of these steps.)

First, a summation over the even indices $$x_{2n_2}$$. Second, a summation over the odd indices broken into two pieces: $$x_{4n_4+1}$$ and $$x_{4n_4+3}$$, according to whether the index is 1 or 3 modulo 4. Here, $$n_m$$ denotes an index that runs from 0 to $$N/m-1$$. The resulting summations look like:


 * $$ X_k = \sum_{n_2=0}^{N/2-1} x_{2n_2} \omega_{N/2}^{n_2 k}

+ \omega_N^k \sum_{n_4=0}^{N/4-1} x_{4n_4+1} \omega_{N/4}^{n_4 k} + \omega_N^{3k} \sum_{n_4=0}^{N/4-1} x_{4n_4+3} \omega_{N/4}^{n_4 k} $$

where we have used the fact that $$\omega_N^{m n k} = \omega_{N/m}^{n k}$$. These three sums correspond to portions of radix-2 (size N/2) and radix-4 (size N/4) Cooley–Tukey steps, respectively. (The underlying idea is that the even-index subtransform of radix-2 has no multiplicative factor in front of it, so it should be left as-is, while the odd-index subtransform of radix-2 benefits by combining a second recursive subdivision.)

These smaller summations are now exactly DFTs of length N/2 and N/4, which can be performed recursively and then recombined.

More specifically, let $$U_k$$ denote the result of the DFT of length N/2 (for $$k = 0,\ldots,N/2-1$$), and let $$Z_k$$ and $$Z'_k$$ denote the results of the DFTs of length N/4 (for $$k = 0,\ldots,N/4-1$$). Then the output $$X_k$$ is simply:
 * $$X_k = U_k + \omega_N^k Z_k + \omega_N^{3k} Z'_k.$$

This, however, performs unnecessary calculations, since $$k \geq N/4$$ turn out to share many calculations with $$k < N/4$$. In particular, if we add N/4 to k, the size-N/4 DFTs are not changed (because they are periodic in N/4), while the size-N/2 DFT is unchanged if we add N/2 to k. So, the only things that change are the $$\omega_N^k$$ and $$\omega_N^{3k}$$ terms, known as twiddle factors. Here, we use the identities:
 * $$\omega_N^{k+N/4} = -i \omega_N^k$$
 * $$\omega_N^{3(k+N/4)} = i \omega_N^{3k}$$

to finally arrive at:
 * $$X_k = U_k + \left( \omega_N^k Z_k + \omega_N^{3k} Z'_k \right),$$
 * $$X_{k+N/2} = U_k - \left( \omega_N^k Z_k + \omega_N^{3k} Z'_k \right),$$
 * $$X_{k+N/4} = U_{k+N/4} - i \left( \omega_N^k Z_k - \omega_N^{3k} Z'_k \right),$$
 * $$X_{k+3N/4} = U_{k+N/4} + i \left( \omega_N^k Z_k - \omega_N^{3k} Z'_k \right),$$

which gives all of the outputs $$X_k$$ if we let $$k$$ range from $$0$$ to $$N/4-1$$ in the above four expressions.

Notice that these expressions are arranged so that we need to combine the various DFT outputs by pairs of additions and subtractions, which are known as butterflies. In order to obtain the minimal operation count for this algorithm, one needs to take into account special cases for $$k = 0$$ (where the twiddle factors are unity) and for $$k = N/8$$ (where the twiddle factors are $$(1 \pm i)/\sqrt{2}$$ and can be multiplied more quickly); see, e.g. Sorensen et al. (1986). Multiplications by $$\pm 1$$ and $$\pm i$$ are ordinarily counted as free (all negations can be absorbed by converting additions into subtractions or vice versa).

This decomposition is performed recursively when N is a power of two. The base cases of the recursion are N=1, where the DFT is just a copy $$X_0 = x_0$$, and N=2, where the DFT is an addition $$X_0 = x_0 + x_1$$ and a subtraction $$X_1 = x_0 - x_1$$.

These considerations result in a count: $$4 N \log_2 N - 6N + 8$$ real additions and multiplications, for N&gt;1 a power of two. This count assumes that, for odd powers of 2, the leftover factor of 2 (after all the split-radix steps, which divide N by 4) is handled directly by the DFT definition (4 real additions and multiplications), or equivalently by a radix-2 Cooley–Tukey FFT step.