Bailey's FFT algorithm

The Bailey's FFT (also known as a 4-step FFT) is a high-performance algorithm for computing the fast Fourier transform (FFT). This variation of the Cooley–Tukey FFT algorithm was originally designed for systems with hierarchical memory common in modern computers (and was the first FFT algorithm in this so called "out of core" class). The algorithm treats the samples as a two dimensional matrix (thus yet another name, a matrix FFT algorithm) and executes short FFT operations on the columns and rows of the matrix, with a correction multiplication by "twiddle factors" in between.

The algorithm got its name after an article by David H. Bailey, FFTs in external or hierarchical memory, published in 1989. In this article Bailey credits the algorithm to W. M. Gentleman and G. Sande who published their paper, Fast Fourier Transforms: for fun and profit, some twenty years earlier in 1966. The algorithm can be considered a radix-$$\sqrt n$$ FFT decomposition.

Here is a brief overview of how the "4-step" version of the Bailey FFT algorithm works:


 * 1) The data (in natural order) is first arranged into a matrix.
 * 2) Each column of a matrix is then independently processed using a standard FFT algorithm.
 * 3) Each element of a matrix is multiplied by a correction coefficient.
 * 4) Each row of a matrix is then independently processed using a standard FFT algorithm.

The result (in natural order) is read column-by-column. Since the operations are performed column-wise and row-wise, steps 2 and 4 (and reading of the result) might include a matrix transpose to rearrange the elements in a way convenient for processing. The algorithm resembles a 2-dimensional FFT, a 3-dimensional (and beyond) extensions are known as 5-step FFT, 6-step FFT, etc.

The Bailey FFT is typically used for computing DFTs of large datasets, such as those used in scientific and engineering applications. The Bailey FFT is a very efficient algorithm, and it has been used to compute FFTs of datasets with billions of elements (when applied to the number-theoretic transform, the datasets of the order of 1012 elements were processed in mid-2000s).