Bit-reversal permutation

In applied mathematics, a bit-reversal permutation is a permutation of a sequence of $$n$$ items, where $$n=2^k$$ is a power of two. It is defined by indexing the elements of the sequence by the numbers from $$0$$ to $$n-1$$, representing each of these numbers by its binary representation (padded to have length exactly $$k$$), and mapping each item to the item whose representation has the same bits in the reversed order.

Repeating the same permutation twice returns to the original ordering on the items, so the bit reversal permutation is an involution.

This permutation can be applied to any sequence in linear time while performing only simple index calculations. It has applications in the generation of low-discrepancy sequences and in the evaluation of fast Fourier transforms.

Example
Consider the sequence of eight letters abcdefgh. Their indexes are the binary numbers 000, 001, 010, 011, 100, 101, 110, and 111, which when reversed become 000, 100, 010, 110, 001, 101, 011, and 111. Thus, the letter a in position 000 is mapped to the same position (000), the letter b in position 001 is mapped to the fifth position (the one numbered 100), etc., giving the new sequence aecgbfdh. Repeating the same permutation on this new sequence returns to the starting sequence.

Writing the index numbers in decimal (but, as above, starting with position 0 rather than the more conventional start of 1 for a permutation), the bit-reversal permutations on $$n=2^k$$ items, for $$k=0,1,2, 3, \dots$$, are: Each permutation in this sequence can be generated by concatenating two sequences of numbers: the previous permutation, with its values doubled, and the same sequence with each value increased by one. Thus, for example doubling the length-4 permutation 0 2 1 3 gives 0 4 2 6, adding one gives 1 5 3 7, and concatenating these two sequences gives the length-8 permutation 0 4 2 6 1 5 3 7.

Generalizations
The generalization to radix $$b$$ representations, for $$b > 2$$, and to $$n=b^k$$, is a digit-reversal permutation, in which the base-$$b$$ digits of the index of each element are reversed to obtain the permuted index. The same idea can also been generalized to mixed radix number systems. In such cases, the digit-reversal permutation should simultaneously reverses the digits of each item and the bases of the number system, so that each reversed digit remains within the range defined by its base.

Permutations that generalize the bit-reversal permutation by reversing contiguous blocks of bits within the binary representations of their indices can be used to interleave two equal-length sequences of data in-place.

There are two extensions of the bit-reversal permutation to sequences of arbitrary length. These extensions coincide with bit-reversal for sequences whose length is a power of 2, and their purpose is to separate adjacent items in a sequence for the efficient operation of the Kaczmarz algorithm. The first of these extensions, called efficient ordering, operates on composite numbers, and it is based on decomposing the number into its prime components.

The second extension, called EBR (extended bit-reversal), is similar in spirit to bit-reversal. Given an array of size $$n$$, EBR fills the array with a permutation of the numbers in the range $$0\ldots n-1$$ in linear time. Successive numbers are separated in the permutation by at least $$\lfloor n/4\rfloor$$ positions.

Applications
Bit reversal is most important for radix-2 Cooley–Tukey FFT algorithms, where the recursive stages of the algorithm, operating in-place, imply a bit reversal of the inputs or outputs. Similarly, mixed-radix digit reversals arise in mixed-radix Cooley–Tukey FFTs.

The bit reversal permutation has also been used to devise lower bounds in distributed computation.

The Van der Corput sequence, a low-discrepancy sequence of numbers in the unit interval, is formed by reinterpreting the indexes of the bit-reversal permutation as the fixed-point binary representations of dyadic rational numbers.

Bit-reversal permutations are often used in finding lower bounds on dynamic data structures. For example, subject to certain assumptions, the cost of looking up the integers between $$0$$ and $$n-1$$, inclusive, in any binary search tree holding those values, is $$\Omega(n \log n)$$ when those numbers are queried in bit-reversed order. This bound applies even to trees like splay trees that are allowed to rearrange their nodes between accesses.

Algorithms
Mainly because of the importance of fast Fourier transform algorithms, numerous efficient algorithms for applying a bit-reversal permutation to a sequence have been devised.

Because the bit-reversal permutation is an involution, it may be performed easily in place (without copying the data into another array) by swapping pairs of elements. In the random-access machine commonly used in algorithm analysis, a simple algorithm that scans the indexes in input order and swaps whenever the scan encounters an index whose reversal is a larger number would perform a linear number of data moves. However, computing the reversal of each index may take a non-constant number of steps. Alternative algorithms can perform a bit reversal permutation in linear time while using only simple index calculations. Because bit-reversal permutations may be repeated multiple times as part of a calculation, it may be helpful to separate out the steps of the algorithm that calculate index data used to represent the permutation (for instance, by using the doubling and concatenation method) from the steps that use the results of this calculation to permute the data (for instance, by scanning the data indexes in order and performing a swap whenever the swapped location is greater than the current index, or by using more sophisticated vector scatter–gather operations).

Another consideration that is even more important for the performance of these algorithms is the effect of the memory hierarchy on running time. Because of this effect, more sophisticated algorithms that consider the block structure of memory can be faster than this naive scan. An alternative to these techniques is special computer hardware that allows memory to be accessed both in normal and in bit-reversed order.

The performance improvement of bit-reversals in both uniprocessor and multiprocessors has been paid a serious attention in high-performance computing fields. Because architecture-aware algorithm development can best utilize hardware and system software resources, including caches, TLB, and multicores, significantly accelerating the computation.