Bitonic sorter

Bitonic mergesort is a parallel algorithm for sorting. It is also used as a construction method for building a sorting network. The algorithm was devised by Ken Batcher. The resulting sorting networks consist of $$O(n\log^2(n))$$ comparators and have a delay of $$O(\log^2(n))$$, where $$n$$ is the number of items to be sorted. This makes it a popular choice for sorting large numbers of elements on an architecture which itself contains a large number of parallel execution units running in lockstep, such as a typical GPU.

A sorted sequence is a monotonically non-decreasing (or non-increasing) sequence. A bitonic sequence is a sequence with $$x_0 \leq \cdots \leq x_k \geq \cdots \geq x_{n-1}$$ for some $$k, 0 \leq k < n$$, or a circular shift of such a sequence.

Complexity
Let $$p = \lfloor \log_2 n \rfloor$$ and $$q = \lceil \log_2 n \rceil$$.

It is evident from the construction algorithm that the number of rounds of parallel comparisons is given by $$q(q+1)/2$$.

It follows that the number of comparators $$c$$ is bounded $$2 ^{p-1} \cdot p (p+1) /2 \leq c \leq \lfloor {n/2} \rfloor \cdot q (q+1) /2 $$ (which establishes an exact value for $$c$$ when $$n$$ is a power of 2).

Although the absolute number of comparisons is typically higher than Batcher's odd-even sort, many of the consecutive operations in a bitonic sort retain a locality of reference, making implementations more cache-friendly and typically more efficient in practice.

How the algorithm works
The following is a bitonic sorting network with 16 inputs:



The 16 numbers enter as the inputs at the left end, slide along each of the 16 horizontal wires, and exit at the outputs at the right end. The network is designed to sort the elements, with the largest number at the bottom.

The arrows are comparators. Whenever two numbers reach the two ends of an arrow, they are compared to ensure that the arrow points toward the larger number. If they are out of order, they are swapped. The colored boxes are just for illustration and have no effect on the algorithm.

Every red box has the same structure: each input in the top half is compared to the corresponding input in the bottom half, with all arrows pointing down (dark red) or all up (light red). If the inputs happen to form a bitonic sequence (a single nondecreasing sequence followed by a single nonincreasing one or vice versa), then the output will form two bitonic sequences. The top half of the output will be bitonic, and the bottom half will be bitonic, with every element of the top half less than or equal to every element of the bottom half (for dark red) or vice versa (for light red). This theorem is not obvious, but can be verified by carefully considering all the cases of how the various inputs might compare, using the zero-one principle, where a bitonic sequence is a sequence of 0s and 1s that contains no more than two "10" or "01" subsequences.

The red boxes combine to form blue and green boxes. Every such box has the same structure: a red box is applied to the entire input sequence, then to each half of the result, then to each half of each of those results, and so on. All arrows point down (blue) or all point up (green). This structure is known as a butterfly network. If the input to this box happens to be bitonic, then the output will be completely sorted in increasing order (blue) or decreasing order (green). If a number enters the blue or green box, then the first red box will sort it into the correct half of the list. It will then pass through a smaller red box that sorts it into the correct quarter of the list within that half. This continues until it is sorted into exactly the correct position. Therefore, the output of the green or blue box will be completely sorted.

The green and blue boxes combine to form the entire sorting network. For any arbitrary sequence of inputs, it will sort them correctly, with the largest at the bottom. The output of each green or blue box will be a sorted sequence, so the output of each pair of adjacent lists will be bitonic, because the top one is blue and the bottom one is green. Each column of blue and green boxes takes N sorted sequences and concatenates them in pairs to form N/2 bitonic sequences, which are then sorted by the boxes in that column to form N/2 sorted sequences. This process starts with each input considered to be a sorted list of one element, and continues through all the columns until the last merges them into a single, sorted list. Because the last stage was blue, this final list will have the largest element at the bottom.

Alternative representation
Each green box, in the diagram above, performs the same operation as a blue box, but with the sort in the opposite direction. So, each green box could be replaced by a blue box followed by a crossover where all the wires move to the opposite position. This would allow all the arrows to point the same direction, but would prevent the horizontal lines from being straight. However, a similar crossover could be placed to the right of the bottom half of the outputs from any red block, and the sort would still work correctly, because the reverse of a bitonic sequence is still bitonic. If a red box then has a crossover before and after it, it can be rearranged internally so the two crossovers cancel, so the wires become straight again. Therefore, the following diagram is equivalent to the one above, where each green box has become a blue plus a crossover, and each orange box is a red box that absorbed two such crossovers:



The arrowheads are not drawn, because every comparator sorts in the same direction. The blue and red blocks perform the same operations as before. The orange blocks are equivalent to red blocks where the sequence order is reversed for the bottom half of its inputs and the bottom half of its outputs. This is the most common representation of a bitonic sorting network. Unlike the previous interpretation, because the elements remain logically ordered, it's easy to extend this representation to a non-power-of-two case (where each compare-and-swap ignores any case where the larger index is out of range).

Example code
The following is a recursion-free implementation of the bitonic mergesort when the array length is a power of two: