Draft:Rao–Sandelius shuffle

The Rao–Sandelius shuffle is a divide-and-conquer algorithm for shuffling a finite sequence. The algorithm takes a list of all the elements of the sequence, and for each element draws a uniform random value from 0 to k-1. The list is broken up into k subsequences by the random value chosen, which are further shuffled.

For an array with $$n$$ entries, the algorithm consumes $$O(n \log n)$$ random digits and performs $$O(n \log n)$$ swaps. This appears worse than the Fisher-Yates Shuffle, which consumes $$O(n \log n)$$ random digits and performs $$O(n)$$ swaps. However, because of the random access pattern of the Fisher-Yates shuffle, the Rao-Sandelius Shuffle can be faster for large $$n$$ due to its cache friendliness.

Original method using random digit table
Rao described an algorithm for shuffling the numbers 1 through n using a table of random decimal digits. Sandelius described a procedure for shuffling a deck with n cards using random decimal digits. Sandelius includes the optimization that for a subdeck of length 2, you only need a single random digit. In the original, the order is kept for an odd random digit, and swapped for an even random digit. In either case the algorithm is as follows:


 * 1) For each element to be shuffled, select a random digit 0 through 9.
 * 2) Group elements that selected the same random digit into sub-lists.
 * 3) Place the sub-lists in numerical order based on the digit selected.
 * 4) Repeat on any sub-lists of size 2 or greater.

Length 2 optimization
Sandelius included the following optimization: If the sub-list is of length 2, draw a single random digit. Swap the order if the digit is even, and leave it unchanged if the digit is odd.

Implementation with random bits
It is straightforward to adapt a version of this algorithm to a computer using k=2.


 * 1) For each element to be shuffled, select a random bit.
 * 2) Place all the elements that selected 0 ahead of all elements that selected 1.
 * 3) Repeat on the 2 sub-lists. This step can be parallelized.

The Size-2 speedup can also be applied in this case. Preserving the order if 1 is chosen, and swapping if 0 is chosen.

Here is pseudocode for an in-place version of the algorithm (for a zero-based array):

function rs_shuffle(A, n): if n < 2 then return if n = 2 then: b ← uniform random bit if b = 0 then exchange A[0] and A[1] return rs_shuffle_large(A, n)

function rs_shuffle_large(A, n): front ← 0 back ← n - 1 outer: while true: b ← uniform random bit while b ≠ 1 front ← front + 1 if front > back then break outer b ← uniform random bit while b ≠ 0 back ← back - 1 if front ≥ back then break outer exchange A[front] and A[back] front ← front + 1 back ← back - 1 if front > back then break outer rs_shuffle(A, front) rs_shuffle(A + front, n - front)

Each algorithm pass is effectively an Inverse-GSR 2-shuffle.