Counter-based random number generator

A counter-based random number generation (CBRNG, also known as a counter-based pseudo-random number generator, or CBPRNG) is a kind of pseudorandom number generator that uses only an integer counter as its internal state. They are generally used for generating pseudorandom numbers for large parallel computations.

Background
We can think of a pseudorandom number generator (PRNG) as a function that transforms a series of bits known as the state into a new state and a random number.

That is, given a PRNG function and an initial state $$\mathrm{state}_0$$, we can repeatedly use the PRNG to generate a sequence of states and random numbers.

$$ \begin{align} \mathrm{PRNG}(\mathrm{state}_0) &= \mathrm{state}_1,\ \mathrm{num}_1 \\ \mathrm{PRNG}(\mathrm{state}_1) &= \mathrm{state}_2,\ \mathrm{num}_2 \\ \mathrm{PRNG}(\mathrm{state}_2) &= \mathrm{state}_3,\ \mathrm{num}_3 \\ \mathrm{PRNG}(\mathrm{state}_3) &= \ldots \end{align} $$

In some PRNGs, such as the Mersenne Twister, the state is large, more than 2048 bytes. In other PRNGs, such as xorshift, $$\mathrm{state}_i$$ and $$\mathrm{num}_i$$ are one and the same (and so the state is small, just 4, 8, or 16 bytes, depending on the size of the numbers being generated). But in both cases, and indeed in most traditional PRNGs, the state evolves unpredictably, so if you want to calculate a particular $$\mathrm{state}_i$$ given an initial state $$\mathrm{state}_0$$, you have to calculate $$\mathrm{state}_1$$, $$\mathrm{state}_2$$, and so on, running the PRNG $$i$$ times.

Such algorithms are inherently sequential and not amenable to running on parallel machines like multi-core CPUs and GPUs.

In contrast, a counter-based random number generator (CBRNG) is a PRNG where the state "evolves" in a particularly simple manner: $$\mathrm{state}_i = i$$. This way you can generate each number independently, without knowing the result of the previous call to the PRNG.

This property make it easy to run a CBRNG on a multiple CPU threads or a GPU. For example, to generate $$n$$ random numbers on a GPU, you might spawn $$n$$ threads and have the $$i$$th thread calculate $$\mathrm{PRNG}(i)$$.

CBRNGs based on block ciphers
Some CBRNGs are based on reduced-strength versions of block ciphers. Below we explain how this works.

When using a cryptographic block cipher in counter mode, you generate a series of blocks of random bits. The $$i$$th block is calculated by encrypting the number $$i$$ using the encryption key $$k$$: $$\mathrm{Block}_i = E(i, k)$$.

This is similar to a CBRNG, where you calculate the $$i$$th random number as $$\mathrm{PRNG}(i)$$. Indeed, any block cipher can be used as a CBRNG; simply let $$\mathrm{PRNG}(i) = E(i, \mathrm{seed})$$!

This yields a strong, cryptographically-secure source of randomness. But cryptographically-secure pseudorandom number generators tend to be slow compared to insecure PRNGs, and in practice many uses of random numbers don't require this degree of security.

In 2011, Salmon et al. at D. E. Shaw Research introduced two CBRNGs based on reduced-strength versions of block ciphers.


 * Threefry uses a reduced-strength version of the Threefish block cipher. (Juvenile fish are known as "fry".)


 * ARS uses a reduced-strength version of the AES block cipher. ("ARS" is a pun on "AES"; "AES" stands for "advanced encryption standard", and "ARS" stands for "advanced randomization system" ).

ARS is used in recent versions of Intel's Math Kernel Library and gets good performance by using instructions from the AES-NI instruction set, which specifically accelerate AES encryption.

Code implementing Threefry, ARS, and Philox (see below) is available from the authors.

CBRNGs based on multiplication
In addition to Threefry and ARS, Salmon et al. described a third counter-based PRNG, Philox, based on wide multiplies; e.g. multiplying two 32-bit numbers and producing a 64-bit number, or multiplying two 64-bit numbers and producing a 128-bit number.

As of 2020, Philox is popular on CPUs and GPUs. On GPUs, nVidia's library and TensorFlow provide implementations of Philox. On CPUs, Intel's MKL provides an implementation.

A new CBRNG based on multiplication is the Squares RNG. This generator passes stringent tests of randomness and is considerably faster than Philox.