# 2020-10-29 – nearly divisionless random numbers

I recently saw Colm MacCárthaigh’s exegesis of Daniel Lemire’s paper about nearly divisionless fast random integer generation. I found myself writing down my own notes on how it works, that take a slightly different angle.

``````uint64_t random_lemire(uint64_t limit) {
uint64_t raw = random_u64();
uint128_t sample = (uint128_t)raw * (uint128_t)limit;
uint64_t frac = (uint64_t)sample;
if(frac < limit) {
const uint64_t residue = -limit % limit;
while(frac < residue) {
raw = random_u64();
sample = (uint128_t)raw * (uint128_t)limit;
frac = (uint64_t)sample;
}
}
return(sample >> 64);
}
``````

(In this explanation, `W == 1<<64` i.e. two to the power of the word size.)

We first get a `raw` random value, which we can think of as a fraction `0 <= raw < 1`, multiplied by `W` so it is represented as a fixed-point number. We calculate a `sample` as `raw * limit`, which is a random fixed-point fractional number `0 <= sample < limit`. Our return value `r` will be the integer part of fixed-point `sample`, which in machine arithmetic is `r == sample / W == sample >> 64`. We will also work with `frac`, the fractional part of `sample`, which in machine arithmetic we get by truncating `sample` to its least significant 64 bits, so that `sample == W*r + frac`.

There is a little bias in this basic fixed-point calculation, so we are going to do some rejection sampling.

We will reject and retry when we get one of `residue = W % limit` special `raw` values, so that our return values are sampled from `W - residue == n * limit` random values. This is the largest possible integer `n` where `n * limit < W`, to minimize the probability of a retry.

To avoid overflow when calculating `residue` (because `W` does not fit in a word) we use some modular arithmetic tricks: `W % limit == (W - limit) % limit`, and `W - limit` is the same as `-limit` in unsigned two’s complement arithmetic, so `residue == W % limit == -limit % limit`. We know this division is safe because we know the `limit` is strictly greater than zero after the `if()` test.

We don’t compare `residue` with `raw` and reject a contiguous range of possible `raw` random values, because `sample` uses the entire range of `raw` and there isn’t a straightforward overflow interval that we can reject. Instead we compare `residue` with `frac`, the fractional part of `sample = raw * limit`. The rough idea is to reject `residue/limit` worth of bias for each of the `limit` possible return values in the integer part of `sample`.

More precisely (working in machine arithmetic rather than fixed-point), the range `0 <= sample < W * limit` consists of several intervals, one for each return value `0 <= r < limit`. Interval `r` covers the sub-range

``````    W*r <= sample < W*(r+1)
``````

In each interval we reject `sample` if it is in the range

``````    W*r <= sample < W*r + residue
``````

and keep `sample` if it is in the range

``````    W*r + residue <= sample < W*(r+1)
``````

The code uses truncated `frac` rather than `sample == W*r + frac`, which in effect subtracts out the `W*r` part of these ranges, so the reject test becomes `frac < residue`.

Each keep interval spans

``````    W*(r+1) - (W*r + residue) == W - residue == n * limit
``````

When we map from `raw` to `sample` with `sample = raw * limit`, the same number of `raw` values fall into each keep interval, because each keep interval is the same size and is a multiple of `limit`. As a result, the sampling will be uniform.

Not all reject intervals contain a sample value because `residue < limit`, or (in terms of our earlier intuition) because `residue/limit < 1`. (Sort-of relatedly, the phase alignment of samples in each keep interval is typically different.) Because of this per-return-value lumpiness, and unlike simpler versions of rejection sampling, this algorithm has a timing side-channel that can reveal a small amount of information about its result, so it should not be used for secrets.

The final tweak makes the algoritm nearly divisionless by avoiding the calculation of `residue`. We observe that `residue == W % limit < limit`, so we can do an initial rough comparison `frac < limit` to see if it’s worth preparing for the rejection sampling loop. This saves division work proportional to how much `limit` is smaller than `W`. It also saves us from having to explicitly check if the `limit` is zero.

Our return values use the most significant bits of the underlying random number generator, and rejection sampling depends on the next most significant bits (because of the way carries in `raw * limit` cascade over multiples of `W`), so we still work OK with RNGs that have quality problems in their least significant bits.