Random floating point numbers

Here are a couple of algorithms for generating uniformly distributed floating point numbers 0.0 <= n < 1.0 using an unbiased random bit generator and IEEE 754 double precision arithmetic. Both of them depend on details of how floating point numbers work, so before getting into the algorithms I’ll review IEEE 754.

The first algorithm uses bit hacking and type punning. The second uses a hexadecimal floating point literal. They are both fun and elegant in their own ways, but these days the second one is better.

ieee 754
exponent
fraction
the problem
conversion
bithacking solution
hexadecimal solution
favourite solution

I wrote some follow-up notes on more random floating point numbers discussing another model of uniformity, and making better use of floating point’s dynamic range.

ieee 754

A floating point number has three parts:

a sign bit
an exponent
a significand aka mantissa aka fraction

bit layout of IEEE 754 double precision

It works like a binary version of decimal scientific notation, so its value is

	+/- 1.fff...fff * 2ᴱ

exponent

Instead of using two’s complement, the exponent is stored as a biased integer,

	E = eeeeeeeeeee - 1023

The maximum value of the exponent bits is 2047, and the bias is half that, so 2⁰ is represented as 1023 in the exponent bits.

(A consequence of using biased integers instead of two’s complement is that IEEE 754 floating point numbers sort correctly if you treat them as sign-magnitude integers.)

fraction

In decimal scientific notation, a number is normalized by adjusting its exponent so its leftmost non-zero digit (most significant digit) is immediately to the left of the decimal point.

Similarly, a floating point number has its leftmost non-zero bit immediately to the left of its binary point. Because a non-zero bit is always 1, there is no need to store it explicitly. So double-precision numbers actually have 53 significant bits, of which 52 appear in their representation.

the problem

To get a uniform distribution of random floating point numbers 0.0 <= n < 1.0:

half of the random results must be 0.5 <= n < 1.0, with E = -1;
a quarter of the results must be 0.25 <= n < 0.5, with E = -2;
an eighth must be 0.125 <= n < 0.25, with E = -3;
etc. usw.

To solve this problem we need to take control of integer to floating conversion.

conversion

The algorithm implemented by the FPU to convert a nonzero unsigned number to double precision floating point is:

Count leading zeroes, to get z;
The exponent is 63 - z and the biased exponent is 1086 - z;
Shift the number left by z - 11 so the top bit is in position 52 (very large numbers get shifted right);
Clear bit 52 (which at this point is always 1) and insert the biased exponent into bits 52…62 (inclusive).

bithacking solution

Observe that floating point numbers 1.0 <= n < 2.0 span the same range of values as we want, but all the numbers have the same exponent E = 0. This allows us to avoid all the exponent shenanigans I described in the previous sections.

So the idea is to use bitwise integer operations to put together a number with:

sign bit = 0;
eeeeeeeeeee = 1023 = 0x3FF so that E = 0;
fff...fff filled with unbiased random bits.

Then the bits of the integer are reinterpreted as the bits of a floating point number, subtract 1.0, and that’s the result.

However, C’s strict aliasing rules make it difficult to do this kind of reinterpretation. For example, it is often done using a union, but the C standard says that is undefined behaviour. But there is a loophole in the rules.

    double pcg64_bithack_double(pcg64_t *rng) {
        double d;
        uint64_t u = pcg64_random(rng);
        /* 52 bits of randomness in the little end */
        u = u >> 12;
        /* set exponent to 0 + bias = 1023 */
        u = u | (1023 << 52);
        /* reinterpret as floating point */
        memmove(&d, &u, sizeof(d));
        return(d - 1.0);
    }

Modern compilers will inline the memmove() so the function ends up being compiled correctly and efficiently.

In practice, using a union is OK because too much code would break if compilers were as strict as they could be, and a union will be more efficient with older compilers.

hexadecimal solution

C99 added support for hexadecimal floating point literals, which look like

	0xHHHH.HHHHp+dddd

Slightly weirdly, while the fraction is written in hex, the exponent (p for power of 2) is written in decimal.

Hex floating literals are useful when it’s important to get exact control over the bit pattern in the number, which is much harder if you have to go via decimal.

The idea is to get a random N-bit integer,

    /* 53 bits of randomness in the little end */
    uint64_t u = pcg64_random(rng) >> 11;

convert it to a floating point number, 0.0 <= d < 2^N,

    double d = (double)u;

then divide it by the maximum range of the integer, 2^N, so the result is between 0.0 and 1.0. Or get the same effect by multiplying by 2^-N.

    d *= 0x1.0p-53;

Note that this solution returns 53 bits of randomness because it is able to make use of the implicit most-significant bit, whereas the bithack solution only has 52 bits of randomness. The 53rd bit ends up being encoded in the exponent when the integer is converted to floating point.

As well as using a hex floating literal, this solution assumes that double precision multiplication is fast. Which is true, because it has been the subject of important benchmarks for decades.

favourite solution

Putting it together gives us this beautifully lucid one-liner.

    double pcg64_random_double(pcg64_t *rng) {
        return((double)(pcg64_random(rng) >> 11) * 0x1.0p-53);
    }

This is the code I’m using in pcg-dxsm.git.