Last week I was interested to read about the proposed `math/rand/v2`

for Golang’s standard library. It mentioned a new-ish flavour
of PCG random number generator which I had not previously encountered,
called PCG64 DXSM. This blog post collects what I have learned about
it. (I have not found a good summary elsewhere.)

At the end there is source code for PCG64 DXSM that you can freely copy and use.

## background

Here’s the bit where the author writes their life story, and all the readers skip several screens full of text to get to the recipe.

Occasionally I write randomized code that needs a pseudorandom number generator that is:

- Not gratuitously terrible;
- Seedable, so I can reproduce test cases;
- Does not tank multicore performance by using global mutable state;
- Runs everywhere without portability faff.

The various `libc`

random number generators can satisfy one or two of
my requirements, if I am lucky.

So I grab a copy of PCG and use that. It’s only like 10 lines of code, and PCG’s creator isn’t an arsehole.

## how pcg works

PCG random number generators are constructed from a collection of linear congruential generators, and a collection of output permutations.

A linear congruential random number generator looks like:

```
state = state * mul + inc;
```

The multiplier `mul`

is usually fixed; the increment `inc`

can be
fixed, but PCG implementations usually allow it to be chosen when the
RNG is seeded.

A bare LCG is a bad RNG. PCG turns an LCG into a good RNG:

- The LCG state is twice the size of the RNG output
- The LCG state is permuted to produce the RNG output

PCG’s output permutations have abbreviated names like XSH (xor-shift),
RR (random rotate), RXS (random xor-shift), XSL (xor shift right
[*sic*]), etc.

## using pcg

The reference implementation of PCG in C++ allows you to mix and match LCGs and output permutations at a variety of integer sizes. There is a bewildering number of options, and the PCG paper explains the desiderata at length. It is all very informative if you are a researcher interested in exploring the parameter space of a family of random number generators.

But it’s all a bit too much when all I want is a replacement for
`rand()`

and `srand()`

.

## pcg32

For 32-bit random numbers, PCG has a straightforward solution in the
form of `pcg_basic.c`

.

In C++ PCG this standard 32-bit variant is called
`pcg_engines::setseq_xsh_rr_64_32`

or simply `pcg32`

for short.

(There is a caveat, tho: `pcg32_boundedrand_r()`

would be faster if it
used Daniel Lemire’s nearly-divisionless unbiased rejection sampling
algorithm for bounded random numbers.)

## pcg64

For 64-bit random numbers it is not so simple.

There is no 64-bit equivalent of `pcg_basic.c`

. The reference
implementations have a blessed flavour called `pcg64`

, but it isn’t
trivial to unpick the source code’s experimental indirections to get a
10 line implementation.

And even if you do that, you won’t get the best 64-bit flavour, which is:

## pcg64 dxsm

PCG64 DXSM is used by NumPy. It is a relatively new flavour of PCG,
which addresses a minor shortcoming of the original `pcg64`

that arose
in the discussion when NumPy originally adopted PCG.

In the commit that introduced PCG64 DXSM, its creator Melissa O’Neill describes it as follows:

## DXSM – double xor shift multiply

This is a new, more powerful output permutation (added in 2019). It’s a more comprehensive scrambling than RXS M, but runs faster on 128-bit types. Although primarily intended for use at large sizes, also works at smaller sizes as well.

As well as the DXSM output permutation, `pcg64_dxsm()`

uses a “cheap
multiplier”, i.e. a 64-bit value half the width of the state, instead
of a 128-bit value the same width as the state. The same multiplier is
used for the LCG and the output permutation. The cheap multiplier
improves performance: `pcg64_dxsm()`

has fewer full-size 128 bit
calculations.

O’Neill wrote a longer description of the design of PCG64 DXSM, and the NumPy documentation discusses how PCG64DXSM improves on the old PCG64.

In C++ PCG PCG64 DXSM’s full name is
`pcg_engines::cm_setseq_dxsm_128_64`

. As far as I can tell it doesn’t
have a more friendly alias. (The C++ PCG `typedef`

`pcg64`

still refers
to the previously preferred `xsl_rr`

variant.)

In the Rust `rand_pcg`

crate PCG64 DXSM is called
`Lcg128CmDxsm64`

, i.e. a linear congruential generator with 128 bits
of state and a cheap multiplier, using the DXSM permutation with 64
bits of output.

That should be enough search keywords and links, I think.

# pcg64 dxsm implementation

OK, at last, here’s the recipe that you were looking for:

```
// SPDX-License-Identifier: 0BSD OR MIT-0
typedef struct pcg64 {
uint128_t state, inc;
} pcg64_t;
uint64_t pcg64_dxsm(pcg64_t *rng) {
/* cheap (half-width) multiplier */
const uint64_t mul = 15750249268501108917ULL;
/* linear congruential generator */
uint128_t state = rng->state;
rng->state = state * mul + rng->inc;
/* DXSM (double xor shift multiply) permuted output */
uint64_t hi = (uint64_t)(state >> 64);
uint64_t lo = (uint64_t)(state | 1);
hi ^= hi >> 32;
hi *= mul;
hi ^= hi >> 48;
hi *= lo;
return(hi);
}
```

## seeding

The algorithm for seeding PCG takes some raw seed values and conditions them to make a new RNG state that is “not ludicrous” (in the words of Simon Tatham). It is the same for all flavours of PCG:

```
pcg64_t pcg64_seed(pcg64_t rng) {
/* must ensure rng.inc is odd */
rng.inc = (rng.inc << 1) | 1;
rng.state += rng.inc;
pcg64_dxsm(&rng);
return(rng);
}
```

You can pass `pcg64_seed()`

a structure literal containing the raw
seed values, or use it like:

```
pcg_t pcg64_getentropy(void) {
pcg64_t rng;
if (getentropy(&rng, sizeof(rng)) < 0)
err(1, "getentropy");
return (pcg64_seed(rng));
}
```

## 128-bit arithmetic

The code above works on 64-bit systems with `clang`

and `gcc`

, which
have `__uint128_t`

built in.

If you need a C implementation that works on systems without a handy 128-bit integer type, then the PCG64 implementation in NumPy has its own support for 128-bit arithmetic.

Or if you are using C++ then the reference implementation of PCG has another 128-bit arithmetic class.