There are four variants of the algorithm for shuffling an array,
arising from two independent choices:
whether to swap elements in the higher or lower parts of the array
whether the boundary between the parts moves upwards or downwards
The variants are perfectly symmetrical, but they work in two
fundamentally different ways: sampling or permutation.
The most common variant is Richard Durstenfeld’s shuffle
algorithm, which moves the boundary downwards and swaps
elements in the lower part of the array. Knuth describes it in TAOCP
vol. 2 sect. 3.4.2; TAOCP doesn’t discuss the other variants.
(Obeying Stigler’s law, it is often called a “Fisher-Yates”
shuffle, but their pre-computer algorithm is arguably different from
the modern algorithm.)
Concrete tetrapods are used to dissipate wave energy in
coastal defences.
There’s a bit of a craze for making tetrapod-shaped things: recently
I’ve seen people making a plush tetrapod and a tetrapod
lamp. So I thought it might be fun to model one.
I found a nice way to describe tetrapods that relies on very few
arbitrary aesthetic choices.
Recently, Chris “cks” Siebenmann has been working on
ratelimiting HTTP bots that are hammering his blog. His articles
prompted me to write some clarifications, plus a few practical
anecdotes about ratelimiting email.
Although it looks really good, I have not yet tried the Jujutsu (jj)
version control system, mainly
because it’s not yet clearly superior to Magit.
But I have been following jj discussions with great interest.
One of the things that jj has not yet tackled is how to do better than
git refs / branches / tags. As I underestand it, jj currently has
something like Mercurial bookmarks, which are more like raw git ref
plumbing than a high-level porcelain feature. In particular, jj lacks
signed or annotated tags, and it doesn’t have branch names that
always automatically refer to the tip.
This is clearly a temporary state of affairs because jj is still
incomplete and under development and these gaps are going to be
filled. But the discussions have led me to think about how git’s
branches are unsatisfactory, and what could be done to improve them.
What does it mean when someone writes that a programming language is
“strongly typed”?
I’ve known for many years that “strongly typed” is a poorly-defined
term. Recently I was prompted on Lobsters to
explain why it’s hard to understand what someone means when they use
the phrase.
Previously, I wrote some sketchy ideas for what I call a p-fast
trie, which is basically a wide fan-out variant of an x-fast
trie. It allows you to find the longest matching prefix or nearest
predecessor or successor of a query string in a set of names in
O(log k) cache misses, where k is the key length.
My initial sketch was more complicated and greedy for space than
necessary, so here’s a simplified revision.
Here’s a sketch of an idea that might or might not be a good idea.
Dunno if it’s similar to something already described in the literature
– if you know of something, please let me know via the links in the
footer!
The gist is to throw away the tree and interior pointers from a
qp-trie. Instead, the p-fast trie is stored using a hash map organized
into stratified levels, where each level corresponds to a prefix of
the key.
Exact-match lookups are normal O(1) hash map lookups. Predecessor /
successor searches use binary chop on the length of the key. Where a
qp-trie search is O(k), where k is the length of the key, a p-fast
trie search is O(log k).
This smaller O(log k) bound is why I call it a “p-fast trie” by
analogy with the x-fast trie, which has O(log log N) query time. (The
“p” is for popcount.) I’m not sure if this asymptotic improvement is
likely to be effective in practice; see my thoughts towards the end of
this note.
At the time I had previously been responsible for Cambridge
University’s email anti-spam system for about 10 years, and in 2014 I
had been given responsibility for Cambridge University’s DNS. So I
knew how Let’s Encrypt should validate mail domains.
Let’s Encrypt was about one year old. Unusually, the code
that runs their operations, Boulder, is free software and open to
external contributors.
Boulder is written in Golang, and I had not previously written any
code in Golang. But its reputation is to be easy to get to grips with.
So, in principle, the bug was straightforward for me to fix. How
difficult would it be as a Golang newbie? And what would Let’s
Encrypt’s contribution process be like?
I cloned the Boulder repository and had a look around the code.
As is pretty typical, there are a couple of stages to fixing a bug in
an unfamiliar codebase:
work out where the problem is
try to understand if the obvious fix could be better
In this case, I remember discovering a relatively substantial TODO
item that intersected with the bug. I can’t remember the details, but
I think there were wider issues with DNS lookups in Boulder. I decided
it made sense to fix the immediate problem without getting involved in
things that would require discussion with Let’s Encrypt staff.
I faffed around with the code and pushed something that looked like it
might work.
A fun thing about this hack is that I never got a working Boulder test
setup on my workstation (or even Golang, I think!) – I just relied on
the Let’s Encrypt cloud test setup. The feedback time was very slow,
but it was tolerable for a simple one-off change.
I thought Golang (at least as it was used in the Boulder codebase) was
as easy to get to grips with as promised. I did not touch it again
until several years later, because there was no need to, but it seemed
fine.
I was very impressed by the Let’s Encrypt continuous integration and
automated testing setup, and by their low-friction workflow for
external contributors. One of my fastest drive-by patches to get into
worldwide production.
My fix was always going to be temporary, and all trace of it was
overwritten years ago. It’s good when “temporary” turns out to be
true!
I was reminded of this story in the pub this evening, and I thought it
was worth writing down. It demonstrated to me that Let’s Encrypt
really were doing all the good stuff they said they were doing.
So thank you to Let’s Encrypt for providing an exemplary service and
for giving me a happy little anecdote.
A couple of years ago I wrote about random floating point
numbers. In that article I was mainly concerned about how
neat the code is, and I didn’t pay attention to its performance.
In hot weather I like to drink my coffee in an iced latte. To make it,
I have a very large Bialetti Moka Express. Recently when I
got it going again after a winter of disuse, it took me a couple of
attempts to get the technique right, so here are some notes as a
reminder to my future self next year.
About half a year ago I encountered a paper bombastically
titled “the ultimate conditional syntax”. It has the
attractive goal of unifying pattern match with boolean if tests,
and its solution is in some ways very nice. But it seems
over-complicated to me, especially for something that’s a basic
work-horse of programming.
I couldn’t immediately see how to cut it down to manageable
proportions, but after reading syntactic musings on match expressions
by Yoshua Wuyts a couple of weeks ago, I had an idea. I’ll
outline it under the “penultimate conditionals” heading below, after
reviewing the UCS and explaining my motivation.
Last year I wrote about inlining just the fast path of Lemire’s
algorithm for nearly-divisionless unbiased bounded random
numbers. The idea was to reduce code bloat by eliminating lots
of copies of the random number generator in the rarely-executed slow
paths. However a simple split prevented the compiler from being able
to optimize cases like pcg32_rand(1 << n), so a lot of the blog post
was toying around with ways to mitigate this problem.
On Monday while procrastinating a different blog post, I realised that
it’s possible to do better: there’s a more general optimization which
gives us the 1 << n special case for free.
use multiplication instead of division to reduce the output of a
random number generator modulo some limit
eliminate the bias in (1) by (counterintuitively) looking at the
lower digits
fun modular arithmetic to calculate the reject threshold for (2)
arrange the reject tests to avoid the slow division in (3) in most cases
The nearly-divisionless logic in (4) leads to two copies of the random
number generator, in the fast path and the slow path.
Generally speaking, compilers don’t try do deduplicate code that was
written by the programmer, so they can’t simplify the
nearly-divisionless algorithm very much when the limit is constant.
This has only one call to pcg32_random(), saving space as I wanted,
and the compiler is able to eliminate the loop automatically when the
limit is a power of two. The loop is smaller than a call to an
out-of-line slow path function, so it’s better all round than the code
I wrote last year.
As before it’s possible to automatically choose the
constantly-divisionless or nearly-divisionless algorithms depending on
whether the limit is a compile-time constant or run-time variable,
using arcane C tricks or GNU C
__builtin_constant_p().
I have been idly wondering how to do something similar in other languages.
Rust isn’t very keen on automatic specialization, but it has a
reasonable alternative. The thing to avoid is passing a runtime
variable to the constantly-divisionless algorithm, because then it
becomes never-divisionless. Rust has a much richer notion of
compile-time constants than C, so it’s possible to write a method
like the follwing, which can’t be misused:
(embiggening_mul is my stable replacement for the unstable widening_mul API.)
This is a nugatory optimization, but there are more interesting cases
where it makes sense to choose a different implementation for constant
or variable arguments – that it, the constant case isn’t simply a
constant-folded or partially-evaluated version of the variable case.
Regular expressions might be lex-style or pcre-style, for example.
It’s a curious question of language design whether it should be
possible to write a library that provides a uniform API that
automatically chooses constant or variable implementations, or whether
the user of the library must make the choice explicit.
Maybe I should learn some Zig to see how its comptime works.
However, PCG only has a limited amount of ILP, about 3 instructions.
Its overall speed is limited by the rate at which a CPU can run a
sequence where the output of one multiply-add feeds into the next
multiply-add.
… Or is it?
With some linear algebra and some AVX512, I can generate random
numbers from a single instance of pcg32 at 200 Gbit/s on a single
core. This is the same sequence of random numbers generated in the
same order as normal pcg32, but more than 4x faster.
The International Obfuscated C Code Contest has a newly
revamped web site, and the Judges have announced the 28th contest, to
coincide with its 40th anniversary. (Or 41st?)
The Judges have also updated the archive of past winners so
that as many of them as possible work on modern systems. Accordingly,
I took a look at my 1998 winner to see how much damage time
hath wrought.
D’oh, I lost track of a bug report that should have been fixed in
nsnotifyd-2.2. Thus, hot on the heels of the previous release,
here’s nsnotifyd-2.3. Sorry for causing extra work to
my uncountably many users!
The nsnotifyd daemon monitors a set of DNS zones and runs a command
when any of them change. It listens for DNS NOTIFY messages so it can
respond to changes promptly. It also uses each zone’s SOA refresh and
retry parameters to poll for updates if nsnotifyd does not receive
NOTIFY messages more frequently. It comes with a client program
nsnotify for sending notify messages.
When nsnotifyd receives a SIGINT or SIGTERM while running the
command, it failed to handle it correctly. Now it exits promptly.
Many thanks to Athanasius for reporting the bug!
Miscellaneous minor code cleanup and compiler warning suppression.
Thanks also to Dan Langille who sent me a lovely appreciation:
Now that I think of it, nsnotifyd is in my favorite group of
software. That group is software I forget I’m running, because they
just run and do the work. For years. I haven’t touched, modified, or
configured nsnotifyd and it just keeps doing the job.
I have made
a new release of nsnotifyd,
a tiny DNS server that just listens for NOTIFY messages
and runs a script when one of your zones changes.