Tony Finch – blog

Concrete tetrapods are used to dissipate wave energy in coastal defences.

There’s a bit of a craze for making tetrapod-shaped things: recently I’ve seen people making a plush tetrapod and a tetrapod lamp. So I thought it might be fun to model one.

I found a nice way to describe tetrapods that relies on very few arbitrary aesthetic choices.

Click here to play with an animated tetrapod which I made using three.js. (You can see its source code too.)

read more ...

Last year I wrote a pair of articles about ratelimiting:

Recently, Chris “cks” Siebenmann has been working on ratelimiting HTTP bots that are hammering his blog. His articles prompted me to write some clarifications, plus a few practical anecdotes about ratelimiting email.

read more ...

Although it looks really good, I have not yet tried the Jujutsu (jj) version control system, mainly because it’s not yet clearly superior to Magit. But I have been following jj discussions with great interest.

One of the things that jj has not yet tackled is how to do better than git refs / branches / tags. As I underestand it, jj currently has something like Mercurial bookmarks, which are more like raw git ref plumbing than a high-level porcelain feature. In particular, jj lacks signed or annotated tags, and it doesn’t have branch names that always automatically refer to the tip.

This is clearly a temporary state of affairs because jj is still incomplete and under development and these gaps are going to be filled. But the discussions have led me to think about how git’s branches are unsatisfactory, and what could be done to improve them.

read more ...

What does it mean when someone writes that a programming language is “strongly typed”?

I’ve known for many years that “strongly typed” is a poorly-defined term. Recently I was prompted on Lobsters to explain why it’s hard to understand what someone means when they use the phrase.

I came up with more than five meanings!

read more ...

Previously, I wrote some sketchy ideas for what I call a p-fast trie, which is basically a wide fan-out variant of an x-fast trie. It allows you to find the longest matching prefix or nearest predecessor or successor of a query string in a set of names in O(log k) cache misses, where k is the key length.

My initial sketch was more complicated and greedy for space than necessary, so here’s a simplified revision.

read more ...

Here’s a sketch of an idea that might or might not be a good idea. Dunno if it’s similar to something already described in the literature – if you know of something, please let me know via the links in the footer!

The gist is to throw away the tree and interior pointers from a qp-trie. Instead, the p-fast trie is stored using a hash map organized into stratified levels, where each level corresponds to a prefix of the key.

Exact-match lookups are normal O(1) hash map lookups. Predecessor / successor searches use binary chop on the length of the key. Where a qp-trie search is O(k), where k is the length of the key, a p-fast trie search is O(log k).

This smaller O(log k) bound is why I call it a “p-fast trie” by analogy with the x-fast trie, which has O(log log N) query time. (The “p” is for popcount.) I’m not sure if this asymptotic improvement is likely to be effective in practice; see my thoughts towards the end of this note.

read more ...

Here are a few tangentially-related ideas vaguely near the theme of comparison operators.

read more ...

Here’s a story from nearly 10 years ago.

the bug

I think it was my friend Richard Kettlewell who told me about a bug he encountered with Let’s Encrypt in its early days in autumn 2015: it was failing to validate mail domains correctly.

the context

At the time I had previously been responsible for Cambridge University’s email anti-spam system for about 10 years, and in 2014 I had been given responsibility for Cambridge University’s DNS. So I knew how Let’s Encrypt should validate mail domains.

Let’s Encrypt was about one year old. Unusually, the code that runs their operations, Boulder, is free software and open to external contributors.

Boulder is written in Golang, and I had not previously written any code in Golang. But its reputation is to be easy to get to grips with.

So, in principle, the bug was straightforward for me to fix. How difficult would it be as a Golang newbie? And what would Let’s Encrypt’s contribution process be like?

the hack

I cloned the Boulder repository and had a look around the code.

As is pretty typical, there are a couple of stages to fixing a bug in an unfamiliar codebase:

work out where the problem is
try to understand if the obvious fix could be better

In this case, I remember discovering a relatively substantial TODO item that intersected with the bug. I can’t remember the details, but I think there were wider issues with DNS lookups in Boulder. I decided it made sense to fix the immediate problem without getting involved in things that would require discussion with Let’s Encrypt staff.

I faffed around with the code and pushed something that looked like it might work.

A fun thing about this hack is that I never got a working Boulder test setup on my workstation (or even Golang, I think!) – I just relied on the Let’s Encrypt cloud test setup. The feedback time was very slow, but it was tolerable for a simple one-off change.

the fix

My pull request was small, +48-14.

After a couple of rounds of review and within a few days, it was merged and put into production!

A pleasing result.

the upshot

I thought Golang (at least as it was used in the Boulder codebase) was as easy to get to grips with as promised. I did not touch it again until several years later, because there was no need to, but it seemed fine.

I was very impressed by the Let’s Encrypt continuous integration and automated testing setup, and by their low-friction workflow for external contributors. One of my fastest drive-by patches to get into worldwide production.

My fix was always going to be temporary, and all trace of it was overwritten years ago. It’s good when “temporary” turns out to be true!

the point

I was reminded of this story in the pub this evening, and I thought it was worth writing down. It demonstrated to me that Let’s Encrypt really were doing all the good stuff they said they were doing.

So thank you to Let’s Encrypt for providing an exemplary service and for giving me a happy little anecdote.

A couple of years ago I wrote about random floating point numbers. In that article I was mainly concerned about how neat the code is, and I didn’t pay attention to its performance.

Recently, a comment from Oliver Hunt and a blog post from Alisa Sireneva prompted me to wonder if I made an unwarranted assumption. So I wrote a little benchmark, which you can find in pcg-dxsm.git.

(Note 2025-06-09: I’ve edited this post substantially after discovering some problems with the results.)

read more ...

In hot weather I like to drink my coffee in an iced latte. To make it, I have a very large Bialetti Moka Express. Recently when I got it going again after a winter of disuse, it took me a couple of attempts to get the technique right, so here are some notes as a reminder to my future self next year.

read more ...

TIL (or this week-ish I learned) why big-sigma and big-pi turn up in the notation of dependent type theory.

read more ...

About half a year ago I encountered a paper bombastically titled “the ultimate conditional syntax”. It has the attractive goal of unifying pattern match with boolean if tests, and its solution is in some ways very nice. But it seems over-complicated to me, especially for something that’s a basic work-horse of programming.

I couldn’t immediately see how to cut it down to manageable proportions, but after reading syntactic musings on match expressions by Yoshua Wuyts a couple of weeks ago, I had an idea. I’ll outline it under the “penultimate conditionals” heading below, after reviewing the UCS and explaining my motivation.

read more ...

Recently, Alex Kladov wrote on the TigerBeetle blog about swarm testing data structures. It’s a neat post about randomized testing with Zig. I wrote a comment with an idea that was new to Alex @matklad, so I’m reposing a longer version here.

read more ...

I have added syntax highlighting to my blog using tree-sitter. Here are some notes about what I learned, with some complaining.

read more ...

Last year I wrote about inlining just the fast path of Lemire’s algorithm for nearly-divisionless unbiased bounded random numbers. The idea was to reduce code bloat by eliminating lots of copies of the random number generator in the rarely-executed slow paths. However a simple split prevented the compiler from being able to optimize cases like pcg32_rand(1 << n), so a lot of the blog post was toying around with ways to mitigate this problem.

On Monday while procrastinating a different blog post, I realised that it’s possible to do better: there’s a more general optimization which gives us the 1 << n special case for free.

nearly divisionless

Lemire’s algorithm has about 4 neat tricks:

use multiplication instead of division to reduce the output of a random number generator modulo some limit
eliminate the bias in (1) by (counterintuitively) looking at the lower digits
fun modular arithmetic to calculate the reject threshold for (2)
arrange the reject tests to avoid the slow division in (3) in most cases

The nearly-divisionless logic in (4) leads to two copies of the random number generator, in the fast path and the slow path.

Generally speaking, compilers don’t try do deduplicate code that was written by the programmer, so they can’t simplify the nearly-divisionless algorithm very much when the limit is constant.

constantly divisionless

Two points occurred to me:

when the limit is constant, the reject threshold (3) can be calculated at compile time
when the division is free, there’s no need to avoid it using (4)

These observations suggested that when the limit is constant, the function for random numbers less than a limit should be written:

    static inline uint32_t
    pcg32_rand_const(pcg32_t *rng, uint32_t limit) {
        uint32_t reject = -limit % limit;
        uint64_t sample;
        do sample = (uint64_t)pcg32_random(rng) * (uint64_t)limit;
        while ((uint32_t)(sample) < reject);
        return ((uint32_t)(sample >> 32));
    }

This has only one call to pcg32_random(), saving space as I wanted, and the compiler is able to eliminate the loop automatically when the limit is a power of two. The loop is smaller than a call to an out-of-line slow path function, so it’s better all round than the code I wrote last year.

algorithm selection

As before it’s possible to automatically choose the constantly-divisionless or nearly-divisionless algorithms depending on whether the limit is a compile-time constant or run-time variable, using arcane C tricks or GNU C __builtin_constant_p().

I have been idly wondering how to do something similar in other languages.

Rust isn’t very keen on automatic specialization, but it has a reasonable alternative. The thing to avoid is passing a runtime variable to the constantly-divisionless algorithm, because then it becomes never-divisionless. Rust has a much richer notion of compile-time constants than C, so it’s possible to write a method like the follwing, which can’t be misused:

    pub fn upto<const LIMIT: u32>(&mut self) -> u32 {
        let reject = LIMIT.wrapping_neg().wrapping_rem(LIMIT);
        loop {
            let (lo, hi) = self.get_u32().embiggening_mul(LIMIT);
            if lo < reject {
                continue;
            } else {
                return hi;
            }
        }
    }

    assert!(rng.upto::<42>() < 42);

(embiggening_mul is my stable replacement for the unstable widening_mul API.)

This is a nugatory optimization, but there are more interesting cases where it makes sense to choose a different implementation for constant or variable arguments – that it, the constant case isn’t simply a constant-folded or partially-evaluated version of the variable case. Regular expressions might be lex-style or pcre-style, for example. It’s a curious question of language design whether it should be possible to write a library that provides a uniform API that automatically chooses constant or variable implementations, or whether the user of the library must make the choice explicit.

Maybe I should learn some Zig to see how its comptime works.

One of the neat things about the PCG random number generator by Melissa O’Neill is its use of instruction-level parallelism: the PCG state update can run in parallel with its output permutation.

However, PCG only has a limited amount of ILP, about 3 instructions. Its overall speed is limited by the rate at which a CPU can run a sequence where the output of one multiply-add feeds into the next multiply-add.

… Or is it?

With some linear algebra and some AVX512, I can generate random numbers from a single instance of pcg32 at 200 Gbit/s on a single core. This is the same sequence of random numbers generated in the same order as normal pcg32, but more than 4x faster.

You can look at the benchmark in my pcg-dxsm repository.

read more ...

The International Obfuscated C Code Contest has a newly revamped web site, and the Judges have announced the 28th contest, to coincide with its 40th anniversary. (Or 41st?)

The Judges have also updated the archive of past winners so that as many of them as possible work on modern systems. Accordingly, I took a look at my 1998 winner to see how much damage time hath wrought.

read more ...

D’oh, I lost track of a bug report that should have been fixed in nsnotifyd-2.2. Thus, hot on the heels of the previous release, here’s nsnotifyd-2.3. Sorry for causing extra work to my uncountably many users!

The nsnotifyd daemon monitors a set of DNS zones and runs a command when any of them change. It listens for DNS NOTIFY messages so it can respond to changes promptly. It also uses each zone’s SOA refresh and retry parameters to poll for updates if nsnotifyd does not receive NOTIFY messages more frequently. It comes with a client program nsnotify for sending notify messages.

This nsnotifyd-2.3 release includes some bug fixes:

When nsnotifyd receives a SIGINT or SIGTERM while running the command, it failed to handle it correctly. Now it exits promptly.

Many thanks to Athanasius for reporting the bug!
Miscellaneous minor code cleanup and compiler warning suppression.

Thanks also to Dan Langille who sent me a lovely appreciation:

Now that I think of it, nsnotifyd is in my favorite group of software. That group is software I forget I’m running, because they just run and do the work. For years. I haven’t touched, modified, or configured nsnotifyd and it just keeps doing the job.

I have made a new release of nsnotifyd, a tiny DNS server that just listens for NOTIFY messages and runs a script when one of your zones changes.

This nsnotifyd-2.2 release includes a new feature:

nsnotify can now send NOTIFY messages from a specific source address

Thanks to Adam Augustine for the suggestion. I like receiving messages that say things like,

Thanks for making this useful tool available for free.

Recently the Spritely Institute published an introduction to Petnames, A humane approach to secure, decentralized naming.

I have long been a fan of petnames, and graph naming systems in general. I first learned about them in the context of Mark Miller’s E programming language which was a distributed object capability system on the JVM. I gather Spritely are working on something similar, but with a more webby lisp/scheme flavour – and in fact Mark Miller is the second author on this article.

An interesting thing that the article sort of hints at is that these kinds of systems have a fairly smooth range of operating points from decentralized petnames all the way to centralized globally unique names. Zooko’s triangle is actually more like a fan when the “human friendly” point is fixed: there’s an arc describing the tradeoffs, from personal through local to global.

As a first step away from petnames, I like to use the term “nickname” for nearby instances of what they call “edge names” in the article: a nickname is not as personal as a petname, it’s a name you share with others.

A bit further along the arc is the article’s example of the “bizdir” local business directory. It’s a trusted (and hopefully trustworthy) naming authority, but in a more local / federated fashion rather than centralized.

How can a petname system function at the global+centralized point, so it could replace the DNS? It needs to pass the “billboard test”: I type in a name I see on a billboard and I get to the right place. (It might be a multipart name like a postal address or DNS name, with an extra “edge name” or two to provide enough disambiguating context.)

I imagine that an operating system supplier might provide a few preconfigured petnames (it probably includes its own petname so the software can update itself securely), a lot like its preconfigured PKIX CA certificates. These petnames would refer to orgs like the “bizdir”, or Verisign, or Nominet, that act as nickname registries. Your collection of petnames is in effect your personal root zone, and the preconfigured petnames are in effect the default TLDs.

When I was thinking about how a decentralized graph naming system could be made user-friendly and able to pass the billboard test I realised that there are organizational structures that are hard to avoid. I expect there would inevitably be something like the CA/Browser forum to mediate between OS suppliers and nickname registries: a petname ICANN.

I wrote an older iteration of these ideas over a decade ago. Those notes suffer from too many DNS brainworms, but looking back, there’s some curious anti-DNS discussion. How might it be useful to be able to reach a name by multiple paths? Could you use that for attestation? What might it look like to have a shared context for names that is not global but is national or regional or local?