What does it mean when someone writes that a programming language is
“strongly typed”?
I’ve known for many years that “strongly typed” is a poorly-defined
term. Recently I was prompted on Lobsters to
explain why it’s hard to understand what someone means when they use
the phrase.
Previously, I wrote some sketchy ideas for what I call a p-fast
trie, which is basically a wide fan-out variant of an x-fast
trie. It allows you to find the longest matching prefix or nearest
predecessor or successor of a query string in a set of names in
O(log k) cache misses, where k is the key length.
My initial sketch was more complicated and greedy for space than
necessary, so here’s a simplified revision.
Here’s a sketch of an idea that might or might not be a good idea.
Dunno if it’s similar to something already described in the literature
– if you know of something, please let me know via the links in the
footer!
The gist is to throw away the tree and interior pointers from a
qp-trie. Instead, the p-fast trie is stored using a hash map organized
into stratified levels, where each level corresponds to a prefix of
the key.
Exact-match lookups are normal O(1) hash map lookups. Predecessor /
successor searches use binary chop on the length of the key. Where a
qp-trie search is O(k), where k is the length of the key, a p-fast
trie search is O(log k).
This smaller O(log k) bound is why I call it a “p-fast trie” by
analogy with the x-fast trie, which has O(log log N) query time. (The
“p” is for popcount.) I’m not sure if this asymptotic improvement is
likely to be effective in practice; see my thoughts towards the end of
this note.
At the time I had previously been responsible for Cambridge
University’s email anti-spam system for about 10 years, and in 2014 I
had been given responsibility for Cambridge University’s DNS. So I
knew how Let’s Encrypt should validate mail domains.
Let’s Encrypt was about one year old. Unusually, the code
that runs their operations, Boulder, is free software and open to
external contributors.
Boulder is written in Golang, and I had not previously written any
code in Golang. But its reputation is to be easy to get to grips with.
So, in principle, the bug was straightforward for me to fix. How
difficult would it be as a Golang newbie? And what would Let’s
Encrypt’s contribution process be like?
I cloned the Boulder repository and had a look around the code.
As is pretty typical, there are a couple of stages to fixing a bug in
an unfamiliar codebase:
work out where the problem is
try to understand if the obvious fix could be better
In this case, I remember discovering a relatively substantial TODO
item that intersected with the bug. I can’t remember the details, but
I think there were wider issues with DNS lookups in Boulder. I decided
it made sense to fix the immediate problem without getting involved in
things that would require discussion with Let’s Encrypt staff.
I faffed around with the code and pushed something that looked like it
might work.
A fun thing about this hack is that I never got a working Boulder test
setup on my workstation (or even Golang, I think!) – I just relied on
the Let’s Encrypt cloud test setup. The feedback time was very slow,
but it was tolerable for a simple one-off change.
I thought Golang (at least as it was used in the Boulder codebase) was
as easy to get to grips with as promised. I did not touch it again
until several years later, because there was no need to, but it seemed
fine.
I was very impressed by the Let’s Encrypt continuous integration and
automated testing setup, and by their low-friction workflow for
external contributors. One of my fastest drive-by patches to get into
worldwide production.
My fix was always going to be temporary, and all trace of it was
overwritten years ago. It’s good when “temporary” turns out to be
true!
I was reminded of this story in the pub this evening, and I thought it
was worth writing down. It demonstrated to me that Let’s Encrypt
really were doing all the good stuff they said they were doing.
So thank you to Let’s Encrypt for providing an exemplary service and
for giving me a happy little anecdote.
A couple of years ago I wrote about random floating point
numbers. In that article I was mainly concerned about how
neat the code is, and I didn’t pay attention to its performance.
In hot weather I like to drink my coffee in an iced latte. To make it,
I have a very large Bialetti Moka Express. Recently when I
got it going again after a winter of disuse, it took me a couple of
attempts to get the technique right, so here are some notes as a
reminder to my future self next year.
About half a year ago I encountered a paper bombastically
titled “the ultimate conditional syntax”. It has the
attractive goal of unifying pattern match with boolean if tests,
and its solution is in some ways very nice. But it seems
over-complicated to me, especially for something that’s a basic
work-horse of programming.
I couldn’t immediately see how to cut it down to manageable
proportions, but after reading syntactic musings on match expressions
by Yoshua Wuyts a couple of weeks ago, I had an idea. I’ll
outline it under the “penultimate conditionals” heading below, after
reviewing the UCS and explaining my motivation.
Last year I wrote about inlining just the fast path of Lemire’s
algorithm for nearly-divisionless unbiased bounded random
numbers. The idea was to reduce code bloat by eliminating lots
of copies of the random number generator in the rarely-executed slow
paths. However a simple split prevented the compiler from being able
to optimize cases like pcg32_rand(1 << n), so a lot of the blog post
was toying around with ways to mitigate this problem.
On Monday while procrastinating a different blog post, I realised that
it’s possible to do better: there’s a more general optimization which
gives us the 1 << n special case for free.
use multiplication instead of division to reduce the output of a
random number generator modulo some limit
eliminate the bias in (1) by (counterintuitively) looking at the
lower digits
fun modular arithmetic to calculate the reject threshold for (2)
arrange the reject tests to avoid the slow division in (3) in most cases
The nearly-divisionless logic in (4) leads to two copies of the random
number generator, in the fast path and the slow path.
Generally speaking, compilers don’t try do deduplicate code that was
written by the programmer, so they can’t simplify the
nearly-divisionless algorithm very much when the limit is constant.
This has only one call to pcg32_random(), saving space as I wanted,
and the compiler is able to eliminate the loop automatically when the
limit is a power of two. The loop is smaller than a call to an
out-of-line slow path function, so it’s better all round than the code
I wrote last year.
As before it’s possible to automatically choose the
constantly-divisionless or nearly-divisionless algorithms depending on
whether the limit is a compile-time constant or run-time variable,
using arcane C tricks or GNU C
__builtin_constant_p().
I have been idly wondering how to do something similar in other languages.
Rust isn’t very keen on automatic specialization, but it has a
reasonable alternative. The thing to avoid is passing a runtime
variable to the constantly-divisionless algorithm, because then it
becomes never-divisionless. Rust has a much richer notion of
compile-time constants than C, so it’s possible to write a method
like the follwing, which can’t be misused:
(embiggening_mul is my stable replacement for the unstable widening_mul API.)
This is a nugatory optimization, but there are more interesting cases
where it makes sense to choose a different implementation for constant
or variable arguments – that it, the constant case isn’t simply a
constant-folded or partially-evaluated version of the variable case.
Regular expressions might be lex-style or pcre-style, for example.
It’s a curious question of language design whether it should be
possible to write a library that provides a uniform API that
automatically chooses constant or variable implementations, or whether
the user of the library must make the choice explicit.
Maybe I should learn some Zig to see how its comptime works.
However, PCG only has a limited amount of ILP, about 3 instructions.
Its overall speed is limited by the rate at which a CPU can run a
sequence where the output of one multiply-add feeds into the next
multiply-add.
… Or is it?
With some linear algebra and some AVX512, I can generate random
numbers from a single instance of pcg32 at 200 Gbit/s on a single
core. This is the same sequence of random numbers generated in the
same order as normal pcg32, but more than 4x faster.
The International Obfuscated C Code Contest has a newly
revamped web site, and the Judges have announced the 28th contest, to
coincide with its 40th anniversary. (Or 41st?)
The Judges have also updated the archive of past winners so
that as many of them as possible work on modern systems. Accordingly,
I took a look at my 1998 winner to see how much damage time
hath wrought.
D’oh, I lost track of a bug report that should have been fixed in
nsnotifyd-2.2. Thus, hot on the heels of the previous release,
here’s nsnotifyd-2.3. Sorry for causing extra work to
my uncountably many users!
The nsnotifyd daemon monitors a set of DNS zones and runs a command
when any of them change. It listens for DNS NOTIFY messages so it can
respond to changes promptly. It also uses each zone’s SOA refresh and
retry parameters to poll for updates if nsnotifyd does not receive
NOTIFY messages more frequently. It comes with a client program
nsnotify for sending notify messages.
When nsnotifyd receives a SIGINT or SIGTERM while running the
command, it failed to handle it correctly. Now it exits promptly.
Many thanks to Athanasius for reporting the bug!
Miscellaneous minor code cleanup and compiler warning suppression.
Thanks also to Dan Langille who sent me a lovely appreciation:
Now that I think of it, nsnotifyd is in my favorite group of
software. That group is software I forget I’m running, because they
just run and do the work. For years. I haven’t touched, modified, or
configured nsnotifyd and it just keeps doing the job.
I have made
a new release of nsnotifyd,
a tiny DNS server that just listens for NOTIFY messages
and runs a script when one of your zones changes.
I have long been a fan of petnames, and graph naming systems in
general. I first learned about them in the context of Mark Miller’s
E programming language which was a distributed
object capability system on the JVM. I gather Spritely are working on
something similar, but with a more webby lisp/scheme flavour – and in
fact Mark Miller is the second author on this article.
An interesting thing that the article sort of hints at is that these
kinds of systems have a fairly smooth range of operating points from
decentralized petnames all the way to centralized globally unique
names. Zooko’s triangle is actually more like a fan when the “human
friendly” point is fixed: there’s an arc describing the tradeoffs,
from personal through local to global.
As a first step away from petnames, I like to use the term “nickname”
for nearby instances of what they call “edge names” in the article: a
nickname is not as personal as a petname, it’s a name you share with
others.
A bit further along the arc is the article’s example of the “bizdir”
local business directory. It’s a trusted (and hopefully trustworthy)
naming authority, but in a more local / federated fashion rather than
centralized.
How can a petname system function at the global+centralized point, so
it could replace the DNS? It needs to pass the “billboard test”: I
type in a name I see on a billboard and I get to the right place. (It
might be a multipart name like a postal address or DNS name, with an
extra “edge name” or two to provide enough disambiguating context.)
I imagine that an operating system supplier might provide a few
preconfigured petnames (it probably includes its own petname so the
software can update itself securely), a lot like its preconfigured
PKIX CA certificates. These petnames would refer to orgs like the
“bizdir”, or Verisign, or Nominet, that act as nickname registries.
Your collection of petnames is in effect your personal root zone, and
the preconfigured petnames are in effect the default TLDs.
When I was thinking about how a decentralized graph naming system
could be made user-friendly and able to pass the billboard test I
realised that there are organizational structures that are hard to
avoid. I expect there would inevitably be something like the
CA/Browser forum to mediate between OS suppliers and nickname
registries: a petname ICANN.
I wrote an older iteration of these ideas over a decade
ago. Those notes suffer from too many DNS brainworms, but looking
back, there’s some curious anti-DNS discussion. How might it be useful
to be able to reach a name by multiple paths? Could you use that for
attestation? What might it look like to have a shared context for
names that is not global but is national or regional or local?
The other day I learned about the Rust crate
lexopt which describes itself as,
A pathologically simple command line argument parser.
Most argument parsers are declarative: you tell them what to parse,
and they do it. This one provides you with a stream of options and
values and lets you figure out the rest.
But the description of lexopt made me think getopt() could be
simpler. The insight is that the string of options that you have to
pass to getopt() is redundant with respect to the code that deals
with the return values from getopt(). What if you just get rid of
the options string?
I thought I would try it. Turns out, not much is lost in getting rid
of the options string, and a few things are gained.
My new code is half the size of getopt(), and has more
functionality. I’m going to show how how this was done, because it’s
short (ish), not because it is interesting. Then I’ll try to tease out
a lesson or two.
I commented on Lobsters that /tmp is usually a bad idea,
which caused some surprise. I suppose /tmp security bugs were common
in the 1990s when I was learning Unix, but they are pretty rare now so
I can see why less grizzled hackers might not be familiar with the
problems.
I guess that’s some kind of success, but sadly the fixes have left
behind a lot of scar tissue because they didn’t address the underlying
problem: /tmp should not exist.
It’s a bad idea because it’s shared global mutable state that
crosses security boundaries. There’s a ton of complexity at all
levels of unix (filesystems, kernel APIs, libc, shell, admin
scripts) that only exists as a workaround for the dangers caused by
making /tmp shared.
A couple of notable things have happened in recent months:
There is a new edition of POSIX for 2024. There’s lots of
good stuff in it, but today I am writing about getentropy()
which is the first officially standardized POSIX API for getting
cryptographically secure random numbers.
On Linux the getentropy(3) function is based on the
getrandom(2) system call. In Linux 6.11 there is
a new vDSO call, vgetrandom(), that makes it possible
to implement getentropy() entirely in userland, which should
make it significantly faster.
UUID v4 and v7 are great examples of the need for high
performance secure random numbers: you don’t want the performance of
your database inserts to be limited by your random number generator!
Another example is DNS source port and query ID randomization which
help protect DNS resolvers against forged answers.
I was inspired to play with getentropy() by a blog post about getting
a few secure random bytes in PostgreSQL without pgcrypto:
it struck me that PostgreSQL doesn’t use getentropy(), and I thought
it might be fun (and possibly even useful!) to add support for it.