D’oh, I lost track of a bug report that should have been fixed in
nsnotifyd-2.2. Thus, hot on the heels of the previous release,
here’s nsnotifyd-2.3. Sorry for causing extra work to
my uncountably many users!
The nsnotifyd daemon monitors a set of DNS zones and runs a command
when any of them change. It listens for DNS NOTIFY messages so it can
respond to changes promptly. It also uses each zone’s SOA refresh and
retry parameters to poll for updates if nsnotifyd does not receive
NOTIFY messages more frequently. It comes with a client program
nsnotify for sending notify messages.
When nsnotifyd receives a SIGINT or SIGTERM while running the
command, it failed to handle it correctly. Now it exits promptly.
Many thanks to Athanasius for reporting the bug!
Miscellaneous minor code cleanup and compiler warning suppression.
Thanks also to Dan Langille who sent me a lovely appreciation:
Now that I think of it, nsnotifyd is in my favorite group of
software. That group is software I forget I’m running, because they
just run and do the work. For years. I haven’t touched, modified, or
configured nsnotifyd and it just keeps doing the job.
I have made
a new release of nsnotifyd,
a tiny DNS server that just listens for NOTIFY messages
and runs a script when one of your zones changes.
I have long been a fan of petnames, and graph naming systems in
general. I first learned about them in the context of Mark Miller’s
E programming language which was a distributed
object capability system on the JVM. I gather Spritely are working on
something similar, but with a more webby lisp/scheme flavour – and in
fact Mark Miller is the second author on this article.
An interesting thing that the article sort of hints at is that these
kinds of systems have a fairly smooth range of operating points from
decentralized petnames all the way to centralized globally unique
names. Zooko’s triangle is actually more like a fan when the “human
friendly” point is fixed: there’s an arc describing the tradeoffs,
from personal through local to global.
As a first step away from petnames, I like to use the term “nickname”
for nearby instances of what they call “edge names” in the article: a
nickname is not as personal as a petname, it’s a name you share with
others.
A bit further along the arc is the article’s example of the “bizdir”
local business directory. It’s a trusted (and hopefully trustworthy)
naming authority, but in a more local / federated fashion rather than
centralized.
How can a petname system function at the global+centralized point, so
it could replace the DNS? It needs to pass the “billboard test”: I
type in a name I see on a billboard and I get to the right place. (It
might be a multipart name like a postal address or DNS name, with an
extra “edge name” or two to provide enough disambiguating context.)
I imagine that an operating system supplier might provide a few
preconfigured petnames (it probably includes its own petname so the
software can update itself securely), a lot like its preconfigured
PKIX CA certificates. These petnames would refer to orgs like the
“bizdir”, or Verisign, or Nominet, that act as nickname registries.
Your collection of petnames is in effect your personal root zone, and
the preconfigured petnames are in effect the default TLDs.
When I was thinking about how a decentralized graph naming system
could be made user-friendly and able to pass the billboard test I
realised that there are organizational structures that are hard to
avoid. I expect there would inevitably be something like the
CA/Browser forum to mediate between OS suppliers and nickname
registries: a petname ICANN.
I wrote an older iteration of these ideas over a decade
ago. Those notes suffer from too many DNS brainworms, but looking
back, there’s some curious anti-DNS discussion. How might it be useful
to be able to reach a name by multiple paths? Could you use that for
attestation? What might it look like to have a shared context for
names that is not global but is national or regional or local?
The other day I learned about the Rust crate
lexopt which describes itself as,
A pathologically simple command line argument parser.
Most argument parsers are declarative: you tell them what to parse,
and they do it. This one provides you with a stream of options and
values and lets you figure out the rest.
But the description of lexopt made me think getopt() could be
simpler. The insight is that the string of options that you have to
pass to getopt() is redundant with respect to the code that deals
with the return values from getopt(). What if you just get rid of
the options string?
I thought I would try it. Turns out, not much is lost in getting rid
of the options string, and a few things are gained.
My new code is half the size of getopt(), and has more
functionality. I’m going to show how how this was done, because it’s
short (ish), not because it is interesting. Then I’ll try to tease out
a lesson or two.
I commented on Lobsters that /tmp is usually a bad idea,
which caused some surprise. I suppose /tmp security bugs were common
in the 1990s when I was learning Unix, but they are pretty rare now so
I can see why less grizzled hackers might not be familiar with the
problems.
I guess that’s some kind of success, but sadly the fixes have left
behind a lot of scar tissue because they didn’t address the underlying
problem: /tmp should not exist.
It’s a bad idea because it’s shared global mutable state that
crosses security boundaries. There’s a ton of complexity at all
levels of unix (filesystems, kernel APIs, libc, shell, admin
scripts) that only exists as a workaround for the dangers caused by
making /tmp shared.
A couple of notable things have happened in recent months:
There is a new edition of POSIX for 2024. There’s lots of
good stuff in it, but today I am writing about getentropy()
which is the first officially standardized POSIX API for getting
cryptographically secure random numbers.
On Linux the getentropy(3) function is based on the
getrandom(2) system call. In Linux 6.11 there is
a new vDSO call, vgetrandom(), that makes it possible
to implement getentropy() entirely in userland, which should
make it significantly faster.
UUID v4 and v7 are great examples of the need for high
performance secure random numbers: you don’t want the performance of
your database inserts to be limited by your random number generator!
Another example is DNS source port and query ID randomization which
help protect DNS resolvers against forged answers.
I was inspired to play with getentropy() by a blog post about getting
a few secure random bytes in PostgreSQL without pgcrypto:
it struck me that PostgreSQL doesn’t use getentropy(), and I thought
it might be fun (and possibly even useful!) to add support for it.
Algorithms like GCRA and leaky bucket don’t allow you to separate
these two points because they don’t measure the client’s rate as a
concrete number.
A moving average allows more flexible policy enforcement because the
rate measurement is meaningful even when you don’t apply
back-pressure. For example, it’s useful in a dry run mode, or when
diverting messages to a quarantine.
An exponential rate limiter stores, for each client:
last update time
average rate
This is a similar amount of space as leaky bucket. GCRA uses less
space because it only needs to store a time.
The main disadvantage is that an exponential rate limiter needs fairly
complicated floating point arithmetic.
GCRA is the “generic cell rate algorithm”, a rate-limiting algorithm
that came from ATM. GCRA does the same job as the better-known
leaky bucket algorithm, but using half the storage and with much less
code.
GCRA only stores one time stamp per sender, whereas leaky buckets need
to store a time stamp and a quota.
GCRA needs less arithmetic, and makes it trivial to tell a client how
long it should wait before the next request is permitted.
no, because the C abstract machine is a (large) finite state
machine,
or maybe yes, if you believe that unaddressable local variables
can exist outside the finite address space, and you can have an
unbounded number of them, i.e. no.
My answer is definitely yes, if you include the standard IO library.
And using IO is much closer to Turing’s original model of a finite
state machine working on unbounded peripheral storage.
I’m pleased that so many people enjoyed my previous blog post on
tolower() with AVX-512. Thanks for all the great comments and
discussion!
One aspect that needed more work was examining the performance for
small strings. The previous blog post had a graph for strings up to
about 1000 bytes long, mainly because it illustrated some curious
behaviour – but that distracted me from what I really care about,
which is strings up to about 100 bytes long, 10x smaller.
Eventually I managed to get some numbers that I think are plausible.
A couple of years ago I wrote about tolower() in bulk at speed using
SWAR tricks. A couple of days ago I was interested by Olivier
Giniaux’s article about unsafe read beyond of death, an
optimization for handling small strings with SIMD instructions, for a
fast hash function written in Rust.
I’ve long been annoyed that SIMD instructions can easily eat short
strings whole, but it’s irritatingly difficult to transfer short
strings between memory and vector registers. Oliver’s post caught my
eye because it seemed like a fun way to avoid the problem, at least
for loads. (Stores remain awkward!)
Semaphores are one of the oldest concurrency primitives in
computing, invented over 60 years ago. They are weird: usually the
only numbers of concurrent processes we care about are zero, one, or
many – but semaphores deal with those fussy finite numbers in
between.
Yesterday I was writing code that needed to control the number of
concurrent operating system processes that it spawned so that it
didn’t overwhelm the computer. One of those rare situations when a
semaphore is just the thing!
a Golang channel is a semaphore
A Golang channel has a buffer size – a number of free slots – which
corresponds to the initial value of the semaphore. We don’t care about
the values carried by the channel: any type will do.
var semaphore := make(chan any, MAXPROCS)
The acquire operation uses up a slot in the channel. It is
traditionally called P(), and described as decrementing the value of
the semaphore, i.e. decrementing the number of free slots in the
channel. When the channel is full this will block, waiting for another
goroutine to release the semaphore.
func acquire() {
semaphore <- nil
}
The release operation, traditionally called V(), frees a slot in
the channel, incrementing the value of the semaphore.
func release() {
<-semaphore
}
That’s it!
the GNU make jobserver protocol is a semaphore
The GNU make -j parallel builds feature uses a semaphore in the
form of its jobserver protocol. Occasionally, other programs
support the jobserver protocol too, such as Cargo. BSD make -j
uses basically the same semaphore implementation, but is not
compatible with the GNU make jobserver protocol.
The make jobserver semaphore works in a similar manner to a Golang
channel semaphore, but:
instead of a channel, make uses a unix pipe;
because make can’t control the amount of free space in a
pipe’s buffer, the value of the semaphore is represented as the
amount of used space in the pipe’s buffer;
the value of the semaphore can’t be more than PIPE_BUF, to
ensure that release() will never block.
Here’s a C-flavoured sketch of how it works. To create a semaphore and
initialize its value, create a pipe and write that many characters to
it, which are buffered in the kernel:
int fd[2];
pipe(fd);
char slots[MAXPROCS] = {0};
write(fd[1], slots, sizeof(slots));
To acquire a slot, read a character from the pipe. When the pipe is
empty this will block, waiting for another process to release the
semaphore.
char slot;
read(fd[0], &slot, 1);
To release a slot, the worker must write the same character back to
the pipe:
write(fd[1], &slot, 1);
Error handling is left as an exercise for the reader.
bonus: waiting for concurrent tasks to complete
If we need to wait for everything to finish, we don’t need any extra
machinery. We don’t even need to know how many tasks are still
running! It’s enough to acquire all possible slots, which will block
until the tasks have finished, then release all the slots again.
func wait() {
for range MAXPROCS {
acquire()
}
for range MAXPROCS {
release()
}
}
Lemire’s nearly-divisionless algorithm unbiased bounded random
numbers has a fast path and a slow path. In the
fast path it gets a random number, does a multiplication, and a
comparison. In the rarely-taken slow path, it calculates a remainder
(the division) and enters a rejection sampling loop.
When Lemire’s algorithm is coupled to a small random number generator
such as PCG, the fast path is just a handful of instructions. When
performance matters, it makes sense to inline it. It makes less sense
to inline the slow path, because that just makes it harder for the
compiler to work on the hot code.
Lemire’s algorithm is great when the limit is not a constant (such as
during a Fisher-Yates shuffle) or when the limit is not a power of
two. But when the limit is a constant power of two, it ought to be
possible to eliminate the slow path entirely.
more lenient handling of trailing . in domain names on the command line
avoid dividing by zero when the refresh interval is less than 10 seconds
do not error out when in TCP mode and debug mode and the refresh timer expires
explain how to fix incomplete github forks
Many thanks to Lars-Johann Liman, JP Mens, and Jonathan Hewlett for
the bug reports. I like receiving messages that say things like,
thanks for nsnotifyd, is a great little program, and a good
example of a linux program, does one thing well.
(There’s more like that in the nsnotifyd-2.0 release annoucement.)
I have also included a little dumpaxfr program, which I wrote when
fiddling around with binary wire format DNS zone transfers. I used the
nsnotifyd infrastructure as a short cut, though dumpaxfr doesn’t
logically belong here. But it’s part of the family, so I wrote a
dumpaxfr(1) man page and included it in this release.
I will be surprised if anyone else finds dumpaxfr useful!
Yesterday I received a bug report for regpg, my program that
safely stores server secrets encrypted with gpg so they can be
commited to a git repository.
The bug was that I used the classic shell pipeline find | xargs grep
with the classic Unix “who would want spaces in filenames?!” flaw.
I have pushed a new release, regpg-1.12, containing the bug fix.
There’s also a gendnskey subcommand which I used when doing my
algorithm rollovers a few years ago. (It’s been a long time since the
last regpg release!) It’s somewhat obsolete, now I know how to use
dnssec-policy.
A bunch of minor compatibility issues have crept in, which mostly
required fixing the tests to deal with changes in Ansible, OpenSSL,
and GnuPG.
My most distressing discovery was that Mac OS crypt(3)
still supports only DES. Good grief.
There are a couple of version control commands that deserve wider
appreciation: SCCS what and RCS ident. They allow
you to find out what source a binary was built from, without having to
run it – handy if it is a library! They basically scan a file looking
for magic strings that contain version control metadata and print out
what they discover.
Here are some miscellaneous unsorted notes about BIND9’s
dnssec-policy that turned out not to be useful in my previous blog
posts, but which some readers might find informative. Some of them I
learned the hard way, so I hope I can make it easier for others!
Here are some notes on migrating a signed zone from BIND’s old
auto-dnssec to its new dnssec-policy.
I have been procrastinating this migration for years, and I avoided
learning anything much about dnssec-policy until this month. I’m
writing this from the perspective of a DNS operator rather than a BIND
hacker.
As is typical for static site generators, each page on this web site
is generated from a file containing markdown with YAML frontmatter.
Neither markdown nor YAML are good. Markdown is very much the
worse-is-better of markup languages; YAML, on the other hand, is more
like better-is-worse. YAML has too many ways of expressing the same
things, and the lack of redundancy in its syntax makes it difficult to
detect mistakes before it is too late. YAML’s specification is
incomprehensible.
But they are both very convenient and popular, so I went with the flow.
The top-level value in a YAML document does not have to be an array or
object: you can use its wild zoo of string syntax too,
so for example,
--- |
here is a preformatted
multiline string
frontmatter and markdown
Putting these two features together, the right way to do YAML
frontmatter for markdown files is clearly,
---
frontmatter: goes here
...
--- |
markdown goes here
The page processor can simply:
feed the contents of the file to the YAML parser
use the first document for metadata
feed the second document to the markdown processor
check that’s the end of the file
No need for any ad-hoc hacks to separate the two parts of the file:
the YAML acts as a lightweight wrapper for the markdown.
markdown inside YAML
The crucial thing that makes this work is that the markdown after the
--- | delimiter does not need to be indented.
Markdown is very sensitive to indentation, so all the tooling (most
importantly my editor) gets righteously confused if markdown is placed
in a container that introduces extra indentation.
YAML in Perl
The static site generator for www.dns.cam.ac.uk
uses --- | to mark the start of the markdown in its source files.
This worked really nicely.
The web site was written in Perl, because most of the existing DNS
infrastructure was Perl and I didn’t want to change programming
languages. YAML was designed by Perl hackers, and the Perl YAML
modules are where it all went wrong started.
I soon discovered that, unlike the original YAML implementations,
serde-yaml requires top-level strings following --- | to be
indented. This bug seems to be common in YAML implementations for
languages other than Perl.
start and end delimiters
So I changed the syntax for my frontmatter so it looks like,
---
frontmatter: goes here
...
markdown goes here
That is, the file starts with a complete YAML document delimited by
--- start and ... end markers, and the rest of the file is the
markdown.
The idea is that a page processor should be able to:
feed the contents of the file to the YAML parser
read one document containing the metadata
feed the rest of the file to the markdown processor
However, I could not work out how to get serde-yaml to read just the
prefix of a file successfully and return the remainder for further
processing.
I know, I’ll use regexps
(Might as well, I’m already way past two problems…)
As a result I had to add a bodge to the page processor:
split the file using a regex
feed the first part to the YAML parser
feed the second part to the markdown processor
mainstream frontmatter
My choice to mark the end of the frontmatter with the YAML ... end
delimiter is not entirely mainstream. As I understand it, the YAML +
markdown convention came from Jekyll, or at least Jekyll
popularized it. Jekyll uses the YAML --- start delimiter to mark the
end of the YAML, or maybe to mark the start of the markdown, but
either way it doesn’t make sense.
Fortunately my ... bodge is compatible with Pandoc YAML
metadata, and Emacs markdown mode
supports Pandoc-style YAML metadata, so the road to hell is at least
reasonably well paved.
grump
It works, but it doesn’t make me happy. I suppose I deserve the
consequences of choosing technology with known deficiencies. But it
requires minimal effort, and is by and large good enough.