# Tony Finch – blog

Following my previous post on rate limiting with GCRA, leaky buckets without the buckets, I reviewed my old notes on rate limiting for Exim. I thought I should do a new write-up of the ideas that I hope will be more broadly interesting.

Exponential rate limiting uses an exponentially-weighted moving average to measure the client’s rate. It is motivated by a shift of perspective:

• first measure the client’s rate,
• then compare it to the limit.

Algorithms like GCRA and leaky bucket don’t allow you to separate these two points because they don’t measure the client’s rate as a concrete number.

A moving average allows more flexible policy enforcement because the rate measurement is meaningful even when you don’t apply back-pressure. For example, it’s useful in a dry run mode, or when diverting messages to a quarantine.

An exponential rate limiter stores, for each client:

• last update time
• average rate

This is a similar amount of space as leaky bucket. GCRA uses less space because it only needs to store a time.

The main disadvantage is that an exponential rate limiter needs fairly complicated floating point arithmetic.

Yesterday I read an article describing the GCRA rate limiting algorithm. I thought it was really interesting, but I wasn’t entirely satisfied with Brandur’s explanation, and the Wikipedia articles on leaky buckets and GCRA are terrible, so here’s my version.

## what is GCRA?

GCRA is the “generic cell rate algorithm”, a rate-limiting algorithm that came from ATM. GCRA does the same job as the better-known leaky bucket algorithm, but using half the storage and with much less code.

GCRA only stores one time stamp per sender, whereas leaky buckets need to store a time stamp and a quota.

GCRA needs less arithmetic, and makes it trivial to tell a client how long it should wait before the next request is permitted.

Yesterday there was some discussion on the Orange Site about whether or not C is Turing complete. The consensus in the StackOverflow question is,

• no, because the C abstract machine is a (large) finite state machine,

• or maybe yes, if you believe that unaddressable local variables can exist outside the finite address space, and you can have an unbounded number of them, i.e. no.

My answer is definitely yes, if you include the standard IO library.

And using IO is much closer to Turing’s original model of a finite state machine working on unbounded peripheral storage.

I’m pleased that so many people enjoyed my previous blog post on tolower() with AVX-512. Thanks for all the great comments and discussion!

One aspect that needed more work was examining the performance for small strings. The previous blog post had a graph for strings up to about 1000 bytes long, mainly because it illustrated some curious behaviour – but that distracted me from what I really care about, which is strings up to about 100 bytes long, 10x smaller.

Eventually I managed to get some numbers that I think are plausible.

A couple of years ago I wrote about tolower() in bulk at speed using SWAR tricks. A couple of days ago I was interested by Olivier Giniaux’s article about unsafe read beyond of death, an optimization for handling small strings with SIMD instructions, for a fast hash function written in Rust.

I’ve long been annoyed that SIMD instructions can easily eat short strings whole, but it’s irritatingly difficult to transfer short strings between memory and vector registers. Oliver’s post caught my eye because it seemed like a fun way to avoid the problem, at least for loads. (Stores remain awkward!)

Actually, to be frank, Olivier nerdsniped me.

Semaphores are one of the oldest concurrency primitives in computing, invented over 60 years ago. They are weird: usually the only numbers of concurrent processes we care about are zero, one, or many – but semaphores deal with those fussy finite numbers in between.

Yesterday I was writing code that needed to control the number of concurrent operating system processes that it spawned so that it didn’t overwhelm the computer. One of those rare situations when a semaphore is just the thing!

## a Golang channel is a semaphore

A Golang channel has a buffer size – a number of free slots – which corresponds to the initial value of the semaphore. We don’t care about the values carried by the channel: any type will do.

``````    var semaphore := make(chan any, MAXPROCS)
``````

The `acquire` operation uses up a slot in the channel. It is traditionally called `P()`, and described as decrementing the value of the semaphore, i.e. decrementing the number of free slots in the channel. When the channel is full this will block, waiting for another goroutine to release the semaphore.

``````    func acquire() {
semaphore <- nil
}
``````

The `release` operation, traditionally called `V()`, frees a slot in the channel, incrementing the value of the semaphore.

``````    func release() {
<-semaphore
}
``````

That’s it!

## the GNU make jobserver protocol is a semaphore

The GNU `make -j` parallel builds feature uses a semaphore in the form of its jobserver protocol. Occasionally, other programs support the jobserver protocol too, such as Cargo. BSD `make -j` uses basically the same semaphore implementation, but is not compatible with the GNU make jobserver protocol.

The `make` jobserver semaphore works in a similar manner to a Golang channel semaphore, but:

• instead of a channel, `make` uses a unix pipe;

• because `make` can’t control the amount of free space in a pipe’s buffer, the value of the semaphore is represented as the amount of used space in the pipe’s buffer;

• the value of the semaphore can’t be more than PIPE_BUF, to ensure that `release()` will never block.

Here’s a C-flavoured sketch of how it works. To create a semaphore and initialize its value, create a pipe and write that many characters to it, which are buffered in the kernel:

``````    int fd[2];
pipe(fd);

char slots[MAXPROCS] = {0};
write(fd[1], slots, sizeof(slots));
``````

To acquire a slot, read a character from the pipe. When the pipe is empty this will block, waiting for another process to release the semaphore.

``````    char slot;
``````

To release a slot, the worker must write the same character back to the pipe:

``````    write(fd[1], &slot, 1);
``````

Error handling is left as an exercise for the reader.

## bonus: waiting for concurrent tasks to complete

If we need to wait for everything to finish, we don’t need any extra machinery. We don’t even need to know how many tasks are still running! It’s enough to acquire all possible slots, which will block until the tasks have finished, then release all the slots again.

``````    func wait() {
for range MAXPROCS {
acquire()
}
for range MAXPROCS {
release()
}
}
``````

That’s all for today! Happy hacking :-)

a blog post for international RNG day

Lemire’s nearly-divisionless algorithm unbiased bounded random numbers has a fast path and a slow path. In the fast path it gets a random number, does a multiplication, and a comparison. In the rarely-taken slow path, it calculates a remainder (the division) and enters a rejection sampling loop.

When Lemire’s algorithm is coupled to a small random number generator such as PCG, the fast path is just a handful of instructions. When performance matters, it makes sense to inline it. It makes less sense to inline the slow path, because that just makes it harder for the compiler to work on the hot code.

Lemire’s algorithm is great when the limit is not a constant (such as during a Fisher-Yates shuffle) or when the limit is not a power of two. But when the limit is a constant power of two, it ought to be possible to eliminate the slow path entirely.

What are the options?

I have made a new release of `nsnotifyd`, a tiny DNS server that just listens for NOTIFY messages and runs a script when one of your zones changes.

This `nsnotifyd-2.1` release includes a few bugfixes:

• more lenient handling of trailing `.` in domain names on the command line
• avoid dividing by zero when the refresh interval is less than 10 seconds
• do not error out when in TCP mode and debug mode and the refresh timer expires
• explain how to fix incomplete github forks

Many thanks to Lars-Johann Liman, JP Mens, and Jonathan Hewlett for the bug reports. I like receiving messages that say things like,

thanks for nsnotifyd, is a great little program, and a good example of a linux program, does one thing well.

(There’s more like that in the nsnotifyd-2.0 release annoucement.)

I have also included a little `dumpaxfr` program, which I wrote when fiddling around with binary wire format DNS zone transfers. I used the `nsnotifyd` infrastructure as a short cut, though `dumpaxfr` doesn’t logically belong here. But it’s part of the family, so I wrote a dumpaxfr(1) man page and included it in this release.

I will be surprised if anyone else finds `dumpaxfr` useful!

Yesterday I received a bug report for regpg, my program that safely stores server secrets encrypted with gpg so they can be commited to a git repository.

The bug was that I used the classic shell pipeline `find | xargs grep` with the classic Unix “who would want spaces in filenames?!” flaw.

I have pushed a new release, regpg-1.12, containing the bug fix.

There’s also a `gendnskey` subcommand which I used when doing my algorithm rollovers a few years ago. (It’s been a long time since the last regpg release!) It’s somewhat obsolete, now I know how to use `dnssec-policy`.

A bunch of minor compatibility issues have crept in, which mostly required fixing the tests to deal with changes in Ansible, OpenSSL, and GnuPG.

My most distressing discovery was that Mac OS `crypt(3)` still supports only DES. Good grief.

There are a couple of version control commands that deserve wider appreciation: SCCS `what` and RCS `ident`. They allow you to find out what source a binary was built from, without having to run it – handy if it is a library! They basically scan a file looking for magic strings that contain version control metadata and print out what they discover.

Here are some miscellaneous unsorted notes about BIND9’s `dnssec-policy` that turned out not to be useful in my previous blog posts, but which some readers might find informative. Some of them I learned the hard way, so I hope I can make it easier for others!

Here are some notes on migrating a signed zone from BIND’s old `auto-dnssec` to its new `dnssec-policy`.

I have been procrastinating this migration for years, and I avoided learning anything much about `dnssec-policy` until this month. I’m writing this from the perspective of a DNS operator rather than a BIND hacker.

Here are some notes about using BIND’s new-ish `dnssec-policy` feature to sign a DNS zone that is currently unsigned.

I am in the process of migrating my DNS zones from BIND’s old `auto-dnssec` to its new `dnssec-policy`, and writing a blog post about it. These introductory sections grew big enough to be worth pulling out into a separate article.

As is typical for static site generators, each page on this web site is generated from a file containing markdown with YAML frontmatter.

Neither markdown nor YAML are good. Markdown is very much the worse-is-better of markup languages; YAML, on the other hand, is more like better-is-worse. YAML has too many ways of expressing the same things, and the lack of redundancy in its syntax makes it difficult to detect mistakes before it is too late. YAML’s specification is incomprehensible.

But they are both very convenient and popular, so I went with the flow.

## multiple documents

A YAML stream may contain several independent YAML documents delimited by `---` start and `...` end markers, for example:

``````    ---
document: 1
...
---
document: 2
...
``````

## string documents

The top-level value in a YAML document does not have to be an array or object: you can use its wild zoo of string syntax too, so for example,

``````    --- |
here is a preformatted
multiline string
``````

## frontmatter and markdown

Putting these two features together, the right way to do YAML frontmatter for markdown files is clearly,

``````    ---
frontmatter: goes here
...
--- |
markdown goes here
``````

The page processor can simply:

• feed the contents of the file to the YAML parser
• use the first document for metadata
• feed the second document to the markdown processor
• check that’s the end of the file

No need for any ad-hoc hacks to separate the two parts of the file: the YAML acts as a lightweight wrapper for the markdown.

## markdown inside YAML

The crucial thing that makes this work is that the markdown after the `--- |` delimiter does not need to be indented.

Markdown is very sensitive to indentation, so all the tooling (most importantly my editor) gets righteously confused if markdown is placed in a container that introduces extra indentation.

## YAML in Perl

The static site generator for www.dns.cam.ac.uk uses `--- |` to mark the start of the markdown in its source files. This worked really nicely.

The web site was written in Perl, because most of the existing DNS infrastructure was Perl and I didn’t want to change programming languages. YAML was designed by Perl hackers, and the Perl YAML modules are where it all went wrong started.

## YAML in other languages

The static site generator for https://dotat.at is written in Rust, using `serde-yaml`.

I soon discovered that, unlike the original YAML implementations, `serde-yaml` requires top-level strings following `--- |` to be indented. This bug seems to be common in YAML implementations for languages other than Perl.

## start and end delimiters

So I changed the syntax for my frontmatter so it looks like,

``````    ---
frontmatter: goes here
...
markdown goes here
``````

That is, the file starts with a complete YAML document delimited by `---` start and `...` end markers, and the rest of the file is the markdown.

The idea is that a page processor should be able to:

• feed the contents of the file to the YAML parser
• feed the rest of the file to the markdown processor

However, I could not work out how to get `serde-yaml` to read just the prefix of a file successfully and return the remainder for further processing.

## I know, I’ll use regexps

(Might as well, I’m already way past two problems…)

As a result I had to add a bodge to the page processor:

• split the file using a regex
• feed the first part to the YAML parser
• feed the second part to the markdown processor

## mainstream frontmatter

My choice to mark the end of the frontmatter with the YAML `...` end delimiter is not entirely mainstream. As I understand it, the YAML + markdown convention came from Jekyll, or at least Jekyll popularized it. Jekyll uses the YAML `---` start delimiter to mark the end of the YAML, or maybe to mark the start of the markdown, but either way it doesn’t make sense.

Fortunately my `...` bodge is compatible with Pandoc YAML metadata, and Emacs markdown mode supports Pandoc-style YAML metadata, so the road to hell is at least reasonably well paved.

## grump

It works, but it doesn’t make me happy. I suppose I deserve the consequences of choosing technology with known deficiencies. But it requires minimal effort, and is by and large good enough.

My opinion is not mainstream, but I think if you really examine the practices and security processes that use and recommend sudo, the reasons for using it are mostly bullshit.

Our net connection at home is not great: amongst its several misfeatures is a lack of IPv6. Yesterday I (at last!) got around to setting up a wireguard IPv6 VPN tunnel between my workstation and my Mythic Beasts virtual private server.

There were a few, um, learning opportunities.

After an extremely long hiatus, I have resurrected my link log.

As well as its web page, https://dotat.at/:/, my link log is shared via:

The Dreamwidth feed has not caught this afternoon’s newly added links, so I am not sure if it is actually working…

There is a lengthy backlog of links to be shared, which will be added to the public log a few times each day.

The backlog will be drained in a random order, but the link log’s web page and atom feed are sorted in date order, so the most-recently shared link will usually not be the top link on the web page.

I might have to revise how the atom feed is implemented to avoid confusing feed readers, but this will do for now.

The code has been rewritten from scratch in Rust, alongside the static site generator that runs my blog. It’s less of a disgusting hack than the ancient Perl link log that grew like some kind of fungus, but it still lacks a sensible database and the code is still substantially stringly typed. But, it works, which is the important thing.

I’ve changed the atom feed so that newly-added entries have both a “published” date (which is the date displayed in the HTML, reflecting when I saved the link) plus an “updated” date indicating when it was later added to the public log.

I think this should make it a little more friendly to feed readers.

Back in December, George Michaelson posted an item on the APNIC blog titled “That OSI model refuses to die”, in reaction to Robert Graham’s “OSI Deprogrammer” published in September. I had discussed the OSI Deprogrammer on Lobsters, and George’s blog post prompted me to write an email. He and I agreed that I should put it on my blog, but I did not get a round tuit until now…

The main reason that OSI remains relevant is Cisco certifications require network engineers to learn it. This makes OSI part of the common technical vocab and basically unavoidable, even though (as Rob Graham correctly argues) it’s deeply unsuitable.

It would be a lot better if the OSI reference model were treated as a model of OSI in particular, not a model of networking in general, as Jesse Crawford argued in 2021. OSI ought to be taught as an example alongside similar reference models of protocol stacks that are actually in use.

One of OSI’s big problems is how it enshrines layering as the architectural pattern, but there are other patterns that are at least as important:

• The hourglass narrow waist pattern, where a protocol stack provides a simple abstraction and only really cares about how things work on one side of the waist.

For instance, IP is a narrow waist and the Internet protocol stack only really cares about the layers above it. And Ethernet’s addressing and framing are another narrow waist, where IEEE 802 only really cares about the layers below.

• Recursive layering of entire protocol stacks. This occurs when tunnelling, e.g. MPLS or IPSEC. It works in concert with narrow waists that allow protocol stacks to be plugged together.

Tunneling starkly highlights what nonsense OSI’s fixed layers are, leading to things like network engineers talking about “layer 2.5” when talking about tunneling protocols that present Ethernet’s narrow waist at their endpoints.

Speaking of Ethernet, it’s very poorly served by the OSI model. Ethernet actually has three layers:

Then there’s WiFi which looks like Ethernet from IP’s point of view, but is even more complicated. And almost everything non-ethernet has gone away or been adapted to look more like ethernet…

Whereas OSI has too few lower layers, it has too many upper layers: its session and presentation layers don’t correspond to anything in the Internet stack. I think Rob Graham said that they came from IBM SNA, and were related to terminal-related things like simplex or block-mode, and character set translation. Jack Haverty said something similar on the Internet History mailing list in 2019. The closest the ARPANET / Internet protocols get is Telnet’s feature negotiation; a lot of the problem solved by the OSI presentation layer is defined away by the Internet’s ASCII-only network virtual terminal. Graham also said that when people assign Internet functions to layers 5 and 6, they do so based on the names not based on how the OSI describes what they do.

One of the things that struck me when reading Mike Padlipsky’s Elements of Networking Style is the amount of argumentation that was directed at terminal handling back then. I guess in that light it’s not entirely surprising that OSI would dedicate two entire layers to the problem.

Padlipsky also drew the ARPANET layering as a fan instead of a skyscraper, with intermediate layers shared by some but not all higher-level protocols, e.g. the NVT used by Telnet, FTP, SMTP. I expect if he were drawing the diagram later there might be layers for 822 headers, MIME, SASL – though they are more like design patterns than layers since they get used rather differently by SMTP, NNTP, HTTP. The notion of pick-and-mix protocol modules seems more useful than fixed layering.

Anyway, if I could magically fix the terminology, I would prefer network engineers to talk about specific protocols (e.g. ethernet, MPLS) instead of bogusly labelling them as layers (e.g. 2, 2.5). If they happen to be working with a more diverse environment than usual (hello DOCSIS) then it would be better to talk about sub-IP protocol stacks. But that’s awkwardly sesquipedalian so I can’t see it catching on.

In my previous entry I wrote about constructing a four-point egg, using curcular arcs that join where their tangents are at 45°. I wondered if I could do something similar with ellipses.

As before, I made an interactive ellipse workbench to experiment with the problem. I got something working, but I have questions…