Earlier this year, I rewrote BIND’s DNS name compression algorithm. That work is in the November release, BIND 9.19.7.
Last week I rewrote the reverse algorithm, DNS name decompression, and made it about 2x faster. I enjoyed deleting this comment that dates back to January 1999:
/* * Note: The following code is not optimized for speed, but * rather for correctness. Speed will be addressed in the future. */
So here is a collection of folklore and tricks related to DNS name decompression.
How are DNS names represented in memory?
Almost all the different implementations that I have looked at use the (uncompressed) DNS wire format. It might seem like a trivial or simplistic answer, but DNS names are so small that it’s hard to do better. In particular, most DNS labels are smaller than a 64-bit pointer so adding a layer of indirection is likely to be bad for size as well as locality.
Actually, that’s an over-simplification.
Alongside the wire format name, there is usually a little vector holding the offsets in the name of each label. Each offset only needs 1 byte (because names are at most 255 long) and there are at most 128 of them.
(Implementations vary in whether they store offsets in increasing or decreasing order.)
So one of the jobs of the DNS name decompression code is to set up this array of label offsets.
The DNS is case-insensitive for ASCII letters, and traditionally it makes some attempt to preserve case (tho that is often futile).
However, DNSSEC requires canonicalizing to lower case in certain situations, so some DNS implementations smash case before doing anything else with a name.
BIND is in the case-preserving camp.
One of the most common traps that DNS implementations fall into is chasing compression pointers without limit. When they receive a packet where the pointers form a loop, they sit chewing CPU and doing nothing else.
One way to avoid the bug is to set a limit on the number of pointers allowed in a name. However I am not sure if there’s any agreement on what that limit should be.
Close reading of RFC 1035 section 4.1.4 reveals that DNS name compression pointers must point earlier in the message. So another solution is:
BIND’s old decompression code was based on a loop that examined the name one byte at a time.
BIND’s new decompression code has a loop that examines the name one label at a time — and tries to do as much work as possible one pointer at a time. When a name is uncompressed, much of the work can be done once at the end of the name.
The straightforward way to smash a DNS name to lower case is to
iterate one label at a time and
tolower() each byte inside each
Another example of doing less is to observe that label lengths are at most 63, and ASCII upper case letters are at least 65, so they cannot be confused. This means your code can run a fast tolower loop over the whole name, and make better use of memory bandwidth.
Let’s use the same trick again!
Most of the DNS name decompression code I have seen copies each label one at a time from the DNS message to the internal name buffer.
But take another look at 2b: we have a handy marker at the start of
the sequence of labels that we just scanned. So instead of copying one
label at a time, we can do the copying one pointer at a time, using
memmove() for everything between the marker and the position of
And if your DNS server prefers to smash case, it can use trick 4 when copying a multiple labels in one go.
I would be interested to hear which other DNS implementations decompress names by copying all labels between each pointer in one shot, instead of one label at a time
It’s pleasing that the marker does two jobs for us. Here’s a two-for-one twofer!
For each label, the new code
There’s no need for a bounds check on the offsets array, because it is big enough to take all the labels that can fit in a maximum-length name. So the name bounds check also acts as the offsets bounds check.
If (unlike BIND) your DNS name decompressor does not need to collect label offsets, then you can move the name length check from once-per-label to once-per-pointer.
This is more like a one-for-two, because it eliminated a variable.
DNS names are self-delimiting, which means you have to parse a name to find out where is the next thing in the message. The spec for compressed names says (somewhat obliquely) that the end of a compressed name is immediately after the first pointer.
The old code had two variables, one for the end of the
name (incremented one byte at a time), and a
(checked for every byte). The new code does less! It
has one variable checking how much of the message was consumed, and it
is set once when the first pointer is seen.
I also wrote a long comment with evil ASCII art to explain why, when the message comes from a twisted miscreant, the end of a name is not necessarily its maximum extent in the message.
If you have been reading the code alongside these notes, you will have
seen that it mostly uses pointers. (And for trick 7, I used
consumed is still unset.)
It would perhaps be nicer to use indexes into the buffer instead of pointers, but the compiler produced better code — I think because a pointer requires fewer registers than a base and an index, and there are a lot of variables to juggle for a relatively small loop.
The new code is typically 2x faster than the old code; for names with many small labels it is 1.4x faster, and for names with few long labels it can be 15x faster. And the new code is 2x smaller: I deleted 173 lines and added 72 lines of code and 90 lines of comments (including a couple of ASCII art diagrams and a reference to the Rocky Horror show).
I’m pleased with the result, and grateful to my colleague Petr Špaček for helping to make it better.
I have said (publicly, several times, in several places) that the first thing I look at in new DNS code is the name decompression algorithm, to see if it properly protects against compression pointer loops.
So it would be deeply embarrassing for me to screw it up!
/* * *** WARNING *** * * dns_name_fromwire() deals with raw network data. An error in this * routine could result in the failure or hijacking of the server. */
As well as passing the existing tests, I added a
fuzz test that ensures the new code produces identical results
to the old code (including side-effects on buffers), and ran it with a
good variety of sanitizers.
How about the deleted comment I quoted at the top of this post?
The replacement code now (nearly 25 years later!) has speed as well as correctness. And I think it’s simpler too — except, perhaps, for the way I pulled trick #5.