The key™ ingredient of my DNS-trie is a way of re-encoding DNS names so that each byte has less than 48 possible values (so that the fan-out of each node is not too big), and normal hostname characters are encoded in one byte.
But this idea is also useful for speeding up DNS name lookups in general-purpose radix trees.
- wherefore art thou, radix tree?
- what’s in a name?
- a plague on both your octets
- do you byte your thumb at us, sir?
- stretches from a bit narrow to a word broad
- squash what light from yonder window breaks
- O deadly sin! O rude unthankfulness!
- parting is such sweet sorrow
wherefore art thou, radix tree?
The reason for using a radix tree to store DNS names is to support DNSSEC proof of nonexistence. When someone asks about a name that doesn’t exist, we need to tell them, “Oh, B doesn’t exist, in fact here’s a cryptographically signed assertion that there are no names between A and C.” The asker can verify the assertion by observing that B sorts between A and C, and by checking the signature.
A radix tree is fairly efficient at the kind of ordered lookup that allows a DNS server to quickly find A and C when it is asked about B.
what's in a name?
However, radix tree ordering is the simple byte-by-byte lexicographic order of the keys, but the canonical order of DNS names is more complicated than that. So, we need to rewrite a DNS name into a form that we can use as a radix tree lookup key, such that the lexicographic order of our rewritten keys matches the canonical order of DNS names.
To prepare a DNS name as a radix tree key, we need to:
-
map ASCII upper-case to ASCII lower-case
(this is specifically byte values 65 - 90; DNS is not internationalized at this level)
-
reverse the order of the labels
(so, for example,
cam.ac.uk
is mapped touk.ac.cam
)
That’s not too bad, but the awkward part is that DNS names can contain arbitrary binary. What is worse is that in DNS packets, the delimiter between labels is not a dot: instead labels have a counted length.
We have to ensure that, for example, foo.bar
sorts before
foo\0.bar
because foo
is shorter than foo\0
. In effect, the
end-of-label marker needs to sort before a zero octet value.
Because of this, DNS servers that use radix trees have some kind of DNS name escaping mechanism so that binary names are handled correctly.
a plague on both your octets
The vast majority of DNS names conform to host name syntax, i.e. letters, digits, hyphens; plus underscore; plus label separators that sort before zero.
In the DNS-trie escaping scheme, any of these common DNS name octets map to single bytes in the lookup key. Weird punctuation or control characters are encoded as two bytes.
The escaped key needs to have the correct lexicographic order, so we don’t use a single escape character: in fact there are six possible escape bytes. When we encode a weird DNS name octet as two key bytes, the first (escape) byte depends on how the weird octet sorts relative to common octet values.
DNS name octet | 1st key byte | 2nd key byte |
---|---|---|
(end of name) | 0 | |
(end of label) | 1 | |
0 - 44 (ctrl, punct) | 2 | 1 - 45 |
45 (hyphen) | 3 | |
46 - 47 (punct) | 4 | 1 - 2 |
48 - 57 (digits) | 5 - 14 | |
58 - 64 (punct) | 15 | 1 - 7 |
65 - 90 (upper) | 18 - 43 (*) | |
91 - 94 (punct) | 15 | 8 - 11 |
95 (underscore) | 16 | |
96 (punct) | 17 | |
97 - 122 (lower) | 18 - 43 | |
123 - 167 (high) | 44 | 1 - 45 |
168 - 211 (high) | 45 | 1 - 44 |
212 - 255 (high) | 46 | 1 - 44 |
(*) The upper-case letters have the same encoding as the lower-case letters, so the two blocks for escape byte 15 are contiguous according to DNS sorting order.
This encoding uses a couple of small lookup tables: one for the first byte value, and one for the second byte value; if the second byte value is 0 it isn’t added to the key. When the second byte is non-zero, the first byte is one of the six escape bytes.
do you byte your thumb at us, sir?
In existing DNS servers, it’s more normal to escape DNS names as little as possible before using them as lookup keys - much less escaping than my DNS-trie scheme. So most lookup key byte values are dedicated to weird (and rare) DNS name octets.
The DNS-trie escaping scheme uses less than 48 different byte values, because its fan-out is limited by the number of bits in a word. But general-purpose radix trees support keys with arbitrary byte values. This gives us an opportunity!
stretches from a bit narrow to a word broad
Tree structures are faster the more shallow they are, because shallow trees require fewer pointers to be followed. To make a tree shallower, we need to make each node broader. So a 5-bit qp-trie (nodes up to 32-wide) is faster than a 4-bit qp-trie (nodes up to 16-wide), and a DNS-trie is faster still (nodes up to 48-wide).
We can make a radix tree shallower by making our keys shorter, and we can make it broader by increasing the information density of our keys. The DNS-trie escaping scheme leaves a lot of unused space in each byte, so if we squeeze that space out, the key will make more effective use of the radix tree.
In effect, this will be a compactionion scheme that makes use of our knowledge about which characters are common, and preserves the sorting order of DNS names.
squash what light from yonder window breaks
There are two plausible compaction schemes:
-
trieprep-64 encodes 4 name bytes in 3 key octets (0.75x length)
-
trieprep-48 encodes 7 name bytes in 5 key octets (0.71x length)
(assuming normal DNS names)
In trieprep-64, we treat escaped name “bytes” from the lookup table as 6 bit values, and pack them into octets in the lookup key in a similar manner to decoding base64.
The trieprep-48 scheme is based on the fact that 2^39 < 48^7 < 2^40, so 7 escaped name “bytes” with values less than 48 can fit snugly in 5 octets.
O deadly sin! O rude unthankfulness!
However, there is a caveat:
Although I have reserved the 0 “byte” for marking the end of the name, it’s possible that after compaction there can be zero octets in the middle of the lookup key. This is not a problem if the radix tree supports true binary lookup keys, but it’s troublesome if the radix tree is designed for C-style zero terminated strings (like my original qp-trie).
With the trieprep-64 scheme, this problem can be avoided by not using any “byte” values that start or end with 4 zero bits (i.e. 1, 2, 3, 16, 32, 48). This leaves 58 usable values which is more than enough.
It might be possible to pull a similar trick with trieprep-48. There is space to expand to trieprep-52, because 52 is still small enough that 2^39 < 52^7 < 2^40, which gives us more space to skip values that can lead to zero octets in our lookup key. But I’m not confident this can be made to work.
parting is such sweet sorrow
Knot DNS uses a 4-bit qp-trie. With a straightforward encoding of DNS names, trie nodes that correspond to the upper 4 bits of a byte are under-used: their fan-out never gets near the maximum of 16. This escaping and compaction scheme should make better use of the qp-trie.
NSD uses a more conventional radix tree, in which each node has a fan-out of up to 256, though usually much less than that in practice because DNS names have a much smaller character set. It will be interesting to see what effect this encoding scheme has in NSD.
(BIND uses a red-black tree so this encoding would not help it.)