A one-pass DNS-trie?

Last summer I wrote about my DNS-trie, a version of my qp-trie that is optimized for DNS names. It turned out to work rather well: I patched NSD to use my code and it was smaller and faster than NSD’s radix tree.

But there seemed to be a couple of places where there was room for improvement.

NXDOMAIN
Wasted space
Inline names
Space in leaves
Space in branches
Space in pointers
A false start
Stretch nodes
Compressing paths
Portable node layout
Unwarranted chumminess with the implementation
Guesstimate

NXDOMAIN

When you look up a name that doesn’t exist in a DNS-trie, the code goes through three stages:

walk down the tree, skipping parts of the name that don’t affect the shape of the tree;
compare the query name with the name we found at the leaf of the tree, making a note of where they differ;
walk the tree again to find the leaf that’s lexicographically before the query name, to provide a proof of nonexistence.

This can be nearly twice as slow as looking up a name that exists, which is kind of disappointing.

In NSD’s radix tree, it finds out where a nonexistent name diverges in the first pass, so it can go straight to the lexicographic predecessor; queries for known and unknown names are about the same speed.

Wasted space

The leaf of a DNS-trie contains two pointers, to the key (a domain name) and its value. The value in turn contains the name’s records - but in NSD it also contains a pointer to the name. This wastes 8 bytes per name on 64-bit systems.

This complaint isn’t really specific to NSD; it’s unlikely that any DNS server would be able to avoid wasting this space unless it was designed around the DNS-trie.

Inline names

In a traditional trie or radix tree, the keys are more or less implicit in the structure of the tree. There’s no skipping parts of the query name when walking the tree, so there’s no need for a key pointer in each leaf.

It’s a waste of time and space to have a tree node for every byte in the key; instead, radix trees typically have some kind of path compression. Where a DNS-trie skips over bytes, a radix tree stores a string fragment to compare against that part of the key.

Can we adapt a DNS-trie to work like this? We would need to find some space inside the tree structure, which is already fairly packed…

Space in leaves

Instead of the pointer to the key, store up to 8 bytes of key directly in the leaf. But we will need somewhere else to store any overflow.

Space in branches

A DNS-trie branch node contains a byte offset which is used for skipping bytes of the key. If we aren’t skipping bytes then we can keep count instead, so the explicit offset isn’t needed and we can use the space for something else.

But it’s only about one and a half bytes.

Space in pointers

The virtual address space on 64 bit systems is usually only 48 bits. But that “usually” hides a number of caveats.

On Linux systems with larger virtual address spaces, the kernel keeps userland addresses within 48 bits unless the application explicitly asks for petabyte addresses. Apple ARM systems can use the spare bits in a pointer for authentication codes, but that is currently only used for code pointers.

So in practice, we can steal 16 bits from the top end of a pointer, plus some bits from the bottom depending on the pointer’s alignment.

Any code will still have to work on systems with 32-bit or 64-bit pointers, so 48-bit mode will have to be chosen carefully only on systems where it is known to work…

A false start

At first I thought it might make sense to append the path compression string fragment to the twig array, but this was far too messy. The tricky mutation cases in a qp-trie or DNS-trie involve introducing or eliminating branches, and they get a lot more complicated when they also have to split or join path compression fragments.

So I gave up on that attempt.

Stretch nodes

My current thought (as yet untested) is to have a variant of a leaf node that is used inside the tree to hold a path compression string fragment, when there isn’t enough space inside the following branch or leaf. It will have the same basic layout as other nodes, that is, two words, one of which is a pointer and the other (like a leaf) containing up to 8 bytes of string fragment.

This wastes a bit more space for extra nodes, and time walking through them, but it should be simple enough, and not too bad if we can reduce the number of stretch nodes we need. So what sizes of string fragments can we squeeze into our branch and leaf nodes?

Compressing paths

Before querying a DNS-trie, the domain name is prepared so that each byte can be used directly to test the bitmap in a branch node. This means each byte contains a value up to about 50 (the size of the bitmap), so we are only using 6 bits of each byte.

We can make better use of space for our path compression string fragments by using 6 bits for each character in the key.

Checking the string fragment can be reasonably neat:

fetch N bytes from the key (where N is the string fragment length)
shift and OR them together into six bit fields inside a word K
mask the string fragment out of the node’s words to get S
compare the compressed key fragment and the node’s string fragment by comparing the words K and S

Portable node layout

So, how long can our string fragments actually be?

In portable code, a node consists of a 64 bit unsigned integer, and an opaque pointer (32 or 64 bits) which we don’t mess with.

In 64 bits, a branch node can fit:

1 bit copy-on-write flag
2 bit type code
2 x 6 bit string fragment
49 bit bitmap

Which is just about doable, though I need to tweak the name preparation scheme slightly to reduce the bitmap size.

The type encodes the length of the string fragment, 0, 1, or 2, leaving 3 to indicate a stretch node or leaf node.

In 64 bits, a leaf or stretch node can fit:

1 bit copy-on-write flag
2 bit type code == 3
1 bit leaf or stretch
4 bit fragment length
9 x 6 bit string fragment

Unwarranted chumminess with the implementation

If we assume our virtual address space is 48 bits, a branch node has space for:

1 bit copy-on-write flag
3 bit type code
5 x 6 bit string fragment
49 bit bitmap
45 bits of pointer (8 byte alignment)

A leaf or stretch node can fit:

1 bit copy-on-write flag
3 bit type code == 6 (stretch) or 7 (leaf)
4 bit fragment length
12 x 6 bit string fragment
48 bits of pointer (no alignment assumed)

Guesstimate

I think this feels promising. Even the portable layout might be reasonably competitive; I expect the 48-bit version will be smaller and faster because it will need fewer stretch nodes.