QP trie news roundup

Firstly, I have to say that it's totally awesome that I am writing this at all, and it's entirely due to the cool stuff done by people other than me. Yes! News about other people doing cool stuff with my half-baked ideas, how cool is that?

CZ.NIC Knot DNS

OK, DNS is approximately the ideal application for tries. It needs a data structure with key/value lookup and lexically ordered traversal.

When qp tries were new, I got some very positive feedback from Marek Vavrusa who I think was at CZ.NIC at the time. As well as being the Czech DNS registry, they also develop their own very competitive DNS server software. Clearly the potential for a win there, but I didn't have time to push a side project to production quality, nor any expectation that anyone else would do the work.

But, in November I got email from Vladimír Čunát telling me he had reimplemented qp tries to fix the portability bugs and missing features (such as prefix searches) in my qp trie code, and added it to Knot DNS. Knot was previously using a HAT trie.

Vladimír said qp tries could reduce total server RSS by more than 50% in a mass hosting test case. Although qp tries can be slower than HAT tries in some synthetic benchmarks (more indirections due to checking a nybble per node rather than a byte per node) this effect is too small to make a non-negligible difference to Knot.

So, qp tries were a pretty good improvement. Thanks, Vladimír, for making such effective use of my ideas!

(I've written some notes on more memory-efficient DNS name lookups in qp tries in case anyone wants to help make it even better...)

Rust

Shortly before Christmas I spotted that Frank Denis has a qp trie implementation in Rust!

Sadly I'm still only appreciating Rust from a distance, but when I find some time to try it out properly, this will be top of my list of things to hack around with!

I think qp tries are an interesting test case for Rust, because at the core of the data structure is a tightly packed two word union with type tags tucked into the low order bits of a pointer. It is dirty low-level C, but in principle it ought to work nicely as a Rust enum, provided Rust can be persuaded to make the same layout optimizations. In my head a qp trie is a parametric recursive algebraic data type, and I wish there were a programming language with which I could express that clearly.

So, thanks, Frank, for giving me an extra incentive to try out Rust! Also, Frank's Twitter feed is ace, you should totally follow him.

Time vs space

Today I had a conversation on Twitter with @tef who has some really interesting ideas about possible improvements to qp tries.

One of the weaknesses of qp-tries, at least in my proof-of-concept implementation, is the allocator is called for every insert or delete. C's allocator is relatively heavyweight (compared to languages with tightly-coupled GCs) so it's not great to call it so frequently.

(Bagwell's HAMT paper was a major inspiration for qp tries, and he goes into some detail describing his custom allocator. It makes me feel like I'm slacking!)

There's an important trade-off between small memory size and keeping some spare space to avoid realloc() calls. I have erred on the side of optimizing for simple allocator calls and small data structure size at the cost of greater allocator stress.

@tef suggested adding extra space to each node for use as a write buffer, in a similar way to "fractal tree" indexes.. As well as avoiding calls to realloc(), a write buffer could avoid malloc() calls for inserting new nodes. I was totally nerd sniped by his cool ideas!

After some intensive thinking I worked out a sketch of how write buffers might amortize allocation in qp tries. I don't think it quite matches what tef had in mind, but it's definitely intriguing. It's very tempting to steal some time to turn the sketch into code, but I fear I need to focus more on things that are directly helpful to my colleagues...

Anyway, thanks, tef, for the inspiring conversation! It also, tangentially, led me to write this item for my blog.