Firstly, I have to say that it's totally awesome that I am writing this at all, and it's entirely due to the cool stuff done by people other than me. Yes! News about other people doing cool stuff with my half-baked ideas, how cool is that?
CZ.NIC Knot DNS
OK, DNS is approximately the ideal application for tries. It needs a data structure with key/value lookup and lexically ordered traversal.
When qp tries were new, I got some very positive feedback from Marek Vavrusa who I think was at CZ.NIC at the time. As well as being the Czech DNS registry, they also develop their own very competitive DNS server software. Clearly the potential for a win there, but I didn't have time to push a side project to production quality, nor any expectation that anyone else would do the work.
But, in November I got email from Vladimír Čunát telling me he had reimplemented qp tries to fix the portability bugs and missing features (such as prefix searches) in my qp trie code, and added it to Knot DNS. Knot was previously using a HAT trie.
Vladimír said qp tries could reduce total server RSS by more than 50%
in a mass hosting test
case. The
disadvantage is that they are slightly slower than HAT tries, e.g. for
the .com
zone they do about twice as many memory indirections per
lookup due to checking a nybble per node rather than a byte per node.
On balance, qp tries were a pretty good improvement. Thanks, Vladimír, for making such effective use of my ideas!
(I've written some notes on more memory-efficient DNS name lookups in qp tries in case anyone wants to help close the speed gap...)
Rust
Shortly before Christmas I spotted that Frank Denis has a qp trie implementation in Rust!
Sadly I'm still only appreciating Rust from a distance, but when I find some time to try it out properly, this will be top of my list of things to hack around with!
I think qp tries are an interesting test case for Rust, because at the
core of the data structure is a tightly packed two word union
with
type tags tucked into the low order bits of a pointer. It is dirty
low-level C, but in principle it ought to work nicely as a Rust
enum
, provided Rust can be persuaded to make the same layout
optimizations. In my head a qp trie is a parametric recursive
algebraic data type, and I wish there were a programming language with
which I could express that clearly.
So, thanks, Frank, for giving me an extra incentive to try out Rust! Also, Frank's Twitter feed is ace, you should totally follow him.
Time vs space
Today I had a conversation on Twitter with @tef who has some really interesting ideas about possible improvements to qp tries.
One of the weaknesses of qp-tries, at least in my proof-of-concept implementation, is the allocator is called for every insert or delete. C's allocator is relatively heavyweight (compared to languages with tightly-coupled GCs) so it's not great to call it so frequently.
(Bagwell's HAMT paper was a major inspiration for qp tries, and he goes into some detail describing his custom allocator. It makes me feel like I'm slacking!)
There's an important trade-off between small memory size and keeping
some spare space to avoid realloc()
calls. I have erred on the side of
optimizing for simple allocator calls and small data structure size at
the cost of greater allocator stress.
@tef suggested adding extra space to each node for use as a write
buffer, in a similar way to "fractal tree"
indexes..
As well as avoiding calls to realloc()
, a write buffer could avoid
malloc() calls for inserting new nodes. I was totally nerd
sniped by his cool ideas!
After some intensive thinking I worked out a sketch of how write buffers might amortize allocation in qp tries. I don't think it quite matches what tef had in mind, but it's definitely intriguing. It's very tempting to steal some time to turn the sketch into code, but I fear I need to focus more on things that are directly helpful to my colleagues...
Anyway, thanks, tef, for the inspiring conversation! It also, tangentially, led me to write this item for my blog.