.@ Tony Finch – blog


Here’s a sketch of an idea that might or might not be a good idea. Dunno if it’s similar to something already described in the literature – if you know of something, please let me know via the links in the footer!

The gist is to throw away the tree and interior pointers from a qp-trie. Instead, the p-fast trie is stored using a hash map organized into stratified levels, where each level corresponds to a prefix of the key.

Exact-match lookups are normal O(1) hash map lookups. Predecessor / successor searches use binary chop on the length of the key. Where a qp-trie search is O(k), where k is the length of the key, a p-fast trie search is O(log k).

This smaller O(log k) bound is why I call it a “p-fast trie” by analogy with the x-fast trie, which has O(log log N) query time. (The “p” is for popcount.) I’m not sure if this asymptotic improvement is likely to be effective in practice; see my thoughts towards the end of this note.

layout

A p-fast trie consists of:

Prefixes are strictly shorter than names so that we can avoid having to represent non-values after the end of a name, and so that it’s OK if one name is a prefix of another.

The size of chunks and bitmaps might change; 6 is a guess that I expect will work OK. For restricted alphabets you can use something like my DNS trie name preparation trick to squash 8-bit chunks into sub-64-wide bitmaps.

In Rust where cross-references are problematic, there might have to be a hash map that owns the leaf objects, so that the p-fast trie can refer to them by name. Or use a pool allocator and refer to leaf objects by numerical index.

search

To search, start by splitting the query string at its end into prefix + final chunk of bits. Look up the prefix in the hash map and check the chunk’s bit in the bitmap. If it’s set, you can return the corresponding leaf object because it’s either an exact match or the nearest predecessor.

If it isn’t found, and you want the predecessor or successor, continue with a binary chop on the length of the query string.

Look up the chopped prefix in the hash map. The next chunk is the chunk of bits in the query string immediately after the prefix.

If the prefix is present and the next chunk’s bit is set, remember the chunk’s leaf pointer, make the prefix longer, and try again.

If the prefix is present and the next chunk’s bit is not set and there’s a lesser bit that is set, return the leaf pointer for the lesser bit. Otherwise make the prefix shorter and try again.

If the prefix isn’t present, make the prefix shorter and try again.

When the binary chop bottoms out, return the longest-matching leaf you remembered.

The leaf’s key and successor bracket the query string.

modify

When inserting a name, all its prefixes must be added to the hash map from longest to shortest. At the point where it finds that the prefix already exists, the insertion routine needs to walk down the (implicit) tree of successor keys, updating pointers that refer to the new leaf’s predecessor so they refer to the new leaf instead.

Similarly, when deleting a name, remove every prefix from longest to shortest from the hash map where they only refer to this leaf. At the point where the prefix has sibling nodes, walk down the (implicit) tree of successor keys, updating pointers that refer to the deleted leaf so they refer to its predecessor instead.

I can’t “just” use a concurrent hash map and expect these algorithms to be thread-safe, because they require multiple changes to the hashmaps. I wonder if the search routine can detect when the hash map is modified underneath it and retry.

thoughts

It isn’t obvious how a p-fast trie might compare to a qp-trie in practice.

A p-fast trie will use a lot more memory than a qp-trie because it requires far more interior nodes. They need to exist so that the random-access binary chop knows whether to shorten or lengthen the prefix.

To avoid wasting space the hash map keys should refer to names in leaf objects, instead of making lots of copies. This is probably tricky to get right.

In a qp-trie the costly part of the lookup is less than O(k) because non-branching interior nodes are omitted. How does that compare to a p-fast trie’s O(log k)?

Exact matches in a p-fast trie are just a hash map lookup. If they are worth optimizing then a qp-trie could also be augmented with a hash map.

Many steps of a qp-trie search are checking short prefixes of the key near the root of the tree, which should be well cached. By contrast, a p-fast trie search will typically skip short prefixes and instead bounce around longer prefixes, which suggests its cache behaviour won’t be so friendly.

A qp-trie predecessor/successor search requires two traversals, one to find the common prefix of the key and another to find the prefix’s predecessor/successor. A p-fast trie requires only one.