benchmarking wider-fanout versions of qp tries
QP tries are faster than crit-bit tries because they extract up to four bits of information out of the key per branch, whereas crit-bit tries only get one bit per branch. If branch nodes have more children, then the trie should be shallower on average, which should mean lookups are faster.
So if we extract even more bits of information from the key per branch, it should be even better, right? But, there are downsides to wider fanout.
I originally chose to index by nibbles for two reasons: they are easy to pull out of the key, and the bitmap easily fits in a word with room for a key index.
If we index keys by 5-bit chunks we pay a penalty for more complicated indexing.
If we index keys by 6-bit chunks, trie nodes have to be bigger to fit a bigger bitmap, so we pay a memory overhead penalty.
How do these costs compare to the benefits of wider branches?
I have implemented 5-bit and 6-bit versions of qp tries and benchmarked them. For those interested in the extremely nerdy details, the full version of "never mind the quadbits" is on the qp trie home page.
The tl;dr is, 5-bit qp tries are fairly unequivocally better than 4-bit qp tries. 6-bit qp tries are less clear - they are not faster than 5-bit on my laptop and desktop (both Core 2 Duo), but they are faster on a Xeon server that I tried despite using more memory.