.@ Tony Finch – blog


(Monday’s notes) (Wednesday’s notes) (Thursday’s notes) (Epilogue)

Today was a bit weird since I managed to perform some unusual productivity judo on myself.

I started the day feeling very unsure about what direction to take, and worried that the project would be a bust before it had even got properly started.

After I had some coffee and killed a bit of email, I found myself looking through Knot’s DNS update code, worrying about how to COWify it. Usually if I am feeling unsure about a project I will put it on the back burner to think about while I do something else. That isn’t going to work when I have given myself only one week to see what I can achieve.

Eventually I realised that my only chance of success would be to strictly limit the scope of the project to the qp trie code itself, aiming to make it possible to COWify the DNS code but not touching the DNS code until the trie is able to COW.

(COW = copy-on-write)

I still didn’t know how COW would work: the new invariants, the fence posts, the edge cases. But it seemed clear after last night’s thoughts that I would need to add some kind of COW flag to the leaf nodes.

Digression: misuse of C

I made two horrible mistakes in my original qp trie code.

First, I used a union to describe how branch and leaf nodes both occupy two words, but with different layouts. It is really hard to use a union in C and not lose a fight with the compiler over the strict aliasing rules.

Second, I used bitfields to describe how one of the branch words is split up into subfields. Bitfields have always been a portability nightmare. I was using them to describe the detailed specifics of a memory layout, which does not work for portable code.

I was aware at the time that these were really bad ideas, but I wanted a reasonably explicit and lightweight notation for accessing the parts of the data structure while I was getting it working. And I never got round to correcting this short-term hack.

Refactoring

So today I spent a few hours starting to replace this dubious C with something more explicit and less likely to confuse the compiler.

As a side-effect, it will make it possible to stash an extra COW flag into leaf nodes.

Where to bung a COW?

When I started refactoring I didn’t know how I would use the COW flag.

It takes me about 25 minutes to walk between home and the beer festival, and today that was really useful thinking time.

My thoughts oscillated between different possible semantics for the COW flag (most of them clearly broken) and I wasn’t sure I could make it work. (At worst I might finish the week with a bit of code cleanup…)

This evening I think I came up with something that will work, and that will justify the refactoring. The problem I have been struggling with is that the existing qp trie structure puts the metadata about a structure next to the pointer to the structure, but the COW flag is all about whether a structure is shared or not, so it really demands to be put in the structure itself.

When you have trie A and its COW-clone trie B, if you put the COW flags in the pointers you end up with different copies of the pointers and flags in A and B. Then you modify B some more, which means you need to update a flag, but the flag you need to update is in A and you don’t have a quick way to locate it. Gah!

Tomorrow

My aim is to hammer through the refactoring, and think about the details of how to use this COW flag, and what the API might look like for COWing the application data structure - mainly the DNS stuff in the case of Knot, but the hooks have to be general-purpose. (Knot uses qp tries for a lot more than just DNS zones.)