.@ Tony Finch – blog


There was an interesting and instructive cockup in the BIND 9.10.4 release. The tl;dr is that they tried to improve performance by changing the layout of a data structure to use less memory. As a side effect, two separate sets of flags ended up packed into the same word. Unfortunately, these different sets of flags were protected by different locks. This overlap meant that concurrent access to one set of flags could interfere with access to the other set of flags. This led to a violation of BIND's self-consistency checks, causing a crash.

The fix uses an obscure feature of C structure bitfield syntax. If you declare a bitfield without a name and with a zero width (in this case, unsigned int :0) any following bitfields must be placed in the next "storage unit". This is described in the C standard section 6.7.2.1p12.

You can see this in action in the BIND DNS red-black tree declaration.

A :0 solves the problem, right?

So a clever idea might be to use this :0 bitfield separator so that the different sets of bitfields will be allocated in different words, and the different words will be protected by different locks.

What are the assumptions behind this fix?

If you only use a :0 separator, you are assuming that a "word" (in a vague hand-wavey sense) corresponds to both a "storage unit", which the compiler uses for struct layouts, and also to the size of memory access the CPU uses for bitfields, which matters when we are using locks to keep some accesses separate from each other.

What is a storage unit?

The C standard specifies the representation of bitfields in section 6.2.6.1p4:

Values stored in bit-fields consist of m bits, where m is the size specified for the bit-field. The object representation is the set of m bits the bit-field comprises in the addressable storage unit holding it.

Annex J, which covers portability issues, says:

The following are unspecified: [... yada yada ...]

The alignment of the addressable storage unit allocated to hold a bit-field

What is a storage unit really?

A "storage unit" is defined by the platform's ABI, which for BIND usually means the System V Application Binary Interface. The amd64 version covers bit fields in section 3.1.2. It says,

In this particular case, the storage unit is 4 bytes for a 32 bit unsigned int. Note that this is less than the architecture's nominal 64 bit width.

How does a storage unit correspond to memory access widths?

Who knows?

What?

It is unspecified.

So...

The broken code declared a bunch of adjacent bit fields and forgot that they were accessed under different locks.

Now, you might hope that you could just add a :0 to split these bitfields into separate storage units, and the split would be enough to also separate the CPU's memory accesses to the different storage units.

But you would be assuming that the code that accesses the bitfields will be compiled to use unsigned int sized memory accesses to read and write the unsigned int sized storage units.

Um, do we have any guarantees about access size?

Yes! There is sig_atomic_t which has been around since C89, and a load of very recent atomic stuff. But none of it is used by this part of BIND's code.

(Also, worryingly, the AMD64 SVABI does not mention "atomic".)

So what is this "Deathstation 9000" thing then?

The DS9K is the canonical epithet for the most awkward possible implementation of C, which follows the standard to the letter, but also violates common assumptions about how C works in practice. It is invoked as a horrible nightmare, but in recent years it has become a disastrous reality as compilers have started exploiting undefined behaviour to support advanced optimizations.

(Sadly the Armed Response Technologies website has died, and Wikipedia's DS9K page has been deleted. About the only place you can find information about it is in old comp.lang.c posts...)

And why is the DS9K relevant?

Well, in this case we don't need to go deep into DS9K territory. There are vaguely reasonable ABIs for which a small :0 fix would not actually work.

For instance, there might be a CPU which can only do 64 bit memory accesses, but which has a 32 bit int storage unit. This type representation would probably mean the C compiler has really bad performance, so it is fairly unlikely. But it is allowed, and there are (or were) CPUs which can't do sub-word memory accesses, and which have very little RAM so they want to pack data tightly.

On a CPU like this, the storage unit doesn't match the memory access, so C's :0 syntax for skipping to the next storage unit will fail to achieve the desired effect of isolating memory accesses that have to be performed under different concurrent access locks.

DS9K defence technology

So the BIND 9.10.4 fix does two things:

The most important change is to move one of the sets of bitfields from the start of the struct to the end of it. This means there are several pointers in between the fields protected by one lock and the fields protected by the other lock, so even a DS9K can't reasonably muddle them up.

Secondly, they used magical :0 syntax and extensive comments to (hopefully) prevent the erroneous compaction from happening again. Even if the bitfields get moved back to the start of the struct (so the pointers no longer provide insulation) the :0 might help to prevent the bug from causing crashes.

(edited to add)

When I wrapped this article up originally, I forgot to return to sig_atomic_t. If, as a third fix, these bitfields were also changed from unsigned int to sig_atomic_t, it would further help to avoid problems on systems where the natural atomic access width is wider than an int, like the lesser cousin of the DS9K I outlined above. However, sig_atomic_t can be signed or unsigned so it adds a new portability wrinkle.

Conclusion

The clever :0 is not enough, and the BIND developers were right not to rely on it.