Last month I wrote some notes on endianness, arguing that it is fundamentally about bit order: byte order arises from the combination of bit order and byte-wide addressing. I also paid attention to the algebraic properties of endian-aware bit strings, but I didn’t pay attention to performance.
- towards practical endianness
- good associations
- deserialize abstractly
- deserialize completely
- deserialize beautifully
- essential endianness
- mismatched memory
towards practical endianness
Last time I was thinking in purely functional style, but for most practical purposes we encounter endianness when reading from and writing to a buffer imperatively.
Also, in functional style, I only worked with the head/start of a bitstring, as if it were a traditional linked list. But in practice when working in a buffer we typically read from the start and write to the end.
So I have been wondering about code that deals with bitwise endianness in a practical manner while still benefiting from the algebraic properties.
Somehow last time I failed to mention “monoid” at all. Bitstrings, being strings, can be concatenated, which is an associative binary operator with the empty string as its identity. This makes a monoid, and monoids are useful enough that it’s worth paying attention when you spot one.
Concatenation works on either end of a string, which matches buffering better than last time’s design based on fold and unfold. So I need to explain how to serialize and deserialize at either end of a bitstring with either endianness.
Let’s play a trick, and allow ourselves to read a bitstring from either end, regardless of its endianness, i.e. abstracting away from its transmission order or representation in memory.
We are looking at one end of a bistring, either the big end or the little end. We place a binary point just on this sice of the end closest to us, and extract a number:
Starting from the little end, we give each bit the value
i counts up from 0. Divide the sum by
2**N and take the remainder
as the value of our number; the quotient is what is left of the
Starting from the big end, we give each bit the value
counts down from -1. Multiply the sum by
2**N and take the integer
part as the value of our number; the fractional part is what is left
of the bitstring.
Given a bitstring N bits long, if we convert all of it to an N bit number, we get the same value regardless of whether we start from the big end or the little end.
This gives us two equivalent dual definitions of how to convert between a bitstring and a number. And this definition is decoupled from the bistring’s endianness, and more separate from splitting and concatenation than the way I described it before.
I originally chose the single-ended way of looking at deserialization because of the way the quotient·remainder or integer·fraction parts map nicely to the split between the extracted number and the rest of the bitstring, and they way it firmly places the binary point. But there’s also a pleasing symmetry to a double-ended bidirectional conversion between bitstrings and numbers.
Now that we have described the value of a bitstring independently of its transmission order, all that is left of endiannes is the choice of whether the big end is first, or the little end is first.
It gets messy when we need to represent bitstrings in memory, for two reasons:
memory is addressed in bytes, which makes endianness look superficially like a question of byte order rather than bit order
there is a conventional numbering of items in memory which might or might not match the bitstring’s endianness (for instance, 0-based addressing doesn’t fit big-endianness comfortably)
The remaining challenge is to come up with an implementation that’s efficient without being ugly.