What is the right way to understand endianness?
In my view endianness makes much more sense when viewed as a matter of bit order: these notes explore how byte order and byte swapping are a consequence of bitwise endianness combined with bytewise addressing and data transmission. But byte swapping isn’t the only flavour of endianness: bitwise endianness is equally comfortable with oddities like PDP11 word swapping.
I also explain why bigendian is just as mathematically elegant as littleendian: they are perfectly dual to each other, so one can’t be more elegant than the other.
There’s a sketch of an abstract API for endianaware bit sequences, designed with algebra in mind (associativity, anamorphisms and catamorphisms).
edited to add: this article has a sequel with improved symmetry
 Endianness is fundamentally bitwise
 Bit numbering
 Duality
 Mixed endian nonsense
 Practical mixed endianness
 Summary
 Elsewhere
 What else?
Endianness is fundamentally bitwise
Endianness is usually understood and described as a matter of byte order, though Danny Cohen’s essay On Holy Wars and a Plea For Peace (which introduced the terms “bigendian” and “littleendian”) described it in terms of transmission order, from bitstreams upwards. But, as Cohen observed, computers and network gear now deal in bytes, so the transmission order of bits is invisible to programmers, and it makes sense to think of endianness as byte order.
Cohen also judges implementation in terms of how consistently they follow a particular endianness, which makes it sound like a matter of aesthetic preference.
I think there’s a little more to it than that.
to and from bits
The fundamental operations are to serialize an unsigned integer value into an ordered sequence of bits, and the inverse, to deserialize a sequence of bits into a value.
Bit sequences can be concatenated to make longer sequences, or split into multiple shorter sequences.
big endian
For example, the value
0x1234 == 4660
serializes as (reading lefttoright)
0001 0010 0011 0100
If we break it in half and deserialize it we get (reading line by line downwards)
0001 0010 == 0x12 == 18
0011 0100 == 0x34 == 52
So bigendian byte order comes from serializing to bits (big end
first) then deserializing to bytes (big end first). In practice we
always implement the composition of these functions as a single fused
operation, such as htonl()
which converts a 32 bit value of
unspecified endianness to a sequence of bigendian bytes.
little endian
It’s harder to represent littleendian in text. Either you have to read righttoleft, or I have to ask you to do mental gymnastics to reverse numbers in your head. Like Cohen, I’ll write numbers with the little end on the right as usual, and justified to the right as a reminder of which end goes first.
Our example becomes
4660 == 0x1234
and serializes as (reading righttoleft)
0001 0010 0011 0100
This time when we break it in half and deserialize, we get (reading line by line downwards)
52 == 0x34 == 0011 0100
18 == 0x12 == 0001 0010
As before we are composing serialize (from 16 bits) and deserialize (to bytes) but this time with littleendian bit order, and the result is little endian byte order.
byte order, word order
So, when a system has a consistent endianness, we can understand this as the result of choosing between either the bigendian conversions to and from bits, or the littleendian conversion to and from bits. A consistent bit, byte, and word ordering comes from composing the two inverse conversion functions at various widths.
Bit numbering
It is usually said (in Cohen’s essay and elsewhere) that the
littleendian order is more mathematically elegant, because when bits
are numbered from zero, the deserialized value of bit i
is 2**i
(where **
is exponentiation).
On the other hand, the bigendian order corresponds better to the way we store strings, and to lexicographic sorting.
I learned binary, bits, bytes, and words, on littleendian computers, and I was convinced (as only teenagers can be) that littleendian was the Right Thing and that bigendian ordering didn’t make sense. But there is a mathematically elegant way to view the bigendian order too.
fractions and strings
Let’s imagine we have a set of values, which we are making progressively longer.

If our values are littleendian numbers, when we add bits they get appended on the big end, on the left. This can drastically alter the values of our numbers, and how they sort relative to each other.

If our values are decimal fractions, when we add digits it only makes a small difference to their values, and a small difference to how they sort.

If our values are strings, adding characters can only make small refinements to their lexigographic order.

If our values are bigendian numbers, the set behaves like strings or fractions: making the values longer only makes small changes to the way they sort.
bigendian elegance
This analogy suggests that we should treat bigendian numbers like
binary fractions, so the value of bit i
is 2**i
.
We usually want to deserialize to an integer, not a fraction, so if we
have N
bits we will need to multiply by 2**N
. (This seems like an
extra thing that isn’t necessary for littleendian numbers  or is
it…?)
But wait, there’s an offbyone error! Either we need to count bits
starting with i = 1
or we need to multiply by 2**(N1)
.
There is a good reason, in this situation, to count from 1, because that gives us a binary fraction between 0 and 1. The effect is a virtual binary point just before the left end of the number, at the start of the bit string.
example
If we have a bit string
0001 0010 0011 0100
We treat it as a binary fraction
0b0.0001_0010_0011_0100
We multiply by 2**16
to get
0b0001_0010_0011_0100.0 == 0x1234 == 4660
Duality
With the right framing, we can see that the bigendian order and the littleendian order are perfectly dual to each other.
bigendian littleendian
first bit most significant least significant
transmission order lefttoright righttoleft
count bits from 1 0
bit value 2 ** i 2 ** +i
binary point to the left to the right
convert to integer mul 2**N mod 2**N
In the expressions, i
is a bit number, and N
is the word size.
Note that in both cases the binary point is just before the first bit in transmission order.
When converting to an integer, imagine we are deserializing an N
bit
word from the start of a longer bit sequence.
In the bigendian case, we multiply by 2**N
and take the integer
part as the value of our word; the fraction part is the rest of the
bit sequence.
In the littleendian case we divide by 2**N
and take the remainder
as the value of our word; the quotient is the rest of the bit
sequence.
In both bigendian and littleendian cases, after multiplying or dividing, the rest of the bit sequence retains the same interpretation of bit numbers, bit values, and position of the binary point.
Mixed endian nonsense
What happens if we mix bit sequences of differing endianness? Nothing good! Here are some examples of the nonsense results and complete brokenness we get if we are careless.
bitwise
Let’s take two 12 bit numbers and serialize one bigendian and one littleendian.
0001 0010 0011 == 0x123 == 291
291 == 0x123 == 0001 0010 0011
When we concatenate them, keeping the bit sequences in their specified orders, we read the bigendian one lefttoright and the littleendian one righttoleft. (This is just an artefact of the way I am writing the numbers; if the sequences were represented as lists of bits, we would be traversing them both the same way.) The result is:
0001 0010 0011 1100 0100 1000
We can deserialize it bigendianwise into a 24bit number and we get:
0x123C48
Half the number has had its bits reversed, which is almost never going to be what anyone wanted.
bytewise
In practice, software isn’t going to represent a bit sequence using a list; we’ll use something more compact and efficient like a byte vector. We’ll have to pay careful attention when its length is not a multiple of 8, though!
In our example, the representation of our bigendian sequence will look like:
0x12
0x3?
The second byte has been filled from the left, bigendian style. The ? marks the 4 spare bits.
Before we concatenate a second 12 bit value, it needs to be shifted right by the length of the first sequence (mod 8):
0x?1
0x23
Now the used and unused bits mesh neatly so they can be combined into
the expected sequence, 0x123123
:
0x12
0x3?
++
0x?1
0x23
==
0x12
0x31
0x23
The representation of a littleendian sequence fills bits from the right, and concatenation requires a left shift:
0x23
0x?1
++
0x3?
0x12
==
0x23
0x31
0x12
If we try to concatenate sequences of different endianness, then the used and spare bits clash instead of meshing. It doesn’t work at all:
0x12 0x23
0x3? 0x?1
++ ++
0x3? 0x?1
0x12 0x23
== ==
???? ????
wordwise
When we use a word to represent shortenough bit sequences, it’s quite easy to concatenate them. For a bigendian sequence:
typedef struct bits { unsigned len, val; } bits;
bits bebits_cat(bits a, bits b) {
return (bits){ .len = a.len + b.len,
.val = a.val  b.val >> a.len };
}
And for a littleendian sequence it is:
bits lebits_cat(bits a, bits b) {
return (bits){ .len = a.len + b.len,
.val = a.val  b.val << a.len };
}
If we use these concatenate functions with arguments of mismatched endianness, once again we get complete nonsense.
For instance, when we call bebits_cat()
with a littleendian second
argument, the right shift will throw away part of the sequence because
it is held at the least significant end of b.val
.
muchfool
This is a longwinded way of saying that littleendian and bigendian bit sequences need to be treated as distinct types that cannot be mixed  at least, not in a naive way.
Unfortunately, in the real world there are systems with inconsistent endianness. How can we support them if we can’t combine littleendian and bigendian bit sequences? It’s tempting to make the concatenation functions more complicated so that they work a bit better (see my comments about Erlang below for an example) but we can do better.
Practical mixed endianness
In the previous section we tried to combine bigendian and littleendian values without any explicit conversion, and it was a disaster. So what is a good way to convert between bigendian and littleendian?
Usually we talk about byteswapping, but my contention is that bytewise endianness isn’t fundamental; it arises from an underlying bitwise endianness. And there are realworld cases that are not handled by byteswapping: for example, Cohen talks about floating point numbers on the DEC PDP11 and VAX, which are stored as a sequence of littleendian 16 bit words in bigendian order.
This gives us a clue that endianness converstion needs to depend on a bit length somehow: 8bit bytes for the usual byteswapping, or 16bit words for DEC floating point.
little to big
For example, let’s use the 32 bit number
0x12345678
We’ll serialize it to a littleendian bit sequence (which I have written with the first bit on the right, as before)
0001 0010 0011 0100 0101 0110 0111 1000
To convert this littleendian bit sequence to bigendian, we deserialize it (reading righttoleft) to make a sequence of byte values:
0111 1000 == 0x78
0101 0110 == 0x56
0011 0100 == 0x34
0001 0010 == 0x12
And reserialize it to a bigendian bit sequence, reading the bytes toptobottom, lefttoright:
0111 1000 0101 0110 0011 0100 0001 0010
If we then deserialize it back to a 32 bit number (reading bigendian bits lefttoright), we get:
0x78563412
We have byteswapped it, without using any bytespecific operations. If we had changed the intermediate word size parameter from 8 to 16, we would have converted via a sequence of 16 bit words, resulting in a PDP11 wordswapped number instead of byteswapped.
efficiency
It seems like a lot of faff to do byte swapping by composing four operations. But look a bit closer.
Say we are on a littleendian computer; the first operation (serializing a 32 bit number) is a noop; the second operation (deserializing to bytes) is also a noop.
If our internal representation of bit sequences is bytewise, then the third operation (serializing a byte sequence) is also a noop. The final operation, deserializing a bigendian bit sequence to a 32bit value, is where the byte swapping happens.
Alternatively, if our internal representation is wordwise, the third operation (serializing a byte sequence) requires byte swapping, and the final operation (desrializing to a 32bit value) is a noop or maybe just a shift.
I have not tried implementing byte swapping by composing four functions like this, so I don’t know how feasible it is for a compiler to actually eliminate the noops…
Summary
Here’s a sketch of what this might look like as an API.
We need two types, bebits
for bigendian bit sequences, and lebits
for littleendian bit sequences. They both implement the same bits
abstract data type, and behave the same way, except that the differing
endianness becomes visible when you serialize and deserialize values
of differing widths.
Each type has an algebraic identity value: there are constants for an
empty bebits
and an empty lebits
.
serialize
An unsigned number can be appended to a bits
type. (To convert a
number by itself, append it to the relevant empty bits value.) Most
languages don’t have arbitrarywidth numbers (uint19_t
or whatever)
so there probably needs to be a separate parameter saying how many
bits to extract from the number.
Signed numbers need to be converted to unsigned before being serialized, using two’s complement or whatever other format is required.
deserialize
An unsigned number of some size (specified with a parameter or implied
by the return type) can be extracted from a bits
type, yielding the
number represented by the prefix of the bit sequence, and the rest of
the bit sequence. The bit sequence must be at least as long as the
size of the resulting number.
concatenate and split
Two bits
values of the same type can be concatenated. The left
operand is always first and the right operand second (there is no
endian order dependence), so you get the same results if you
concatenate bits
then deserialize to bytes, or if you deserialize to
bytes then concatenate byte strings.
You can also split a bit sequence into two subsequences, so you can deserialize the parts separately.
length
So you can find out the number of bits in a bit sequence  how much there is left to deserialize, etc.
duality
I have described serialize as appending to an existing bits
value so
that it is dual to deserialize.
Haskell fans will appreciate that serializing a list of numbers to
bits
is just folding serialize over the list; similarly,
deserializing bits
to a list of numbers is unfolding deserialize.
If your programming language has the right kind of polymorphism then
you can use the same serialize, deserialize, concatenate, and length
interfaces for both bebits
and lebits
, with the endianness
determined by the type of the bits
argument.
conversion
To convert between bebits
and lebits
, deserialize to a list of
numbers and reserialize them again; 8 bit numbers gives you byte
swapping, 16 bit numbers gives you PDP11style word swapping, etc.
This definition of conversion is an example of a hylomorphism. In principle this pattern should make it amenable to optimizations such as deforestation.
Elsewhere
In September 2016 I started writing these notes but I didn’t like how it was turning out, so I stuck it on the back burner. Four years later I’ve rewritten the notes so that I can clear the old draft out of my inbox. This version is better at treating bytes as nothing special, and I now have a better description of the duality between bigendian and littleendian.
Erlang
In yet another September even more years ago, I wrote about endianness in Erlang bit strings, and complained about some crazy behaviour.
Erlang bit syntax is a good example of a wrong way to resolve mixed endian nonsense that I hinted at earlier. Erlang bit strings are bigendian by default, and concatenation always follows bigendian behaviour.
If you append a littleendian value, it is byteswapped, then treated as a bigendian value.
Erlang’s bit syntax is utterly brilliant; it’s a shame that it’s littleendian support is messed up.
Haskell
My abortive 2016 article was inspired by a nice paper about bitdata: Highlevel Views on Lowlevel Representations. It describes a type system (with type inference) for bitdata, and a neat syntax for parsing and serialization based on pattern matching algebraic data types. There are two main limitations with their work:

It only deals with wordsized data; they aren’t trying to tackle the representation of larger structures or network packets.

Because of that they can ignore questions of endianness.
Their followup paper, Strongly Typed Memory Areas: Programming SystemsLevel Data Structures in a Functional Language extends their ideas to cover what they didn’t (in the end) call “bytedata”. They build up larger structures from bytealigned bitdata; members of a structure can be marked as LE or BE and they are byteswapped as required.
What else?
I’m curious if there are other descriptions of endianness that tackle it in a similar way, starting with bit order and deriving byte order as a consequence.
Also I’m not enough of a mathematician to formalize this in a way that properly elucidates the duality between bigendian and littleendian. Endianness is such a commonplace concept (and usually so ugly that it’s desperately in need of cleaning up) that surely someone has done so already…
edited to add: this article has a sequel with improved symmetry