.@ Tony Finch – blog


This is in response to mathew's recent post. I'm posting it here (and emailing it) because I can't get his frustrating Wordpress installation to log me in.

He asks, "It’s possible to write some C code to work out whether a machine’s architecture is little-endian or big-endian with respect to bytes. Is it possible, using only ANSI C, to work out whether the machine’s architecture is big-endian or little-endian with respect to bits?"

Machines do not (in general) have any bitwise endianness that's visible to programmers. In order to access bits you have to use shift and mask operations in registers, and these operations do not imply an absolute numbering of bits in the same way that the machine's addressing architecture does for bytes. (Instruction architecture documentation will often number the bits in a word, but a program running on the machine can't tell whether this documentation uses big-endian or little-endian conventions, or whether it numbers the bits from zero or one.)

Machines do have bitwise endiannes at a lower level that is invisible to the programmer, when buses or IO serialise and deserialise bytes, e.g. to and from RAM or disk or network. The serialized form is not accessible, so its endianness doesn't matter. There is an exception to this, which is output devices where the serialised form is visible to the user, e.g. displays with sub-byte pixels. But this is the device's endianness, not the whole machine's, and different devices attached to the same machine can legitimately disagree.

The same arguments about machines' instruction architectures apply to most of the C programming model, i.e. bits are mostly only accessible by shifts and masks, so endianness isn't visible. The exception is bit fields in structs, which allow words to be broken down into units of arbitrary size. Bit fields must have an endianness convention for how they are packed into words, but this is chosen by the compiler and need not agree with the hardware's byte-wise endianness. However the compiler usually does agree with the hardware, so that the following two structures have the same layout, though this is not required by the standard:

        struct a {
                unsigned char first;
                unsigned char second;
        };
        struct b {
                unsigned first : 8;
                unsigned second : 8;
        };