A history of S_IFMT

In Unix, S_IFMT is a mask identifying the bits of an inode's mode that indicate the file's type, i.e. whether it is a directory, a symbolic link, a socket, and so on. It is conventionally 0170000, which corresponds to the top 4 bits of a 16-bit mode.

I saw someone asking the other day why 4 bits are used when POSIX only defines 7 types, and so could be stored just as well in 3 bits. The straightforward answer is that it allows room for expansion, and indeed many Unixes define several more. Solaris, for example, has an additional 3 types: doors, event ports, and ACL shadows (though the latter is not exposed in userspace).

But that's not the whole story. The question I'm going to answer in this post is not why 4 bits are used, but why they're used the way they are. If you have a look at the standard file types, their values seem pretty arbitrary, when you might expect a simple count upwards.

SymbolOctalBitsType
S_IFMT1700001111
S_IFIFO0100000001Named pipe
S_IFCHR0200000010Character special
S_IFDIR0400000100Directory
S_IFBLK0600000110Block special
S_IFREG1000001000Regular file
S_IFLNK1200001010Symbolic link
S_IFSOCK1400001100Socket

I saw some patterns in there, but I couldn't work it out, so I had a look at some historical manuals and header files.

1st Edition UNIX

1st Edition UNIX (1971) had no type field as such. The top 4 bits of the mode had the following layout. A dot (.) means that the bit's value doesn't matter.

OctalBitMeaning
1000001...Inode is allocated
040000.1..Directory
020000..1.Has been modified
010000...1Large file storage

We can see the origin of S_IFDIR here, but the other bits had completely different meanings. In fact, 1st Edition had a very different layout for the mode in general. For one thing, groups had yet to be introduced. The bottom 6 bits were used, from higher to lower, to mean: setuid, executable, owner-read, owner-write, other-read, and other-write. And so 1st Edition ls might write --xrwr- to mean something like -rwxr-xr-x today.

Bit 020000 was apparently always set to 1, and so was likely just ignored by the time of the 1st Edition. Bit 100000 was also always set to 1 for allocated inodes, but this allowed the file system to distinguish between an unallocated inode and a regular file with no permissions (-------).

4th Edition UNIX

The mode layout changed in 4th Edition UNIX (1973), coinciding with the addition of groups and a switch to the modern -rwxrwxrwx layout for the file permissions. This was the first Unix to have a mask for these inode types, though it was only 2 bits wide, taking the place of the directory bit and modification bit.

SymbolOctalBitsType
IFMT0600000110
000000.00.Regular file
IFCHR020000.01.Character special
IFDIR040000.10.Directory
IFBLK060000.11.Block special

The allocation bit (IALLOC) and large file bit (ILARG) were still used as in the 1st Edition.

7th Edition UNIX

The next change happened in 7th Edition UNIX (1979), when the mask was extended to the present 4 bits, by extending it by a single bit in each direction, displacing IALLOC and ILARG. Yet each bit retained its absolute position in the mode, which is why the earliest types are not counted from 1. In addition, regular files kept their highest bit set (as it will have been when IALLOC was in use), so as to distinguish between an unallocated inode (stored with a fully zeroed mode), and a regular file with no permissions (----------).

Also added were two types no longer in use, multiplexed special files, which had the same codes as their uniplexed counterparts, but with their lowest bit set. These types did not however last long.

SymbolOctalBitsType
S_IFMT1700001111
S_IFCHR0200000010Character special
S_IFMPC0300000011Multiplexed character special
S_IFDIR0400000100Directory
S_IFBLK0600000110Block special
S_IFMPB0700000111Multiplexed block special
S_IFREG1000001000Regular file

System III

System III (1982) added named pipes, starting at the lowest value now possible.

SymbolOctalBitsType
S_IFIFO0100000001Named pipe

4.3BSD

4.3BSD (1986) added symbolic links and sockets, also counting up but only using the top 3 bits, 160000, presumably so as not to step on AT&T's toes.

SymbolOctalBitsType
S_IFLNK1200001010Symbolic link
S_IFSOCK1400001100Socket

Enumerating S_IFMT

Something interesting (to me) about how this layout has come about is that, if you twiddle the bits a little, you can end up with a reasonably chronological numbering of the types. Specifically, in code:

fmt = mode >> 12;                      // drop file permissions, leaving IFMT
if (fmt == 010) return 0;              // if only IALLOC bit is set, clear it
return ((fmt >> 1) | (fmt << 2)) & 07; // fold rightmost bit onto leftmost bit

And this gives us:

#Type
0Regular file
1Character special
2Directory
3Block special
4Named pipe
5Symbolic link
6Socket

So anyway, those are the reasons for the unusual S_IFMT values.