no-longer-simple mail transport protocol

When SMTP was originally developed in 1981 it was indeed simple: messages were flat-ASCII only, with no support for interesting languages or media or structured messages. But that was OK, because people had been using email without those features for nearly a decade. By the start of the 1990s this was embarrassingly limited, so MIME was developed to support multimedia and multilingual email in a backwards-compatible way. This didn't require any enhancements to SMTP, but as a consequence it was somewhat inefficient. So a mechanism for extending SMTP was developed, which (amongst other things) allows MIME messages to be transmitted more efficiently. Fifteen years later another significant change is being worked on, to move from ASCII-only email addresses to internationalized UTF-8 addresses, while still preserving backwards compatibility where possible. The end result is that even getting the bits from A to B is amazingly complicated, as I shall now explain.

The basic email message format, dating right back to before RFC 561 in September 1973, divides the message data into a header and the body. The header is mostly "protocol" data (i.e. for consuption by computers) whereas the body is "text" (i.e. for consumption by people), according to the terminology of RFC 2277. This distinction is relevant to MIME because the extensions it introduces can change the interpretation of the "text" parts of messages, but they leave existing the protocol parts unchanged. (Of course MIME introduces new "protocol" headers for its own purposes.)

In RFC 2045 MIME defines three classes of data:

7bit data is what may be transmitted by un-extended SMTP without any encoding. The data consists of lines of up to 1000 characters terminated by a CRLF (i.e. a pair of bytes with values 13 and 10), Bytes may only have values between 1 and 127, and values 13 and 10 may only occur as part of a CRLF.
8bit data is the same as 7bit data, except that bytes may have values above 127. Many textual formats count as 8bit data, for example text in ISO 8859 charsets or in UTF-8.
Binary data has no restrictions. As well as multimedia data such as images and sounds, some textual formats are binary, for example UTF-16.

Since only 7bit data can be transmitted un-encoded, MIME also defines two so-called content-transfer-encodings which parallel the 8bit and binary data classes.

The quoted-printable encoding is designed for use with 8bit data that has a small proportion of bytes outside the 7bit gamut. Most bytes are un-encoded, and encoded bytes are expanded to a three byte sequence consisting of an = sign followed by two hex digits. Although it is designed to be reasonably unobtrusive when used with textual data, such that it remains mostly readable even if not decoded, it can handle arbitrary binary data. However the typical expansion factor for compressed data is 2.25 which is not very efficient.
The base64 encoding is designed for binary data. It encodes three data bytes in four bytes of mostly alphanumeric ASCII, for an expansion factor of 1.33.

I said above that the message header is only "mostly" protocol data. There are some important parts which are human-readable text, including the Subject: and the parts of the From: / To: / CC: headers which give people's names. RFC 2047 defines a sort of mini-MIME for use in these headers. It's "mini" because information about the character set and encoding is bundled in with the encoded text in a compact form, rather than being placed in separate headers in longhand like the message body's MIME metadata. RFC 2047 has "q" and "b" encodings which are similar to quoted-printable and base64.

If that isn't complicated enough, there are bits of header which are "protocol" according to RFC 2277, but nevertheless can't always fit within the limits of ASCII. The principal example of this is attachment filenames. So there is yet another encoding defined in RFC 2231.

So that is a summary of the baroqueness that is MIME encodings. One of the things that ESMTP aims to do is (eventually) make all of this encoding unnecessary. There are three existing extensions for this purpose, and one further extension which is currently in the works. These extensions roughly parallel MIME's encoding features.

However, before listing the extensions I should note that with un-extended SMTP, some encoding is necessary to transmit even a plain 7bit ASCII message. Messages are transmitted as a sequence of lines terminated by a line containing only a dot. Lines in the message which start with a dot have an extra dot added so that the message is not terminated prematurely. This is called dot-stuffing.

The 8BITMIME extension, defined in RFC 1652, allows you to transmit 8bit attachments un-encoded. This is almost universally supported now.
The CHUNKING extension, defined in RFC 3030, allows you to use a different form of framing for messages. Instead of dot-stuffing, you can transmit the message in byte-counted chunks. As well as being marginally more efficient than the traditional framing, it makes it possible to use...
The BINARYMIME extension, also defined in RFC 3030, allows you to transmit binary attachments un-encoded.
The UTF8SMTP extension is still a draft. It's part of the effort to internationalize email addresses, which follows from the introduction of internationalized domain names. One of the side-effects of this extension is that it will be possible to use UTF-8 in message headers un-encoded. This should make RFC 2047 and RFC 2231 un-necessary. (One of the pre-requisites for this is a commonly accepted universal character set, which we now have. UTF-8 has only really achieved this status in the last few years.)

These four extensions are not orthogonal, so they combine to produce eight possible forms of SMTP message transmission, one of which is very unlikely to occur in the real world. The more extensions that the server supports, the less you need to encode. The following list enumerates the possible combinations of extensions, and for each combination it states what data must be encoded. "Headers" means that non-ASCII data in the headers must be encoded according to RFC2047 or RFC2231 (otherwise it must be UTF-8); "8bit" means that 8bit data must be encoded with quoted-printable or base64; "binary" means that binary data must be encoded with base64; and "dot-stuffing" means dots at the start of lines must be doubled.

No extensions: headers, 8bit, binary, dot-stuffing.
8BITMIME: headers, binary, dot-stuffing.
8BITMIME + UTF8SMTP: binary, dot-stuffing.
CHUNKING: headers, 8bit, binary. (unlikely!)
CHUNKING + 8BITMIME: headers, binary.
CHUNKING + 8BITMIME + UTF8SMTP: binary.
CHUNKING + 8BITMIME + BINARYMIME: headers.
CHUNKING + 8BITMIME + BINARYMIME + UTF8SMTP: no encoding.

The specifications for these extensions say that when an MTA that supports them is transmitting a message, it must translate un-encoded data into its encoded form if the receiving MTA doesn't support the necessary extensions. In the absence of these extensions, an MTA can pretty much ignore the message data. (The separation of the layers isn't entirely perfect because of things like Received: trace headers and the way delivery failures are reported.) An MTA that does support these extensions has to have a full MIME implementation. The kind of layering violation that makes a mockery of the ISO model :-)

Note that while the specifications require support for downgrading an un-encoded message into encoded form, upgrading is optional. Whether the extra effort is worthwhile or not probably depends on the relative cost of CPU and network.

Obviously this complexity has implications for the design of MTAs, but I will write about that another time.