When SMTP was originally developed in 1981 it was indeed simple: messages were flat-ASCII only, with no support for interesting languages or media or structured messages. But that was OK, because people had been using email without those features for nearly a decade. By the start of the 1990s this was embarrassingly limited, so MIME was developed to support multimedia and multilingual email in a backwards-compatible way. This didn't require any enhancements to SMTP, but as a consequence it was somewhat inefficient. So a mechanism for extending SMTP was developed, which (amongst other things) allows MIME messages to be transmitted more efficiently. Fifteen years later another significant change is being worked on, to move from ASCII-only email addresses to internationalized UTF-8 addresses, while still preserving backwards compatibility where possible. The end result is that even getting the bits from A to B is amazingly complicated, as I shall now explain.
The basic email message format, dating right back to before RFC 561 in September 1973, divides the message data into a header and the body. The header is mostly "protocol" data (i.e. for consuption by computers) whereas the body is "text" (i.e. for consumption by people), according to the terminology of RFC 2277. This distinction is relevant to MIME because the extensions it introduces can change the interpretation of the "text" parts of messages, but they leave existing the protocol parts unchanged. (Of course MIME introduces new "protocol" headers for its own purposes.)
In RFC 2045 MIME defines three classes of data:
Since only 7bit data can be transmitted un-encoded, MIME also defines two so-called content-transfer-encodings which parallel the 8bit and binary data classes.
I said above that the message header is only "mostly" protocol data. There are some important parts which are human-readable text, including the Subject: and the parts of the From: / To: / CC: headers which give people's names. RFC 2047 defines a sort of mini-MIME for use in these headers. It's "mini" because information about the character set and encoding is bundled in with the encoded text in a compact form, rather than being placed in separate headers in longhand like the message body's MIME metadata. RFC 2047 has "q" and "b" encodings which are similar to quoted-printable and base64.
If that isn't complicated enough, there are bits of header which are "protocol" according to RFC 2277, but nevertheless can't always fit within the limits of ASCII. The principal example of this is attachment filenames. So there is yet another encoding defined in RFC 2231.
So that is a summary of the baroqueness that is MIME encodings. One of the things that ESMTP aims to do is (eventually) make all of this encoding unnecessary. There are three existing extensions for this purpose, and one further extension which is currently in the works. These extensions roughly parallel MIME's encoding features.
However, before listing the extensions I should note that with un-extended SMTP, some encoding is necessary to transmit even a plain 7bit ASCII message. Messages are transmitted as a sequence of lines terminated by a line containing only a dot. Lines in the message which start with a dot have an extra dot added so that the message is not terminated prematurely. This is called dot-stuffing.
These four extensions are not orthogonal, so they combine to produce eight possible forms of SMTP message transmission, one of which is very unlikely to occur in the real world. The more extensions that the server supports, the less you need to encode. The following list enumerates the possible combinations of extensions, and for each combination it states what data must be encoded. "Headers" means that non-ASCII data in the headers must be encoded according to RFC2047 or RFC2231 (otherwise it must be UTF-8); "8bit" means that 8bit data must be encoded with quoted-printable or base64; "binary" means that binary data must be encoded with base64; and "dot-stuffing" means dots at the start of lines must be doubled.
The specifications for these extensions say that when an MTA that supports them is transmitting a message, it must translate un-encoded data into its encoded form if the receiving MTA doesn't support the necessary extensions. In the absence of these extensions, an MTA can pretty much ignore the message data. (The separation of the layers isn't entirely perfect because of things like Received: trace headers and the way delivery failures are reported.) An MTA that does support these extensions has to have a full MIME implementation. The kind of layering violation that makes a mockery of the ISO model :-)
Note that while the specifications require support for downgrading an un-encoded message into encoded form, upgrading is optional. Whether the extra effort is worthwhile or not probably depends on the relative cost of CPU and network.
Obviously this complexity has implications for the design of MTAs, but I will write about that another time.