Email message identification

There are at least six forms of identifier in SMTP which identify messages or parts of messages or transmissions of messages.

Message-ID [RFC2822]

The message’s primary identifier appears in its Message-ID header field. It’s a key part of the way Usenet works, where Message-IDs must be globally unique and are used by servers to decide whether or not they have a copy of a message already. They are much less significant in email, but some servers do per-mailbox de-duplication when delivering a message.

Content-ID [RFC2045,RFC2046]

If a message has several MIME parts then they can cross-reference each other using their Content-IDs. This is most commonly used for embedding images in HTML email. Content-IDs are supposed to be a globally unique identifier for the file, so that you can tell (for example) that various message/external-body parts refer to the same thing.

Resent-Message-ID [RFC2822]

When you re-send a message it gets an extra message ID alongside the extra from/to/date fields. This could be used to distinguish different re-sendings of the same message.

Some MUAs, e.g. Pine, call the re-send operation “bounce”, but this causes confusion with delivery failures. Re-sending is more similar to forwarding a message, but without wrapping the original message in a new multi-part message with a covering note. Mailing lists effectively re-send messages but they don’t use Resent- headers to explain what they did, which is a shame.

spool ID [RFC2821]

Received: headers include an ID field which is used to record the spool ID (sometimes called the queue ID). This is essentially the filename of the message within the MTA’s spool, so the message will get a new spool ID each time it is transmitted. When a message is accepted, the receiver-SMTP usually mentions its spool ID in its final 250 response, so that the sender-SMTP can log it. This makes it easy to correlate the logs for a particular message across multiple MTAs. (MICROS~1 Exchange instead mentions the Message-ID, which is no use at all since the sender already knows it.)

TRANSID [RFC1845]

The transaction ID is part of the SMTP extension for checkpoint/restart, which allows a client to recover gracefully from a lost connection. The sender-SMTP generates a TRANSID for each transmission of a message, so in the normal case they correspond one-to-one with the receiver-SMTP’s spool IDs. (There may be more than one spool ID per TRANSID if the sender-SMTP tries to restart a failed transaction for which the receiver-SMTP has discarded the previous spool file.) Not much software implements this extension.

ENVID [RFC3461,RFC3885]

This is something like a cross between the Resent-Message-ID and the TRANSID. It is generated in a similar manner to a TRANSID and also appears in the message envelope, but once a message has a TRANSID it is preserved until the message gets re-sent. The idea is that the TRANSID is included in a delivery status notification so that mailing list managers (etc.) can correlate the DSN with the message that caused it. A Resent-Message-ID would serve almost as well.

A couple of weeks ago, Ian Christian asked on the Exim-users list “Wouldn’t a connection-ID be a useful thing to have?” and I had already come to the conclusion that, yes, it would.

At the moment, Exim generates its spool ID quite late, after it has received the message header. This means that rejected RCPT commands can’t always be correlated with messages, though this turns out not to be a problem in practice. The other thing you can’t do is tell which messages were sent down the same connection, unless you tell Exim to log more connection details, and even then it isn’t immediately obvious.

My design would be to create a connection ID for every incoming connection, which can be used for logging everything related to that connection. When a MAIL command is received, a spool ID is created which has the connection ID as a substring, and this is used for logging all commands related to that transaction. When the message is forwarded onwards, a TRANSID is created which has the spool ID as a substring. Thus the postmaster can grep broadly or selectively by using shorter or longer identifiers.

It isn’t possible to correlate everything easily, though, because if you send multiple messages down the same outgoing connection, they will have unrelated TRANSIDs, so the MTA will still have to log details of outgoing connections separately.