More log-structured MTA queues.

At the end of http://fanf.livejournal.com/65911.html I mentioned that it might be beneficial to have multiple queues, in order to reduce the density of garbage in the older parts of the queue. There are at least a couple of other reasons why one might want multiple queues.

Even faster

In the absence of any other bottlenecks, the MTA is going to be limited by the rate that it can fsync the main queue. You can raise this limit if you have two parallel queues on different disks, and spread the load of incoming messages between them. I got this idea from the Sendmail X design document which describes a similar (but not quite so neat) queue structure to the one I have been describing.

TURN

SMTP's TURN feature allows a client to ask the server to deliver any email queued on the server for a particular domain. This might be used by business dial-up customers who call in to their ISP every so often to collect email.

There are two variants of TURN in the current specifications, ETRN and ATRN, because the original form was insecure. With ETRN the server delivers the queued messages as if from a normal queue run; the only security concern is that the server must have some throttling to prevent clients from starting unlimited numbers of queue runners. With ATRN the existing connection is used to deliver the messages from the server to the client, which switch roles after the ATRN command. The client must be authenticated so that the server knows the client is permitted to receive the email it is asking for. ATRN is used on the "on-demand mail relay" port 366 instead of the usual SMTP port.

The basic implementation of ETRN (common to sendmail, Exim, and older Postfix) is for the server to fire off sendmail -qR, which scans the entire queue for messages with recipient addresses containing the domain given by the client. This is horribly inefficient if you have lots of messages on the queue, and the more clients you have using ETRN the more your queues get clogged with undeliverable messages.

The solution to this problem is to get messages for your ETRN domains off the queue; with Exim this is typically done by delivering them to a batch-SMTP file per domain, which can then be re-injected for delivery fairly efficiently when the client says ETRN. This kind of setup is a must for ATRN: whereas ETRN uses normal SMTP routing and delivery (which works if the clients have static IP addresses), ATRN does not, so there is generally no way to deliver the messages except by ATRN. ETRN is an optimisation to allow clients to tell the server not to wait for a retry timeout, whereas ATRN is purely on-demand so it is actually wrong to leave the messages on the retry queue.

With my log-structured queues we can use this idea but do it more efficiently. When a message is addressed to an ATRN domain, or cannot be delivered to an ETRN domain, its envelope is written to that domain's queue instead of appended to the main queue. One thing that makes this slightly more interesting is that the envelope may have to be split if it has recipients at multiple domains. This introduces the requirement for some kind of reference counting of spool files. The per-domain queue files are not routinely scanned, and when a client requests a delivery the server can simply and efficiently work through its queue.

Postfix leaves ETRN messages on its "deferred" queue, but optimises ETRN by keeping per-domain indexes of messages. This has the advantage of avoiding the need to split messages per domain, and means that ETRN domains are still retried even if the client doesn't ask. However it means that the deferred queue can get large and normal queue runs can get expensive. We should also periodically retry ETRN domains, but this won't happen unless they have messages on the main queue. To deal with that we should periodically probe these domains, which does not need any disk activity if the domain remains unreachable.