.@ Tony Finch – blog


Shortly before my wedding there was a discussion on the Exim-users mailing list about Exim’s handling of its hints databases, which cache information about the retry status of remote hosts and messages in the queue, callout results, ratelimit state, etc. At the moment Exim just uses a standard Unix DB library for this, e.g. GDBM, with whole-file locking to protect against concurrent access from multiple Exim processes.

There are two disadvantages with this: firstly, the performance isn’t great because Exim tends to serialize on the locks, and the DB lock/open/fsync/close/unlock cycle takes a while, limiting the throughput of the whole system; secondly, if you have a cluster of mail servers the information isn’t shared between them, so each machine has to populate its own database which implies poor information (e.g. an incomplete view of clients’ sending rates) and duplicated effort (e.g. repeated callouts, wasted retries).

The first problem is a bit silly, because the databases are just caches and can be safely deleted, so they don’t need to be on persistent storage. In fact some admins mount a ram disk on Exim’s hints db directory which avoids the fsync cost and thereby ups the maximum throughput. If you go a step further, you can take the view that Exim is to some extent using the DB as an IPC mechanism.

The traditional solution to the second problem is to slap a standard SQL DB on the back-end, but if you do this the SQL DB becomes a single point of failure. This is bad given that a system like ppswitch which is a cluster of identical machines that relay email does not currently have a SPOF. It also compounds the excessive persistence silliness.

It occurs to me that what I want is something like Splash, which is a distributed masterless database, which uses the Spread toolkit to reliably multicast messages around the cluster. Wonderful! The hard work has already been done for me, so all I need to do is overhaul Exim’s hints DB layer for the first time in 10 years - oh, and get a load of other stuff off the top of my to-do list first.

If it’s done properly it should greatly improve the ratelimit feature on clustered systems, and make it much easier to write a high-quality greylisting implementation. (A BALGE implementation is liable to cause too many operational problems to be acceptable to us.) It should also be good for Exim even in single-host setups, by avoiding the hints lock bottleneck. The overhaul of the hints DB layer will also allow Exim to make use of other more sophisticated databases as well as Splash, e.g. standard SQL databases or Unix-style libraries that support multiple-reader / single-writer locking.