.@ Tony Finch – blog


This is one of the worst examples of willful incompetence I have seen recently, so I thought it deserved a bit of point and laugh.

In the last couple of weeks we received two complaints that email from ABN AMRO (a large Dutch bank) to Cambridge was unreliable. According to the bounce messages that were forwarded to us, their Postfix outbound servers were complaining “conversation with mx.cam.ac.uk timed out while sending MAIL FROM”. Strange: we don’t do anything particularly time consuming at that point in the SMTP conversation. Perhaps it’s an MTU problem?

I try turning on some extra logging, and Exim says “SMTP connection from triton10.abnamro.nl lost while reading message data (header)”. This is inconsistent with an MTU problem, since the envelope commands (MAIL FROM, RCPT TO, DATA) are larger than the replies, and Exim has received the commands and sent the replies OK. It’s also inconsistent with Postfix’s error message, since Postfix obviously sent the MAIL FROM without a problem. It turns out there’s a minor bug in Postfix that causes it to use an incorrect error message when there’s a timeout waiting for a reply from the server.

OK, so ABN AMRO’s Postfix is timing out while waiting for our envelope replies, but our replies are sent reasonably promptly. I resort to running a very selective tcpdump on mx.cam.ac.uk to see if that provides a clue. There is indeed no sign of an MTU problem: what is actually happening is their end is closing the connection only 15 seconds after it has sent the envelope commands. Exim doesn’t check for a closed connection until it wants to read more data, which explains its error message.

So it looks like their end has an absurdly small 15 second timeout, which triggers if we take too long to emit the envelope replies - which can happen if recipient address verification takes a while. The standard requires at least a five minute timeout, and we’re careful to stay within that limit. They are just asking for trouble if they reduce their timeout to such a short period, and they would have to deliberately break Postfix which ships with correct defaults.

So I tried getting in touch with their postmasters. I first tried postmaster@nl.abnamro.com since one of the problem reports came from an @nl.abnamro.com address. I received two bounces from their Lotus Notes system. I then tried postmaster@abnamro.com, and after a day without a reply I tried postmaster@abnamro.nl. I still haven’t received a reply.

I also asked the people who reported the problem to chase it up with their IT staff. Eventually I got the reply that they are aware of the problem but there is no “business justification” to fix their broken systems. I bet ABN AMRO’s management would do something if their post room was chucking letters in the bin and ignoring support queries, so why do they tolerate such crapness for email?