This morning my colleagues turned off their Cisco PIX SMTP fuxup mode and most of the failing email finally made it through. But not all. So I broke out tcpdump again.
BTW, a handy way to split a tcpdump output file called dump into separate TCP connections is something like the following. This assumes that one of the endpoints of each connection is always the same port on the same server IP address.
tcpdump -vvv -r dump | sed '/.* clientname[.]\([0-9]*\) .*/!d;s//\1/' | sort -u | while read p; do tcpdump -r dump -w dump.$p port $p; done
I looked at the connections which ended with a timeout during the data transmission phase. It turned out that just two messages were still having problems. And what strange beasts they were. The following is a paraphrase of the key features of the messages.
MAIL FROM:SIZE=12345 250 OK RCPT TO: 250 Accepted DATA 354 Enter message, ending with "." on a line by itself X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: application/ms-tnef; name="winmail.dat" Content-Transfer-Encoding: binary Subject: Service Unavailable Date: Wed, 2 Sep 2009 12:34:56 +0100 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Service Unavailable Thread-Index: GV7mWrV6mdBQLdgGFAfkSCsA From: "/O=IT Support/OU=Department/cn=Configuration" "/cn=Servers/cn=MAILSERVER/cn=Microsoft Public MDB" ... loads of binary TNEF guff with lots of UTF-16 ...
The X.400 address is funny, but what was actually causing the problem was the attempt to send a binary message. Our servers don't support the BINARYMIME extension, so this is verboten. The reason it caused a timeout is that (being binary rather than lines of text) the message didn't end with a CRLF newline sequence, so the DATA dot-CRLF terminator did not appear at the start of a line, so our server thought it was part of the data and continued waiting for more.
Bizarre. Why is the message being sent to my servers when it should have been delivered internally? (Its recipient is a departmental address.) Why isn't Exchange downgrading the binary content as required by the specification? Alternatively, why is it using DATA to send a binary message when you can only do that using the BDAT command?
Very strange.