More spam bot signatures

There's another spam bot heuristic which is the exact complement to the one described in my prevous post. The idea is to keep rack of how many different HELO domains an SMTP client uses.

  defer
    message   = Probable spam bot HELO varies between $sender_rate domains
    # whitelist checks go here
    ratelimit = 2.2 / 1w / per_conn / strict \
      / unique=$sender_helo_name / $sender_host_address

This is mostly the same as the code in my previous post, but the lookup key is the client's IP address, and we only increase the measured rate if the client uses a different unique HELO domain.

This check also works very well at detecting spam bots. When testing it I noticed that one particular bot likes to use a HELO domain consisting entirely of random upper-case letters, and it only talks to one of our servers with the lowest IP address. So I added a specific (cheaper) check to deal with it. This causes the bot to go away, and legitimate senders will retry with a different server.

  defer
    message   = Probable spam bot HELO - please try another server
    condition = ${if and{{ eq{$primary_hostname}{ppsw-0.csi.cam.ac.uk} } \
                              { match{$sender_helo_name}{^[A-Z]+\$} }} }

There is a small risk of false positives with the variable HELO domain test. Some outgoing mail server clusters are behind a NAT, so we see HELO domains from multiple servers coming from the same IP address. I also found a number of false positives for the popular HELO domain check (most prominently rediffmail.com and easyjet.com). The way I'm dealing with them (which I hope will work in the long term) is as follows.

For the popular HELO domain check described in my previous post, I maintain a lookup table of legitimate HELO domains that trigger the check. The following replaces the hard-coded check for localhost.localdomain that appears in my previous entry.

    condition = ${if !match_domain{$sender_helo_name}{cdb;DB/helo_ok.cdb} }

As well as allowing through clients whose HELO domain can be verified, I now also check dnswl.org for well-known legitimate senders, and I maintain a table of sending hosts that slip through the other checks.

  ! verify   = helo
  ! hosts    = +helo_ok
  ! dnslists = list.dnswl.org

To avoid problems with false positives, my anti-spam-bot checks return a temporary error code ("defer" in Exim-speak instead of "deny"). I then have a nightly audit script which looks for hosts that appear to be repeatedly retrying a messages and which should be added to my whitelist tables. It might be possible to automate this table maintenance, again using the ratelimit feature, but I expect that will require a bit more experience to determine the right thresholds.