.@ Tony Finch – blog


Thanks for all the interesting comments on my previous post.

The reason I'm investigating this is to work around false positives caused by SpamAssassin's obfuscation rules. These are intended to match deliberate misspellings of commonly spammed goods such as Viagra. The specific instance that caused the bug report was a Reading postcode being treated as an obfuscated Rolex.

Therefore I'm not particularly worried about missing out obscure special cases like GIR 0AA and the overseas territories AAAA 1ZZ. However it might be worth tightening up the outcode regex, based on the list of UK postcode areas, to reduce the chance of matching a bogus postcode.

Also, the Post Office's postcode FAQ mentions that only London uses the ANA and AANA outcode formats. (In fact it's only the E, EC, SW, W, WC areas.) I managed to find a list of postcode districts which includes these outcodes (Wikipedia omits them) and it shows that the third position rule is wrong: it says M does not appear there but there is a poscode district London W1M. Rule Three also allows A and E which are not in fact used.

qr{\b
  ([BGLMNS][1-9][0-9]?
  |[A-PR-UWYZ][A-HK-Y][1-9]?[0-9]
  |([EW]C?|NW?|S[EW])[1-9][0-9A-HJKMNPR-Y]
  )[ ]{0,2}
  ([0-9][ABD-HJLNP-UW-Z]{2})
\b}x