.@ Tony Finch – blog


AJ’s idea of scanning email for passwords has provoked a lot of strange suggestions, online and in person. Elaborate password cracking clusters, phishing my own users, hardware acceleration, etc… (When AJ suggested the Idea, my first thought was to use a Bloom filter to find passwords, since you can’t recover the contents of a Bloom filter without brute force - but it’s very hard to delete items from a Bloom filter, which in this scenario must be done whenever any user changes their password. No, that too is not going to work.)

The whole idea is very borderline: is it worth spending significant effort when the phishing success rate is much less than 1% per incident, and the current fashion for phishing universities is probably short-lived? (This week we got about 2000 messages from the phishers and 5 users replied.) On the other hand it would be very interesting to find out what the detection rate would be. Would there be any false positives? i.e. unintentional password appearances? What is the legitimate positive rate? e.g. senior staff sending their passwords to their PAs? (The latter is against our AUP but it is common practice.) How much password sharing is there outside the anticipated scenarios?

It seems that it’s worth making the point that it isn’t hard for me to get my users’ plaintext passwords: I could just instrument our various SASL implementations. But we (me and my colleagues) don’t do that because sysadmins are safer not knowing their users’ secrets. This is why we don’t log message subjects, and why our accidental-deletion recovery tools don’t require seeing any message contents. We don’t look at the contents of a user’s account in any detail without permission from that user - and even then we’d very much prefer not to know that we’re recovering email that was deleted by their jilted lover who obtained their password from a shared browser history.

From my point of view, the interesting thing is that it is feasible to detect when a user is sending their own password in a message, using just a standard Unix encrypted password file and some simple code: crypt every word the user sends and compare with that user’s crypted password. This is just a few hundred lines of C, including hooks into our authentication database and email content scanner, and choosing the right words. My prototype code can check 2000 MD5 crypted words per second per CPU, and should be able to skip most words in a message since they are outside our password rules.

There has been a lot of traffic on various mailing lists about these phishing attacks, especially notifications of new reply addresses. But we don’t want to be in the business of maintaining blacklists of addresses our users mustn’t send to. Password scanning seems like a simple way of avoiding that tar pit, which is I think the main attraction. So why do I think it’s absurd?