All the crypto code you’ve ever written is probably broken

tl;dr: use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption. use authenticated encryption.

Do you keep up on the latest proceedings of the IACR CRYPTO conference? No? Then chances are whenever you have tried to use a cryptographic library you made some sort of catastrophic mistake which would lead to a complete loss of confidentiality of the data you’re trying to keep secret.

The most important question is: are you using an authenticated encryption mode? If you don’t know what authenticated encryption is, then you’ve probably already made a mistake. Here’s a hint: authenticated encryption has nothing to do with authenticating users into a webapp. It has everything to do with ensuring the integrity of your data hasn’t been compromised, i.e. no one has tampered with the message.

Why is authenticated encryption so poorly known despite being so important? I don’t know. Perhaps it’s because the need for it wasn’t formally proven until the year 2000. And chances are you’ve never heard of authenticated encryption at all, because despite the best efforts of the cryptographic community it remains a relatively poorly-known concept.

Most of the cryptographic APIs you’ve ever encountered have probably made you run a gambit of choices for how you want to encrypt data. You might think AES-256 is the way to go, but by default your crypto API might select ECB mode, which is so bad and terribly insecure it isn’t even worth talking about. Perhaps you select CBC or CTR mode, but your crypto API doesn’t make you specify a random IV and will always encrypt everything with an IV of all zeroes, which if you ever reuse the same key will compromise the confidentiality of your data.

Let’s say you’ve gotten through all of that and are now using something like AES-CTR mode with a random IV per message. Great. Do you think you’re secure now? Probably not. A sophisticated attacker might attempt a man-in-the-middle attack, which gives him the ability to execute “chosen ciphertext” attacks (CCAs). To defend against these you must also ensure the integrity of your data, or otherwise confidentiality might be lost.

You may have learned you need to use a MAC to do this (and if you didn’t you’re most likely insecure!). You may have selected HMAC for this purpose. But you’re still left with three options here! Do you compute the MAC of the plaintext or the ciphertext. If you compute the MAC of the plaintext, do you encrypt it along with the plaintext, or do you append it to the end of the ciphertext? Or to spell it out more precisely, which of the following do you do?

Edit: (this is important enough I feel the need to edit it retroactively)

If you have answered any of the above questions incorrectly (the correct answer to the above question is “encrypt then MAC”) you’ve quite likely created an insecure cryptographic scheme. Unless you really know what you’re doing and can answer all these questions correctly (and even then!), you probably shouldn’t be trying to build your own cipher/MAC constructions and should defer to cryptographic experts who specialize in that sort of thing. These cipher/MAC constructions are called authenticated encryption modes.

If you find yourself reaching for any form of encryption that isn’t an authenticated encryption mode, you’re probably doing it wrong. You shouldn’t ever be choosing between CBC or CFB or CTR (or god forbid ECB). Unless you’re a cryptographer, these should be considered dangerous low-level primitives not for the consumption of mere mortals.

That said, what should you be using?

EAX is one of the recommended modes and is relatively easy to understand: it’s a combination of AES-CTR mode and CMAC (a.k.a. OMAC1) which is a MAC derived from a block cipher (in this case AES). While EAX mode is relatively simple to understand and you may be tempted to implement it yourself it if it’s unavailable in your language environment, you probably shouldn’t, as there are a number of potential pitfalls that await you and unless you know what you’re doing (and even then!) you’re likely to get it wrong.

If I’ve scared you enough by now, you my be googling around to discover if there’s an implementation of any of the above modes in your respective programming language environment, and sadly in many language environments you may turn up empty. In these cases, there’s not much you can do except petition your language maintainers who specialize in cryptography to expose APIs to authenticated encryption modes.

Authenticated encryption is something you should use as a complete package, implemented as a single unit by a well-reputed open source cryptographic library and not assembled piecemeal by people who do not specialize in cryptography.

Bottom line: unless you’re using authenticated encryption, you are opening yourself up to all sorts of attacks you can’t even anticipate, and shouldn’t consider the data you’re storing confidential.

Edit: several people have asked about more information on everything I’ve described here, most notably why various MACing schemes are secure or insecure. If you are really interested in this topic, I strongly recommend you take the Stanford Crypto class on Coursera which is what inspired me to write this blog post to begin with.

 
2,078
Kudos
 
2,078
Kudos

Now read this

Introducing TJSON, a stricter, typed form of JSON

NOTE: TJSON syntax has been revised since this post was originally published. Please visit https://www.tjson.org/ for the latest syntax. I’d like to announce a project I’ve been working on with Ben Laurie called TJSON (Tagged JSON).... Continue →