The other day I set up a new OpenBSD instance with a nice RAID array, encrypted with Full Disk Encryption. And promptly proceeded to forget part of the passphrase.

We know things get interesting when I lose a password.

I did a weak attempt at finding some public bruteforce tool, and found nothing. I say weak because somewhere in the back of my brain, I already wanted to take a peek at the OpenBSD FDE implementation.

Very little is documented, and while I do trust OpenBSD, I want to know how my data is encrypted. So this was the "perfect" occasion.

Hold on, because it will be a bumpy ride, straight into the OpenBSD core sources, following notes I took during the ~3 hours process.

Goals

We need to extract enough info from the encrypted disk and rebuild enough of the decryption algorithm to be able to rapidly try many passphrases.

What this usually means in FDE is finding the details of the Key Derivation Function, and whatever mechanism is used to detect if the passphrase is correct or not.

Starting points

A prompt. A damn prompt.

# bioctl -c C -l sd3a softraid0
Passphrase:
softraid0: incorrect key or passphrase

We start chasing by looking at the bioctl and softraid_crypto implementations, Cmd-F'ing "Passphrase:" and "incorrect key or passphrase".

https://github.com/openbsd/src/blob/master/sys/dev/softraid_crypto.c

https://github.com/openbsd/src/blob/master/sbin/bioctl/bioctl.c

The first hit is promising.

bio_kdf_derive(&kdfinfo, &kdfhint, "Passphrase: ", 0);
void
bio_kdf_derive(struct sr_crypto_kdfinfo *kdfinfo, struct sr_crypto_kdf_pbkdf2
    *kdfhint, char* prompt, int verify)
	// [...]
	derive_key_pkcs(kdfhint->rounds,
	    kdfinfo->maskkey, sizeof(kdfinfo->maskkey),
	    kdfhint->salt, sizeof(kdfhint->salt), prompt, verify);

derive_key_pkcs is a banal checking wrapper for pkcs5_pbkdf2, so we now know how the passphrase is derived into a key:

kdfinfo->maskkey = pbkdf2(password, kdfhint->salt, kdfhint->rounds)

Let's chase kdfhint.

Pass the salt

The salt is certainly stored on the encrypted disk. The object must be populated by the lines just above the bio_kdf_derive call, because before that its memory is zeroed:

		create.bc_opaque = &kdfhint;
		create.bc_opaque_size = sizeof(kdfhint);
		create.bc_opaque_flags = BIOC_SOOUT;

		/* try to get KDF hint */
		if (ioctl(devh, BIOCCREATERAID, &create))
			err(1, "ioctl");

I tried a few leads here, including following the BIOCCREATERAID ioctl, but what got me somewhere was a code search for "bc_opaque".

softraid_crypto.c L223-L225

		if (copyout(sd->mds.mdd_crypto.scr_meta->scm_kdfhint,
		    bc->bc_opaque, bc->bc_opaque_size))
			goto done;

It's copied from some deeper metadata object. This seems complex. Hmmm.

Let's try a new angle: what is the type of the kdfhint?

softraidvar.h L53-L62

/*
 * sr_crypto_genkdf is a generic hint for the KDF performed in userland and
 * is not interpreted by the kernel.
 */
struct sr_crypto_genkdf {
	u_int32_t	len;
	u_int32_t	type;
#define SR_CRYPTOKDFT_INVALID	0
#define SR_CRYPTOKDFT_PBKDF2	1
#define SR_CRYPTOKDFT_KEYDISK	2
};

/*
 * sr_crypto_genkdf_pbkdf2 is a hint for the PKCS#5 KDF performed in userland
 * and is not interpreted by the kernel.
 */
struct sr_crypto_kdf_pbkdf2 {
	u_int32_t	len;
	u_int32_t	type;
	u_int32_t	rounds;
	u_int8_t	salt[128];
};

Aha! If it's "not interpreted by the kernel", then it must be verbatim in the disk metadata. We need to look at one.

A simple example

To reproduce a case where we will know if we got it right, we make a small encrypted image, with passphrase "password".

# dd if=/dev/zero of=file.img bs=1 count=1M
# vnconfig vnd0 file.img
# disklabel -E /dev/rvnd0c
Label editor (enter '?' for help at any prompt)
> a a
offset: [0]
size: [2048]
FS type: [4.2BSD] RAID
> w
> q
No label changes.
# bioctl -c C -l /dev/vnd0a softraid0
New passphrase: password
Re-type passphrase: password
softraid0: CRYPTO volume attached as sd4

Here is the hexdump: https://gist.github.com/FiloSottile/8294e708396396d6b6d49c7c839b72ec

We are looking for a sr_crypto_kdf_pbkdf2 structure, which we can recognize because it starts with a u_int32_t length, followed by a u_int32_t type of value 1, followed by a u_int32_t number of rounds. There are many 01 00 00 00 (little endian!) around, but only one seems surrounded by two other u_int32_t:

00002960  -- -- -- -- -- -- -- --  -- -- -- -- 8c 00 00 00  |..U...(zU.......|
00002970  01 00 00 00 00 20 00 00  50 1f db 08 97 6d 2c 40  |..... ..P....m,@|
00002980  63 fb ff 91 5e 6c 75 fc  b9 44 86 16 77 1f 6d 65  |c...^lu..D..w.me|
00002990  4d 64 f8 56 ab 11 83 c7  7b 01 ac a0 f2 69 51 83  |Md.V....{....iQ.|
000029a0  b3 41 df c4 83 21 7a ce  75 37 3d f8 80 4f 6d 36  |.A...!z.u7=..Om6|
000029b0  06 63 55 15 ff de 7d 7a  b1 ac dd 0c f8 41 63 bb  |.cU...}z.....Ac.|
000029c0  42 cc a6 85 4a b5 52 f4  50 ec 9f 05 3f 9d 8b 8d  |B...J.R.P...?...|
000029d0  64 fe 85 ba 8f ce 08 87  97 e2 8d 35 2c 9d 6a 2d  |d..........5,.j-|
000029e0  cb 8c e2 7e 72 65 7d 7e  56 76 87 89 e6 ba cc 49  |...~re}~Vv.....I|
000029f0  bd 84 43 ef e6 3e 07 d6  00 00 00 00 00 00 00 00  |..C..>..........|

Indeed, the length field is 8c = 140 = 4 + 4 + 4 + 128, and the rounds number 0x2000 is reasonable. We have our salt!

A checksum to check your key

While lurking this comment caught my eye:

	/* Check that the key decrypted properly. */
	sr_crypto_calculate_check_hmac_sha1(sd->mds.mdd_crypto.scr_maskkey,
	    sizeof(sd->mds.mdd_crypto.scr_maskkey),
	    (u_int8_t *)sd->mds.mdd_crypto.scr_key,
	    sizeof(sd->mds.mdd_crypto.scr_key),
	    check_digest);
	if (memcmp(sd->mds.mdd_crypto.scr_meta->chk_hmac_sha1.sch_mac,
	    check_digest, sizeof(check_digest)) != 0) {
		...
	}

Apparently the correctness of the passphrase is checked by doing a HMAC of something, and comparing it with an expected value.

Let's see what this chk_hmac_sha1 structure is.

/*
 * Check that HMAC-SHA1_k(decrypted scm_key) == sch_mac, where
 * k = SHA1(masking key)
 */
struct sr_crypto_chk_hmac_sha1 {
	u_int8_t	sch_mac[20];
} __packed;

Oh, thanks, that makes things much easier. What the comment calls "decrypted scm_key" is called scr_key in the snippet above.

We have our check algorithm:

HMAC-SHA1(k=SHA1(maskkey), scr_key) == sch_mac

Keys, keys that encrypt keys

Let's see how this scr_key is decrypted. Just above.

	if (sr_crypto_decrypt((u_char *)sd->mds.mdd_crypto.scr_meta->scm_key,
	    (u_char *)sd->mds.mdd_crypto.scr_key,
	    sd->mds.mdd_crypto.scr_maskkey, sizeof(sd->mds.mdd_crypto.scr_key),
	    sd->mds.mdd_crypto.scr_meta->scm_mask_alg) == -1)
		goto out;

sr_crypto_decrypt is just AES-ECB-256. So last piece of the algorithm:

scr_key = AES-ECB-256_decrypt(k=maskkey, scm_key)

Hexdump spelunking

Now, it's a matter of finding scm_key and sch_mac in the disk image. Again, let's look at the data structures, starting with chk_hmac_sha1.

	u_int32_t		scm_check_alg;	/* key chksum algorithm */
#define SR_CRYPTOC_HMAC_SHA1		1
	u_int32_t		scm_pad2;
	union {
		struct sr_crypto_chk_hmac_sha1	chk_hmac_sha1;
		u_int8_t			chk_reserved2[64];
	}			_scm_chk;

Sweet. We are looking for 01 00 00 00 (scm_check_alg), followed by 00 00 00 00 (scm_pad2), followed by 20 random bytes (SHA1). Sure enough, just after the salt, there's our check HMAC:

00002a60  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  |................|
00002a70  00 00 00 00 26 e8 25 6f  86 8f cd 33 88 1c d4 f1  |....&.%o...3....|
00002a80  1e 9d 2a 98 ca 21 2d 9c  00 00 00 00 00 00 00 00  |..*..!-.........|
00002a90  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Finally, we need to find the encrypted key, scm_key. This took me a while, until I realized the size of this encrypted blob:

#define SR_CRYPTO_MAXKEYS	32	/* max keys per volume */
#define SR_CRYPTO_KEYBITS	512	/* AES-XTS with 2 * 256 bit keys */
#define SR_CRYPTO_KEYBYTES	(SR_CRYPTO_KEYBITS >> 3)

	u_int8_t		scr_key[SR_CRYPTO_MAXKEYS][SR_CRYPTO_KEYBYTES];

	/* symmetric keys used for disk encryption */
	u_int8_t		scm_key[SR_CRYPTO_MAXKEYS][SR_CRYPTO_KEYBYTES];

32 * 512/8 = 2048 = 0x800, 0x800 bytes of random stuff. You can't really miss it in the hexdump. But where are the boundaries? Well, if we are lucky, the line where the big random blob starts (00002160) and the one where the salt starts (00002960) will be approximately... Yes! Exactly 0x800 bytes apart :)

That random blob is all key material, followed by the PBKDF2 rounds and salt, and by the check HMAC.

Wrapping it up

So now we found all the pieces to write some code and find out if our assumptions were correct:

func main() {
    scmKey := decode(scmKey)
    salt := decode(salt)

    maskkey := pbkdf2.Key([]byte("password"), salt, rounds, 32, sha1.New)

    // AES-ECB-256_decrypt(k=maskkey, scm_key) = scr_key
    a, err := aes.NewCipher(maskkey)
    if err != nil {
        log.Fatal(err)
    }
    for i := 0; i < len(scmKey); i += a.BlockSize() {
        a.Decrypt(scmKey[i:i+a.BlockSize()], scmKey[i:i+a.BlockSize()])
    }

    // HMAC-SHA1(k=maskkey, scm_key) == sch_mac
    h := sha1.Sum(maskkey)
    mac := hmac.New(sha1.New, h[:])
    mac.Write(scmKey)
    expectedMAC := mac.Sum(nil)

    fmt.Print(hex.Dump(expectedMAC))
}

If we are right, this will output the same HMAC as in the last hexdump snippet. The first time I forgot to hash the maskkey, almost tore my hair out. But then...

$ go build -i . && ./openbsd-fde-crack
00000000  26 e8 25 6f 86 8f cd 33  88 1c d4 f1 1e 9d 2a 98  |&.%o...3......*.|
00000010  ca 21 2d 9c                                       |.!-.|

VoilĂ !

Now that we know how to extract the data and how to try passphrases against it, it will be trivial to write a bruteforce tool to recover the part of passphrase I forgot.

There's some code here, but don't expect a fire-and-forget tool, this post gives you enough information to figure out stuff on your own: https://github.com/FiloSottile/openbsd-fde-crack

To know what happens the next time I lose a password (sigh), follow me on Twitter.

UPDATE: I found it! After fixing a bug or two in the brute force tool and almost losing hope, it found the right combination of forgotten word and (Italian) misspelling.

UPDATE: I later found a nice article documenting the entire system. It also includes references to JohnTheRipper having a module for this. Well, this was more fun.