Attacking and defending DNS over TCP

A month ago I wrote about a denial of service attack that caused a TCP flood on one of our externally-facing authoritative DNS servers. In that case the mitigation was to discourage legitimate clients from using TCP; however those servers are sadly still vulnerable to TCP flooding attacks, and because they are authoritative servers they have to be open to the Internet. But we get some mitigation from having off-site anycast servers so it isn't completely hopeless.

This post is about our inward-facing recursive DNS servers.

Two weeks ago, Google and Red Hat announced CVE-2015-7547, a remote code execution vulnerability in glibc's getaddrinfo() function. There are no good mitigations for this vulnerability so we patched everything promptly (and I hope you did too).

There was some speculation about whether it is possible to exploit CVE-2015-7547 through a recursive DNS server. No-one could demonstrate an attack but the general opinion was that it is likely to be possible. Dan Kaminsky described the vulnerability as "a skeleton key of unkown strength", citing the general rule that "attacks only get better".

Yesterday, Jaime Cochran and Marek Vavruša of Cloudflare described a successful cache traversal exploit of CVE-2015-7547. Their attack relies on the behaviour of djbdns when it is flooded with TCP connections - it drops older connections.

BIND's behaviour is different; it retains existing connections and refuses new connections. This makes it trivially easy to DoS BIND's TCP listener, and given the wider discussion about proper support for DNS over TCP including persistent connections djbdns's behaviour appears to be superior.

So it's unfortunate that djbdns has better connection handling which makes it vulnerable to this attack, whereas BIND is protected by being worse!

Fundamentally, we need DNS server software to catch up with the best web servers in the quality of their TCP connection handling.

But my immediate reaction was to realise that my servers would have a problem if the Cloudflare attack (or something like it) became at all common. Our recursive DNS servers were open to TCP connections from the wider Internet, and we were relying on BIND to reject queries from foreign clients. This clearly was not sufficiently robust. So now, where my firewall rules used to have a comment

    # XXX or should this be CUDN only?

there are now some actual filtering rules to block incoming TCP connections from outside the University. (I am still sending DNS REFUSED replies over UDP since that is marginally more friendly than an ICMP rejection and not much more expensive.)

Although this change was inspired by CVE-2015-7547, it is really about general robustness; it probably doesn't have much effect on our ability to evade more sophisticated exploits.