Last weekend one of our authoritative name servers
(authdns1.csx.cam.ac.uk
) suffered a series of DoS attacks which made
it rather unhappy. Over the last week I have developed a patch for
BIND to make it handle these attacks better.
The attack traffic
On authdns1
we provide off-site secondary name service to a number
of other universities and academic institutions; the attack targeted
imperial.ac.uk
.
For years we have had a number of defence mechanisms on our DNS servers. The main one is response rate limiting, which is designed to reduce the damage done by DNS reflection / amplification attacks.
However, our recent attacks were different. Like most reflection / amplification attacks, we were getting a lot of QTYPE=ANY queries, but unlike reflection / amplification attacks these were not spoofed, but rather were coming to us from a lot of recursive DNS servers. (A large part of the volume came from Google Public DNS; I suspect that is just because of their size and popularity.)
My guess is that it was a reflection / amplification attack, but we were not being used as the amplifier; instead, a lot of open resolvers were being used to amplify, and they in turn were making queries upstream to us. (Consumer routers are often open resolvers, but usually forward to their ISP's resolvers or to public resolvers such as Google's, and those query us in turn.)
What made it worse
Because from our point of view the queries were coming from real resolvers, RRL was completely ineffective. But some other configuration settings made the attacks cause more damage than they might otherwise have done.
I have configured our authoritative servers to avoid sending large UDP packets which get fragmented at the IP layer. IP fragments often get dropped and this can cause problems with DNS resolution. So I have set
max-udp-size 1420;
minimal-responses yes;
The first setting limits the size of outgoing UDP responses to an MTU which is very likely to work. (The ethernet MTU minus some slop for tunnels.) The second setting reduces the amount of information that the server tries to put in the packet, so that it is less likely to be truncated because of the small UDP size limit, so that clients do not have to retry over TCP.
This works OK for normal queries; for instance a cam.ac.uk IN MX
query gets a svelte 216 byte response from our authoritative servers
but a chubby 2047 byte response from our recursive servers which do
not have these settings.
But ANY queries blow straight past the UDP size limit: the attack
queries for imperial.ac.uk IN ANY
got obese 3930 byte responses.
The effect was that the recursive clients retried their queries over TCP, and consumed the server's entire TCP connection quota. (Sadly BIND's TCP handling is not up to the standard of good web servers, so it's quite easy to nadger it in this way.)
draft-ietf-dnsop-refuse-any
We might have coped a lot better if we could have served all the attack traffic over UDP. Fortunately there was some pertinent discussion in the IETF DNSOP working group in March last year which resulted in draft-ietf-dnsop-refuse-any, "providing minimal-sized responses to DNS queries with QTYPE=ANY".
This document was instigated by Cloudflare, who have a DNS server architecture which makes it unusually difficult to produce traditional comprehensive responses to ANY queries. Their approach is instead to send just one synthetic record in response, like
cloudflare.net. HINFO ( "Please stop asking for ANY"
"See draft-jabley-dnsop-refuse-any" )
In the discussion, Evan Hunt (one of the BIND developers) suggested an alternative approach suitable for traditional name servers. They can reply to an ANY query by picking one arbitrary RRset to put in the answer, instead of all of the RRsets they have to hand.
The draft says you can use either of these approaches. They both allow an authoritative server to make the recursive server go away happy that it got an answer, and without breaking odd applications like qmail that foolishly rely on ANY queries.
I did a few small experiments at the time to demonstrate that it really would work OK in the real world (unlike some of the earlier proposals) and they are both pretty neat solutions (unlike some of the earlier proposals).
Attack mitigation
So draft-ietf-dnsop-refuse-any is an excellent way to reduce the damage caused by the attacks, since it allows us to return small UDP responses which reduce the downstream amplification and avoid pushing the intermediate recursive servers on to TCP. But BIND did not have this feature.
I did a very quick hack on Tuesday to strip down ANY responses, and I deployed it to our authoritative DNS servers on Wednesday morning for swift mitigation. But it was immediately clear that I had put my patch in completely the wrong part of BIND, so it would need substantial re-working before it could be more widely useful.
I managed to get back to the patch on Thursday. The right place to put
the logic was in the fearsome
query_find()
which is the top-level query handling function and nearly 2400 lines
long! I finished the first draft of the revised patch that afternoon
(using none of the code I wrote on Tuesday), and I spent Friday
afternoon debugging and improving it.
The result is this patch which adds a minimal-qtype-any
option.
I'm currently running it on my toy nameserver, and I plan to deploy it
to our production servers next week to replace the rough hack.
I have submitted the patch to the ISC; hopefully something like it will be included in a future version of BIND. And it prompted a couple of questions about draft-ietf-dnsop-refuse-any that I posted to the DNSOP working group mailing list.