Understanding the .io TLD's DNS configuration vulnerability

July 12, 2017

First there was Matthew Bryant's The .io Error - Taking Control of All .io Domains With a Targeted Registration, about a configuration error that allegedly allowed you to take over control of some .io nameservers, and then there was a response to it, Matt Pounsett's The .io Error: A Problem With Bad Optics, But Little Substance, which argued that this was much ado about nothing much. While I agree that the consequences are less severe than Bryant thought, I think that Pounsett's article understates the risks itself (and I believe doesn't correctly explain what's going on in the DNS here). In any case, the whole thing confused me and other people, so I'm going to write my understanding of things up here.

Let's start with the basics of compromising a domain through dangling nameserver delegation. Suppose you find a domain barney.io that lists ns1.fred.ly as one of its two nameservers, and fred.ly is not registered (worse nameserver mistakes happen). To attack barney.io you register fred.ly and create a ns1.fred.ly A record that points to a nameserver that you're running. Some portion of the people looking up information in barney.io will wind up querying your nameserver, and at that point you can give them whatever answers you want. If they're asking their original question, you can directly lie to them (telling people that all MX entries in barney.io point to harvestmail.fred.ly, for example). If they're making NS queries to check for zone delegation, you can just give them NS records that point to you and start lying some more when they follow those NS records.

(You can then increase how many people will talk to ns1.fred.ly by DOSing the other barney.io DNS server off the Internet.)

This is more or less what the setup was for .io. Among .io's nameservers were ns-a1.io through ns-a4.io, and all of those names could be registered as domains in .io and then given A records in your DNS data for your new domain(s) (and Matthew Bryant did just this with ns-a1.io). However, there was an important difference that made this less severe than my example, and that's that .io had active glue records in the root zone for those names that pointed people to the IP addresses of the real nameservers. With these glue records present, a client didn't talk to Matthew Bryant's DNS server just because it decided to use ns-a1.io as part of resolving a .io name; if it believed and used the glue records, it would wind up talking to the real nameserver. You only had your query diverted to Bryant's DNS server if you decided to send a query to ns-a1.io but not use the IP from the glue record and instead look it up directly.

Using data from glue records instead of looking things up yourself is common but not mandatory, and there are various reasons why a resolver would not do so. Some recursive DNS servers will deliberately try to check glue record information as a security measure; for example, Unbound has the harden-referral-path option (via Tony Finch). Since the original article reported seeing real .io DNS queries being directed to Bryant's DNS server, we know that a decent number of clients were not using the root zone glue records. Probably a lot more clients were still using the glue records, through.

(There are a bunch of uncertainties about just what DNS data was being returned by who during the incident. The original article shows a reply from a root server and that probably didn't change, but we don't know what the official .io servers themselves started returning as glue records for .io during the time that ns-a1.io was active as a domain registration. I will decline to speculate on what was the likely result here.)

Given my history with glue record hell, it amuses me that this is a case where dangling glue records helped instead of hurt, making a problem less severe than it would otherwise have been. Had there been no glue records or incomplete glue records for the .io zone, there would have been more danger (or at least the danger would have been more clearer).

(In this case the presence of the glue records was mandatory, since these were NS names inside the zone itself. Without glue records in the root zone, you would have a chicken and egg problem in getting the IP address of, say, a0.nic.io.)

PS: As far as I can see from Bryant's article, he didn't realize that the root zone glue records would cause many clients to not query his DNS servers, significantly reducing the severity of someone having control over the names of four of the seven .io DNS servers. As far as Pounsett's article goes, he appears to more or less spot the issue with root glue but doesn't explain it and appears to expect all clients to use the glue all of the time (which is demonstrably not the case). I think he may also be confusing the data in the .io zone with the root zone glue for .io. Note that it's not necessary to get your IP address for ns-a1.io included in the .io zone; to make some clients start talking to you, it's sufficient for NS records for ns-a1.io to show up and ideally to occlude the A and AAAA records.

(We know that Bryant's NS records showed up in the .io zone. We don't know if they occluded the A record for ns-a1.io that was there, but it seems likely that they did.)

Sidebar: What I suspect went wrong in .io's procedures

It seems quite likely that ns-a1.io through ns-a4.io were intended to be purely host names of DNS servers, not domain names, much like my example of ns1.fred.ly. However, they were placed directly in the apex of a zone (.io) that allows people to register domains, and I suspect that the people running the IO zone forgot to tell the people running the IO registry that these names existed in the zone as host names and should be locked out from domain registration. That's been fixed now, obviously, and WHOIS tells me they're 'Reserved by Registry'.

(This is thus a different failure mode than having NS records for your domain or TLD that point to hosts in entirely unregistered domains. That's a pure failure, since the names don't exist at all except perhaps through lingering glue records. Here the names existed entirely properly, it's just that the IO registry was allowed to override them with new data.)

The problem doesn't come up for the other .io nameservers, which are all under nic.io, since nic.io is already a registered domain in .io.

Written on 12 July 2017.
« Recursive DNS servers send the whole original query to authoritative servers
SELinux's problem of keeping up with general Linux development »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 12 23:49:57 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.