In yesterday’s notes about my IETF 101 activities on Monday I forgot to mention the Hackathon Happy Hour. A few of the teams were demoing their projects, so I had a chat with the BBC R&D folks. They had been at the table next to us over the weekend, working on IP multicast for TV. I found out at the happy hour that they were multicasting unidirectional QUIC (Google’s new encrypted transport layer, which is currently going through IETF standardization). Their player app normally uses unicast HTTP; by using QUIC they can re-use HTTP semantics for multicasting as well. Super cool.
This week I am commuting from Cambridge to Paddington. So far this has been working OK - I’m not suffering too much from burning the candle at both ends, though I’m not seeing very much of the family!
The first WG meeting session starts at 09:30, and I can get there in time if I get up around 07:00, and get out of the house without faffing. Once I get on the train I can plan my day, read blogs and mail, etc.
I realised yesterday that this routine is slightly suboptimal if there aren’t any WG meetings that I want to attend in the morning - if I don’t realise this until after I am up and out of the house, I miss the opportunity for a lie in!
Oh well, I spent the morning catching up on things in the code lounge.
There were some complaints on the
ucam-itsupport list about a
connectivity problem, so
(yet again) I had to explain why it wasn’t my fault….
Remember that when there is an upstream connectivity problem, it tends to be most visible to end users as a DNS problem: when the uplink goes away, the DNS servers can’t get answers for users, so the users never even get to the point of trying to talk off-site, so they don’t discover that the uplink has gone - they just see the DNS error.
The TTL for en.wikipedia.org is 5 minutes, so if the uplink problem lasts longer than that, you will get a DNS error for Wikipedia.
There’s a new feature in the recently released BIND 9.12 called “serve-stale”, which changes the cache time-to-live logic. When the DNS server tries to refresh an item in its cache, and discovers that it can no longer reach the authoritative DNS servers, it will continue to return the stale answer to users.
We have upgraded to BIND 9.12.0, but I have not yet enabled the serve-stale feature. I wanted to be sure that the servers continued to work OK with the existing configuration (in case I needed to roll back), and then industrial action intervened before I could make the serve-stale change.
I have a 9.12.1 upgrade in the works (to fix an interoperability regression to accommodate bad DNS zones that have a forbidden CNAME at the apex) after which I will enable serve-stale.
Tangentially, there’s another 9.12 feature which I am looking forward to enabling: BIND can now use DNSSEC NSEC proof-of-nonexistence records to synthesize negative answers without having to re-ask the authoritative servers. This is particularly good at improving the performance of handling junk queries for invalid TLDs, and it will allow me to delete a lot of configuration verbiage that I added to suppress other junk queries.
In the first session after lunch, I thought the most useful WG meeting would be netconf, following the conversations I had about it on Monday. This kind of choice is a bit risky, because if you don’t know much about a protocol, you lack the context to make much sense of the detailed business of a WG.
There was some discussion about the YANG keystore model (used for configuring things like ssh keys, I gather), YANG push (I guess for pub-sub style data collection), and a binary representation for netconf (which is natively XML).
The second afternoon session was one of the main reasons I am attendnig IETF 101.
There was a comment from Warren Kumari (the IESG area director responsible for the dnsop WG) that there are quite a lot of DNS drafts in flight at the moment. From my point of view, there are some that I am really keen on since they are directly helpful for the services I run; there are some which are interesting but not directly relevant to me; and there are some that I think are probably bad ideas. (Of course, other people think that my favourite drafts are bad ideas - DNS people generally get on well with each other but we don’t always agree!)
Joe Abley has resurrected the long-stalled draft-ietf-dnsop-refuse-any so that it can be pushed through the various “last call” stages towards publication as an RFC.
I’m pleased to see that it is making progress again. Two years ago I implemented this draft for BIND to improve our robustness against certain kinds of flooding attacks.
Evan Hunt reported on the current state of draft-ietf-dnsop-aname. This spec will be a standardized version of the CNAME-at-apex workarounds that various DNS vendors have implemented in various ways. I’m looking forward to getting my hands on this since the restrictions on CNAMEs are a longstanding pain point for us.
I reviewed the draft in detail earlier this year (part 1, part 2) and Evan told me that my comments were very helpful, especially for clarifying how recursive servers could handle ANAME records.
The discussion in the meeting was about how to refactor the draft to clarify it.
This spec is about a more compact way of recording DNS traffic than
pcap files. It is already being used for recording
telemetry data on some root name servers. Sounds quite cool.
There are a couple of drafts in this area: security considerations for RFC 5011, how to follow a key rollover safely; and kskroll-sentinel which is another way for tracking whether validators are following RFC 5011 successfully.
There was also a discussion about
how a DNSSEC validator can bootstrap its trust anchors.
I have some ideas about this - a few years ago I wrote a draft about
trust anchor witnesses
(I should probably write down my ideas about how to simplify it).
I need to read
Ben Laurie’s old draft on DNSSEC key distribution
getdns does zero-configuration DNSSEC.
Work on the revised DNS terminology explainer continues, and seems to be approaching readiness for the last call process.
The DNS session signalling draft describes a way to make DNS-over-TCP (and other persistent transports) timeouts and connection shutdown more explicit. I don’t really see the point of it - it isn’t clear to me that explicit negotiation will provide much benefit. The alternative is for the server to close idle TCP connections whenever it wants, and for clients to handle lost connections gracefully.
After the discussion of drafts in progress, there was a presentation by Bert Hubert (author of PowerDNS) about the increasing complexity of DNS.
It was a very witty and well-informed rant, and it sparked some good discussion about what we can do to tackle the problem. One suggestion (from Job Snijders wrt the routing area) was to have strict rules that drafts cannot progress to RFC without multiple independent interoperable implementations. Another suggestion was to see if there were old RFCs that could be deprecated.
Would it be worth writing a consolidated DNS spec? Probably far too much work for unclear benefit. Would it be worth writing a roadmap RFC, that tells readers how much attention to give to old documents?
The non-wg track included a pleasant lunch with folks from NLnet Labs and others, and in the evening (instead of going to the official social meet in the Science Museum) several of us went to a local pub. I can’t remember many of the fascinating topics we discussed :-) But (work-related) there was some agreement about session signalling being of dubious benefit.