Here’s a somewhat obscure network debugging tale…
Context: recursive DNS server networking
Our central server network spans four sites across Cambridge, so it has a decent amount of resilience against power and cooling failures, and although it is a single layer two network, it is using some pretty fancy Cisco Nexus switches to provide plenty of redundant connectivity.
We have four recursive DNS servers, one at each site, usually two live and two hot spare. They are bare metal machines, which are intended to be able to boot up and provide service even if everything else is broken, provided they have power and cooling and network in at least one site.
The server network has several VLANs, and our resolver service addresses are on two of them: 131.111.8.42 is on VLAN 808, and 131.111.12.20 is on VLAN 812. So that any of the servers can provide service on either address, their switch ports are configured to deliver VLAN 808 untagged (so the servers can be provisioned using PXE booting without any special config) and VLAN 812 tagged.
Context: complying with reverse path filtering
There is strict reverse path filtering on the server network routers, so I have to make sure my resolvers use the correct VLAN depending on the source address. The trick is to use policy routing to match source addresses, since the normal routing table only looks at destination addresses.
The servers run Ubuntu, so this is configured in /etc/network/interfaces
by adding a couple of up
and down
commands. Here’s an example;
there are four similar blocks in the config, for VLAN 808 and VLAN
812, and for IPv4 and IPv6.
iface em1.812 inet static
address 131.111.12.{{ ifnum }}
netmask 24
up ip -4 rule add from 131.111.12.0/24 table 12
down ip -4 rule del from 131.111.12.0/24 table 12
up ip -4 route add default table 12 via 131.111.12.62
down ip -4 route del default table 12 via 131.111.12.62
The bug: missing IPv6 policy routing
On Sunday we had some scheduled power work in one of our machine rooms. On Monday I found that the server in that room was not answering correctly over IPv6.
The machine had mostly booted OK, but it had partially failed to configure its network interfaces: everything was there except for the IPv6 policy routing, which meant that answers over IPv6 were being sent out of the wrong interfaces and dropped by the routers.
The logs were not completely clear, but it looked like the server had booted faster than the switch that it was connected to, so it had tried to configure its network interfaces when there was no network.
Two possible fixes
One approach might have been to add a script that waits for the
network to come up in /etc/network/if-pre-up.d
. But this is likely
to be unreliable in bad situations where it is extra important that
the server boots predictably.
The other approach, suggested by David McBride, was to try disabling
IPv6 duplicate address detection. He found the dad-attempts
option
in the
interfaces(5)
man page, which looked very promising.
Edited to add: Chris Share pointed out that there is a third option:
DAD can be disabled using
sysctl net.ipv6.conf.default.accept_dad=0
which is probably simpler than individually nobbling each network interface.
Debugging
I went downstairs to the machine room in our office building to try booting a server with the ethernet cable unlugged. This nicely reproduced the problem.
I then tried adding the dad-attempts
option, and booting again. The
server booted successfully!
No need for a horrible pre-up script, yay!
Moans
The ifupdown
man pages are not very good at explaining how the
program works: they don’t explain the /etc/network/if-*.d
hook
scripts, nor how the dad-attempts
option works.
I dug around in its source code, and I found that ifupdown
’s DAD
logic is implemented by the script /lib/ifupdown/settle-dad.sh
,
which polls the output of ip -6 address list
. If it times out while
the address is still marked “tentative” (because the network is down)
the script declares failure, and ifupdown
breaks.
The other key part is the nodad
option to ip -6 addr add
, which is
undocumented.
This made it somewhat harder to find the fix and understand it. Bah.
Risks
I’ve now disabled duplicate address detection on my DNS servers, though I might have gone a bit far by disabling it on my VMs as well as the recursive servers. The point of DAD is to avoid accidentally breaking the network, so it’s a bit arrogant to turn it off. On the other hand, if I have misconfigured duplicate IPv6 addresses, I have almost certainly done the same for IPv4, so I have still accidentally broken the network…