I haven't got round to setting up proper performance monitoring for our DNS servers yet, so I have been making fairly ad-hoc queries against the BIND statistics channel and pulling out numbers with jq.
Last week I changed our new DNS setup for more frequent DNS updates. As part of this change I reduced the TTL on all our records from one day to one hour. The obvious question was, how would this affect the query rate on our servers?
So I wrote a simple monitoring script. The first version did,
while sleep 1 do fetch-and-print-stats done
But the fetch-and-print-stats part took a significant fraction of a second, so the queries-per-second numbers were rather bogus.
A better way to do this is to run `sleep` in the background, while you fetch-and-print-stats in the foreground. Then you can wait for the sleep to finish and loop back to the start. The loop should take almost exactly a second to run (provided fetch-and-print-stats takes less than a second). This is pretty similar to an alarm()/wait() sequence in C. (Actually no, that's bollocks.)
My dnsqps script also abuses `eval` a lot to get a shonky Bourne shell version of associative arrays for the per-server counters. Yummy.
So now I was able to get queries-per-second numbers from my servers, what was the effect of dropping the TTLs? Well, as far as I can tell from eyeballing, nothing. Zilch. No visible change in query rate. I expected at least some kind of clear increase, but no.
The current version of my dnsqps script is:
#!/bin/sh while : do sleep 1 & # set an alarm for s in "$@" do total=$(curl --silent http://$s:853/json/v1/server | jq -r '.opcodes.QUERY') eval inc='$((' $total - tot$s '))' eval tot$s=$total printf ' %5d %s' $inc $s done printf '\n' wait # for the alarm done