I have been fiddling around with FreeBSD's
client. Since I
have become responsible for Cambridge's Internet registrations, it's
helpful to have a
whois client which isn't annoying.
whois is an unspeakably crappy protocol. In fact it's barely
even a protocol, more like a set of vague suggestions. Ugh.
... is to work out which server to send your
whois query to. There
are a number of techniques, most of which are necessary and none of
which are sufficient.
Rely on a knowledgable user to specify the server.
Happily we can do better than just this, but the feature has to be available for special queries.
Have a built-in curated mapping from query patterns to servers.
This is the approach used by Debian's
client. Sadly in
the era of vast numbers of new gTLDs, this requires
software updates a couple of times a week.
Send the query to
TLD.whois-servers.net which maps TLDs to whois servers using CNAMEs
in the DNS.
This is a brilliant service, particularly good for the wild and wacky two-letter country-class TLDs. Unfortunately it has also failed to keep up with the new gTLDs, even though it only needs a small amount of extra automation to do so.
whois.nic.TLD which is the standard required
for new gTLDs.
In practice a combination of (2) and (3) is extremely
effective for domain name
Follow referrals from a server with broad but shallow data to one with narrower and deeper data.
Referrals are necessary for domain queries in "thin" registries, in which the TLD's registry does not contain all the details about registrants (domain owners), but instead refers queries to the registrar (i.e. reseller).
They are also necessary for IP address lookups, for which ARIN's database contains registrations in North America, plus referrals to the other regional Internet registries for IP address registrations in other parts of the world.
Back in May I added (3) to FreeBSD's
whois to fix its support for
new gTLDs, and I added a bit more (1).
One motivation for the latter was for looking up
ac.uk domains: (4)
doesn't work because Nominet's
whois server doesn't provide
referrals to JANET's
whois server; and (2) is a bit awkward, because
although there is an entry for
ac.uk.whois-servers.net you have to
have some idea of when it makes sense to try DNS queries for 2LDs.
whois-servers.net would be easier to use if it had a wildcard for
The other motivation for extending the curated server list was to
teach it about more NIC handle formats, such as
handles; and the same mechanism is useful for special-case domains.
Last week I added support for AS numbers, moving them from (0) to (1).
After doing that I continued to fiddle around, and soon realised that
it is possible to dispense with (3) and (2) and a large chunk of (1),
by relying more on (4). The IANA
whois server knows about most things
you might look up with
whois - domain names, IP addresses, AS
numbers - and can refer you to the right server.
This allowed me to throw away a lot of query syntax analysis and trial-and-error DNS lookups. Very satisfying.
(I'm not sure if this excellently comprehensive data is a new feature
whois server, or if I just failed to notice it before...)
... is that the output from
whois servers is only vaguely
For example, FreeBSD's
whois now knows about 4 different referral
formats, two of which occur with varying spacing and casing from
different servers. (I've removed support for one amazingly ugly and
happily obsolete referral format.)
My code just looks for a match for any referral format without trying to be knowledgable about which servers use which syntax.
The output from
whois is basically a set of
key: value pairs, but
often these will belong to multiple separate objects (such as a domain
name or a person or a net block); servers differ about whether blank
lines separate objects or are just for pretty-printing a single
object. I'm not sure if there's anything that can be done about this
without huge amounts of tedious work.
And servers often emit a lot of rubric such as terms and conditions or hints and tips, which might or might not have comment markers. FreeBSD's whois has a small amount of rudimentary rubric-trimming code which works in a lot of the most annoying cases.
... is that the syntax of whois queries is enormously variable. What is worse, some servers require some non-standard complication to get useful output.
If you query Verisign for
microsoft.com the server does fuzzy
matching and returns a list of dozens of spammy name server names. To
get a useful answer you need to ask for
ARIN also returns an unhelpfully terse list if a query matches
multiple objects, e.g. a net block and its first subnet. To make it
return full details for all matches (like RIPE's whois server) you
need to prefix the query with a
.dk the verbosity option is
The best one is DENIC, which requires a different query syntax depending on whether the domain name is a non-ASCII internationalized domain name, or a plain ASCII domain (which might be a punycode-encoded internationalized domain name). Good grief, can't it just give a useful answer without hand-holding?
That's quite a lot of bullshit for a small program to cope with, and it's really only scratching the surface. Debian's whois implementation has attacked this mess with a lot more sustained diligence, but I still prefer FreeBSD's because of its better support for new gTLDs.