.@ Tony Finch – blog


Adapted from a twitter thread

The question

Julia Evans asked

is mail.google.com a subdomain of google.com?

I love knowing about different possible definitions for terms because then if I’m having a discussion with someone and they use an ambiguous word, I can ask “oh, do you mean definition A or B?” and then we can pick a definition to use in our conversation

I could not resist writing a somewhat rambling response…

Historical context

A few things were happening concurrently in the 1980s: at the same time we have the transition of the ARPANET to the Internet, the rise of ethernet and other local area networks, and the proliferation of other circuit-switched and packet-switched networks.

The ARPANET was a fairly small flat network, that grew massively and became interconnected to many other networks. At the time the DNS was designed, the flat hosts.txt namespace was already having trouble keeping up with the growth of the ARPANET.

A fixed naming structure would obviously be unable to cope with the diversification of network technologies and the rapidly growing complexity of network management.

At the same time, the term “domain” started to be used a lot in networking, though I don’t think it was ever very specific. It crops up in phrases like:

The ARPANET had, working together in the same technical community, academic institutions with their pathological balkanisation, and the hierarchial military. (The IETF is a “task force” because of this military influence.) So the question was how to set up a manageable naming system that could accommodate such different animals.

Domain names

The DNS tries to tame this zoo in two ways:

  1. a hierarchial namespace

  2. classes of network

The first has been incredibly successful; the second was a hedge against the Internet not winning, which turned out to be unnecessarily pessimistic.

There are only two non-Internet classes in the DNS: for MIT’s Chaosnet, which is now abused for DNS server metadata; and MIT’s Hesiod, which was the directory service for their Athena distributed computing system.

Now the abstract notion of a domain is not hierarchial: for instance, inter-domain routing is not hierarchial. The hierarchy of domain names is an important simplification to make the DNS manageable.

I make a pedantic and kind of subtle distinction between domain names (or DNS names if I want to be super specific) as the syntactic things in hierarchial nested namespaces; and a domain as the loosey-goosey notion of a management scope / network / namespace. Usually the distinction doesn’t matter much, but that’s how I like to keep it straight in my head.

Syntax of domain names

The hierarchial design of the DNS is not often very apparent, when everything you see is like twitter.com or apple.com, but where I work it’s multi-storey like

I take the view that every domain name is a subdomain: subdomains and parent domains are purely syntactic properties of domain names, and don’t imply anything about what the name is for — whether it is a hostname or not, whether it is a namespace or not. I work in the guts of DNS software, where parent domains and subdomains are treated this way.

Function of domain names

We can classify domain names according to what they are used for:

Often the longest names - the leaves of the DNS tree structure - are hostnames, but (surprise!) hostnames can have subdomains:

Well, maybe not a surprise, when you understand that the hierarchial structure of the DNS namespace does not (much) constrain how a name is used: for instance, when you make a DNS query the answer does not tell you whether or not the name has subdomains.

A domain name can also function as a namespace, where it names an abstract domain. For instance, my colleagues might refer to “the cam.ac.uk domain”, and they might mean the University data network, or they might mean the set of devices with names under cam.ac.uk - they often don’t mean the name itself!

Domain boundaries in the DNS

To a large extent, it is a matter of convention whether a domain name represents an abstract domain or not - that is, whether the domain name is used as a namespace. But there are three ways that boundaries between domains become explicit in the DNS:

DNS zones

Scalability, distribution, federation: in the DNS you delegate responsibility for some part of the namespace (a subtree of your domain) by introducing a zone cut.

In a zone cut the subdomain below the cut has its own separate zone file, its own SOA and NS records. You get your own zone when you register a domain name.

A zone cut has to occur at a . in a domain name but not every . is a zone cut.

For example dns.cam.ac.uk is in the same zone as cam.ac.uk - there is no zone cut between dns and cam - but maths.cam.ac.uk is in a different zone. The dns subdomain is not delegated but the maths subdomain is.

Counterfactual thought experiment! If every . in a domain name was a zone cut then www.cam.ac.uk (and every other hostname!) would have to be in a separate zone from cam.ac.uk.

Subdomains without zones

In Cambridge our DNS management system allows us to control who has permission to update hostnames under which subdomains, without necessarily using DNS delegations and zone cuts to mark the boundaries. Chris Siebenmann has written about how they use subdomains for namespace classification in his department in Toronto.

The resolver search path

Back to the 1980s and the ARPANET / Internet protocol upgrade!

ARPANET users were used to typing single-label hostnames; none of them wanted to have to rattle out .berkeley.edu (or whatever) after every hostname, on a serial terminal at 1200 baud.

So as a command-line user interface convenience, the DNS has a notion of your local domain. Remember the abstract notion of a domain, that models the scope of a network - its management, its physical and logical extent. In a DNS stub resolver you give it an idea of your local network by configuring its search path, or in simple cases just a domain name.

For example, I configure the search path on my workstation with dns.cam.ac.uk (where my servers live), csi.cam.ac.uk (devices in the office), uis.cam.ac.uk (lots of things such as my old git server). That lets me type ssh auth0 and log in to one of my authoritative dns servers, or ssh git to get to my git server, and the stub resolver will use its search path to automatically append the (hopefully) right parent domain onto the unqualified hostname that I typed.

Usually it isn’t this complicated. For example, when I started work at Cambridge almost all the computers I worked with were under csi.cam.ac.uk so I could put domain csi.cam.ac.uk in /etc/resolv.conf and it would do the right thing, including walking up the hierarchy to find the names of services like hermes.cam.ac.uk and jackdaw.cam.ac.uk which are not in the csi subdomain.

It’s worth looking at the stub resolver options (you may recoil in horror) and their unpleasant history.

Non-DNS domain names

I’m going on a bit of a tangent now, because it involves some fun facts, even though it isn’t really about the question of subdomains.

Andrew Sullivan pointed out that a stub resolver doesn’t just use the DNS to resolve names: it can use multicast DNS to resolve names under .local (mDNS is related to the DNS but quite different); or hosts files; or other name service protocols.

Ron Echeverri prompted me to talk about one of these others: the JANET name registration scheme (NRS). The NRS had hierarchial names a bit like the DNS, but it was implemented like the ARPANET hosts.txt - one big centrally managed hosts file.

(The reason the UK DNS TLD is .uk is for NRS compatibility. Strictly speaking it should be .gb to match our ISO 3166 alpha-2 code.)

UK sysadmins older than I have terrible stories about coaxing systems to interoperate between the DNS and NRS, because whereas our DNS name is cam.ac.uk, our NRS name was uk.ac.cam, the other way round.

This raises the question of how did they choose which order to write a domain name?

NRS order is: lexicographic order matches hierarchy.

Paul Mockapetris explained that he chose DNS order for user interface reasons: the prefix of the name, what you type first, is the most locally relevant and specific, and the suffix can be automatically completed - hence the resolver search path.

Unqualified hostnames

The stub resolver search feature gives us the notion of an unqualified hostname (eg, hermes or auth0) as an abbreviation for a fully-qualified domain name, FQDN (eg, hermes.cam.ac.uk or auth0.dns.cam.ac.uk).

Typically unix flavoured systems like to be configured to think of their own name as their unqualified single-label hostname, and you configure the rest of the name as the domain in /etc/resolv.conf.

Aside: Ever since I worked at Demon I have configured servers with their FQDN as their hostname, to avoid weird DNS fuckery, such as long timeouts waiting for the server to work outs its own FQDN because its DNS setup is broken; DNS is now my job and, yep, it is really important to configure my systems to avoid this kind of problem!

Subdomains in zone files

Unqualified hostnames and the notion of a current domain also turn up in the syntax of DNS zone files. Domain names in zone files are FQDNs if they end with a trailing dot; if a name doesn’t end with a dot then the current domain is added to it. This is the cause of many irritating mistakes!

By default the current domain is the same as the name of the zone, but you can use the $ORIGIN directive to change it so that (for instance) different sections of the zone file have different subdomains as their default domain.

The $ORIGIN directive is probably the clearest place in the DNS where you can point at something concrete that isn’t a zone cut and say, here is a subdomain. It is a syntactic convenience (and sometimes inconvenience, when you get the dots wrong) that makes it easier to set up a subdomain by naming convention rather than by zone file.

Domains and subdomains

In the majority of cases, less complicated than Cambridge, you end up with a situation where

And subdomains are not something you have to think about.

If you are used to normal setups with flat namespaces and no delegation, then subdomains can be intimidating: they threaten to open a huge can of worms of DNS complexity.

Simpler subdomains

The three key ideas to tame subdomains:

  1. Always work with fully-qualified domain names, to avoid relying on the resolver search path, and to avoid syntax footguns in zone files.

    You can configure a search path to make the interactive command line more comfortable, but source code and configuration files should always use FQDNs.

  2. Subdomains and parent domains are just syntactic features of domain names, where one name is a suffix of another (with a dot between). Dots in domain names don’t have to imply any particular meaning.

  3. If you want a hierarchial naming structure, you do not need any of the horrors of zone cuts or delegations or NS records: you can just put a dot in a name.

Other people often use the term “subdomain” to mean a delegated namespace (with or without a zone cut), so it’s a domain (in the abstract sense) that’s subordinate. It’s a more restricted definition compared to my meaning, where a subdomain is a domain name (in the syntactic sense) that’s subordinate.