Adapted from a twitter thread
- The question
- Historical context
- Domain names
- Syntax of domain names
- Function of domain names
- Domain boundaries in the DNS
- DNS zones
- Subdomains without zones
- The resolver search path
- Non-DNS domain names
- Unqualified hostnames
- Subdomains in zone files
- Domains and subdomains
- Simpler subdomains
The question
is
mail.google.com
a subdomain ofgoogle.com
?I love knowing about different possible definitions for terms because then if I’m having a discussion with someone and they use an ambiguous word, I can ask “oh, do you mean definition A or B?” and then we can pick a definition to use in our conversation
I could not resist writing a somewhat rambling response…
Historical context
A few things were happening concurrently in the 1980s: at the same time we have the transition of the ARPANET to the Internet, the rise of ethernet and other local area networks, and the proliferation of other circuit-switched and packet-switched networks.
The ARPANET was a fairly small flat network, that grew massively and
became interconnected to many other networks. At the time the DNS was
designed, the flat hosts.txt
namespace was already having trouble
keeping up with the growth of the ARPANET.
A fixed naming structure would obviously be unable to cope with the diversification of network technologies and the rapidly growing complexity of network management.
At the same time, the term “domain” started to be used a lot in networking, though I don’t think it was ever very specific. It crops up in phrases like:
- management domain
- broadcast domain
- inter-domain routing
The ARPANET had, working together in the same technical community, academic institutions with their pathological balkanisation, and the hierarchial military. (The IETF is a “task force” because of this military influence.) So the question was how to set up a manageable naming system that could accommodate such different animals.
Domain names
The DNS tries to tame this zoo in two ways:
-
a hierarchial namespace
-
classes of network
The first has been incredibly successful; the second was a hedge against the Internet not winning, which turned out to be unnecessarily pessimistic.
There are only two non-Internet classes in the DNS: for MIT’s Chaosnet, which is now abused for DNS server metadata; and MIT’s Hesiod, which was the directory service for their Athena distributed computing system.
Now the abstract notion of a domain is not hierarchial: for instance, inter-domain routing is not hierarchial. The hierarchy of domain names is an important simplification to make the DNS manageable.
I make a pedantic and kind of subtle distinction between domain names (or DNS names if I want to be super specific) as the syntactic things in hierarchial nested namespaces; and a domain as the loosey-goosey notion of a management scope / network / namespace. Usually the distinction doesn’t matter much, but that’s how I like to keep it straight in my head.
Syntax of domain names
The hierarchial design of the DNS is not often very apparent, when
everything you see is like twitter.com
or apple.com
, but where I
work it’s multi-storey like
uk
- Britain / Nominetac.uk
- UKERNA / JANETcam.ac.uk
- the University of Cambridgedns.cam.ac.uk
- DNS services in Cambridgeauth0.dns.cam.ac.uk
- one of the DNS servers
I take the view that every domain name is a subdomain: subdomains and parent domains are purely syntactic properties of domain names, and don’t imply anything about what the name is for — whether it is a hostname or not, whether it is a namespace or not. I work in the guts of DNS software, where parent domains and subdomains are treated this way.
Function of domain names
We can classify domain names according to what they are used for:
-
a host name has A and/or AAAA address records
-
a mail domain has MX records
-
a service domain (when prefixed by protocol and transport labels) has SRV records
Often the longest names - the leaves of the DNS tree structure - are hostnames, but (surprise!) hostnames can have subdomains:
- bare
cam.ac.uk
is a hostname - bare
dns.cam.ac.uk
is a hostname
Well, maybe not a surprise, when you understand that the hierarchial structure of the DNS namespace does not (much) constrain how a name is used: for instance, when you make a DNS query the answer does not tell you whether or not the name has subdomains.
A domain name can also function as a namespace, where it names an
abstract domain. For instance, my colleagues might refer to “the
cam.ac.uk
domain”, and they might mean the University data network,
or they might mean the set of devices with names under cam.ac.uk
-
they often don’t mean the name itself!
Domain boundaries in the DNS
To a large extent, it is a matter of convention whether a domain name represents an abstract domain or not - that is, whether the domain name is used as a namespace. But there are three ways that boundaries between domains become explicit in the DNS:
-
zones
-
the stub resolver search path
-
$ORIGIN
in zone files
DNS zones
Scalability, distribution, federation: in the DNS you delegate responsibility for some part of the namespace (a subtree of your domain) by introducing a zone cut.
In a zone cut the subdomain below the cut has its own separate zone file, its own SOA and NS records. You get your own zone when you register a domain name.
A zone cut has to occur at a .
in a domain name but not every .
is
a zone cut.
For example dns.cam.ac.uk
is in the same zone as cam.ac.uk
- there
is no zone cut between dns
and cam
- but maths.cam.ac.uk
is in
a different zone. The dns
subdomain is not delegated but the maths
subdomain is.
Counterfactual thought experiment! If every .
in a domain name was a
zone cut then www.cam.ac.uk
(and every other hostname!) would have
to be in a separate zone from cam.ac.uk
.
Subdomains without zones
In Cambridge our DNS management system allows us to control who has permission to update hostnames under which subdomains, without necessarily using DNS delegations and zone cuts to mark the boundaries. Chris Siebenmann has written about how they use subdomains for namespace classification in his department in Toronto.
The resolver search path
Back to the 1980s and the ARPANET / Internet protocol upgrade!
ARPANET users were used to typing single-label hostnames; none of them
wanted to have to rattle out .berkeley.edu
(or whatever) after every
hostname, on a serial terminal at 1200 baud.
So as a command-line user interface convenience, the DNS has a notion of your local domain. Remember the abstract notion of a domain, that models the scope of a network - its management, its physical and logical extent. In a DNS stub resolver you give it an idea of your local network by configuring its search path, or in simple cases just a domain name.
For example, I configure the search path on my workstation with
dns.cam.ac.uk
(where my servers live), csi.cam.ac.uk
(devices in
the office), uis.cam.ac.uk
(lots of things such as my old git
server). That lets me type ssh auth0
and log in to one of my
authoritative dns servers, or ssh git
to get to my git server, and
the stub resolver will use its search path to automatically append the
(hopefully) right parent domain onto the unqualified hostname that I
typed.
Usually it isn’t this complicated. For example, when I started work at
Cambridge almost all the computers I worked with were under
csi.cam.ac.uk
so I could put domain csi.cam.ac.uk
in
/etc/resolv.conf
and it would do the right thing, including walking
up the hierarchy to find the names of services like hermes.cam.ac.uk
and jackdaw.cam.ac.uk
which are not in the csi
subdomain.
It’s worth looking at the stub resolver options (you may recoil in horror) and their unpleasant history.
Non-DNS domain names
I’m going on a bit of a tangent now, because it involves some fun facts, even though it isn’t really about the question of subdomains.
Andrew Sullivan pointed out that a stub resolver
doesn’t just use the DNS to resolve names: it can use multicast DNS to
resolve names under .local
(mDNS is related to the DNS but quite
different); or hosts
files; or other name service protocols.
Ron Echeverri prompted me to talk about one of these others:
the JANET name registration scheme (NRS). The NRS had
hierarchial names a bit like the DNS, but it was implemented like the
ARPANET hosts.txt
- one big centrally managed hosts file.
(The reason the UK DNS TLD is .uk
is for NRS compatibility. Strictly
speaking it should be .gb
to match our ISO 3166 alpha-2
code.)
UK sysadmins older than I have terrible stories about coaxing systems
to interoperate between the DNS and NRS, because whereas our DNS name
is cam.ac.uk
, our NRS name was uk.ac.cam
, the other way round.
This raises the question of how did they choose which order to write a domain name?
NRS order is: lexicographic order matches hierarchy.
Paul Mockapetris explained that he chose DNS order for user interface reasons: the prefix of the name, what you type first, is the most locally relevant and specific, and the suffix can be automatically completed - hence the resolver search path.
Unqualified hostnames
The stub resolver search feature gives us the notion of an
unqualified hostname (eg, hermes
or auth0
) as an abbreviation for a
fully-qualified domain name, FQDN (eg, hermes.cam.ac.uk
or
auth0.dns.cam.ac.uk
).
Typically unix flavoured systems like to be configured to think of
their own name as their unqualified single-label hostname, and you
configure the rest of the name as the domain
in /etc/resolv.conf
.
Aside: Ever since I worked at Demon I have configured servers with their FQDN as their hostname, to avoid weird DNS fuckery, such as long timeouts waiting for the server to work outs its own FQDN because its DNS setup is broken; DNS is now my job and, yep, it is really important to configure my systems to avoid this kind of problem!
Subdomains in zone files
Unqualified hostnames and the notion of a current domain also turn up in the syntax of DNS zone files. Domain names in zone files are FQDNs if they end with a trailing dot; if a name doesn’t end with a dot then the current domain is added to it. This is the cause of many irritating mistakes!
By default the current domain is the same as the name of the zone, but
you can use the $ORIGIN
directive to change it so that (for
instance) different sections of the zone file have different
subdomains as their default domain.
The $ORIGIN
directive is probably the clearest place in the DNS
where you can point at something concrete that isn’t a zone cut and
say, here is a subdomain. It is a syntactic convenience (and sometimes
inconvenience, when you get the dots wrong) that makes it easier to
set up a subdomain by naming convention rather than by zone file.
Domains and subdomains
In the majority of cases, less complicated than Cambridge, you end up with a situation where
-
you register a domain name, which is your DNS zone
-
you configure that as your stub resolver’s (parent, search) domain
-
your hostnames are all one level under your registered domain
And subdomains are not something you have to think about.
If you are used to normal setups with flat namespaces and no delegation, then subdomains can be intimidating: they threaten to open a huge can of worms of DNS complexity.
Simpler subdomains
The three key ideas to tame subdomains:
-
Always work with fully-qualified domain names, to avoid relying on the resolver search path, and to avoid syntax footguns in zone files.
You can configure a search path to make the interactive command line more comfortable, but source code and configuration files should always use FQDNs.
-
Subdomains and parent domains are just syntactic features of domain names, where one name is a suffix of another (with a dot between). Dots in domain names don’t have to imply any particular meaning.
-
If you want a hierarchial naming structure, you do not need any of the horrors of zone cuts or delegations or NS records: you can just put a dot in a name.
Other people often use the term “subdomain” to mean a delegated namespace (with or without a zone cut), so it’s a domain (in the abstract sense) that’s subordinate. It’s a more restricted definition compared to my meaning, where a subdomain is a domain name (in the syntactic sense) that’s subordinate.