.@ Tony Finch – blog


I have a question about a subtle distinction in meaning between words. I want to know if this distinction is helpful, or if it is just academic obfuscation. But first let me set the scene...

Last week I attempted to explain DNSSEC to a few undergraduates. One of the things I struggled with was justifying its approach to privacy.

(HINT: it doesn't have one!)

Actually, DNSSEC has a bit more reasoning for its lack of privacy than that, so here's a digression:

The bullshit reason why DNSSEC doesn't give you privacy

Typically when you query the DNS you are asking for the address of a server; the next thing you do is connect to that server. The result of the DNS query is necessarily revealed, in cleartext, in the headers of the TCP connection attempt.

So anyone who can see your TCP connection has a good chance of inferring your DNS queries. (Even if your TCP connection went to Cloudflare or some other IP address hosting millions of websites, a spy can still guess based on the size of the web page and other clues.)

Why this reasoning is bullshit

DNSSEC turns the DNS into a general-purpose public key infrastructure, which means that it makes sense to use the DNS for a lot more interesting things than just finding IP addresses.

For example, people might use DNSSEC to publish their PGP or S/MIME public keys. So your DNS query might be a prelude to sending encrypted email rather than just connecting to a server.

In this case the result of the query is not revealed in the TCP connection traffic - you are always talking to your mail server! The DNS query for the PGP key reveals who you are sending mail to - information that would be much more obscure if the DNS exchange were encrypted!

That subtle distinction

What we have here is a distinction between

who you are talking to

and

what you are talking about

In the modern vernacular of political excuses for the police state, who you talk to is "metadata" and this is "valuable" information which "should" be collected. (For example, America uses metadata to decide who to kill.) Nobody wants to know what you are talking about, unless getting access to your data gives them a precedent in favour of more spying powers.

Hold on a sec! Let's drop the politics and get back to nerdery.

Actually it's more subtle than that

Frequently, "who you are talking to" might be obscure at a lower layer, but revealed at a higer layer.

For example, when you query for a correspondent's public key, that query might be encrypted from the point of view of a spy sniffing your network connection, so the spy can't tell whose key you asked for. But if the spy has cut a deal with the people who run the keyserver, they can find out which key you asked for and therefore who you are talking to.

Why DNS privacy is hard

There's a fundamental tradeoff here.

You can have privacy against an attacker who can observe your network traffic, by always talking to a few trusted servers who proxy your actions to the wider Internet. You get extra privacy bonus points if a diverse set of other people use the same trusted servers, and provide covering fire for your traffic.

Or you can have privacy against a government-regulated ISP who provides (and monitors) your default proxy servers, by talking directly to resources on the Internet and bypassing the monitored proxies. But that reveals a lot of information to network-level traffic analsis.

The question I was building up to

Having written all that preamble, I'm even less confident that this is a sensible question. But anyway,

What I want to get at is the distinction between metadata and content. Are there succinct words that capture the difference?

Do you think the following works? Does this make sense?

The weaker word is

"confidential"

If something is confidential, an adversary is likely to know who you are talking to, but they might not know what you talked about.

The stronger word is

"private"

If something is private, an adversary should not even know who you are dealing with.

For example, a salary is often "confidential": an adversary (a rival colleague or a prospective employer) will know who is paying it but not how big it is.

By contrast, an adulterous affair is "private": the adversary (your spouse) shouldn't even know you are spending a lot of time with someone else.

What I am trying to get at with this choice of words is the idea that even if your communication is "confidential" (i.e. encrypted so spies can't see the contents), it probably isn't "private" (i.e. it doesn't hide who you are talking to or suppress scurrilous implications).

SSL gives you confidentiality (it hides your credit card number) whereas TOR gives you privacy (it hides who you are buying drugs from).