[DNSOP] fragmentation itself (Re: FYI: draft-andrews-dnsop-defeat-frag-attack)

Paul Vixie <paul@redbarn.org> Wed, 10 July 2019 18:00 UTC

Return-Path: <paul@redbarn.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BBB1A12038B for <dnsop@ietfa.amsl.com>; Wed, 10 Jul 2019 11:00:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id saAQueA-bwdJ for <dnsop@ietfa.amsl.com>; Wed, 10 Jul 2019 11:00:57 -0700 (PDT)
Received: from family.redbarn.org (family.redbarn.org [IPv6:2001:559:8000:cd::5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0E160120355 for <dnsop@ietf.org>; Wed, 10 Jul 2019 11:00:56 -0700 (PDT)
Received: from linux-9daj.localnet (50-255-33-26-static.hfc.comcastbusiness.net [50.255.33.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by family.redbarn.org (Postfix) with ESMTPSA id 56994892E5 for <dnsop@ietf.org>; Wed, 10 Jul 2019 18:00:55 +0000 (UTC)
From: Paul Vixie <paul@redbarn.org>
To: dnsop@ietf.org
Date: Wed, 10 Jul 2019 18:00:54 +0000
Message-ID: <31268568.A9Z9RF6e3N@linux-9daj>
Organization: none
In-Reply-To: <20190710075028.GA2084@naina>
References: <01BAC484-5E62-4573-A162-F3BD4F0DCF34@isc.org> <20190710075028.GA2084@naina>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/9JcLNnJQ8J2mrlbI4RLdnuPJSA0>
Subject: [DNSOP] fragmentation itself (Re: FYI: draft-andrews-dnsop-defeat-frag-attack)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jul 2019 18:01:00 -0000

i like marka's proposed solution below, a lot. and muks' is also clever, 
though requiring wire protocol changes. however, fujiwara-san's proposal 
describes a broader array of fragmentation problems than just integrity, and 
we should be looking at that broader array when making our plans.

i think there's a broader question about fragmentation itself. at the outset, 
i knew, when creating EDNS0, that fragmentation was considered harmful:

https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-87-3.pdf

noting, jeff mogul and chris kanterjiev (kent), authors of the above tech 
report, were two of my mentors and bosses at d|i|g|i|t|a|l from 1988 to 1993, 
so, i read all of their work, and i discussed my questions and objections with 
them. fragmentation was, in ipv4, harmful. there can be no argument at this 
late date. i sinned egregiously in EDNS0 by opening the door to fragmentation, 
and the results have been predictably painful and expensive for everybody.

however, IPv6 intended to, promised to, and claimed to, fix V4 fragmentation's 
many defects. i must have been thinking optimistic thoughts about the IPv6 
time line, and IPv6's promises and intentions, when i opened the fragmentation 
door in EDNS0. what actually then happened was that ICMPv6 as required for 
PMTUD6 was not secure and could not be implemented, which means any 
fragmentation done in IPv6 (which unlike IPv4, is an endpoint-only activity) 
will be uninformed about path MTU. thus we make pessimistic assumptions like 
1500 and 1220. and if that's the kind of fragmentation we can actually get, 
then it's a negative value, and fragmentation in IPv6 is as bad, for different 
reasons, than in IPv4.

therefore, for the reasons set out by fujiwara-san in his recent draft posted 
here, and especially for the reasons spelled out by his extensive references, 
DNS should not use fragmentation. while some of kazunori's examples have to do 
with message integrity and attacks such as the shulman method, the case 
against fragmentation in DNS's use of UDP is immensely strong. solving for the 
integrity problems doesn't change our conclusion, and adds more complexity.

i have two final notes, which may help inform those who witnessed the sham 
consensus railroaded (soviet-style) through the recent DNS-OARC meeting in 
bangkok, and heard me speak against outlawing fragmentation as a 2020 Flag Day 
goal, and are now hearing me contradict myself.

---

first, we need fragmentation to work, which means we need path MTU discovery 
to work, which means we need ICMP to be secure, at least in IPv6. while use of 
fragmentation for DNS UDP has a high cost, the intentional investment of that 
cost would be a beneficial forcing function on fixing fragmentation itself. 
notably, TCP avoids fragmentation through its MSS signaling, which defaults to 
MIN(myMTU, herMTU) minus some fudge factor for protocol headers. which means 
lack of fragmentation does not hurt the web, and so nobody cares about it. but 
we should, all, care about it. i'll explain further in an upcoming article 
which i'll link here, but briefly, 1500 is the wrong LAN MTU for FastE, and is 
insanely small for 1GE, unthinkably wrong for 10GE, laughable for 40GE, and 
engineering malpractice for 100GE. for a test, do a bunch of NFS and SMB 
tests, over both UDP and TCP for each protocol, using jumbo grams (9K MTU) and 
then again using standard (1500 MTU) sized data grams. watch for transfer 
speed, CPU utilization, and network utilization (as bits, not as packets).

i will at some point teach FreeBSD TCP how to fragment its first TCP segment 
after synchronization, but only for IPv6. my goal is to force IPv4 fallback if 
IPv6 with all of its promised PMTUD and endpoint-only fragmentation does not 
work. let every network operator whose key performance indicators include IPv6 
deployment levels, begin to fear that without its PMTUD promises, IPv6 is not 
good enough to replace IPv4, and they will have to plan on investments in 
dual-stack, _forever_.

---

second, all mass is energy, and state in the network should be thought of as 
having mass. PMTUD has some scale problems regarding endpoint state 
requirements, and so, has to work well enough for fast LRU purges of state 
required for endpoint MTU information, which will lead to rapid rediscovery. 
but, TCP protocol control blocks are also state, and state has mass. a world 
in which every recursive iterating server has long-running TCP/853 (DoT) 
connections open to hundreds or thousands of authority servers is not going to 
be inexpensive, either for the initators or the responders. the web works this 
way, but can require tens of gigabytes of kernel memory for the TCP state 
alone. that's not a good ratio or mass to value. importantly, fragmentation 
has another state mass cost, which is transmitting the fragments with enough 
inter-packet gap to avoid microbursts which overflow the switch port buffers, 
and receiving fragments which must be reassembled before they can be 
delivered. all of this is wrong.

william simpson, perry metzger, and paul vixie (me) worked together about ten 
years ago to create TCP enhancements which would have permitted an unlimited 
number of quiescent but open TCP connections, at a per-connection state cost 
precisely equal to the cost of resisting a SYN flood attack. so, highly 
compressed state, because state mass is a high cost at the network-wide level. 
we also supported payloads large enough for DNS or WWW queries in the 
synchronization phase, fixed the security problems around RST, expanded the 
option header space, and saved the window size during periods of connection 
quiescence, allowing back-to-back-to-back transmissions once cookies had been 
exchanged. the result was RFC 6013, which was entirely ignored by the people 
who brought us TCPFO, which has the same incompressible state as TCP, adds no 
security, and reduces only the problem of round trip costs. the other document 
besides RFC 6013 that may be of interest is here:

https://www.usenix.org/system/files/login/articles/126-metzger.pdf

metzger, simpson, and vixie (me) are all notoriously difficult to work with, 
and this stems from correctable personality defects and unforced human 
protocols errors for which we should each be periodically upbraided. however, 
ignoring our work because we're somewhat irritating runs the risk of taking 
the internet itself down a blind alley from which a later return won't earn us 
thanks from the grandchildren.

---

in summary, the network needs working fragmentation so that it can have a 
future that isn't constrained by the physics of thickwire ten megabit 
ethernets, and if the DNS community were willing to join the fight, it would 
be a shorter fight. however, DNS, and UDP itself, is better off without 
fragmentation, because of state mass and complexity costs, regardless of 
whether we can solve fragmentation's integrity and substitution weaknesses.

-- 
Paul