.@ Tony Finch – blog


The Internet architecture is beautifully simple and fantastically successful. But it is wrong.

The Internet architecture was born of an argument between circuit switching and packet switching. It correctly argues that packet switching is more fundamental and therefore more powerful: it is easier to implement data streams efficiently on top of a network optimized for packet switching than it is to implement datagrams efficiently on top of a network optimized for circuit switching. A crucial part of the Internet’s design is the end-to-end argument, which says that reliable communication can be completely correctly implemented only at the endpoints of a link; any reliability features of the link itself are optimizations, not fundamentals. Hence the idea of intelligent endpoints that actively participate in maintaining a connection, rather than relying on the network to do all the work. However the Internet takes datagram fundamentalism too far, and at the same time it fails to take full advantage of intelligent endpoints.

In particular, Internet hosts - endpoints - are completely ignorant of the details of routing, hence architecture diagrams that depict the network as a cloud. Packets are shoved in and magically pop out wherever their addressing says they should. One consequence of this is that the network must know how to reach every host. The Internet makes this feasible by assuming that hosts with similar addresses have similar routing: although there are hundreds of millions of hosts on the Internet, core routers only have to deal with a couple of hundred thousand routes. However the fact remains that any site which has multiple connections to the Internet must run at least one router which keeps track of practically all the routes on the Internet. This severely limits the complexity of the Internet. (And it isn’t fixed by IPv6.)

This is not such a problem for an Internet of static devices or of intermittently connected devices, but if you want to support properly mobile devices which maintain communications seamlessly whilst changing their connectivity, you have a problem. The job of core routers now scales according to the number of devices not the number of organizations, and our technique (“CIDR”) for aggregating routes based on topology no longer works. The topology changes too fast and is too fine-grained. So mobility on the Internet uses a new routing layer above the basic Internet infrastructure, to work around the scalability problem.

Even in the absence of mobility, core Internet routers have an extremely difficult job. Not only do they have to forward packets at tens of gigabits per second, but they must also maintain a dynamic routing table which affects every packet forwarding action, and they must communicate with other routers to keep this table up-to-date. Routers in circuit-switched networks are much simpler, and therefore cheaper and easier to manage. RFC 3439 has a good discussion of the complexity and cost trade-offs. It isn’t an Internet hagiography.

An important corollary of the end-to-end argument is that security must be implemented end-to-end - after all, security is to a large extent a reliability problem. But as a consequence, whereas the Internet relies too much on the network for routing, it relies too much on the host for security. (This is partly, but not entirely, a consequence of the core protocols being mostly concerned with working at all, let alone working securely, and all the users being trusted in the first two decades.) So IP provides us with no help with managing access to the network or auditing network usage. It has no place for a trusted third party or mediated connectivity.

That does not mean that these are impossible to implement on the Internet - but it does mean they break things. Firewalls and NATs simplify routing and management, but they have to implement work-arounds for higher-level protocols which assume end-to-end connectivity. And “higher-level” can be as low-level as TCP: firewalls often break path MTU discovery by blocking crucial ICMP messages, and NATs often break TCP connections that stay idle too long.

Which (sort of) brings us to the upper levels of the protocol stack, where end-to-end security is implemented. This is where I get to my point about the need for a session layer. The particular features I am concerned with are security and multiplexing. You can compose them either way around, and the Internet uses both orders.

In HTTP, multiplexing relies on raw TCP: you use multiple concurrent TCP connections to get multiple concurrent HTTP requests. Each connection is secured using TLS, and above that, application-level functionality is used to authenticate the user. Similar models are used for DNS(SEC) and SIP.

In SSH, the TCP connection is secured and authenticated first, and this foundation is used as the basis for application-level multiplexing of streams over the connection. Similar models are used for Jabber and BEEP.

The problem with HTTP is that re-securing and re-authenticating each connection is costly, so complexity is added to mitigate these costs. So TLS session caches shorten connection start-up, and HTTP/1.1 allows multiple requests per extension, and techniques like cookies and session keys in URLs avoid the need to re-authenticate for each request.

The problem with SSH and BEEP is that multiplexing streams requires a windowing mechanism so that one busy stream doesn’t monopolize the link and starve quieter streams. However TCP already has a windowing mechanism, and in the event of poor connectivity this interferes with the upper layers. TCP-over-TCP is a bad idea but similar arguments apply to other upper layers.

What is missing is a proper session layer, which is used for authentication and to establish a security context, but which is agnostic about multiplexing - datagrams, streams, reliable or not, concurrent or not. Every Internet application protocol has had to re-invent a session layer: mapped to a TCP connection, as in SSH, or mapped to an authentication token, as in HTTP. This goes right back to the early days: in FTP, the session corresponds to the control connection, and multiplexing is handled by the data connections.

As well as managing security and multiplexing, a session layer can manage performance too. At the moment, we rely on TCP’s informal congestion control features: the Internet works because practically everyone implements them. However in the mid-1990s, people were seriously worried that this wouldn’t be sufficient. The rise of HTTP meant that bulk data transfer was happening in shorter connections which didn’t give TCP enough time to measure the available bandwidth, so it would tend to over-shoot. HTTP/1.1 and the dot-com overspend helped, but the problem is still there and is once more rearing its head in the form of multimedia streaming protocols. A session can share its measurement of network properties across all its constituent traffic.

My assumption is that sessions will be relatively heavyweight to set up and relatively long-lived: more like SSH than HTTP. The shortest session one typically sees is downloading a page (plus its in-line images) from a web site, which is long enough to justify the setup costs - after all, it’s enough to justify HTTP/1.1 pipelining which isn’t as good as the multiplexing I have in mind. But what about really short transactions? I do not believe they occur in isolation, so its reasonable to require them to be performed within a session.

But what about the DNS? In fact I see it as a vestigial bit of session layer. The endpoint identifiers we use in practice are domain names, but to talk to one we must first establish connectivity to it, which requires a DNS lookup. Admittedly this is a bit of a stretch, since the DNS lookup doesn’t involve an end-to-end handshake, but it can involve a fair amount of latency and infrastructure. The first one is especially heavyweight.

And establishing connectivity brings me back to routing. Why not use a more distributed on-demand model for routing? More like email than usenet? More like the DNS than hosts.txt? Then your router should be able to scale according to just your levels of traffic and complexity of connectivity, instead of according to the Internet as a whole.

When you set up a session with a remote host, you would establish not only a security context, but also a routing context. You can take on some responsibility for routing to make the network’s job easier. Perhaps it would be simpler if addresses were no longer end-to-end, but instead were more like paths. Routers could simply forward packets by examining a pre-established path rather than based on a dynamic routing lookup - and this would be secure because of the session’s security context. Separate infrastructure for session set-up would deal with the changing connectivity of the network, instead of routers. Because you participate in routing, you can co-operate actively as well: if you are mobile you can reconfigure your route as you move, without breaking the session.

I quite like this idea, but I really don’t know how it could be implemented. See how long people have been working on better infrastructure at similar levels: IPv6, DNSSEC. Maybe it could be built on top of IPv4 instead of replacing it. SCTP, a replacement for TCP, has many of the multiplexing and multihoming features, but it doesn’t address routing. And speaking of that, I’m not sure how to manage the millions of sessions flowing through a backbone router without requiring it to know about them all. Sessions would have to be handled in aggregate (according to local topology) but you still have to allocate bandwidth between them fairly…

Anyway, a fun idea to discuss in the pub.