Session layers, again

There’s been a moderately interesting thread on the main IETF list recently, especially the part starting with a message from Karl Auerbach which reminded me of a post I wrote last year. Especially interesting is the link posted by Lars Eggert to a paper by Bryan Ford (of PEG parsers fame). I need a less long-winded summary of my session layer idea, so this is it.

I’m starting from the observation that lots of application protocols have some idea of a session, but the Internet architecture provides no support for sessions so each application that needs one has re-invented the idea differently. The earliest example is the FTP control connection which manages multiple data connections. HTTP requests are mostly independent of each other, but it has had various feeble attempts at sessions built on top of it. TLS has a session cache to slightly reduce reconnect latency. BEEP and ssh use a single TCP connection per session and multiplex concurrent activities down the same pipe. Etc. There’s obviously a need to fulfill.

A session is essentially a user’s login to an application, therefore they should have the same lifetime. Login implies that a session’s core service is a cryptographic association between the client and server, which has to provide the usual integrity, confidentiality, as well as authenticity features. (As such it should be able to replace IPsec, TLS, SASL, etc.)

On top of this is built a layer for multiplexing streams (and perhaps datagrams). Since it already has a security association, it can avoid many performance problems caused by multiplexing using concurrent TCP connections:

it can omit the TCP three-way handshake and the TCP+TLS 6-way handshake, to get T/TCP startup performance without its security problems;
it can avoid slow-start restart delays and related problems by doing congestion control at the session layer instead of per-stream (see also RFC 2140);
it can avoid the problems that ssh and BEEP have with multiple layers of windowing and head-of-line blocking on packet loss, by making each stream run fully independently on top of the datagram layer;
if it supports both reliable streams and best-effort datagrams in the same session then it’ll support media apps (SIP, Jingle) well even when there is asymmetrical connectivity (NATs or firewalls - assuming they allow a session to be established in the first place);
the crypto provides the endpoints with a secure identity for the session that’s independent of lower-level addressing or routing, so they can re-establish the session if either end moves and can re-locate the other, without interrupting the higher-level data streams - mobility for free.

This is all very similar to Bryan Ford’s SST design, so I’m really pleased that the lazyweb has turned my dreaming into something real! However SST’s channels are lower-level than my sessions: all apps communicating between the same pair of hosts use one channel. I’m not sure how this affects higher-level authentication (SASL and perhaps X.509) - you would at least need some way to cryptographically bind app-level auth to the SST channel, but can other apps sharing the channel spoof this binding, and does it matter?

Last year’s article also had some wild speculation about fixing the problems of NAT and global routing scalability, but that part is not relevant to the current discussion and too vague to be summarized more concisely, so I won’t try (even though I still quite like the idea).