TCP Fast Open: expediting web services

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Michael Kerrisk
August 1, 2012

Much of today's Internet traffic takes the form of short TCP data flows that consist of just a few round trips exchanging data segments before the connection is terminated. The prototypical example of this kind of short TCP conversation is the transfer of web pages over the Hypertext Transfer Protocol (HTTP).

The speed of TCP data flows is dependent on two factors: transmission delay (the width of the data pipe) and propagation delay (the time that the data takes to travel from one end of the pipe to the other). Transmission delay is dependent on network bandwidth, which has increased steadily and substantially over the life of the Internet. On the other hand, propagation delay is a function of router latencies, which have not improved to the same extent as network bandwidth, and the speed of light, which has remained stubbornly constant. (At intercontinental distances, this physical limitation means that—leaving aside router latencies—transmission through the medium alone requires several milliseconds.) The relative change in the weighting of these two factors means that over time the propagation delay has become a steadily larger component in the overall latency of web services. (This is especially so for many web pages, where a browser often opens several connections to fetch multiple small objects that compose the page.)

Reducing the number of round trips required in a TCP conversation has thus become a subject of keen interest for companies that provide web services. It is therefore unsurprising that Google should be the originator of a series of patches to the Linux networking stack to implement the TCP Fast Open (TFO) feature, which allows the elimination of one round time trip (RTT) from certain kinds of TCP conversations. According to the implementers (in "TCP Fast Open", CoNEXT 2011 [PDF]), TFO could result in speed improvements of between 4% and 41% in the page load times on popular web sites.

We first wrote about TFO back in September 2011, when the idea was still in the development stage. Now that the TFO implementation is starting to make its way into the kernel, it's time to visit it in more detail.

The TCP three-way handshake

To understand the optimization performed by TFO, we first need to note that each TCP conversation begins with a round trip in the form of the so-called three-way handshake. The three-way handshake is initiated when a client makes a connection request to a server. At the application level, this corresponds to a client performing a connect() system call to establish a connection with a server that has previously bound a socket to a well-known address and then called accept() to receive incoming connections. Figure 1 shows the details of the three-way handshake in diagrammatic form.

Figure 1: TCP three-way handshake between a client and a server

During the three-way handshake, the two TCP end-points exchange SYN (synchronize) segments containing options that govern the subsequent TCP conversation—for example, the maximum segment size (MSS), which specifies the maximum number of data bytes that a TCP end-point can receive in a TCP segment. The SYN segments also contain the initial sequence numbers (ISNs) that each end-point selects for the conversation (labeled M and N in Figure 1).

The three-way handshake serves another purpose with respect to connection establishment: in the (unlikely) event that the initial SYN is duplicated (this may occur, for example, because underlying network protocols duplicate network packets), then the three-way handshake allows the duplication to be detected, so that only a single connection is created. If a connection was established before completion of the three-way handshake, then a duplicate SYN could cause a second connection to be created.

The problem with current TCP implementations is that data can only be exchanged on the connection after the initiator of the connection has received an ACK (acknowledge) segment from the peer TCP. In other words, data can be sent from the client to the server only in the third step of the three-way handshake (the ACK segment sent by the initiator). Thus, one full round trip time is lost before data is even exchanged between the peers. This lost RTT is a significant component of the latency of short web conversations.

Applications such as web browsers try to mitigate this problem using HTTP persistent connections, whereby the browser holds a connection open to the web server and reuses that connection for later HTTP requests. However, the effectiveness of this technique is decreased because idle connections may be closed before they are reused. For example, in order to limit resource usage, busy web servers often aggressively close idle HTTP connections. The result is that a high proportion of HTTP requests are cold, requiring a new TCP connection to be established to the web server.

Eliminating a round trip

Theoretically, the initial SYN segment could contain data sent by the initiator of the connection: RFC 793, the specification for TCP, does permit data to be included in a SYN segment. However, TCP is prohibited from delivering that data to the application until the three-way handshake completes. This is a necessary security measure to prevent various kinds of malicious attacks. For example, if a malicious client sent a SYN segment containing data and a spoofed source address, and the server TCP passed that segment to the server application before completion of the three-way handshake, then the segment would both cause resources to be consumed on the server and cause (possibly multiple) responses to be sent to the victim host whose address was spoofed.

The aim of TFO is to eliminate one round trip time from a TCP conversation by allowing data to be included as part of the SYN segment that initiates the connection. TFO is designed to do this in such a way that the security concerns described above are addressed. (T/TCP, a mechanism designed in the early 1990s, also tried to provide a way of short circuiting the three-way handshake, but fundamental security flaws in its design meant that it never gained wide use.)

On the other hand, the TFO mechanism does not detect duplicate SYN segments. (This was a deliberate choice made to simplify design of the protocol.) Consequently, servers employing TFO must be idempotent—they must tolerate the possibility of receiving duplicate initial SYN segments containing the same data and produce the same result regardless of whether one or multiple such SYN segments arrive. Many web services are idempotent, for example, web servers that serve static web pages in response to URL requests from browsers, or web services that manipulate internal state but have internal application logic to detect (and ignore) duplicate requests from the same client.

In order to prevent the aforementioned malicious attacks, TFO employs security cookies (TFO cookies). The TFO cookie is generated once by the server TCP and returned to the client TCP for later reuse. The cookie is constructed by encrypting the client IP address in a fashion that is reproducible (by the server TCP) but is difficult for an attacker to guess. Request, generation, and exchange of the TFO cookie happens entirely transparently to the application layer.

At the protocol layer, the client requests a TFO cookie by sending a SYN segment to the server that includes a special TCP option asking for a TFO cookie. The SYN segment is otherwise "normal"; that is, there is no data in the segment and establishment of the connection still requires the normal three-way handshake. In response, the server generates a TFO cookie that is returned in the SYN-ACK segment that the server sends to the client. The client caches the TFO cookie for later use. The steps in the generation and caching of the TFO cookie are shown in Figure 2.

Figure 2: Generating the TFO cookie

At this point, the client TCP now has a token that it can use to prove to the server TCP that an earlier three-way handshake to the client's IP address completed successfully.

For subsequent conversations with the server, the client can short circuit the three-way handshake as shown in Figure 3.

Figure 3: Employing the TFO cookie

The steps shown in Figure 3 are as follows:

The client TCP sends a SYN that contains both the TFO cookie (specified as a TCP option) and data from the client application.
The server TCP validates the TFO cookie by duplicating the encryption process based on the source IP address of the new SYN. If the cookie proves to be valid, then the server TCP can be confident that this SYN comes from the address it claims to come from. This means that the server TCP can immediately pass the application data to the server application.
From here on, the TCP conversation proceeds as normal: the server TCP sends a SYN-ACK segment to the client, which the client TCP then acknowledges, thus completing the three-way handshake. The server TCP can also send response data segments to the client TCP before it receives the client's ACK.

In the above steps, if the TFO cookie proves not to be valid, then the server TCP discards the data and sends a segment to the client TCP that acknowledges just the SYN. At this point, the TCP conversation falls back to the normal three-way handshake. If the client TCP is authentic (not malicious), then it will (transparently to the application) retransmit the data that it sent in the SYN segment.

Comparing Figure 1 and Figure 3, we can see that a complete RTT has been saved in the conversation between the client and server. (This assumes that the client's initial request is small enough to fit inside a single TCP segment. This is true for most requests, but not all. Whether it might be technically possible to handle larger requests—for example, by transmitting multiple segments from the client before receiving the server's ACK—remains an open question.)

There are various details of TFO cookie generation that we don't cover here. For example, the algorithm for generating a suitably secure TFO cookie is implementation-dependent, and should (and can) be designed to be computable with low processor effort, so as not to slow the processing of connection requests. Furthermore, the server should periodically change the encryption key used to generate the TFO cookies, so as to prevent attackers harvesting many cookies over time to use in a coordinated attack against the server.

There is one detail of the use of TFO cookies that we will revisit below. Because the TFO mechanism allows a client that submits a valid TFO cookie to trigger resource usage on the server before completion of the three-way handshake, the server can be the target of resource-exhaustion attacks. To prevent this possibility, the server imposes a limit on the number of pending TFO connections that have not yet completed the three-way handshake. When this limit is exceeded, the server ignores TFO cookies and falls back to the normal three-way handshake for subsequent client requests until the number of pending TFO connections falls below the limit; this allows the server to employ traditional measures against SYN-flood attacks.

The user-space API

As noted above, the generation and use of TFO cookies is transparent to the application level: the TFO cookie is automatically generated during the first TCP conversation between the client and server, and then automatically reused in subsequent conversations. Nevertheless, applications that wish to use TFO must notify the system using suitable API calls. Furthermore, certain system configuration knobs need to be turned in order to enable TFO.

The changes required to a server in order to support TFO are minimal, and are highlighted in the code template below.

    sfd = socket(AF_INET, SOCK_STREAM, 0);   // Create socket

    bind(sfd, ...);                          // Bind to well known address
    
    int qlen = 5;                            // Value to be chosen by application
    setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
    
    listen(sfd, ...);                        // Mark socket to receive connections

    cfd = accept(sfd, NULL, 0);              // Accept connection on new socket

    // read and write data on connected socket cfd

    close(cfd);

Setting the TCP_FASTOPEN socket option requests the kernel to use TFO for the server's socket. By implication, this is also a statement that the server can handle duplicated SYN segments in an idempotent fashion. The option value, qlen, specifies this server's limit on the size of the queue of TFO requests that have not yet completed the three-way handshake (see the remarks on prevention of resource-exhaustion attacks above).

The changes required to a client in order to support TFO are also minor, but a little more substantial than for a TFO server. A normal TCP client uses separate system calls to initiate a connection and transmit data: connect() to initiate the connection to a specified server address and (typically) write() or send() to transmit data. Since a TFO client combines connection initiation and data transmission in a single step, it needs to employ an API that allows both the server address and the data to be specified in a single operation. For this purpose, the client can use either of two repurposed system calls: sendto() and sendmsg().

The sendto() and sendmsg() system calls are normally used with datagram (e.g., UDP) sockets: since datagram sockets are connectionless, each outgoing datagram must include both the transmitted data and the destination address. Since this is the same information that is required to initiate a TFO connection, these system calls are recycled for the purpose, with the requirement that the new MSG_FASTOPEN flag must be specified in the flags argument of the system call. A TFO client thus has the following general form:

    sfd = socket(AF_INET, SOCK_STREAM, 0);
    
    sendto(sfd, data, data_len, MSG_FASTOPEN, 
                (struct sockaddr *) &server_addr, addr_len);
        // Replaces connect() + send()/write()
    
    // read and write further data on connected socket sfd

    close(sfd);

If this is the first TCP conversation between the client and server, then the above code will result in the scenario shown in Figure 2, with the result that a TFO cookie is returned to the client TCP, which then caches the cookie. If the client TCP has already obtained a TFO cookie from a previous TCP conversation, then the scenario is as shown in Figure 3, with client data being passed in the initial SYN segment and a round trip being saved.

In addition to the above APIs, there are various knobs—in the form of files in the /proc/sys/net/ipv4 directory—that control TFO on a system-wide basis:

The tcp_fastopen file can be used to view or set a value that enables the operation of different parts of the TFO functionality. Setting bit 0 (i.e., the value 1) in this value enables client TFO functionality, so that applications can request TFO cookies. Setting bit 1 (i.e., the value 2) enables server TFO functionality, so that server TCPs can generate TFO cookies in response to requests from clients. (Thus, the value 3 would enable both client and server TFO functionality on the host.)
The tcp_fastopen_cookies file can be used to view or set a system-wide limit on the number of pending TFO connections that have not yet completed the three-way handshake. While this limit is exceeded, all incoming TFO connection attempts fall back to the normal three-way handshake.

Current state of TCP fast open

Currently, TFO is an Internet Draft with the IETF. Linux is the first operating system that is adding support for TFO. However, as yet that support remains incomplete in the mainline kernel. The client-side support has been merged for Linux 3.6. However, the server-side TFO support has not so far been merged, and from conversations with the developers it appears that this support won't be added in the current merge window. Thus, an operational TFO implementation is likely to become available only in Linux 3.7.

Once operating system support is fully available, a few further steps need to be completed to achieve wider deployment of TFO on the Internet. Among these is assignment by IANA of a dedicated TCP Option Number for TFO. (The current implementation employs the TCP Experimental Option Number facility as a placeholder for a real TCP Option Number.)

Then, of course, suitable changes must be made to both clients and servers along the lines described above. Although each client-server pair requires modification to employ TFO, it's worth noting that changes to just a small subset of applications—most notably, web servers and browsers—will likely yield most of the benefit visible to end users. During the deployment process, TFO-enabled clients may attempt connections with servers that don't understand TFO. This case is handled gracefully by the protocol: transparently to the application, the client and server will fall back to a normal three-way handshake.

There are other deployment hurdles that may be encountered. In their CoNEXT 2011 paper, the TFO developers note that a minority of middle-boxes and hosts drop TCP SYN segments containing unknown (i.e., new) TCP options or data. Such problems are likely to diminish as TFO is more widely deployed, but in the meantime a client TCP can (transparently) handle such problems by falling back to the normal three-way handshake on individual connections, or generally falling back for all connections to specific server IP addresses that show repeated failures for TFO.

Conclusion

TFO is promising technology that has the potential to make significant reductions in the latency of billions of web service transactions that take place each day. Barring any unforeseen security flaws (and the developers seem to have considered the matter quite carefully), TFO is likely to see rapid deployment in web browsers and servers, as well as in a number of other commonly used web applications.

Index entries for this article
Kernel	Networking
Kernel	TCP

(Log in to post comments)

TCP Fast Open: expediting web services

Posted Aug 1, 2012 21:57 UTC (Wed) by pj (subscriber, #4506) [Link]

Why do clients have to change to using sendto() or sendmsg() ? Couldn't the charter of connect() be changed a little iff TCP_FASTOPEN is set to make it be a bit more 'lazy' so it doesn't actually attempt a connection until something is written to it?

TCP Fast Open: expediting web services

Posted Aug 1, 2012 22:21 UTC (Wed) by nix (subscriber, #2304) [Link]

I can't see a sensible way for connect(2) to determine if it should return ECONNREFUSED without that first SYN/ACK roundtrip. :)

(You'll note that sendmsg(), being intended for connectionless protocols, isn't documented as being able to return ECONNREFUSED either. I suspect this oversight will be rectified in due course.)

TCP Fast Open: expediting web services

Posted Aug 1, 2012 23:00 UTC (Wed) by josh (subscriber, #17465) [Link]

Just document that if you connect with the TCP_FASTOPEN flag you can potentially get ECONNREFUSED from whatever later call you use to write data to the socket.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 22:56 UTC (Thu) by nix (subscriber, #2304) [Link]

Quite. The set of errors in the manpages (and, indeed, in POSIX) is not a total set -- the kernel is allowed to return other errors, though it is perhaps unwise to expect callers to expect those errors.

TCP Fast Open: expediting web services

Posted Aug 1, 2012 23:11 UTC (Wed) by dskoll (subscriber, #1630) [Link]

Well, connect could return EINPROGRESS if the option is used as it does for a non-blocking socket. That's a bit of a lie, though. :)

TCP Fast Open: expediting web services

Posted Aug 8, 2012 14:45 UTC (Wed) by kevinm (guest, #69913) [Link]

...and you could request fastopen by setting TCP_CORK before connect(); write(); then uncorking.

TCP Fast Open: expediting web services

Posted Aug 1, 2012 22:34 UTC (Wed) by bojan (subscriber, #14302) [Link]

> and the speed of light, which has remained stubbornly constant.

Anyone has a patch for this? ;-)

TCP Fast Open: expediting web services

Posted Aug 2, 2012 0:39 UTC (Thu) by felixfix (subscriber, #242) [Link]

Numerous more-pessimal patches exist in the literature; some assembly may be required. More-optimal is TBD for the time being and left as an exercise for the student.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 23:28 UTC (Thu) by Lennie (subscriber, #49641) [Link]

For HTTP the patch is a SPDY-/HTTP/2-like protocol, the performance of HTTP is currently limited because it does not do multiplexing.

SPDY/2 is currently supported by Firefox, Chrome. An Apache module from Google, there is a beta patch from the nginx developers, node.js module, a Java server implementation and some beta C-client- and server-libraries and implementations.

TCP Fast Open: expediting web services

Posted Aug 9, 2012 6:11 UTC (Thu) by cvrebert (guest, #86030) [Link]

You're gonna have to wait until 2208 for that.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 7:45 UTC (Thu) by iq-0 (subscriber, #36655) [Link]

This probably will have an uphill battle being really accepted, since the client has to be sure that the request is idempotent on the server when connecting. That's effectively impossible without explicit annotations (at least for http, other protocols might not have that problem) :-/

TCP Fast Open: expediting web services

Posted Aug 2, 2012 9:44 UTC (Thu) by epa (subscriber, #39769) [Link]

The article makes it sound like the server's responsibility to make sure its behaviour is idempotent before enabling TFO. For http in particular, a GET request is defined to be idempotent while POST is not, and the client knows which it is sending. So it makes more sense for web servers to always allow TFO, but clients refrain from using it for POST or PUT requests. The web server could add a sanity check, once it receives the data and starts processing the request, that TFO was not used for these request types.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 12:16 UTC (Thu) by iq-0 (subscriber, #36655) [Link]

Many 'GET' operations are not necessarily idempotent. The RFC's language is 'SHOULD NOT', which is about the mildest way requirements are described.

The problem is that often different applications run in a big webserver (especially true for massive virtual hosting setups) and that "partially enabling TFO" is not really an option.

The other way round would be for the application to signal that the request is not allowed using TFO, but that would require a retry using non-TFO which is not specced and introduces more latency than is gained using TFO.

The only real solution that is safe would be to invent (yet another) HTTP header that signifies which methods for which paths under the current vhost may be done using TFO initialized connections.
For systems that support TFO for all requests (because they have higher-level guards against duplicate requests) one could simply hint '* /' or something. Only the first connection to such a site must in that case always be made using non-TFO requests.

TCP Fast Open: expediting web services

Posted Aug 9, 2012 13:48 UTC (Thu) by jthill (subscriber, #56558) [Link]

I think just allowing a server to detect whether the connection is still in fast-open state should be enough. If the server sees a non-idempotent request it can make sure the initial ACK has arrived before proceeding. Make it detectable by say making {aio_,}fsync not complete until it's arrived, or fabricating a redundant EPOLLOUT edge.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 12:21 UTC (Thu) by colo (guest, #45564) [Link]

Ah, well yes, in an ideal world... :)

I've seen enough GET-requests with at times far-reaching side effects in the wild that I'm not convinced this will (or should) see widespread adoption, at least not for "ordinary" HTTP servers that aren't serving up static content only or something like that.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 13:16 UTC (Thu) by epa (subscriber, #39769) [Link]

Long ago I worked at ArsDigita where the rule was to avoid form submit buttons as much as possible. So the user management page on a site would have 'delete user' not as a POST form submission, but simply a hyperlink. This was held to make the website look cleaner and feel faster. Then a customer using a 'web accelerator' which eagerly follows links ended up trashing their whole site. The programmer's fault, of course - GET requests should never be used for strongly state-changing operations like deleting data.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 14:04 UTC (Thu) by alankila (guest, #47141) [Link]

Although to be fair, multiple GETs that all attempt to delete the same user aren't a problem here. So maybe the latter request crashes if website developer wasn't careful enough, but the delete itself was idempotent.

I guess cautious people can't turn TFO on unless they validate the software they runs, but now there is going to be a good reason why you want to ensure that software is idempotent-GET safe. I imagine that the vast majority of software is, actually.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 15:19 UTC (Thu) by epa (subscriber, #39769) [Link]

Yes, I was confusing idempotent (makes no difference whether requested once, or many times) with stateless (makes no difference whether requested zero, one, or many times). Ideally GET requests would be not merely idempotent but stateless, which is a stronger property. But that is not needed for TFO.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 16:11 UTC (Thu) by man_ls (guest, #15091) [Link]

That is not required in the RFC, and I am not sure that restful, stateless web services would be even possible. After all, web services are supposed to change state in the server; otherwise what is the point?

TCP Fast Open: expediting web services

Posted Aug 2, 2012 16:32 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Maybe I don't understand. Internally we have a great many RESTful web services which all work roughly like this:

The client asks if this particular combination of data (provided as GET query parameters) is found in the database overseen by that web service. The server looks in its database (e.g. maybe it's the collection of all voter registration records for a particular country) and if there is a matching record it replies saying what was found and where.

You can run that same request again and get the same exact answer and running it many times or not at all changes nothing‡ so that seems to meet your requirement entirely.

‡ In practice some of the services do accounting, they are incrementing a counter somewhere for every query run and then we use that counter to determine the payment to a third party for the use of their data. But this is no greater deviation from the concept than the usual practice of logging GET requests and anyway most of the services don't do that.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 20:42 UTC (Thu) by dlang (guest, #313) [Link]

you are making the assumption that the RESTful web service is read-only.

How can you do a RESTful web service where the client is sending information to the server without causing these sorts of problems?

TCP Fast Open: expediting web services

Posted Aug 3, 2012 2:02 UTC (Fri) by butlerm (subscriber, #13312) [Link]

The normal technique is to mark all non-idempotent requests with a unique id, and keep track of which ones have already been processed. That is practically the only way to get once-only execution semantics.

This could presumably be done (up to a point) at the transport layer with TCP Fast Open by having the initiating endpoint assign a unique identifier to a given user space connection request, attaching the identifier as a TCP option, caching the identifier for some reasonable period on the target endpoint, and throwing away SYN packets with connection identifiers that have already been satisfied. The more general way to do that of course is to do it at the application layer, in addition to anything the transport layer may or may not do.

TCP Fast Open: expediting web services

Posted Aug 3, 2012 15:35 UTC (Fri) by epa (subscriber, #39769) [Link]

I thought that part of making a RESTful web service was deciding which operations are read-only and which affect the state; and splitting them into GET and POST requests accordingly.

TCP Fast Open: expediting web services

Posted Aug 3, 2012 21:16 UTC (Fri) by dlang (guest, #313) [Link]

That depends on how you define REST

many groups make everything a GET request, especially for APIs that are not expected to be used from browsers, but rather called from other applications, especially in B2B type situations.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 17:33 UTC (Thu) by ycheng-lwn (guest, #86073) [Link]

Our draft creates this confusion that HTTP on TFO will break if the request is not idempotent because we use the word "idempotent transactions". But today the client may send a non-idempotent request twice already with standard TCP. For example, the link may fail after the server receive a non-idempotent request. the client will retry the request on another connection later since the original is not acknowledged.

TFO makes such a case possible in the SYN stage: the server reboots between when it receives request in SYN-data and when it sends the SYN-ACK. Being unaware of the reboot, the client will timeout and retransmit SYNs. If the server comes back and accepts the SYN, the client will repeat the request. But IMO the risk is minimal especially if the server defers enabling TFO until a reasonable connection timeout after reboot, e.g., 5 min.

Cheers,

-yuchung (tfo developer)

TCP Fast Open: expediting web services

Posted Aug 3, 2012 4:36 UTC (Fri) by ras (subscriber, #33059) [Link]

> Being unaware of the reboot, the client will timeout and retransmit SYNs.

For that to happen the server must accept the cookie. Surely you could get around that by including the servers boot time in the MAC key used to generate the cookie?

TCP Fast Open: expediting web services

Posted Aug 3, 2012 14:04 UTC (Fri) by cesarb (subscriber, #6266) [Link]

> Surely you could get around that by including the servers boot time in the MAC key used to generate the cookie?

A better option would be /proc/sys/kernel/random/boot_id (see http://0pointer.de/blog/projects/ids.html).

TCP Fast Open: expediting web services

Posted Aug 4, 2012 23:16 UTC (Sat) by drdabbles (guest, #48755) [Link]

How do you perceive this working with load balancing hardware in the future? Vendors will have to get behind TFO and patch firmwares, but hardware behind the LBs will also need to be patched. Do you expect a chicken-and-the-egg situation here, or do you know something that perhaps you aren't or can't share with us?

TCP Fast Open: expediting web services

Posted Aug 8, 2012 0:05 UTC (Wed) by butlerm (subscriber, #13312) [Link]

That depends on the design of the load balancer. A well designed one should have no problem converting a TCP Fast Open connection between the client and the LB and a standard connection between the LB and the servers behind it. Assuming the servers and the load balancer(s) are colocated, there should be very little penalty for doing so.

TCP Fast Open: expediting web services

Posted Aug 8, 2012 3:41 UTC (Wed) by raven667 (subscriber, #5198) [Link]

Doing so would presumably add latency and reduce the effectiveness of fast open. It would make more sense for the LB to just do NAT rather than proxying the connection. Is there any special handling in conntrack needed for this?

TCP Fast Open: expediting web services

Posted Aug 8, 2012 8:02 UTC (Wed) by johill (subscriber, #25196) [Link]

I don't think it would affect the effectiveness a lot -- presumably the backend server and LB are close by each other, so the latency between them matters less than the latency between the LB & client.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 9:42 UTC (Thu) by Los__D (guest, #15263) [Link]

"In the above steps, if the TFO cookie proves not to be valid, then the server TCP discards the data and sends a segment to the client TCP that acknowledges just the SYN. At this point, the TCP conversation falls back to the normal three-way handshake. If the client TCP is authentic (not malicious), then it will (transparently to the application) retransmit the data that it sent in the SYN segment."

Why does the server discard the data? Shouldn't it just return to the old ways, and defer the delivery until the three-way handshake has completed?

TCP Fast Open: expediting web services

Posted Aug 2, 2012 10:59 UTC (Thu) by dan_a (guest, #5325) [Link]

I would think that the problem is the resources in the OS which you could consume by doing this - especially since the handshake has already failed one trustworthiness test.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 11:28 UTC (Thu) by Los__D (guest, #15263) [Link]

But if it is already possible to send data with an normal SYN, and that data gets delivered to the application later, what do you gain from throwing it away when you use TCP Fast Open? If the feature can be used for SYN attacks, they would just do it without Fast Open.

I'm probably missing something, but it doesn't really make sense to me.

TCP Fast Open: expediting web services

Posted Aug 3, 2012 2:17 UTC (Fri) by butlerm (subscriber, #13312) [Link]

If I am not mistaken, most modern TCP stacks do not hold data sent with a SYN, and do not send it either. For most applications, there would be relatively little advantage if they did. Requests usually fit in an MTU (or MSS) worth of data, and in the absence of something like TCP Fast Open, the target endpoint has to wait for an acknowledgement that can carry the full sub-MSS sized request without a problem. Where on the other hand, holding the data simply makes it easier to conduct SYN attacks.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 15:57 UTC (Thu) by gdt (subscriber, #6284) [Link]

"Router latencies" isn't really an issue, as they are easily enough solved by increasing the bandwidth (which reduces the time to receive and transmit a packet). (And yeah, I'm avoiding the "bufferbloat" overprovisioning of buffers at the network edge here, because when that exists RTO is not much help -- saving one RTT when you have multiple RTT in the queue ahead of you isn't a huge win.)

The speed of light in optical fiber is the major contributor to latency. The speed of light in fiber is roughly 150Km per ms, this is much slower than the speed of light in a vacuum. The speed of light in a fiber can be improved, but at the cost of narrowing bandwidth. This tradeoff isn't available to users, but is determined during ITU standards-making. Moreover the tradeoff isn't huge, in the order of 5% of latency. But the tradeoff does have a major effect on the cost of Forward Error Correction ASICs.

Once you get out of the tail links, the lowest speed you'll encounter on a ISP backbone is 1Gbps. You've got to have well more than 1,000 router hops before you'll get 1ms of ingress playin, cell switching and egress playout. The other devices which can add significant latency are the middleboxes at the other end of the link: firewalls, load sharers and so on.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 23:44 UTC (Thu) by Lennie (subscriber, #49641) [Link]

So when will we see companies building vacuum tubes used to speed up the light when crossing large parts of land or maybe the atlantic ?

That is the thing I care about. ;-)

Is anyone doing research on that yet ?

TCP Fast Open: expediting web services

Posted Aug 4, 2012 12:20 UTC (Sat) by nix (subscriber, #2304) [Link]

Warning: work in this area can be dangerous. See e.g. the series, ahem I mean feasibility study, bookended by <http://www.amazon.com/The-Collapsium-Wil-McCarthy/dp/0553...>, <http://www.amazon.com/To-Crush-Moon-Wil-McCarthy/dp/05535...>.

(Though admittedly the Queendom did go to rather more extreme lengths to increase the speed of light than mere vacuum tubes, and displayed a cavalier degree of carelessness, indeed insouciance, regarding the fact that keeping trillions of black holes in your solar system in very large arrays moving at well below orbital velocity is insanely dangerous.)

TCP Fast Open: expediting web services

Posted Aug 4, 2012 13:34 UTC (Sat) by Jannes (subscriber, #80396) [Link]

actually TFO should be a huge improvement in a bufferbloated situation. If there is 1 second of bufferbloat, then 'Saving one RTT' means saving 1 second.

Not saying it's a solution to bufferbloat of course.

TCP Fast Open: expediting web services

Posted Aug 4, 2012 23:12 UTC (Sat) by drdabbles (guest, #48755) [Link]

This may be true in theory (I'm not sure), but in practice it's completely wrong. Bandwidth tells you only how much data can be passed through a link in a given time period. Saying a link is capable of 1Gbit/sec means if you consume every possible bit for every possible cycle for 1 second, you'll have transferred 1Gbit of data over the wire. Many links have a frame/second limit, so if your frames aren't completely full, you've wasted bandwidth and decreased the utilization of the link.

Router latency is caused by many factors. Some can be router CPU shortages, memory resource shortages, the time it takes to transfer a frame from "the wire" to the internal hardware and vice verse, how quickly a packet can be processed, whether packet inspection is happening, etc. This, relatively speaking, can be a very long time. Typically it's microseconds, but certainly not always. Either way, it represents a minimum time delay with only a practical ceiling (IP timeout / retransmission). So increasing bandwidth to an already slow router only makes the problem worse.

Also, if you have a link that passes through 1000 routers, it's bound to hit a dozen that are oversubscribed and performing horribly. This is especially true as your distance from the "core" increases and your distance to the "edge" decreases. This is why major datacenters are next to or house the major peering points of the Internet.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 18:46 UTC (Thu) by pr1268 (subscriber, #24648) [Link]

Perhaps I'm not totally understanding what's going on here (high-level): TFO seems more like a user-space fix (i.e. Apache HTTPD, Microsoft IIS, etc.). Are changes to the system calls socket(2), connect(2), etc. the reason this article is on the LWN Kernel Development page?

Also, do I assume correctly that both client and server have to support TFO to realize the speed-up mentioned in the article?

I certainly don't mean to criticize this article (or its placement here on LWN), just curious instead. Great article, thanks!

TCP Fast Open: expediting web services

Posted Aug 3, 2012 2:22 UTC (Fri) by butlerm (subscriber, #13312) [Link]

Yes, both client and server need support for TCP Fast Open. That support amounts to a change to both the TCP stack and the TCP socket API, both of which are implemented by the kernel. Without kernel support (or a user space TCP implementation and the privileges necessary to use it) neither endpoint can make use of TFO.

TCP Fast Open: expediting web services

Posted Aug 2, 2012 19:06 UTC (Thu) by jengelh (subscriber, #33263) [Link]

>Google [...] to implement TCP Fast Open

Great, now GTFO is going to get a new subentry in the Urban Dictionary.

Examples for the speed of light

Posted Aug 10, 2012 0:28 UTC (Fri) by paulproteus (guest, #69280) [Link]

The article says:

At intercontinental distances, this physical limitation means that—leaving aside router latencies—transmission through the medium alone requires several milliseconds

To be more concrete:

1 mile is 5 microseconds
6 milliseconds from New York to Florida (1152 miles)
15 milliseconds from New York to San Francisco (2917 miles)
36 milliseconds from New York to Tokyo (6735 miles)

Source for time conversion: GNU Units.

You have: 1 mile
You want: light second
* 5.3681938e-06
/ 186282.4

Examples for the speed of light

Posted Aug 10, 2012 1:27 UTC (Fri) by dlang (guest, #313) [Link]

it's actually a bit longer than these times as the speed of light you are listing is the speed of light through a vacuum, going through fiber is noticeably slower.

Examples for the speed of light

Posted Aug 10, 2012 1:30 UTC (Fri) by paulproteus (guest, #69280) [Link]

Interesting point!

http://blog.advaoptical.com/speed-light-fiber-first-build... suggests that one should expect approximately a 33% increase in these times for the fiber optics.

(Additionally, they should be doubled due to round-trip time, as per my follow-up comment.)

Examples for the speed of light (Correction)

Posted Aug 10, 2012 1:28 UTC (Fri) by paulproteus (guest, #69280) [Link]

One correction to the above note: One should *double* these numbers for *round*-trip time. These are one-way times.

Examples for the speed of light

Posted Oct 2, 2012 8:17 UTC (Tue) by ncm (guest, #165) [Link]

Do fibers follow great-circle routes already? I expected that to take longer to happen.

Examples for the speed of light

Posted Oct 12, 2012 12:04 UTC (Fri) by Lennie (subscriber, #49641) [Link]

You can see the routes here:

http://www.cablemap.info/