BBR congestion control
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
Congestion-control algorithms are unglamorous bits of code that allow network protocols (usually TCP) to maximize the throughput of any given connection while simultaneously sharing the available bandwidth equitably with other users. New algorithms tend not to generate a great deal of excitement; the addition of TCP New Vegas during the 4.8 merge window drew little fanfare, for example. The BBR (Bottleneck Bandwidth and RTT) algorithm just released by Google, though, is attracting rather more attention; it moves away from the mechanisms traditionally used by these algorithms in an attempt to get better results in a network characterized by wireless links, meddling middleboxes, and bufferbloat.
The problem that any congestion-control algorithm must solve is that the net has no mechanism for informing an endpoint of the bandwidth available for a given connection. So the algorithm must, somehow, come to its own conclusions regarding just how much data it can send at any given time. Since the available bandwidth will generally vary over time, that bandwidth estimate must be revised occasionally. In other words, a congestion control algorithm must maintain an ongoing estimate of how much data can be sent, derived from the information that is available to it.
That information is somewhat sparse. These algorithms typically work by using one metric that they are easily able to measure: the number of packets that do not make it to the other end of the connection and must be retransmitted. When the network is running smoothly, dropped packets should be a rare occurrence. Once a router's buffers begin to fill, though, it will have no choice but to drop the packets it has no room for. Packet drops are thus a fairly reliable signal that a connection is overrunning the bandwidth available to it and should slow down.
The problem with this approach, on the network we have now, is that the buffers between any pair of endpoints can be huge. Oversized buffers have been recognized as a problem for some years now, and progress has been made in mitigating the resulting bufferbloat issues. But the world is still full of bloated routers and some link-level technologies, such as WiFi, require a certain amount of buffering for optimal performance. By the time an endpoint has sent enough data to overflow a buffer somewhere along the connection, the amount of data buffered could be huge. The packet-loss signal, in other words, comes far too late; by the time it is received, an endpoint could have been overdriving the connection for a long time.
Loss-based algorithms can also run into problems when short-lived conditions cause a dropped packet. They may slow down unnecessarily and, as a result, fail to make use of the bandwidth that is available.
Bottleneck Bandwidth and RTT
The BBR algorithm differs from most of the others in that it pays relatively little attention to packet loss. Instead, its primary metric is the actual bandwidth of data delivered to the far end. Whenever an acknowledgment packet is received, BBR updates its measurement of the amount of data delivered. The sum of data delivered over a period of time is a reasonably good indicator of the bandwidth the connection is able to provide, since the connection has demonstrably provided that bandwidth recently.
When a connection starts up, BBR will be in the "startup" state; in this mode, it behaves like most traditional congestion-control algorithms in that it starts slowly, but quickly ramps up the transmission speed in an attempt to measure the available bandwidth. Most algorithms will continue to ramp up until they experience a dropped packet; BBR, instead, watches the bandwidth measurement described above. In particular, it looks at the actual delivered bandwidth for the last three round-trip times to see if it changes. Once the bandwidth stops rising, BBR concludes that it has found the effective bandwidth of the connection and can stop ramping up; this has a good chance of happening well before packet loss would begin.
The measured bandwidth is then deemed to be the rate at which packets should be sent over the connection. But in measuring that rate, BBR probably transmitted packets at a higher rate for a while; some of them will be sitting in queues waiting to be delivered. In an attempt to drain those packets out of the buffers where they languish, BBR will go into a "drain" state, during which it will transmit below the measured bandwidth until it has made up for the excess packets sent before.
Once the drain phase is done, BBR goes into the steady-state mode where it transmits at more-or-less the calculated bandwidth. That is "more-or-less" because the characteristics of a network connection will change over time, so the actual delivered bandwidth must be continuously monitored. Also, an increase in effective bandwidth can only be detected by occasionally trying to transmit at a higher rate, so BBR will scale the rate up by 25% about 1/8 of the time. If the bandwidth has not increased (transmitting at a higher rate does not result in data being delivered at a higher rate, in other words), that probe will be followed by a drain period to even things out again.
One interesting aspect of BBR is that, unlike most other algorithms, it does not use the congestion window as the primary means of controlling outgoing traffic. The congestion window limits the maximum amount of data that can be in flight at any given time; an increase in the window will generally result in a burst of packets consuming the newly available bandwidth. BBR, instead, uses the tc-fq packet scheduler to send out data at the proper rate. The congestion window is still set as a way of ensuring that there is never too much data in flight, but it is no longer the main regulatory mechanism.
There is one last complication: many network connections are subject to "policers", middleboxes that limit the maximum data rate any connection can reach. If such a box exists, there is little point in trying to exceed the rate it will allow. The BBR code looks for periods with a suspiciously constant bandwidth (within 4Kb/sec) and a high packet loss rate; should that happen, it concludes that there is a policer in the loop and limits the bandwidth to a level that will not cause that policer to start dropping packets.
The BBR patch set was posted by Neal
Cardwell; the code itself carries signoffs from a number of people,
including Van Jacobson and Eric Dumazet. Google has, they say, been using
BBR for some time, and is evidently happy with the results; BBR works fine
when only one side of the connection is using it, so each deployment
should, if it lives up to its promises, make the net that much better.
We shouldn't have to wait long to find out; networking maintainer David
Miller has applied the patches, meaning
that BBR should be available in the 4.9 kernel.
Index entries for this article | |
---|---|
Kernel | Networking/Congestion control |
(Log in to post comments)
BBR congestion control
Posted Sep 21, 2016 18:47 UTC (Wed) by post-factum (subscriber, #53836) [Link]
How many of them are already in kernel? Which one to use?
And, of course, this link should be here: https://xkcd.com/927/
BBR congestion control
Posted Sep 21, 2016 18:49 UTC (Wed) by post-factum (subscriber, #53836) [Link]
BBR congestion control
Posted Sep 21, 2016 23:11 UTC (Wed) by pr1268 (subscriber, #24648) [Link]
I'm unsure calling each of the congestion control algorithms a standard is appropriate (well, except for maybe Slow-Start). It's more like having a wide variety of choices, and the user can choose whichever one he/she thinks works best.
Not that I disagree with you, though. There are at least ten such algorithms in the Linux Kernel—maybe more; I'm not at my Linux computer as I'm typing this. So many choices.... :-\
BBR congestion control
Posted Sep 22, 2016 5:21 UTC (Thu) by josh (subscriber, #17465) [Link]
Other than additional processing power, does BBR do any *harm* in any scenario?
BBR congestion control
Posted Sep 22, 2016 6:04 UTC (Thu) by shemminger (subscriber, #5739) [Link]
BBR congestion control
Posted Sep 22, 2016 7:33 UTC (Thu) by josh (subscriber, #17465) [Link]
It makes sense that the various congestion control algorithms will have tradeoffs. But if one of them works consistently better in every way than not enabling one at all, why not enable one?
> Also it requires tc-fq which is not appropriate for use on routers.
Why not?
BBR congestion control
Posted Sep 22, 2016 23:18 UTC (Thu) by shemminger (subscriber, #5739) [Link]
fq_codel is stochastic, so it wont work very well on hosts with 1,000,000 flows or more...
fq_codel is aimed for routers, while sch_fq targets hosts, implementing pacing at a minimal cost (one high resolution timer per qdisc)
BBR congestion control
Posted Sep 23, 2016 3:32 UTC (Fri) by gdt (subscriber, #6284) [Link]
at least ten such algorithms in the Linux Kernel
Having all the congestion control algorithms available on a platform allows simple reasoning about the real-world performance of the algorithms: the only difference is the algorithm; the operating system, hardware and network are constant.[1] Stephen and others' implementation of plugable congestion control modules was by itself responsible for an appreciable advance in the state of the art, as it made in-the-field comparisons of algorithm performance easy to do: you didn't have to additionally have to reason about the comparative networking attributes of FreeBSD and Windows. This doesn't mean that all the algorithms provided by Linux are a good choice, far from it. However it is still useful to ship these algorithms as points of reference to identify regressions in newer algorithms.
As for user complexity, the default algorithm is a reasonable choice. There's no reason for a user to know other choices exist. If you wish to make a different choice then having all the alternatives available is a good thing, as often whichever alternative is "best" depends upon your circumstances. The default algorithm has been altered in the past, and I expect will be again. However given the possibility of a fault causing congestion collapse, all maintainers of widely-used operating systems take pains to ensure that algorithms and their implementations have proven themselves before they are used as a new default.
--------
[1] "all", as far as patent law allows.
BBR congestion control
Posted Sep 22, 2016 10:38 UTC (Thu) by Sesse (subscriber, #53779) [Link]
Think of it this way: BBR is for your endpoints. CoDel is for every router along the way. (The waters get slightly muddied if your server doubles as a router, though.)
BBR congestion control
Posted Sep 22, 2016 20:43 UTC (Thu) by mtaht (guest, #11087) [Link]
Given BBR's characteristics, the "FQ" portion helps most with legacy CC mechanisms, but I did just run a long string of tests showing the BBR + fq_codel is very good, much better than either alone.
I'll try to write up those results in the next week or so, but the
flent dataset is here: http://blog.cerowrt.org/flent/bbr-comprehensive.tgz if you want to browse and compare.
BBR congestion control
Posted Sep 22, 2016 23:46 UTC (Thu) by Sesse (subscriber, #53779) [Link]
In a sense, it's a bit icky that sch_fq uses the qdisc hooks to do TCP pacing, but given how many failed attempts there were at implementing paced TCP in Linux before Eric Dumazet came along and just did it, and how amazingly well it works (even without BBR, it's a great win), I'll survive that it takes up the “slot” from fq_codel. (Also, it's a bit confusing that it's called sch_fq; it should really be called sch_packet_pacing, because I can't imagine another reason to run it.)
BBR congestion control
Posted Sep 21, 2016 19:34 UTC (Wed) by flussence (subscriber, #85566) [Link]
I hope this is as easy to set up as the CDG & fq-codel I've been using (to great effect). Tweaking one sysctl and a few kconfig options is fine, but every time I try to get into the tc manpages my eyes glaze over.
BBR congestion control
Posted Sep 22, 2016 2:13 UTC (Thu) by mtaht (guest, #11087) [Link]
BBR congestion control
Posted Sep 24, 2016 21:23 UTC (Sat) by hechacker1 (guest, #82466) [Link]
It was a good post. I hoping to see it in ubuntu server soon since I use a squid proxy for my own network.
Where is one supposed to run this?
Posted Sep 22, 2016 12:39 UTC (Thu) by giggls (subscriber, #48434) [Link]
My (Linux based) Router or Desktop?
Sven
Where is one supposed to run this?
Posted Sep 22, 2016 15:17 UTC (Thu) by blitzkrieg3 (guest, #57873) [Link]
Where is one supposed to run this?
Posted Sep 28, 2016 0:22 UTC (Wed) by gdt (subscriber, #6284) [Link]
Recall that it is the transmitter that makes the decisions in TCP congestion control, the receiver simply Acks. Where the handset or Chromebook is predominately a consumer of data then its choice of TCP congestion control algorithm isn't going to make a difference.
Where is one supposed to run this?
Posted Oct 2, 2016 4:23 UTC (Sun) by marcH (subscriber, #57642) [Link]
Where is one supposed to run this?
Posted Sep 22, 2016 15:28 UTC (Thu) by farnz (subscriber, #17727) [Link]
Any TCP endpoints - desktops, servers etc. Routers don't need to if they're just routing, but they'll benefit for things like SSH.
BBR congestion control
Posted Sep 23, 2016 2:03 UTC (Fri) by jcm (subscriber, #18262) [Link]
Too often, I see problems in this world being solved by cleaver technical solutions because "utopia will never happen, so we'll just do this...". It might take a decade to provide a nicer solution that works at internet scale, which is all the more reason to get that moving now. And it's amazing how you can force people to use standards if you get big enough players pushing it through to help.
BBR congestion control
Posted Sep 23, 2016 9:22 UTC (Fri) by diegor (subscriber, #1967) [Link]
ATM was pushed by many big telecomunication company, but at the end failed. Without router pre allocating bandwith, there is no real improvements in estimate "bandwith information".
Note that in a resiliant network, where every packet of a connection can be routed in a different way, it's just delusional to think that you can better track bandwith allocation for every connection on router, respect to what you can do on the endpoint.
Every complex problem, have a nice solution, easy to understand, that doesn't work.
BBR congestion control
Posted Sep 23, 2016 9:37 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
I worked for a telecom that tried to deploy it. It actually looked awesome when it worked - you could get a channel across a network with guaranteed throughput and no jitter. And because telecoms usually build networks in rings, it could even survive individual fiber cuts without losing a single packet.
The catch phrase, of course, "when it worked". RSVP-capable hardware cost a LOT more than regular one and when it breaks finding the cause often involved restarting large parts of the network.
Eventually it was abandoned and they simply overprovisioned networks, segregating purely data flows into low-priority buckets.
BBR congestion control
Posted Sep 24, 2016 17:13 UTC (Sat) by shemminger (subscriber, #5739) [Link]
Only in restricted research networks and closed networks has it ever been possible. Even then the dynamic nature of available bandwidth means that any data given to end points would be out of date.
BBR congestion control
Posted Sep 25, 2016 1:59 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]
Something like ICMP "Packet Too Big" message, but for bandwidth. So if a router sees an "Advised Throughput Capacity" message with a value greater than it can sustain, it re-sends the message with a lower value. Kinda like extended ECN.
Of course, endpoints will be free to ignore this hint.
BBR congestion control
Posted Sep 29, 2016 5:10 UTC (Thu) by mtaht (guest, #11087) [Link]
BBR congestion control
Posted Sep 29, 2016 9:13 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
A semi-authenticated ICMP message that is treated as a hint might be a much better idea now.
BBR congestion control
Posted Sep 29, 2016 19:34 UTC (Thu) by mtaht (guest, #11087) [Link]
"Widespread deployment of ICMP filtering makes it impossible to
rely on ICMP Source Quench messages for congestion control."
BBR congestion control
Posted Oct 1, 2016 0:27 UTC (Sat) by simoncion (guest, #41674) [Link]
Out-competed by more aggressive algorithms?
Posted Sep 23, 2016 10:43 UTC (Fri) by runekock (subscriber, #50229) [Link]
Out-competed by more aggressive algorithms?
Posted Sep 24, 2016 12:51 UTC (Sat) by mtaht (guest, #11087) [Link]
Out-competed by more aggressive algorithms?
Posted Sep 25, 2016 0:44 UTC (Sun) by giraffedata (guest, #1954) [Link]
No, BBR crushes cubic in its earlier phases, in my preliminary tests.
I'm not sure what you mean by this. Do you mean a node gets more bandwidth if it uses BBR than if it uses CUBIC, in a typical network? Or that in a network in which one node uses BBR and another user CUBIC, the BBR gets more bandwidth? Or something else?
The actual question is about a node using BBR in a network in which other nodes use an older, more aggressive policy. That may be a moot question. The older policies one finds in a network aren't more aggressive. CUBIC in particular is not terribly aggressive. Even older policies do things like cut their sending rate by half when they see the first dropped packet, which is not aggressive at all.
Out-competed by more aggressive algorithms?
Posted Sep 25, 2016 2:20 UTC (Sun) by flussence (subscriber, #85566) [Link]
Cubic is aggressive in the sense that it has permanent acceleration, and the only thing keeping its velocity in check is a simple negative feedback loop: always trying to stuff more data into the network, until *after* the network itself becomes congested and starts rejecting (or dropping) packets.
Due to that design it spends most of its time significantly under maximum capacity, ramping up until a congestion spike hits. And due to *that* effect being in the wild for so long, network middleboxes have evolved to buffer tons of data even when they can't forward it right away, because a higher initial rate spike for a short-lived connection (e.g. HTTP) would be perceived as "faster".
BBR uses a lot of cleverness to guess the network path's effective bandwidth based on ACK delays and then self-regulates slightly below that, updating its estimate over time. If a Cubic stream is sharing the same pipe, it'll still be oscillating between over- and under-loaded, but because it's mostly under, the BBR stream will see "spare" capacity and gradually eat the difference until it has significantly more than 50% of the total throughput.
Out-competed by more aggressive algorithms?
Posted Sep 25, 2016 4:29 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]
Out-competed by more aggressive algorithms?
Posted Sep 25, 2016 18:55 UTC (Sun) by mtaht (guest, #11087) [Link]
That said, BBR and cubic CAN co-habitate in a conventional drop tail system - this plot - showing a BBR, cubic, BBR, and cubic flow, starting 3 seconds apart, in sequence, shows that.
http://blog.cerowrt.org/flent/bbr-comprehensive/latecomer...
(interpretation, I welcome corrections!)
The first BBR flow gets a reasonable RTT estimate, and when cubic hits the link, BBR falls off to what it thinks is a reasonably "fair share" of the link, while cubic grabs what it can, and then gets crushed by the second BBR flow entering the link (which does not get an accurate early RTT estimate) - BUT - eventually, it does, and all flows of both CCs eventually achieve something close to parity.
There's a lot of carnage going on in the background to get to this point - and this is a nasty test, intentionally so! And I did most of the data collection in the wee hours of the morning (and in the interest of science made all my data public)
I keep the blog and a goodly portion of the flent data (including, rarely, packet captures) on github. I track queue depth, emulate several RTTs, multiple kinds of devices (notably cable modems), and also put multiple AQM and fq technologies in the path as well.
https://github.com/dtaht/blog-cerowrt
I think - based on what I just seen - I need to go add conventional policers to the test matrix. Sigh. And then theres, you know, making wifi fast, which is what I should be finishing up instead of going gaga over BBR.
data rate, not bandwidth
Posted Sep 25, 2016 0:57 UTC (Sun) by giraffedata (guest, #1954) [Link]
I had a hard time understanding this at first because of the incorrect use of the word "bandwidth." Bandwidth is capacity, so phrases such as, "the actual bandwidth of data delivered to the far end" don't make sense.
It's supposed to say, "the actual rate of data delivered to the far end," and BBR is simply about comparing the rate of data delivered to the far end to the rate sent from the near end, and when you've raised the latter to the point that it's greater than the former, you've discovered the bandwidth of the connection.
To be even more clear, I would say "the actual rate of data arriving at the far end."
data rate, not bandwidth
Posted Oct 2, 2016 4:26 UTC (Sun) by marcH (subscriber, #57642) [Link]
"Throughput" is the shorter word you're looking for.
data rate, not bandwidth
Posted Oct 5, 2016 18:03 UTC (Wed) by giraffedata (guest, #1954) [Link]
"Throughput" is the shorter word you're looking for.
In this case, I think "throughput" would have been almost as confusing as bandwidth, because we're dealing with the special case that the data rate into the pipe is greater than the data rate out of it (because of that meddlesome buffering by routers). I think of throughput as a steady state thing covering the whole pipe.
BBR congestion control
Posted Sep 26, 2016 12:20 UTC (Mon) by anton (subscriber, #25547) [Link]
Net2o also estimates the available bandwidth through the round-trip time, as described in these Slides from 2012. It would be interesting to see a more detailed comparison between BBR and net2o flow control.
BBR congestion control
Posted Sep 29, 2016 18:45 UTC (Thu) by forthy (guest, #1525) [Link]
Note that net2o can do a lot of things TCP can't, because it is redesigned from scratch, and therefore the ack packets can contain more information than a TCP ack.
BBR congestion control
Posted Sep 27, 2016 9:45 UTC (Tue) by nnovoice (guest, #111400) [Link]
Also, http://blog.cerowrt.org/post/bbrs_basic_beauty/ link says better bandwidth and sawtooth is also dead, which is really good if it happens. Itching to try BBR, but have to go from 4.1 to 4.9. Long way to go!
BBR congestion control
Posted Sep 30, 2016 5:51 UTC (Fri) by pjardetzky (guest, #111456) [Link]
BBR congestion control
Posted Dec 17, 2016 0:04 UTC (Sat) by dps (guest, #5725) [Link]
BBR seems to be using some similar techniques to tools whih measured rsponse tines for two back to back ICMP echo request packets, some which claimed good results. Very asymmetric links and 100:1 contention has made the problem harder.
BBR congestion control
Posted Aug 17, 2017 13:18 UTC (Thu) by plasma-tiger (guest, #115599) [Link]
BBR congestion control
Posted Aug 19, 2017 16:30 UTC (Sat) by flussence (subscriber, #85566) [Link]
BBR congestion control
Posted Jan 6, 2019 21:26 UTC (Sun) by bircoph (guest, #117170) [Link]
Tests were made on two endpoints (desktop and laptop) connected via openvpn.
With Cubic on sending side I have ~3 MB/s initiall which steadily rises to ~5 MB/s and holds there. 5 MB/s hits a CPU limit on laptop due to sophisticated encryption used.
With BBR I have initial speed ~5MB/s which drops steadily to ~400 КB/s and holds there.
So maybe BBR is good for specific datacenter setups or lab environment, but it is a failure for real life end-user hardware on common wired or wireless networks. At least for now. Maybe some bug in the kernel?
BBR congestion control
Posted Jan 6, 2019 22:43 UTC (Sun) by zlynx (guest, #2285) [Link]
But today I needed to copy a disk image around anyway so I tried it from my laptop over WiFi to my NAS. Both are running Fedora 29, both with BBR enabled.
Something you should know about FQ though. The WiFi system has dropped that. It no longer uses qdiscs at all. It's internal to the WiFi system and is a fq_codel variant designed for WiFi. It is supposed to work the same as FQ for BBR, however.
Doing my transfer using rsync with --progress it averaged right around 20 MB/s aka just over 200 Mbps. The speed went up and down but never dropped to the floor as in your example.
One thing I just thought of though. You are definitely not using OpenVPN in TCP mode, right? Using any VPN through a TCP tunnel is a horrible idea, and especially bad on WiFi. You get all of TCP's problems compounded as packet losses and delay cause both TCP sessions to react, overreact, and miserably fail.
BBR congestion control
Posted Jan 7, 2019 16:36 UTC (Mon) by bircoph (guest, #117170) [Link]
I tested OpenVPN in exactly the same environment with 4 different TCP congestion control algorithms on sender's side: RENO, BIC, CUBIC and BBR. (I also tested changing algo on receiver's side, but this changes almost nothing.) First three of them behave rather alike: RENO, BIC and CUBIC achieve 4.6-5.0 MB/s (with CUBIC being the best of three), but BBR provides a steady drop to ~400 KB/s. This is absolutely unacceptable. And this is unlikely to be an application problem, since it works well with other congestion control algorithms. So something is very wrong with BBR or its implementation.
> One thing I just thought of though. You are definitely not using OpenVPN in TCP mode, right?
Of course I'm using TCP. It is pointless to test TCP congestion control with an UDP application. Why I'm using OpenVPN over TCP? Because that's how the server is configured and I can't change that: both endpoints are under my control, but the server is not. That's the reality I have to face and work with.
BBR congestion control
Posted Jan 7, 2019 16:54 UTC (Mon) by zlynx (guest, #2285) [Link]
Use TCP sessions inside of a UDP OpenVPN tunnel.
Proxy tunnels like SSH and SOCKS are different because they do not tunnel the actual TCP packets. Proxies unwrap the TCP sessions and build new ones.