GCRA: leaky buckets without the buckets

Yesterday I read an article describing the GCRA rate limiting algorithm. I thought it was really interesting, but I wasn’t entirely satisfied with Brandur’s explanation, and the Wikipedia articles on leaky buckets and GCRA are terrible, so here’s my version.

what is GCRA?

GCRA is the “generic cell rate algorithm”, a rate-limiting algorithm that came from ATM. GCRA does the same job as the better-known leaky bucket algorithm, but using half the storage and with much less code.

GCRA only stores one time stamp per sender, whereas leaky buckets need to store a time stamp and a quota.

GCRA needs less arithmetic, and makes it trivial to tell a client how long it should wait before the next request is permitted.

rate limit parameters
dodgy metaphors
leaky bucket algorithm
eliminate the bucket
GCRA algorithm
discussion
an old mystery solved

rate limit parameters

Both algorithms have the same configuration parameters:

a maximum rate
a burst size that allows a sender to briefly exceed the rate limit

Each operation has a cost, e.g. the size of a packet if this is a network rate limiter, in which case the cost is measured in bytes and the rate is bytes per second. In simple cases we might fix the cost at 1 unit of work per request, so the rate is measured in requests per second.

dodgy metaphors

The leaky bucket algorithm is named after a questionable analogy.

Imagine you have a bucket that leaks at some rate.

Each request is represented as a jug of water; the amount of water is the cost. A request is permitted if you can pour the jug into the bucket without overflowing.

In a steady state you can pour water into the bucket at the same rate that it leaks out.

The capacity of the bucket determines the burst size: how much water you can chuck in quickly after waiting for the bucket to drain.

On its own terms this metaphor doesn’t make much sense. If the bucket is leaking, why is it bad for it to overflow? But I suppose it’s fairly closely analogous to a write buffer attached to a fixed-rate transmission line.

A better metaphor might be a battery that recharges at a fixed rate; the battery’s capacity determines the burst size, and the cost of of an operation says how much it drains the battery.

leaky bucket algorithm

each client has a stored update time and a quota

when a request arrives, get the client’s details

time = client ? client.time : now
quota = client ? client.quota : burst

calculate the bucket leakage or battery recharge
```
quota += (now - time) * rate
```
limit to the permitted burst size
```
quota = min(quota, burst)
```
pay the cost
```
quota -= cost
```
reject if the client overspent their quota; tell them when it will recover
```
if quota < 0:
    return DENY(now - quota / rate)
```

update the client if the request is permitted

client.time = now
client.quota = quota
return ALLOW

eliminate the bucket

GCRA’s key trick is to change the units so that the client’s quota is measured in units of time. Calculating bucket leakage or battery recharge becomes trivial: it is just time passing.

Re-think the bucket capacity as a sliding window around the present time during which requests are allowed. The size of the window is burst / rate seconds.

A request nominally takes cost time, which is 1 / rate for fixed-cost requests.

GCRA stores the earliest permitted time of a request, which (with the change of units) is the present time minus the quota. Thus the two leaky bucket values get replaced with one.

A client is only allowed to make a request late (after the earliest permitted time) so the sliding window covers the recent past. We don’t let the window reach into the future to protect against clock resets.

When a client is bursting, its earliest permitted time increases faster than real time and will catch up with the present time.

When a client goes over quota, its earliest permitted time is in the future, so its requests get denied.

GCRA algorithm

when a request arrives, get the client’s earliest permitted time
```
time = client ? client.time : 0
```
ensure it is within the sliding window
```
time = clamp(now - window, time, now)
```
pay the cost
```
time += cost
```
tell the user when they can try again if they are too soon
```
if now < time:
    return DENY(time)
```
update the client’s time if the request is permitted
```
client.time = time
return ALLOW
```

discussion

Leaky bucket is a pretty simple algorithm, but GCRA is even simpler. It requires almost no calculation – a big advantage for something intended for implementation in hardware in the 1990s!

I have not tried to match my variable names to the standard GCRA terminology. To be honest, I find it too confusing and I would probably get it wrong. (Why do they have so many different Ts?!)

an old mystery solved

When I was working on rate limiting for Exim, I used an exponentially weighted moving average to calculate a client’s smoothed sending rate. That algorithm stores an update time and a measured rate for each client – the same amount of storage as traditional leaky bucket.

The nice thing about the moving average was that it allowed more flexible policy enforcement than leaky bucket, because the rate measurement was meaningful even when it didn’t apply back-pressure. For example, a dry run mode, or diverting messages to a quarantine.

But it was strange that moving averages packed more information into the same space as leaky buckets. Back then I didn’t really understand how the space was being wasted. Now I know!