GCRA vs leaky / token buckets

In this note I’ll show why the rate limit algorithms GCRA, leaky bucket, and token bucket behave the same.

The parameters of the algorithms are a time window and a maximum quota of usage (e.g. requests or bytes) per window. The quota limits the size of a fast burst of requests. The maximum sustained rate is,

    rate = quota / window

Leaky bucket and token bucket store the time of the previous request, and a bucket counting the available capacity. The rate determines how quickly capacity becomes available.

It’s easy to see that leaky bucket and token bucket are equivalent, because they simply count in opposite directions:

    leaky -= (now - previous) * rate
    leaky = clamp(0, leaky, quota)
    leaky += cost
    previous = now
    return leaky < quota ? ALLOW : DENY

    tokens += (now - previous) * rate
    tokens = clamp(0, tokens, quota)
    tokens -= cost
    previous = now
    return tokens > 0 ? ALLOW : DENY

GCRA tracks a “not-before” time, and allows requests that occur after that point in time. The not-before time is normally in the recent past, and requests increase it towards and possibly (when the client is over its limit) beyond the present time.

    time = clamp(now - window, time, now)
    time += cost / rate
    return time < now ? ALLOW : DENY

It’s not trivially obvious that GCRA is equivalent to the other two. But we can convert the GCRA code into the leaky bucket code with a few transformations, as follows.

We can change from absolute time to relative time by taking now away from the equations:

    bucket = clamp(-window, time, 0)
    bucket += cost / rate
    return bucket < 0 ? ALLOW : DENY

But that change is incomplete: the stored bucket is relative to the time of the previous request. To make it relative to the current time, we need to remember when the previous request occurred, and after retrieving the bucket we need to shift it to account for the passage of time:

    bucket -= now - previous
    bucket = clamp(-window, time, 0)
    bucket += cost / rate
    previous = now
    return bucket < 0 ? ALLOW : DENY

Now we will change units to count tokens instead of seconds. We can do that by multiplying bucket by the rate, which is measured in tokens per second. Note that quota = window * rate.

    bucket -= (now - previous) * rate
    bucket = clamp(-quota, time, 0)
    bucket += cost
    previous = now
    return bucket < 0 ? ALLOW : DENY

Finally, add quota to bucket because positive numbers are nicer, and we end up with the leaky bucket code.

    bucket -= (now - previous) * rate
    bucket = clamp(0, time, quota)
    bucket += cost
    previous = now
    return bucket < quota ? ALLOW : DENY

Before writing this down I had a handwavey belief that these algoritms are equivalent – “it’s obvious!” enthusiatic gesturing. The tricky part was to explain the shift from relative to absolute time – why does GCRA use less storage? how does the passage of time become implicit? – so it was worth taking the time to make it as clear as I could.

The standard ITU GCRA variable names are:

Λ, the maximum rate, aka quota / window
T, the emission interval, 1 / Λ, aka window / quota
t_a, the arrival time, aka now
τ, the jitter tolerance; τ + T == quota
TAT, the theoretical arrival time

ITU GCRA compares TAT - τ < t_a where my GCRA compares time < now

My “not-before” time is roughly TAT - τ, but ITU GCRA does the clamp and increment in a different order, so the actual not-before time is TAT - τ - T. This corresponds to the difference between τ and quota.