In this note I’ll show why the rate limit algorithms GCRA, leaky bucket, and token bucket behave the same.
The parameters of the algorithms are a time window and a maximum quota of usage (e.g. requests or bytes) per window. The quota limits the size of a fast burst of requests. The maximum sustained rate is,
rate = quota / window
Leaky bucket and token bucket store the time of the previous request, and a bucket counting the available capacity. The rate determines how quickly capacity becomes available.
It’s easy to see that leaky bucket and token bucket are equivalent, because they simply count in opposite directions:
leaky -= (now - previous) * rate
leaky = clamp(0, leaky, quota)
leaky += cost
previous = now
return leaky < quota ? ALLOW : DENY
tokens += (now - previous) * rate
tokens = clamp(0, tokens, quota)
tokens -= cost
previous = now
return tokens > 0 ? ALLOW : DENY
GCRA tracks a “not-before” time, and allows requests that occur after that point in time. The not-before time is normally in the recent past, and requests increase it towards and possibly (when the client is over its limit) beyond the present time.
time = clamp(now - window, time, now)
time += cost / rate
return time < now ? ALLOW : DENY
It’s not trivially obvious that GCRA is equivalent to the other two. But we can convert the GCRA code into the leaky bucket code with a few transformations, as follows.
We can change from absolute time to relative time by taking now away
from the equations:
bucket = clamp(-window, time, 0)
bucket += cost / rate
return bucket < 0 ? ALLOW : DENY
But that change is incomplete: the stored bucket is relative to the
time of the previous request. To make it relative to the current time,
we need to remember when the previous request occurred, and after
retrieving the bucket we need to shift it to account for the passage
of time:
bucket -= now - previous
bucket = clamp(-window, time, 0)
bucket += cost / rate
previous = now
return bucket < 0 ? ALLOW : DENY
Now we will change units to count tokens instead of seconds. We can do
that by multiplying bucket by the rate, which is measured in
tokens per second. Note that quota = window * rate.
bucket -= (now - previous) * rate
bucket = clamp(-quota, time, 0)
bucket += cost
previous = now
return bucket < 0 ? ALLOW : DENY
Finally, add quota to bucket because positive numbers are nicer,
and we end up with the leaky bucket code.
bucket -= (now - previous) * rate
bucket = clamp(0, time, quota)
bucket += cost
previous = now
return bucket < quota ? ALLOW : DENY
Before writing this down I had a handwavey belief that these algoritms are equivalent – “it’s obvious!” enthusiatic gesturing. The tricky part was to explain the shift from relative to absolute time – why does GCRA use less storage? how does the passage of time become implicit? – so it was worth taking the time to make it as clear as I could.
The standard ITU GCRA variable names are:
-
Λ, the maximum rate, aka quota / window
-
T, the emission interval, 1 / Λ, aka window / quota
-
ta, the arrival time, aka
now -
τ, the jitter tolerance; τ + T == quota
-
TAT, the theoretical arrival time
ITU GCRA compares TAT - τ < ta where my GCRA compares
time < now
My “not-before” time is roughly TAT - τ, but ITU GCRA does the clamp and increment in a different order, so the actual not-before time is TAT - τ - T. This corresponds to the difference between τ and quota.
Tony Finch – blog