Following my previous post on rate limiting with GCRA, leaky buckets without the buckets, I reviewed my old notes on rate limiting for Exim. I thought I should do a new write-up of the ideas that I hope will be more broadly interesting.
Exponential rate limiting uses an exponentially-weighted moving average to measure the client’s rate. It is motivated by a shift of perspective:
- first measure the client’s rate,
- then compare it to the limit.
Algorithms like GCRA and leaky bucket don’t allow you to separate these two points because they don’t measure the client’s rate as a concrete number.
A moving average allows more flexible policy enforcement because the rate measurement is meaningful even when you don’t apply back-pressure. For example, it’s useful in a dry run mode, or when diverting messages to a quarantine.
An exponential rate limiter stores, for each client:
- last update time
- average rate
This is a similar amount of space as leaky bucket. GCRA uses less space because it only needs to store a time.
The main disadvantage is that an exponential rate limiter needs fairly complicated floating point arithmetic.
configuration parameters
A rate limiter can have three configuration parameters:
- a maximum rate
- a burst size that allows a sender to briefly exceed the rate limit
- an averaging period that determines how quickly past behaviour is forgotten
In a linear rate limiter like GCRA or leaky bucket, the period is fixed as burst / rate owing to the way the model works.
An exponential rate limiter has two parameters:
- a limit which is also the maximum burst size
- an averaging period
The maximum rate is limit / period.
For example, I might set limit = 600 requests per period = 1 hour. If I want to allow the same long-term average rate, but with a smaller burst size, I might set limit = 10 requests per period = 1 minute.
Deriving the max rate from the other two parameters makes the algorithm easy to configure, and it turns out to simplify the mathematical model very nicely.
algorithm
-
each client has a stored update time and rate
-
a request has a cost, which is typically 1 for fixed-cost requests, or (for example) the request size in bytes when limiting bandwidth
-
when a request arrives, get the client’s details
t_prev = client ? client.time : 0 r_prev = client ? client.rate : 0
-
calculate the interval since the previous request, relative to the averaging period
interval = (t_now - t_prev) / period
-
clamp the interval to avoid division by zero
interval = max(interval, 1.0e-10)
-
the exponential smoothing weight is explained below
alpha = exp(-interval)
-
the instantaneous rate, measured in cost per period
r_inst = cost / interval
-
the updated average rate is
r_now = (1 - alpha) * r_inst + alpha * r_prev
-
ensure rare requests are counted in full
r_now = max(r_now, cost)
behaviour
When a client starts making requests very fast, its average rate (r_prev and r_now) increases by close to the cost each time, so it will hit the limit after close to limit / cost requests.
When the client’s rate is more modest, or closer to its measured average, the average changes by a smaller amount for each request.
When the client slows down, its measured rate decays exponentially towards the new level.
enforcement
When a client exceeds its limit, how long must it wait before it can try again and its request will be allowed?
t_next = t_now + period * ln(r_now / limit)
The decision to allow or deny a request is separate from calculating the client’s average rate. It will typically look like,
if r_now > limit:
return DENY(t_next)
client.time = t_now
client.rate = r_now
return ALLOW
This implements a “leaky” policy that measures the rate of requests
that are allowed, without increasing the client’s rate for requests
that are denied. This is usually the right policy when DENY
causes
backpressure and clients are expected to retry denied requests.
You can implement a “strict” policy by updating the client’s stored rate for both denied and allowed requests. A “strict” policy is often appropriate when there is no backpressure. I used it when quarantining email messages from end-users whose accounts might have been compromised to send spam, or who might have been sending a quarterly newsletter.
rationale
The next few subsections explain how the algorithm works in more detail. You don’t need to read them to successfully use exponential rate limiting.
very slow clients
When a client returns after a long gap, the interval is very large, which means alpha is small, and r_inst is small. As a result r_now becomes very small.
This is unhelpful in practice: it effectively means the client’s first request is not counted. A more useful way to handle an isolated request is to say its rate is the cost of the request per the period. That way it gets treated like the first request of a fast burst.
The algorithm implements this logic by ensuring that the average rate is at least as large as the cost of the current request.
varying intervals
Where does the exponential smoothing weight exp(-interval)
come from
in the algorithm above?
We are using the usual formula for an exponentially weighted moving average,
r_now = (1 - alpha) * r_inst + alpha * r_prev
Moving averages are commonly calculated over fixed-size intervals, so
typically alpha is also fixed. The subexpression alpha * r_prev
says how much to forget past behaviour. Each time a fixed-size
interval passes, the old rate gets multiplied by alpha again: that
is, the forgetfulness scales exponentially with time.
In our scenario, we want to update our measurement of the rate of requests each time a request occurs, at irregular and unpredictable intervals. So our alpha must vary exponentially with the interval. We derive it using the time since the previous request as a power of some base.
t_delta = t_now - t_prev
alpha = pow(base, t_delta)
We set the base using the configured averaging period. I previously said somewhat vaguely that the period determines how quickly past behaviour is forgotten. In an exponential rate limiter it is the time for 63% forgetfulness.
pow(base, period) == exp(-1.0)
exp(period * ln(base)) == exp(-1.0)
ln(base) == -1.0 / period
pow(base, t_delta) == exp(t_delta * ln(base))
Therefore,
alpha = exp(-t_delta / period)
penalty time
When a client exceeds its limit, it must wait for some time doing nothing before its request will be allowed. The wait is derived as follows:
limit == (1 - alpha) * 0 + alpha * r_now
limit / r_now == exp(-wait)
ln(limit / r_now) == -wait
wait = ln(r_now / limit)
This wait is relative to the averaging period, so it gets multiplied by the period to calculate the next permitted time.
t_next = t_now + period * wait
fast bursts
Basing the forgetfulness on e seems somewhat arbitrary: why not make the forgetfulness 50% (a half life) or 90% instead of 63%?
Another seemingly arbitrary choice is to measure rates in cost per period instead of per second.
It turns out that these choices fit together neatly so that fast requests are counted at their full cost, so a client will hit its limit when expected.
The updated rate is calculated as
r_now = (1 - alpha) * r_inst + alpha * r_prev
When the interval is very small, alpha is very nearly 1.0, and as a result the calculation turns out to be counting approximately linearly towards the limit
r_now = cost + r_prev
The second subexpression is obvious but the first one is surprising!
Let’s unpack it.
(1 - alpha) * r_inst
== (1 - exp(-interval)) * cost / interval
Factor out the cost; the surprise is that
1 ≈≈ (1 - exp(-interval)) / interval
This property comes from the fact that the gradient of ex is 1 when x is 0. To show why this is so, I need some basic calculus:
y ≡ f(x) ≡ ex
δy ≡ f(x + δx) − f(x)
δx ≡ −interval
So, when x is zero and δx is small,
(1 − eδx) / −δx = (eδx − 1) / δx = (e0 + δx − e0) / δx = ( f(0 + δx) − f(0) ) / δx = δy / δx ≈ dy / dx = ex = 1
discussion
I originally developed this algorithm in 2005-2006 to throttle outgoing email spam floods. It turned out to be easy to use and produced results that made sense. (More so after I added the slow client tweak!)
I have gone into some detail to explain how the algorithm is derived and how it behaves. It was years before I understood some of the mathematics properly, because I accidentally landed on a sweet spot through some combination of luck and applied mathematical supersition – 1/e is more beautiful than 1/2 or 1/10, right?
I don’t know if I reinvented exponential rate limiting, or if there are other similar algorithms out there. When I was working on it I was not able to find much literature on exponentially weighted moving averages with varying intervals, so I laboriously worked it out for myself.
I would love to hear about any other uses of exponential rate limiting!