exponential rate limiting

Following my previous post on rate limiting with GCRA, leaky buckets without the buckets, I reviewed my old notes on rate limiting for Exim. I thought I should do a new write-up of the ideas that I hope will be more broadly interesting.

Exponential rate limiting uses an exponentially-weighted moving average to measure the client’s rate. It is motivated by a shift of perspective:

first measure the client’s rate,
then compare it to the limit.

Algorithms like GCRA and leaky bucket don’t allow you to separate these two points because they don’t measure the client’s rate as a concrete number.

A moving average allows more flexible policy enforcement because the rate measurement is meaningful even when you don’t apply back-pressure. For example, it’s useful in a dry run mode, or when diverting messages to a quarantine.

An exponential rate limiter stores, for each client:

last update time
average rate

This is a similar amount of space as leaky bucket. GCRA uses less space because it only needs to store a time.

The main disadvantage is that an exponential rate limiter needs fairly complicated floating point arithmetic.

configuration parameters
algorithm
behaviour
enforcement
rationale
discussion

configuration parameters

A rate limiter can have three configuration parameters:

a maximum rate
a burst size that allows a sender to briefly exceed the rate limit
an averaging period that determines how quickly past behaviour is forgotten

In a linear rate limiter like GCRA or leaky bucket, the period is fixed as burst / rate owing to the way the model works.

An exponential rate limiter has two parameters:

a limit which is also the maximum burst size
an averaging period

The maximum rate is limit / period.

For example, I might set limit = 600 requests per period = 1 hour. If I want to allow the same long-term average rate, but with a smaller burst size, I might set limit = 10 requests per period = 1 minute.

Deriving the max rate from the other two parameters makes the algorithm easy to configure, and it turns out to simplify the mathematical model very nicely.

algorithm

each client has a stored update time and rate
a request has a cost, which is typically 1 for fixed-cost requests, or (for example) the request size in bytes when limiting bandwidth

when a request arrives, get the client’s details

t_prev = client ? client.time : 0
r_prev = client ? client.rate : 0

calculate the interval since the previous request, relative to the averaging period
```
interval = (t_now - t_prev) / period
```
clamp the interval to avoid division by zero
```
interval = max(interval, 1.0e-10)
```
the exponential smoothing weight is explained below
```
alpha = exp(-interval)
```
the instantaneous rate, measured in cost per period
```
r_inst = cost / interval
```

the updated average rate is

r_now = (1 - alpha) * r_inst + alpha * r_prev

ensure rare requests are counted in full
```
r_now = max(r_now, cost)
```

behaviour

When a client starts making requests very fast, its average rate (r_prev and r_now) increases by close to the cost each time, so it will hit the limit after close to limit / cost requests.

When the client’s rate is more modest, or closer to its measured average, the average changes by a smaller amount for each request.

When the client slows down, its measured rate decays exponentially towards the new level.

enforcement

When a client exceeds its limit, how long must it wait before it can try again and its request will be allowed?

    t_next = t_now + period * ln(r_now / limit)

The decision to allow or deny a request is separate from calculating the client’s average rate. It will typically look like,

    if r_now > limit:
        return DENY(t_next)

    client.time = t_now
    client.rate = r_now
    return ALLOW

This implements a “leaky” policy that measures the rate of requests that are allowed, without increasing the client’s rate for requests that are denied. This is usually the right policy when DENY causes backpressure and clients are expected to retry denied requests.

You can implement a “strict” policy by updating the client’s stored rate for both denied and allowed requests. A “strict” policy is often appropriate when there is no backpressure. I used it when quarantining email messages from end-users whose accounts might have been compromised to send spam, or who might have been sending a quarterly newsletter.

rationale

The next few subsections explain how the algorithm works in more detail. You don’t need to read them to successfully use exponential rate limiting.

very slow clients

When a client returns after a long gap, the interval is very large, which means alpha is small, and r_inst is small. As a result r_now becomes very small.

This is unhelpful in practice: it effectively means the client’s first request is not counted. A more useful way to handle an isolated request is to say its rate is the cost of the request per the period. That way it gets treated like the first request of a fast burst.

The algorithm implements this logic by ensuring that the average rate is at least as large as the cost of the current request.

varying intervals

Where does the exponential smoothing weight exp(-interval) come from in the algorithm above?

We are using the usual formula for an exponentially weighted moving average,

    r_now = (1 - alpha) * r_inst + alpha * r_prev

Moving averages are commonly calculated over fixed-size intervals, so typically alpha is also fixed. The subexpression alpha * r_prev says how much to forget past behaviour. Each time a fixed-size interval passes, the old rate gets multiplied by alpha again: that is, the forgetfulness scales exponentially with time.

In our scenario, we want to update our measurement of the rate of requests each time a request occurs, at irregular and unpredictable intervals. So our alpha must vary exponentially with the interval. We derive it using the time since the previous request as a power of some base.

    t_delta = t_now - t_prev

    alpha = pow(base, t_delta)

We set the base using the configured averaging period. I previously said somewhat vaguely that the period determines how quickly past behaviour is forgotten. In an exponential rate limiter it is the time for 63% forgetfulness.

    pow(base, period) == exp(-1.0)

    exp(period * ln(base)) == exp(-1.0)

    ln(base) == -1.0 / period

    pow(base, t_delta) == exp(t_delta * ln(base))

Therefore,

    alpha = exp(-t_delta / period)

penalty time

When a client exceeds its limit, it must wait for some time doing nothing before its request will be allowed. The wait is derived as follows:

    limit == (1 - alpha) * 0 + alpha * r_now

    limit / r_now == exp(-wait)

    ln(limit / r_now) == -wait

    wait = ln(r_now / limit)

This wait is relative to the averaging period, so it gets multiplied by the period to calculate the next permitted time.

    t_next = t_now + period * wait

fast bursts

Basing the forgetfulness on e seems somewhat arbitrary: why not make the forgetfulness 50% (a half life) or 90% instead of 63%?

Another seemingly arbitrary choice is to measure rates in cost per period instead of per second.

It turns out that these choices fit together neatly so that fast requests are counted at their full cost, so a client will hit its limit when expected.

The updated rate is calculated as

    r_now = (1 - alpha) * r_inst + alpha * r_prev

When the interval is very small, alpha is very nearly 1.0, and as a result the calculation turns out to be counting approximately linearly towards the limit

    r_now = cost + r_prev

The second subexpression is obvious but the first one is surprising!

Let’s unpack it.

        (1 - alpha) * r_inst
     == (1 - exp(-interval)) * cost / interval

Factor out the cost; the surprise is that

   1 ≈≈ (1 - exp(-interval)) / interval

This property comes from the fact that the gradient of e^x is 1 when x is 0. To show why this is so, I need some basic calculus:

y ≡ f(x) ≡ e^x
δy ≡ f(x + δx) − f(x)
δx ≡ −interval

So, when x is zero and δx is small,

(1 − e^δx) / −δx
= (e^δx − 1) / δx
= (e^{0 + δx} − e⁰) / δx
= ( f(0 + δx) − f(0) ) / δx
= δy / δx
≈ dy / dx
= e^x
= 1

discussion

I originally developed this algorithm in 2005-2006 to throttle outgoing email spam floods. It turned out to be easy to use and produced results that made sense. (More so after I added the slow client tweak!)

I have gone into some detail to explain how the algorithm is derived and how it behaves. It was years before I understood some of the mathematics properly, because I accidentally landed on a sweet spot through some combination of luck and applied mathematical supersition – 1/e is more beautiful than 1/2 or 1/10, right?

I don’t know if I reinvented exponential rate limiting, or if there are other similar algorithms out there. When I was working on it I was not able to find much literature on exponentially weighted moving averages with varying intervals, so I laboriously worked it out for myself.

I would love to hear about any other uses of exponential rate limiting!

	(1 − e^δx) / −δx
=	(e^δx − 1) / δx
=	(e^{0 + δx} − e⁰) / δx
=	( f(0 + δx) − f(0) ) / δx
=	δy / δx
≈	dy / dx
=	e^x
=	1