a few notes on ratelimiting

Last year I wrote a pair of articles about ratelimiting:

Recently, Chris “cks” Siebenmann has been working on ratelimiting HTTP bots that are hammering his blog. His articles prompted me to write some clarifications, plus a few practical anecdotes about ratelimiting email.

mea culpa

The main reason I wrote the GCRA article was to explain GCRA better without the standard obfuscatory terminology, and to compare GCRA with a non-stupid version of the leaky bucket algorithm. It wasn’t written with my old exponential ratelimiting in mind, so I didn’t match up the vocabulary.

In the exponential ratelimiting article I tried to explain how the different terms correspond to the same ideas, but I botched it by trying to be too abstract. So let’s try again.

parameters

It’s simplest to configure these ratelimiters (leaky bucket, GCRA, exponential) with two parameters:

limit
period

The maximum permitted average rate is calculated from these parameters by dividing them:

rate = limit / period

The period is the time over which client behaviour is averaged, which is also how long it takes for the ratelimiter to forget past behaviour. In my GCRA article I called it the window. Linear ratelimiters (leaky bucket and GCRA) are 100% forgetful after one period; the exponential ratelimiter is 67% forgetful.

The limit does double duty: as well as setting the maximum average rate (measured in requests per period) it sets the maximum size (measured in requests) of a fast burst of requests following a sufficiently long quiet gap.

how bursty

You can increase or decrease the burst limit – while keeping the average rate limit the same – by increasing or decreasing both the limit and the period.

For example, I might set limit = 600 requests per period = 1 hour. If I want to allow the same average rate, but with a smaller burst size, I might set limit = 10 requests per period = 1 minute.

anecdote

When I was looking after email servers, I set ratelimits for departmental mail servers to catch outgoing spam in case of compromised mail accounts or web servers. I sized these limits to a small multiple of the normal traffic so that legitimate mail was not delayed but spam could be stopped fairly quickly.

A typical setting was 200/hour, which is enough for a medium-sized department. (As a rule of thumb, expect people to send about 10 messages per day.)

An hourly limit is effective at catching problems quickly during typical working hours, but it can let out a lot of spam over a weekend.

So I would also set a second backstop limit like 1000/day, based on average daily traffic instead of peak hourly traffic. It’s a lower average rate that doesn’t forget bad behaviour so quickly, both of which help with weekend spam.

variable cost

Requests are not always fixed-cost. For example, you might want to count the request size in bytes when ratelimiting bandwidth.

The exponential algorithm calculates the instantaneous rate as

    r_inst = cost / interval

where cost is the size of the request and interval is the time since the previous request.

I’ve edited my GCRA algorithm to make the cost of requests more clear. In GCRA a request uses up some part of the client’s window, a nominal time measured in seconds. To convert a request’s size into time spent:

    spend = cost / rate

So the client’s earliest permitted time should be updated like:

    time += cost * period / limit

(In constrained implementations the period / limit factor can be precomputed.)

how lenient

When a client has used up its burst limit and is persistently making requests faster than its rate limit, the way the ratelimiter accepts or rejects requests is affected by the way it updates its memory of the client.

For exim’s ratelimit feature I provided two modes called “strict” and “leaky”. There is a third possibility: an intermediate mode which I will call “forgiving”.

The “leaky” mode is most lenient. An over-limit client will have occasional requests accepted at the maximum permitted rate. The rest of its requests will be rejected.

When a request is accepted, all of the client’s state is updated; when a request is rejected, the client’s state is left unchanged.

The lenient leaky mode works for both GCRA and exponential ratelimiting.
In “forgiving” mode, all of a client’s requests are rejected while it is over the ratelimit. As soon as it slows down below the ratelimit its requests will start being accepted.

When a request is accepted, all of the client’s state is updated; when a request is rejected, the client’s time is updated, but (in the exponential ratelimiter) not its measured rate.

The forgiving mode works for both GCRA and exponential ratelimiting.
In “strict” mode, all of a client’s requests are rejected while it is over the ratelimit, and requests continue to be rejected after a client has slowed down depending on how fast it previously was.

When a request is accepted or rejected, both of the client’s time and measured rate are updated.

The strict mode only works for exponential ratelimiting.

I only realised yesterday, from the discussion with cks, how a “forgiving” mode can be useful for the exponential ratelimiter, and how it corresponds to the less-lenient mode of linear leaky bucket and GCRA ratelimiters. (I didn’t describe the less-lenient mode in my GCRA article.)

anecdote

One of the hardest things about ratelimiting email was coming up with a policy that didn’t cause undue strife and unnecessary work.

When other mail servers (like the departmental server in the anecdote above) were sending mail through my relays, it made sense to use “leaky” ratelimiter mode with SMTP 450 temporary rejects. When there was a flood of mail, messages would be delayed and retried automatically.

When their queue size alerts went off, the department admin could take a look and respond as appropriate.

That policy worked fairly well.

However, when the sender was an end-user sending via my MUA message submission servers, they usually were not running software that could gracefully handle an SMTP 450 temporary rejection.

The most difficult cases were the various college and department alumni offices. Many of them would send out annual newsletters, using some dire combination of Microsoft Excel / Word / Outlook mailmerge, operated by someone with limited ability to repair a software failure.

In that situation, SMTP 450 errors broke their mailshots, causing enormous problems for the alumni office and their local IT support. (Not nice to realise I caused such trouble!)

The solution was to configure the ratelimiter in “strict” mode and “freeze” or quarantine over-limit bulk mail from MUAs. The “strict” mode ensured that everything after the initial burst of a spam run was frozen.

When the alert was triggered I inspected a sample of the frozen messages. If they were legitimate newsletters, I could thaw them for delivery and reset the user’s ratelimit. In almost all cases the user would not be disturbed.

If it turned out the user’s account was compromised and used to send spam, then I could get their IT support to help sort it out, and delete the frozen junk from the quarantine.

That policy worked OK: I was the only one who had to deal with my own false positives, and they were tolerably infrequent.