NNRPD Exponential Backoff

This was my first solution to the problem of "resource abuse". Resource abuse is when one or more users decide to post an order of magnitude or more articles than the average poster. Classic examples of resource abusers are "spammers".

You may find a tarball with the patches at EarthLink Network.

Initial Design

The design goal was to patch NNRPD such that no single IP address could post more than a certain number of articles per hour. The first step was to find out what that "certain number" should be. Since I am a prolific poster, I figured that if I measured myself I would get a good first guess as to what that number should be.

On a good day (i.e. much stupidity in NANA) I can post about 1 message every 2 minutes (30 per hour), but this rate falls off when I get tired. If you look at this over a 24 hour period, you get basically 1 or 2 postings per hour. Clearly an exponential backoff algorithm would be sufficient to allow me to post at my prolific rate, and still slow down a spammer's posting robot.

Thus, I came up with the following algorithm.

  * To compute the new sleep time, the previous sleep time is, for most
  * cases multiplied by a factor (K). K is computed based on the
  * difference between this time and the last posting time in seconds.
  *
  * If this difference is less than POST_FAST then K is K_INC. 
  *
  * If the difference is between POST_FAST and POST_SLOW then K is 1 and 
  *    K_NOM is added to the time each time.
  *
  * If the difference is greater than POST_SLOW then K is 1/K_DEC.
  *
  * Before sleeptimes are passed to sleep(), they are divided by K_DIV.
  * This allows for scaling the initial exponential growth of sleep time
  * so that one can post a certain number of messages before invoking a 
  * penalty.  Set K_DIV = 2**N where N = Number of posts before backoff 
  * begins.

  #define K_INC  2
  #define K_NOM  5
  #define K_DEC  4
  #define K_DIV  1024
  #define POST_FAST 150
  #define POST_SLOW 3600
Note the three regions of behavior, characterized by the difference between this posting time and the last one (dPOST). The only time the sleeptime decreases is when you haven't posted anything in an hour, otherwise the sleeptime continues to increase.

This algorithm suffered from the problem of undesired feedback, since the sleeptime contributes to dPOST.

Algorithm Redesign

About this time I was doing work as a consultant for EarthLink Network, a major ISP. Given their situation with spam, it seemed a natural solution for them to embrace.

However, Earthlink did not like the algorithm I initially picked. Their major concern was the dynamic IP space that they had to deal with; this implied that users would be unfairly backed off.

Here was the new algorithm we came up with:

 * To compute the new sleep time, the previous sleep time is, for most
 * cases multiplied by a factor (backoff_k). backoff_k is computed based on the
 * difference between this time and the last posting time in seconds.
 *
 * If this difference is less than POST_FAST then backoff_k is K_INC. 
Due to this dynamic IP space they had to deal with, Earthlink required posters to be backed off faster than was evident by my own metrics. However, they also required a faster return to the 0 backoff time state.

Drawbacks to this system

Clearly those who do offline news reading suffer from this algorithm. Offline news reading requires that you upload your responses in one felt swoop, usually using a posting robot similar to the ones the spammers use. Since we cannot tell the difference, the offline news readers get backed off as well.

Some places have multiple users coming from the same IP number. These algorithms make no attempt to distinguish on a per user basis, thus several posters who happen to land on the same IP (as in dynamic IP assignment) may get backed off unfairly.

Further work (completed!)

Since, NNRPD has hooks in it for username/password authentication, it makes more sense to use this rather than IP numbers for posting backoff. This feature was recently added to the code and submitted to the INN 1.8 development team for inclusion.

Backoff appears to work

Since the installation of posting backoff at EarthLink Networks, EarthLink has dramatically fallen off the externally composed index of sites with high BI postings. This conclusion was achieved with data taken from Ed Falk's EarthLink spam history web page. There is a graph of this data here. Exponential backoff was installed on EarthLink news servers sometime in July of 1997. Note the dramatic decrease in his spam article counts at about that time.

Granted, Mr. Falk's data is not without it's inaccuracy. Several random scans of headers turned of a significant amount of headers claimed to be from earthlink which were actually determinable as forgerys. This possibility is indicated on Mr. Falk's web pages.

Nevertheless, I feel that the consistency of this metric coupled with the dramatic drop in Usenet volume abuse indicate that exponential backoff is quite successful at preventing the abuse of resources that occurs when ISP customers post too much.

Acknowledgements

The author would like to thank EarthLink Network for their support and adoption of this work in progress. The following people there also deserve special kudos for their help and support:


This site is maintained by Dave Hayes