Kwaku Boateng

Software Engineer and Content Creator

Due to a spike in requests on our public API(application programming interface) at Regulon. I had to implement a rate limiter to improve the availability and user experience of our APIs.

There are different ways to rate-limit an API. I'll list three here and explain one.

  1. Fixed window: This is the most straightforward way. Essentially, you'll limit every user to a number of requests within a fixed time window. This way of rate-limiting cannot handle sudden bursts of traffic. If a user reaches the limit within the first second of the window, the user will have to wait for the counter to reset. An example of a fixed window is: 100 requests per minute. If a user uses up all their requests in the first few seconds, they'll have to wait for the next minute to initiate another request.
  2. Sliding window: This is similar to the fixed window, only that this time the window “slides” forward with the current time. The window constantly evaluates the quota relative to the current time. Example: If the limit is 100 requests per hour. When a request is made at 2:30 PM, the system checks requests made between 1:30 PM and 2:30 PM. At 4:00 PM, the system checks requests made between 3:00 PM and 4:00 PM.
  3. Token bucket: This is the industry standard and is used by Nginx, Envoy, and AWS API Gateway. It controls how many requests are allowed over time by giving the system a fixed bucket or token size. The goal is that every user has a bucket of tokens. Every request takes out a token from their bucket. If the bucket is empty, the request is rejected. Tokens are added at a steady rate or refill rate up to a maximum capacity of the bucket. This approach is ideal for handling short bursts in traffic with long-term smooth rate limiting. For example, for a bucket with 100 tokens at a refill rate of 10 tokens per second, a user can make 100 requests in one second and in the next second make up to 10 requests.

I chose the token bucket approach since it guaranteed our users short spikes in traffic.

Implementing the Token Bucket Algorithm:

Make sure Redis is installed. In Regulon's use case, I chose Redis. Redis offers fast reads, and there's no need to make a round trip to other services and then database to verify if a user is allowed to access the system. The rate limiter should be the first thing an authenticated user will interact with.

Now let's look at the code that handles the implementation:

Ruby:

Install Redis

gem install redis

or your Rails app:

# Gemfile
gem 'redis'

rate_limiter.rb

require 'redis'
require 'time'

class RateLimiter
  def initialize(rate_per_minute: 100, bucket_size: 100)
    @redis = Redis.new
    # this is the refill rate
    @rate_per_second = rate_per_minute.to_f / 60.0 
    @bucket_size = bucket_size
  end

  def allow_request(user_id)
   # create keys in Redis for the user
    tokens_key = "tb:#{user_id}:tokens"
    ts_key     = "tb:#{user_id}:ts"

   # read from Redis, if there are no values, initiate with the fixed values
    tokens = @redis.get(tokens_key)&.to_f || @bucket_size
    last_ts = @redis.get(ts_key)&.to_f || Time.now.to_f

   # calculate the elapsed time from the last request
    now = Time.now.to_f
    elapsed = now - last_ts

    #  Calculate tokens to refill
    # If 0.5 seconds passed and refill_rate is 10 tokens/second:
    tokens_to_add = elapsed * @rate_per_second
    tokens = [@bucket_size, tokens + tokens_to_add].min
    
   # Check if enough tokens are available
    if tokens < 1
      retry_after = (1 - tokens) / @rate_per_second
      @redis.set(tokens_key, tokens)
      @redis.set(ts_key, now)
      return [false, retry_after]
    end

    tokens -= 1

    @redis.pipelined do
      @redis.set(tokens_key, tokens)
      @redis.set(ts_key, now)
    end

    [true, nil]
  end
end

Usage:

    limiter = RateLimiter.new(rate_per_minute: 100)
    allowed, retry_after = limiter.allow_request(user_id)
    # validate for allowed
   unless allowed
      render json: {
        error: "Too Many Requests",
        retry_after: retry_after
      }, status: 429
    end

For Python:

rate_limiter.py

import time
import redis

class RateLimiter:
    def __init__(self, redis_client, rate_per_minute=100, bucket_size=100):
        self.redis = redis_client
        self.rate_per_second = rate_per_minute / 60.0
        self.bucket_size = bucket_size

    def allow_request(self, user_id):
        key_tokens = f"tb:{user_id}:tokens"
        key_timestamp = f"tb:{user_id}:ts"

        # fetch current values
        tokens = self.redis.get(key_tokens)
        last_ts = self.redis.get(key_timestamp)

        tokens = float(tokens) if tokens else self.bucket_size
        last_ts = float(last_ts) if last_ts else time.time()

        now = time.time()
        elapsed = now - last_ts

        # refill based on elapsed time
        tokens = min(self.bucket_size, tokens + elapsed * self.rate_per_second)

        if tokens < 1:
            # not enough tokens - reject
            self.redis.set(key_tokens, tokens)
            self.redis.set(key_timestamp, now)
            retry_after = (1 - tokens) / self.rate_per_second
            return False, retry_after

        # consume 1 token
        tokens -= 1

        # save state
        pipeline = self.redis.pipeline()
        pipeline.set(key_tokens, tokens)
        pipeline.set(key_timestamp, now)
        pipeline.execute()

        return True, None

Usage:

from flask import Flask, request, jsonify
import redis
from token_bucket import TokenBucket

app = Flask(__name__)
r = redis.Redis(host="localhost", port=some_port)
limiter = RateLimiter(r)

@app.route("/api/test")
def test():
    user_id = request.headers.get("X-User-ID", "anonymous")
    allowed, retry_after = limiter.allow_request(user_id)

    if not allowed:
        return jsonify({
            "error": "Too Many Requests",
            "retry_after": retry_after
        }), 429

    return jsonify({"message": "success"})

In conclusion, the Token Bucket Algorithm is an effective way to rate limit APIs. Test it out and happy coding.