Rate Limiting – Token Bucket Algorithm
Due to a spike in requests on our public API(application programming interface) at Regulon. I had to implement a rate limiter to improve the availability and user experience of our APIs.
There are different ways to rate-limit an API. I'll list three here and explain one.
- Fixed window: This is the most straightforward way. Essentially, you'll limit every user to a number of requests within a fixed time window. This way of rate-limiting cannot handle sudden bursts of traffic. If a user reaches the limit within the first second of the window, the user will have to wait for the counter to reset. An example of a fixed window is: 100 requests per minute. If a user uses up all their requests in the first few seconds, they'll have to wait for the next minute to initiate another request.
- Sliding window: This is similar to the fixed window, only that this time the window “slides” forward with the current time. The window constantly evaluates the quota relative to the current time. Example: If the limit is 100 requests per hour. When a request is made at 2:30 PM, the system checks requests made between 1:30 PM and 2:30 PM. At 4:00 PM, the system checks requests made between 3:00 PM and 4:00 PM.
- Token bucket: This is the industry standard and is used by Nginx, Envoy, and AWS API Gateway. It controls how many requests are allowed over time by giving the system a fixed bucket or token size. The goal is that every user has a bucket of tokens. Every request takes out a token from their bucket. If the bucket is empty, the request is rejected. Tokens are added at a steady rate or refill rate up to a maximum capacity of the bucket. This approach is ideal for handling short bursts in traffic with long-term smooth rate limiting. For example, for a bucket with 100 tokens at a refill rate of 10 tokens per second, a user can make 100 requests in one second and in the next second make up to 10 requests.
I chose the token bucket approach since it guaranteed our users short spikes in traffic.
Implementing the Token Bucket Algorithm:
Make sure Redis is installed. In Regulon's use case, I chose Redis. Redis offers fast reads, and there's no need to make a round trip to other services and then database to verify if a user is allowed to access the system. The rate limiter should be the first thing an authenticated user will interact with.
Now let's look at the code that handles the implementation:
Ruby:
Install Redis
gem install redis
or your Rails app:
# Gemfile
gem 'redis'
rate_limiter.rb
require 'redis'
require 'time'
class RateLimiter
def initialize(rate_per_minute: 100, bucket_size: 100)
@redis = Redis.new
# this is the refill rate
@rate_per_second = rate_per_minute.to_f / 60.0
@bucket_size = bucket_size
end
def allow_request(user_id)
# create keys in Redis for the user
tokens_key = "tb:#{user_id}:tokens"
ts_key = "tb:#{user_id}:ts"
# read from Redis, if there are no values, initiate with the fixed values
tokens = @redis.get(tokens_key)&.to_f || @bucket_size
last_ts = @redis.get(ts_key)&.to_f || Time.now.to_f
# calculate the elapsed time from the last request
now = Time.now.to_f
elapsed = now - last_ts
# Calculate tokens to refill
# If 0.5 seconds passed and refill_rate is 10 tokens/second:
tokens_to_add = elapsed * @rate_per_second
tokens = [@bucket_size, tokens + tokens_to_add].min
# Check if enough tokens are available
if tokens < 1
retry_after = (1 - tokens) / @rate_per_second
@redis.set(tokens_key, tokens)
@redis.set(ts_key, now)
return [false, retry_after]
end
tokens -= 1
@redis.pipelined do
@redis.set(tokens_key, tokens)
@redis.set(ts_key, now)
end
[true, nil]
end
end
Usage:
limiter = RateLimiter.new(rate_per_minute: 100)
allowed, retry_after = limiter.allow_request(user_id)
# validate for allowed
unless allowed
render json: {
error: "Too Many Requests",
retry_after: retry_after
}, status: 429
end
For Python:
rate_limiter.py
import time
import redis
class RateLimiter:
def __init__(self, redis_client, rate_per_minute=100, bucket_size=100):
self.redis = redis_client
self.rate_per_second = rate_per_minute / 60.0
self.bucket_size = bucket_size
def allow_request(self, user_id):
key_tokens = f"tb:{user_id}:tokens"
key_timestamp = f"tb:{user_id}:ts"
# fetch current values
tokens = self.redis.get(key_tokens)
last_ts = self.redis.get(key_timestamp)
tokens = float(tokens) if tokens else self.bucket_size
last_ts = float(last_ts) if last_ts else time.time()
now = time.time()
elapsed = now - last_ts
# refill based on elapsed time
tokens = min(self.bucket_size, tokens + elapsed * self.rate_per_second)
if tokens < 1:
# not enough tokens - reject
self.redis.set(key_tokens, tokens)
self.redis.set(key_timestamp, now)
retry_after = (1 - tokens) / self.rate_per_second
return False, retry_after
# consume 1 token
tokens -= 1
# save state
pipeline = self.redis.pipeline()
pipeline.set(key_tokens, tokens)
pipeline.set(key_timestamp, now)
pipeline.execute()
return True, None
Usage:
from flask import Flask, request, jsonify
import redis
from token_bucket import TokenBucket
app = Flask(__name__)
r = redis.Redis(host="localhost", port=some_port)
limiter = RateLimiter(r)
@app.route("/api/test")
def test():
user_id = request.headers.get("X-User-ID", "anonymous")
allowed, retry_after = limiter.allow_request(user_id)
if not allowed:
return jsonify({
"error": "Too Many Requests",
"retry_after": retry_after
}), 429
return jsonify({"message": "success"})
In conclusion, the Token Bucket Algorithm is an effective way to rate limit APIs. Test it out and happy coding.