Token Bucket Rate Limiting: Engineering Defense Against Credential Stuffing at Scale

Credential stuffing remains one of the most persistent threats to web applications. Akamai reported more than 26 billion credential stuffing attempts globally every month in 2024. Attackers test stolen username/password combinations from data breaches against login endpoints, hoping users reused credentials across services.

Rate limiting serves as a frontline defense, but naive implementations either block legitimate users who mistype passwords or fail to stop distributed attacks. This technical guide explores token bucket rate limiting and advanced patterns that balance security with user experience.

Understanding the Token Bucket Algorithm

The token bucket algorithm models rate limiting as a bucket that fills with tokens at a steady rate. Each request consumes a token. When the bucket empties, requests are rejected until tokens regenerate.

Key parameters control behavior:

Bucket Size (Capacity): Maximum tokens the bucket can hold, determining burst capacity
Refill Rate: Tokens added per time unit, setting sustained throughput limit
Tokens per Request: Usually 1, but expensive operations can consume more

Example configuration: A bucket with capacity 10 and refill rate of 1 token per second allows bursts of 10 requests, then sustains 1 request per second. A user rapidly entering several wrong passwords won't be locked out, but a bot attempting thousands of combinations will be throttled.

Token Bucket vs. Leaky Bucket

The related leaky bucket algorithm processes requests at a fixed rate, queuing excess requests. Token bucket allows bursts up to capacity; leaky bucket smooths traffic to constant output.

For credential stuffing defense, token bucket's burst tolerance better matches human behavior—users sometimes make multiple rapid attempts when unsure of credentials—while still limiting sustained attack rates.

Implementation Architecture

Production implementations require distributed state storage. Redis provides ideal characteristics:

-- Redis Lua script for atomic token bucket
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_update')
local tokens = tonumber(bucket[1]) or capacity
local last_update = tonumber(bucket[2]) or now

-- Calculate token refill
local elapsed = now - last_update
local refill = elapsed * refill_rate
tokens = math.min(capacity, tokens + refill)

-- Check if request allowed
if tokens >= requested then
    tokens = tokens - requested
    redis.call('HMSET', key, 'tokens', tokens, 'last_update', now)
    redis.call('EXPIRE', key, 3600)
    return 1  -- Allowed
else
    redis.call('HMSET', key, 'tokens', tokens, 'last_update', now)
    redis.call('EXPIRE', key, 3600)
    return 0  -- Denied
end

Lua scripts execute atomically in Redis, preventing race conditions when multiple requests arrive simultaneously. The EXPIRE command ensures buckets for inactive users are eventually cleaned up.

Tiered Response Strategy

Cloudflare recommends using multiple rate limiting rules with increasing penalties. This pattern provides graduated responses matching threat severity:

Tier 1 - Soft Limit: After 4 requests per minute, present a challenge (CAPTCHA or device verification). Legitimate users occasionally make multiple attempts; a simple challenge confirms humanity without blocking access.

Tier 2 - Medium Limit: After 10 requests over 10 minutes, require additional verification. This catches persistent but low-volume attacks while allowing legitimate users who might be troubleshooting login issues.

Tier 3 - Hard Limit: After exceeding tier 2 thresholds, block the client for an extended period (hours or days). At this point, the traffic pattern strongly indicates automation.

// Tiered rate limit configuration
const tierConfig = {
  tier1: {
    capacity: 4,
    refillRate: 4/60,     // 4 per minute
    action: 'challenge',   // Present CAPTCHA
    duration: 60
  },
  tier2: {
    capacity: 10,
    refillRate: 10/600,   // 10 per 10 minutes
    action: 'verify',      // Require email/SMS verification
    duration: 600
  },
  tier3: {
    capacity: 20,
    refillRate: 20/3600,  // 20 per hour
    action: 'block',       // Block client
    duration: 86400        // 24 hour block
  }
};

Keying Strategies

Rate limits must be applied to appropriate identifiers. Each keying strategy has trade-offs:

IP Address: Simple but problematic. Legitimate users behind NAT or corporate proxies share IPs, causing false positives. Attackers using residential proxies easily rotate IPs.

IP + Username: Better for protecting specific accounts. An attacker targeting one account faces limits regardless of IP rotation. However, attackers targeting many accounts with one credential per account bypass this.

Device Fingerprint: More resistant to IP rotation but requires client-side fingerprinting. Privacy concerns and anti-fingerprinting browsers complicate this approach.

Behavioral Session: Combine multiple signals—IP, fingerprint, mouse patterns, timing—into a session identifier. More robust but complex to implement.

Effective implementations layer multiple keys. Rate limit by IP for global protection, by username for account protection, and by fingerprint/session for sophisticated attack resistance.

Adaptive Rate Limiting

Static thresholds cannot anticipate all attack patterns. Adaptive systems adjust limits based on observed behavior:

Baseline Learning: Establish normal traffic patterns for each endpoint during non-attack periods
Anomaly Detection: Automatically tighten limits when traffic exceeds learned baselines
Geographic Adjustment: Apply stricter limits to regions with elevated attack traffic
Time-Based Variation: Recognize legitimate traffic patterns (business hours, marketing campaigns) and adjust accordingly

// Adaptive threshold calculation
function calculateThreshold(baseThreshold, context) {
  let multiplier = 1.0;

  // Increase limit for known good signals
  if (context.hasValidSession) multiplier *= 1.5;
  if (context.hasPassedCaptcha) multiplier *= 2.0;
  if (context.accountAge > 30) multiplier *= 1.25;

  // Decrease limit for suspicious signals
  if (context.isKnownBadASN) multiplier *= 0.5;
  if (context.failureRate > 0.8) multiplier *= 0.25;
  if (context.isProxyIP) multiplier *= 0.75;

  return Math.round(baseThreshold * multiplier);
}

Integration with Behavioral Detection

Rate limiting alone cannot distinguish sophisticated bots from legitimate users with poor memory. Integration with behavioral analysis provides context that improves both systems:

Pre-authentication signals: Analyze mouse movements, typing patterns, and page engagement before login submission. Suspicious behavior triggers immediate stricter rate limits.

Failed attempt analysis: Legitimate users fail with passwords close to correct (typos, caps lock). Bots fail with random passwords from credential lists. Analyze failure patterns to adjust limits.

Session continuity: Legitimate users navigate to login from other pages. Bots often hit login endpoints directly. Consider navigation patterns in rate limit decisions.

Response Headers and Client Communication

Rate-limited responses should communicate limits to clients, enabling legitimate applications to back off appropriately:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1702234567

{
  "error": "rate_limited",
  "message": "Too many login attempts. Please try again in 60 seconds.",
  "retry_after": 60
}

The Retry-After header indicates when requests will be accepted again. Well-behaved clients respect this header, reducing unnecessary retry traffic.

Monitoring and Alerting

Rate limiting generates valuable security telemetry. Key metrics to track:

Rate limit hits by tier: Sudden increases indicate potential attacks
Unique IPs hitting limits: Distributed attacks involve many IPs; account-targeted attacks few
Accounts frequently rate-limited: May indicate targeted account takeover attempts
Geographic distribution: Attacks often originate from specific regions or hosting providers
Time patterns: Attacks may follow attacker time zones or automation schedules

Alert thresholds should trigger investigation when rate limiting activity significantly exceeds baselines, indicating either an ongoing attack or misconfigured limits affecting legitimate users.

Edge Cases and Gotchas

Clock Skew: In distributed systems, servers with unsynchronized clocks may calculate token refills inconsistently. Use centralized time sources or accept minor inconsistencies.

Failopen vs. Failclosed: When Redis is unreachable, should requests be allowed (fail open) or denied (fail closed)? Fail open maintains availability during outages but loses protection. Fail closed maintains security but causes downtime.

IPv6 Considerations: IPv6's vast address space enables attackers to use unique IPs per request. Rate limit by /64 or larger prefixes rather than individual addresses.

Account Enumeration: Different responses for valid vs. invalid usernames leak information. Apply consistent rate limits regardless of whether the account exists.

Conclusion

Token bucket rate limiting provides essential protection against credential stuffing when properly implemented. Key principles for effective deployment:

Use tiered responses matching threat severity to action
Layer multiple keying strategies for comprehensive coverage
Integrate with behavioral analysis for context-aware limits
Implement adaptive thresholds that respond to attack patterns
Monitor rate limiting metrics as security telemetry

Rate limiting is necessary but not sufficient. Combine with behavioral analysis, device fingerprinting, and threat intelligence for defense-in-depth against credential stuffing attacks.