Tuan 09: Rate Limiter

“Mot he thong khong co Rate Limiter giong nhu san van dong khong co cong soat ve — ai cung vao duoc, va ket qua la giam dap.”

Tags: system-design rate-limiter security devops alex-xu Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-05-Load-Balancer · Tuan-06-Cache-Strategy Lien quan: Tuan-14-AuthN-AuthZ-Security · Tuan-15-Data-Security-Encryption · Tuan-13-Monitoring-Observability · Tuan-11-Microservices-Pattern


1. Context & Why

Analogy doi thuong

Hieu, tuong tuong em di xem concert o san van dong 50,000 cho ngoi. O cong soat ve (ticket gate), nhan vien chi cho dung so nguoi/phut di qua — vi du 200 nguoi/phut. Neu 10,000 nguoi u o mot luc ma khong co kiem soat:

  • Cong bi qua tai (server overload)
  • Nguoi ta giam dap nhau (cascading failure)
  • Nhung nguoi co ve VIP cung bi ket (legitimate users affected)
  • Ke gian tron ve de dang (malicious traffic gets through)

Rate Limiter chinh la cai cong soat ve do. No gioi han so luong request mot client co the gui den server trong mot khoang thoi gian nhat dinh. Qua gioi han → tra ve HTTP 429 “Too Many Requests” va yeu cau doi.

Tai sao can Rate Limiter?

Van deKhong co Rate LimiterCo Rate Limiter
DDoS AttackServer bi danh sapChi attacker bi block, user binh thuong van truy cap
Brute Force LoginAttacker thu hang trieu passwordBi chan sau 5-10 lan thu sai
API AbuseMot client goi 1M req/s, nguoi khac khong dung duocMoi client co quota cong bang
Accidental LoopBug trong client gui request vo hanBi rate limit, server an toan
Cost ControlCloud bill phat no vi runaway requestsChi phi duoc kiem soat

Tai sao Alex Xu dat no o Chapter 4?

Vi Rate Limiter la component co ban ma moi he thong production deu can. Truoc khi design bat ky he thong nao (URL Shortener, Chat, News Feed), em phai biet cach bao ve no khoi abuse. No nam giua Load Balancer (Tuan 05) va application logic — la tuyen phong thu dau tien.


2. Deep Dive — Cac thuat toan Rate Limiting

2.1 Token Bucket Algorithm

Nguyen ly: Tuong tuong mot cai xo (bucket) chua cac token. Moi request can 1 token de duoc xu ly. Token duoc do vao xo deu dan theo toc do co dinh (refill rate). Neu xo het token → request bi tu choi.

Tham so:

  • Bucket size (dung luong xo): So token toi da — cho phep burst
  • Refill rate (toc do nap): So token duoc them moi giay

Vi du: Bucket size = 10, Refill rate = 2 tokens/s

  • T=0: Bucket co 10 tokens. Client gui 10 requests lien tuc → tat ca duoc xu ly, bucket = 0
  • T=1: Bucket duoc nap 2 tokens. Client gui 3 requests → 2 duoc xu ly, 1 bi reject
  • T=5: Bucket da nap lai 10 tokens (cap tai bucket size). Client co the burst lai

Uu diem:

  • Cho phep burst traffic (huu ich cho real-world usage patterns)
  • Memory hieu qua: chi can luu last_refill_time va tokens per user
  • Amazon va Stripe dung Token Bucket

Nhuoc diem:

  • Can tune 2 tham so (bucket size + refill rate) — khong truc quan
  • Burst co the lam overload backend neu khong can than

2.2 Leaky Bucket Algorithm

Nguyen ly: Tuong tuong mot cai xo bi thung day. Request do vao tu tren, va chay ra (duoc xu ly) voi toc do co dinh tu lo thung. Neu xo day → request moi bi do ra ngoai (reject).

Tham so:

  • Bucket size: So request toi da trong hang doi
  • Outflow rate (toc do chay ra): So request duoc xu ly moi giay

Vi du: Bucket size = 10, Outflow rate = 2 req/s

  • Client gui 10 requests cung luc → tat ca vao hang doi
  • He thong xu ly deu 2 req/s, mat 5s de xu ly het
  • Request thu 11 bi reject vi bucket day

Uu diem:

  • Output rate co dinh va on dinh — rat tot cho downstream services
  • Shopify dung Leaky Bucket cho API rate limiting

Nhuoc diem:

  • Khong cho phep burst — moi request deu phai cho trong queue
  • Request cu co the chiem hang doi, lam request moi bi reject

2.3 Fixed Window Counter

Nguyen ly: Chia thoi gian thanh cac cua so co dinh (vi du: moi phut). Dem so request trong moi cua so. Neu vuot gioi han → reject.

Vi du: Limit = 100 req/min

  • 14:00:00 - 14:00:59: Dem requests. Neu dat 100 → reject cac request con lai
  • 14:01:00: Counter reset ve 0. Bat dau dem lai

Uu diem:

  • Don gian nhat de implement
  • Memory cuc thap: chi can 1 counter per window per user

Nhuoc diem (QUAN TRONG):

  • Boundary burst problem: Neu client gui 100 req o giay 59 cua phut 1, va 100 req o giay 0 cua phut 2 → 200 requests trong 2 giay, gap doi limit!
  • Day la ly do Fixed Window khong duoc dung cho security-critical systems

2.4 Sliding Window Log

Nguyen ly: Luu timestamp cua moi request vao mot sorted set. Khi request moi den, xoa cac timestamp cu hon window size, dem so timestamp con lai. Neu vuot limit → reject.

Vi du: Limit = 100 req/min

  • Request moi den luc 14:01:30
  • Xoa tat ca timestamps truoc 14:00:30
  • Dem so timestamps con lai trong [14:00:30, 14:01:30]
  • Neu >= 100 → reject

Uu diem:

  • Chinh xac tuyet doi — khong co boundary burst problem
  • Rate limit duoc enforce chinh xac tai moi thoi diem

Nhuoc diem:

  • Ton memory: phai luu moi timestamp. Neu 1M users, moi user 1000 req/window → 1B timestamps
  • Redis ZSET per user co the rat lon

2.5 Sliding Window Counter

Nguyen ly: Ket hop Fixed Window Counter va Sliding Window Log. Dung counter cua window hien tai va window truoc, tinh weighted average dua tren vi tri trong window.

Cong thuc:

Vi du: Limit = 100 req/min, window = 1 phut

  • Window truoc (14:00 - 14:01): 84 requests
  • Window hien tai (14:01 - 14:02): 36 requests
  • Thoi diem hien tai: 14:01:15 (da qua 25% cua window hien tai)

Request tiep theo se dat 100 → vua du limit, request sau do bi reject.

Uu diem:

  • Memory hieu qua nhu Fixed Window (chi can 2 counters per user)
  • Chinh xac gan tuyet doi — smooth ra boundary problem
  • Cloudflare dung Sliding Window Counter

Nhuoc diem:

  • Chi la xap xi (approximation), khong chinh xac 100%
  • Trong thuc te, sai so < 1% — chap nhan duoc

2.6 Bang so sanh cac thuat toan

Thuat toanMemoryDo chinh xacBurst supportDo phuc tapDung khi
Token BucketThap (2 vars/user)CaoCo (configurable)ThapAPI general purpose, AWS, Stripe
Leaky BucketThap (queue + pointer)CaoKhongThapCan output on dinh, Shopify
Fixed Window CounterRat thap (1 counter/user)Thap (boundary burst)Co (o ranh gioi)Rat thapInternal services, non-critical
Sliding Window LogRat cao (all timestamps)Tuyet doiKhongCaoSecurity-critical, brute force prevention
Sliding Window CounterThap (2 counters/user)Gan tuyet doi (~99%)Mot phanTrung binhProduction API, Cloudflare

Ket luan thuc te: Hau het production systems dung Token Bucket hoac Sliding Window Counter. Fixed Window chi dung cho internal/non-critical. Sliding Window Log chi dung khi can chinh xac tuyet doi (vi du: financial transactions).

2.7 Distributed Rate Limiting (Redis-based)

Trong he thong distributed voi nhieu API server instances, rate limiter can shared state. Redis la lua chon pho bien nhat vi:

  • Atomic operations: INCR, EXPIRE, Lua scripting
  • In-memory: latency < 1ms
  • Single-threaded: khong co race condition trong 1 command

Kien truc:

Client → Load Balancer → API Server 1 ─┐
                       → API Server 2 ─┤→ Redis Cluster (shared counters)
                       → API Server 3 ─┘

Van de voi distributed rate limiting:

  1. Race condition: 2 servers doc counter = 99 (limit = 100), ca hai tang len 100 va cho phep → thuc te 101 requests. Giai phap: Dung Lua script de atomic read-check-increment.
  2. Redis failure: Neu Redis down, rate limiter mat tac dung. Giai phap: Fail-open (cho phep tat ca) hoac fail-closed (reject tat ca) tuy policy. Dung Redis Sentinel/Cluster cho HA.
  3. Latency overhead: Moi request phai query Redis. Giai phap: Local cache + periodic sync, hoac dung Redis co-located with API servers.

2.8 Rate Limiting tai cac layers khac nhau

LayerVi triCong cuDac diem
Client-sideBrowser/Mobile appThrottle library, debounceDe bi bypass, chi la UX protection
CDN/EdgeCloudflare, AWS CloudFrontBuilt-in rate limitingChan DDoS truoc khi den origin
API GatewayKong, AWS API Gateway, NginxPlugin/module rate limitingCentralized, de quan ly
ApplicationCode trong serviceRedis-based custom logicFlexible, business-logic aware
DatabaseConnection pool, query limiterpgbouncer, MySQL proxyBao ve DB khoi overload

Best practice: Rate limit o nhieu layers (defense in depth). CDN chan DDoS, API Gateway chan per-user abuse, Application chan business-logic violations.

2.9 HTTP 429 + Retry-After Header

Khi rate limit bi triggered, server tra ve:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1678886400
 
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded. Please retry after 30 seconds.",
    "retry_after": 30
  }
}

Cac headers quan trong:

  • Retry-After: So giay client nen doi truoc khi thu lai
  • X-RateLimit-Limit: Tong so request duoc phep trong window
  • X-RateLimit-Remaining: So request con lai
  • X-RateLimit-Reset: Unix timestamp khi window reset

Aha Moment: Client tot se doc Retry-After va implement exponential backoff. Client xau se retry ngay luc → can rate limiter manh hon cho nhung client nay (progressive penalties).

2.10 Rate Limit by IP / User / API Key

Phuong phapKhi nao dungHan che
By IPAnonymous traffic, DDoS mitigationShared IP (NAT, VPN, corporate proxy) → nhieu user bi anh huong
By User IDAuthenticated APIsCan authn truoc rate limiting, khong chan duoc pre-auth attacks
By API KeyPublic APIs, 3rd-party integrationsAPI key co the bi share hoac leak
CompoundIP + User + EndpointChinh xac nhat, nhung phuc tap hon

Thuc te: Dung compound key la tot nhat. Vi du: rate limit = {user_id}:{endpoint}:{minute}. Nhu vay user A goi /api/search 100 lan/phut khong anh huong quota cua user A goi /api/profile.

2.11 Tiered Rate Limiting (Free vs Premium)

TierRate LimitBurstFeatures
Free60 req/min10 req burstBasic endpoints only
Basic ($29/mo)600 req/min50 req burstAll endpoints
Pro ($99/mo)3,000 req/min200 req burstAll endpoints + priority queue
Enterprise (custom)30,000 req/min1,000 req burstDedicated pool, custom limits

Implementation: Moi API key co tier field. Rate limiter lookup tier → apply tuong ung. Dung Redis hash:

HSET ratelimit:config:free max_requests 60 window_seconds 60 burst 10
HSET ratelimit:config:pro  max_requests 3000 window_seconds 60 burst 200

2.12 API Gateway Rate Limiting

Kong Gateway

# kong.yml - Rate Limiting Plugin
plugins:
  - name: rate-limiting
    config:
      second: 10        # 10 req/s
      minute: 100       # 100 req/min
      hour: 5000        # 5000 req/h
      policy: redis      # Dung Redis cho distributed
      redis_host: redis-cluster.internal
      redis_port: 6379
      redis_timeout: 2000
      fault_tolerant: true  # Fail-open neu Redis down
      hide_client_headers: false  # Tra ve X-RateLimit-* headers
      limit_by: consumer  # Rate limit per consumer (user)

AWS API Gateway

{
  "usagePlan": {
    "name": "ProPlan",
    "description": "Rate limit for Pro tier",
    "throttle": {
      "rateLimit": 50,
      "burstLimit": 200
    },
    "quota": {
      "limit": 100000,
      "period": "MONTH"
    }
  }
}

Nhan xet: API Gateway rate limiting rat tien nhung khong flexible bang custom implementation. Khi can rate limit theo business logic (vi du: limit so don hang/ngay), van can application-level rate limiter.


3. Estimation — Tinh toan cho Rate Limiter

Tham chieu: Tuan-02-Back-of-the-envelope, sdi.anhvy.dev — Rate Limiter

3.1 Tinh rate limit thresholds tu QPS estimation

Assumptions: API co 10M DAU, moi user trung binh 100 requests/ngay.

Tinh per-user rate limit:

Giai thich: safety_factor = 10 vi user co the tap trung su dung trong vai phut (khong phan bo deu ca ngay). burst_allowance = 20 cho phep burst khi load trang.

Kiem tra: Neu tat ca 10M users deu hit rate limit cung luc:

He thong can handle 2.5M QPS trong worst case → can horizontal scaling + CDN protection.

3.2 Redis memory cho Sliding Window Counter

Assumptions: 10M users, moi user can 2 counters (current + previous window) + metadata.

Nhan xet: Chi ~1 GB cho 10M users! Redis single node (16 GB) du suc. Neu dung Sliding Window Log (luu moi timestamp):

Van chap nhan duoc, nhung gap 3x so voi Sliding Window Counter.

3.3 Overhead cua Rate Limiter len latency

Redis round-trip time (same AZ): ~0.5ms

Neu API avg latency = 50ms:

Chap nhan duoc. Tuy nhien, neu Redis o khac AZ (cross-AZ latency ~1-2ms):

Giai phap: Co-locate Redis voi API servers cung AZ. Hoac dung local cache voi periodic sync (tang throughput nhung giam accuracy).

Voi Lua script (atomic operation):

Lua script chay server-side tren Redis nen khong them network round trip, nhung execution time co the lau hon single command.


4. Security — Bao ve he thong bang Rate Limiter

4.1 DDoS Mitigation

DDoS attack vectors va rate limiting response:

Attack TypeDac diemRate Limiting Strategy
Volumetric (UDP flood, DNS amplification)Millions of req/s tu botnetLayer 3/4: CDN + ISP-level filtering (Cloudflare, AWS Shield)
Protocol (SYN flood, Ping of Death)Exploit network protocolLayer 4: Connection rate limiting, SYN cookies
Application (HTTP flood, Slowloris)Legitimate-looking requestsLayer 7: Per-IP + per-endpoint rate limiting

Multi-layer DDoS defense:

  1. CDN/Edge (Cloudflare): Challenge suspicious IPs, block known bad actors
  2. Load Balancer: Connection rate limiting, geographic blocking
  3. API Gateway: Per-IP rate limiting (100 req/min/IP)
  4. Application: Per-user rate limiting + anomaly detection

Quan trong: Rate limiter khong the chong DDoS mot minh. Can ket hop voi CDN, WAF (Web Application Firewall), va ISP-level mitigation. Rate limiter la tuyen cuoi, khong phai tuyen dau.

4.2 Brute Force Prevention

Vi du: Login endpoint bao ve khoi password brute force.

Strategy da tang:

Lan thu that baiResponse
1-3Cho phep binh thuong, tra ve “Invalid credentials”
4-5Them CAPTCHA
6-10Rate limit: 1 attempt/30s + CAPTCHA
11-20Rate limit: 1 attempt/5min
>20Lock account 30 phut, notify user qua email

Rate limit key: login_attempt:{ip}:{username} — ket hop IP va username de tranh:

  • Attacker dung 1 IP brute force nhieu account
  • Attacker dung nhieu IP brute force 1 account

4.3 Credential Stuffing Protection

Credential stuffing = attacker dung danh sach username/password bi leak tu website khac de thu dang nhap hang loat.

Dau hieu nhan biet:

  • Login attempts tu nhieu IP khac nhau nhung pattern giong nhau
  • Failure rate bat thuong (>95% failed logins)
  • User agents/fingerprints giong het nhau

Rate limiting cho credential stuffing:

Neu normal login = 500 req/s, set global limit = 750 req/s. Khi vuot → trigger enhanced security:

  • Bat buoc CAPTCHA cho tat ca login
  • Tang delay giua cac login attempts
  • Alert security team

4.4 API Abuse Detection

Cac pattern abuse thuong gap:

PatternMo taDetection
ScrapingRequest lien tuc den listing/search endpointsHigh req rate tren specific endpoints
EnumerationThu sequential IDs (/users/1, /users/2, …)Sequential access pattern
Data exfiltrationDownload so luong lon dataBandwidth per user bat thuong
Price manipulationGui hang ngan order requestsOrder rate >> normal

Giai phap: Rate limiting ket hop anomaly detection. Khong chi dem so request ma con phan tich pattern:

  • Ratio giua cac endpoint (user binh thuong: 80% read, 20% write. Attacker: 95% read specific endpoint)
  • Request timing (human co jitter, bot co pattern deu dan)
  • Response consumption (user binh thuong doc response, bot ignore)

4.5 Rate Limiting Bypass Techniques va Prevention

Bypass TechniqueMo taPrevention
IP RotationDung proxy pool / VPN de doi IP lien tucRate limit by fingerprint (TLS, headers) + behavior analysis
Distributed attackDung botnet tu nhieu IPGlobal rate limiting + CAPTCHA + anomaly detection
SlowlorisGui request cuc cham de giu connectionConnection timeout + concurrent connection limit
Header spoofingGia mao X-Forwarded-For de doi IPTrust chi proxy noi bo, dung X-Real-IP tu trusted proxy
API key sharingNhieu attacker dung chung 1 API keyMonitor usage pattern per key, revoke neu suspicious
Account rotationTao nhieu account de chia rate limitRate limit per IP va per account, limit account creation

Aha Moment: Khong co rate limiter nao la hoan hao. Attacker luon tim cach bypass. Muc tieu la tang chi phi cua attack cho den khi khong con kinh te nua — khong phai chan 100%.


5. DevOps — Trien khai va van hanh Rate Limiter

5.1 Redis-based Rate Limiter Deployment

# docker-compose.yml - Rate Limiter Infrastructure
version: '3.8'
 
services:
  redis-ratelimit:
    image: redis:7-alpine
    command: >
      redis-server
      --maxmemory 2gb
      --maxmemory-policy allkeys-lru
      --appendonly no
      --save ""
      --tcp-backlog 511
      --timeout 0
      --tcp-keepalive 300
    ports:
      - "6380:6379"
    deploy:
      resources:
        limits:
          memory: 2.5g
          cpus: '2'
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    networks:
      - ratelimit-net
 
  redis-sentinel-1:
    image: redis:7-alpine
    command: redis-sentinel /etc/redis/sentinel.conf
    volumes:
      - ./sentinel.conf:/etc/redis/sentinel.conf
    depends_on:
      - redis-ratelimit
    networks:
      - ratelimit-net
 
  # API Server voi rate limiting
  api-server:
    build: .
    environment:
      - REDIS_RATELIMIT_HOST=redis-ratelimit
      - REDIS_RATELIMIT_PORT=6379
      - RATE_LIMIT_DEFAULT=100/min
      - RATE_LIMIT_AUTH=10/min
    depends_on:
      redis-ratelimit:
        condition: service_healthy
    networks:
      - ratelimit-net
 
networks:
  ratelimit-net:
    driver: bridge
# sentinel.conf - Redis Sentinel cho HA
sentinel monitor ratelimit-master redis-ratelimit 6379 2
sentinel down-after-milliseconds ratelimit-master 5000
sentinel failover-timeout ratelimit-master 10000
sentinel parallel-syncs ratelimit-master 1

5.2 Monitoring Rate Limit Hits — Prometheus + Grafana

# prometheus-alerts.yml
groups:
  - name: rate_limiting
    rules:
      # Alert khi rate limit hit rate tang dot bien
      - alert: RateLimitHitSpike
        expr: >
          rate(rate_limit_hits_total[5m]) >
          rate(rate_limit_hits_total[1h] offset 1d) * 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Rate limit hits tang gap 3x so voi cung gio hom qua"
          description: "Current: {{ $value }}/s. Co the la DDoS hoac API abuse."
 
      # Alert khi qua nhieu user bi rate limited
      - alert: TooManyUsersRateLimited
        expr: >
          rate(rate_limit_hits_total{result="rejected"}[5m]) /
          rate(http_requests_total[5m]) > 0.1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: ">10% requests bi rate limited — co the rate limit qua chat"
 
      # Alert khi Redis rate limiter latency cao
      - alert: RateLimiterLatencyHigh
        expr: >
          histogram_quantile(0.99,
            rate(rate_limiter_duration_seconds_bucket[5m])
          ) > 0.005
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Rate limiter P99 latency > 5ms"
 
      # Alert khi Redis rate limiter down
      - alert: RateLimiterRedisDown
        expr: redis_up{instance=~".*ratelimit.*"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Rate limiter Redis instance down!"

Grafana Dashboard panels:

PanelPromQLMuc dich
Rate Limit Hits/srate(rate_limit_hits_total[1m])Xu huong bi rate limit
Rejection Rate (%)rate(rate_limit_hits_total{result="rejected"}[5m]) / rate(http_requests_total[5m]) * 100% request bi reject
Top Rate Limited Userstopk(10, sum by (user_id)(rate(rate_limit_hits_total{result="rejected"}[1h])))Tim abuse users
Top Rate Limited Endpointstopk(10, sum by (endpoint)(rate(rate_limit_hits_total{result="rejected"}[1h])))Tim endpoint bi tan cong
Redis Latency P99histogram_quantile(0.99, rate(rate_limiter_duration_seconds_bucket[5m]))Performance rate limiter
Redis Memory Usageredis_memory_used_bytes{instance=~".*ratelimit.*"}Capacity planning

5.3 Alerting on Spike Patterns

Spike detection strategies:

  1. Absolute threshold: rate > 10,000/s → alert
  2. Relative to baseline: rate > 3x average of same hour yesterday
  3. Rate of change: deriv(rate_limit_hits_total[5m]) > 100 (tang nhanh bat thuong)
  4. Anomaly detection: Dung Grafana ML hoac external system (Datadog Anomaly Monitors)
# Ket hop cac strategies
- alert: SuspiciousTrafficPattern
  expr: >
    (
      rate(http_requests_total{path="/api/login"}[5m]) > 50
      and
      rate(http_requests_total{path="/api/login", status="401"}[5m]) /
      rate(http_requests_total{path="/api/login"}[5m]) > 0.9
    )
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Possible brute force attack: >90% login failures at >50 req/s"

5.4 Nginx Rate Limiting Module

# /etc/nginx/conf.d/rate-limiting.conf
 
# --- Zone definitions ---
# Shared memory zone cho rate limiting
# 10m = 10MB shared memory ≈ 160,000 IP addresses
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;
 
# Rate limit by API key (extracted from header)
map $http_x_api_key $api_key_zone {
    default "anonymous";
    ~^.+$   $http_x_api_key;
}
limit_req_zone $api_key_zone zone=api_key:10m rate=100r/min;
 
# --- Connection limiting ---
limit_conn_zone $binary_remote_addr zone=conn_per_ip:10m;
 
server {
    listen 80;
    server_name api.example.com;
 
    # Global connection limit
    limit_conn conn_per_ip 20;
 
    # Custom error page cho 429
    error_page 429 = @rate_limited;
    location @rate_limited {
        default_type application/json;
        return 429 '{"error":"rate_limited","retry_after":30}';
    }
 
    # --- General API ---
    location /api/ {
        limit_req zone=general burst=20 nodelay;
        limit_req zone=api_key burst=50 nodelay;
        limit_req_status 429;
        limit_req_log_level warn;
 
        # Truyen rate limit headers ve client
        add_header X-RateLimit-Limit 10;
        add_header X-RateLimit-Burst 20;
 
        proxy_pass http://backend;
    }
 
    # --- Login endpoint (strict) ---
    location /api/auth/login {
        limit_req zone=login burst=3 nodelay;
        limit_req_status 429;
        limit_req_log_level error;
 
        proxy_pass http://backend;
    }
 
    # --- Search endpoint (moderate) ---
    location /api/search {
        limit_req zone=api burst=10 delay=5;
        # delay=5: 5 requests duoc xu ly ngay,
        # 5 tiep theo bi delay (queued),
        # sau do reject
        limit_req_status 429;
 
        proxy_pass http://backend;
    }
}

Giai thich burst va nodelay:

  • burst=20: Cho phep 20 requests vuot rate truoc khi reject
  • nodelay: Xu ly burst requests ngay lap tuc (khong queue)
  • delay=5: 5 requests dau duoc xu ly ngay, con lai bi queue

6. Code — Production-grade Rate Limiter

6.1 Python: Sliding Window Rate Limiter voi Redis

"""
Production-grade Sliding Window Counter Rate Limiter
Su dung Redis + Lua script cho atomic operations
"""
 
import time
import hashlib
import logging
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
import redis
 
logger = logging.getLogger(__name__)
 
 
class RateLimitTier(Enum):
    FREE = "free"
    BASIC = "basic"
    PRO = "pro"
    ENTERPRISE = "enterprise"
 
 
@dataclass(frozen=True)
class RateLimitConfig:
    max_requests: int       # So request toi da trong window
    window_seconds: int     # Kich thuoc window (giay)
    burst_size: int         # Burst cho phep (Token Bucket component)
 
    @classmethod
    def for_tier(cls, tier: RateLimitTier) -> "RateLimitConfig":
        configs = {
            RateLimitTier.FREE:       cls(max_requests=60,    window_seconds=60, burst_size=10),
            RateLimitTier.BASIC:      cls(max_requests=600,   window_seconds=60, burst_size=50),
            RateLimitTier.PRO:        cls(max_requests=3000,  window_seconds=60, burst_size=200),
            RateLimitTier.ENTERPRISE: cls(max_requests=30000, window_seconds=60, burst_size=1000),
        }
        return configs[tier]
 
 
@dataclass
class RateLimitResult:
    allowed: bool
    limit: int
    remaining: int
    reset_at: float          # Unix timestamp khi window reset
    retry_after: Optional[int]  # Seconds to wait (None neu allowed)
 
    @property
    def headers(self) -> dict[str, str]:
        """Tra ve HTTP headers theo RFC draft."""
        headers = {
            "X-RateLimit-Limit": str(self.limit),
            "X-RateLimit-Remaining": str(max(0, self.remaining)),
            "X-RateLimit-Reset": str(int(self.reset_at)),
        }
        if not self.allowed and self.retry_after:
            headers["Retry-After"] = str(self.retry_after)
        return headers
 
 
# Lua script: atomic sliding window counter
# Chay toan bo tren Redis server — khong co race condition
SLIDING_WINDOW_LUA = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
 
local current_window = math.floor(now / window)
local previous_window = current_window - 1
local elapsed = now - (current_window * window)
 
local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window
 
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local current_count = tonumber(redis.call("GET", current_key) or "0")
 
-- Sliding window counter formula
local weighted_count = previous_count * (1 - elapsed / window) + current_count
 
if weighted_count >= limit then
    -- Rate limited
    local ttl = window - elapsed
    return {0, limit, math.floor(limit - weighted_count), math.ceil(now + ttl), math.ceil(ttl)}
end
 
-- Allowed: increment current window counter
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)  -- TTL = 2 windows de giu previous
redis.call("EXPIRE", previous_key, window * 2)
 
local new_count = weighted_count + 1
local remaining = math.floor(limit - new_count)
local reset_at = (current_window + 1) * window
 
return {1, limit, remaining, math.ceil(reset_at), 0}
"""
 
 
class SlidingWindowRateLimiter:
    """
    Production-grade sliding window counter rate limiter.
 
    Features:
    - Atomic Redis operations via Lua script (no race conditions)
    - Tiered rate limiting (Free/Basic/Pro/Enterprise)
    - Compound key support (IP + user + endpoint)
    - Graceful degradation khi Redis down (fail-open)
    - Metrics emission for Prometheus
    """
 
    def __init__(
        self,
        redis_client: redis.Redis,
        fail_open: bool = True,
        metrics_callback=None,
    ):
        self._redis = redis_client
        self._fail_open = fail_open
        self._metrics = metrics_callback
        self._lua_sha: Optional[str] = None
 
    def _ensure_script(self) -> str:
        """Load Lua script vao Redis (cached)."""
        if self._lua_sha is None:
            self._lua_sha = self._redis.script_load(SLIDING_WINDOW_LUA)
        return self._lua_sha
 
    def _build_key(
        self,
        identifier: str,
        endpoint: Optional[str] = None,
    ) -> str:
        """Tao Redis key tu identifier + endpoint."""
        parts = ["rl", identifier]
        if endpoint:
            # Hash endpoint de key khong qua dai
            ep_hash = hashlib.md5(endpoint.encode()).hexdigest()[:8]
            parts.append(ep_hash)
        return ":".join(parts)
 
    def check(
        self,
        identifier: str,
        config: RateLimitConfig,
        endpoint: Optional[str] = None,
    ) -> RateLimitResult:
        """
        Kiem tra va ghi nhan 1 request.
 
        Args:
            identifier: User ID, API key, hoac IP address
            config: Rate limit configuration
            endpoint: Optional endpoint path cho per-endpoint limiting
 
        Returns:
            RateLimitResult voi allowed status va headers
        """
        key = self._build_key(identifier, endpoint)
 
        try:
            sha = self._ensure_script()
            now = time.time()
 
            result = self._redis.evalsha(
                sha,
                1,  # so keys
                key,
                now,
                config.window_seconds,
                config.max_requests,
            )
 
            allowed, limit, remaining, reset_at, retry_after = result
 
            rate_result = RateLimitResult(
                allowed=bool(allowed),
                limit=limit,
                remaining=remaining,
                reset_at=reset_at,
                retry_after=retry_after if retry_after > 0 else None,
            )
 
            # Emit metrics
            if self._metrics:
                self._metrics(
                    identifier=identifier,
                    endpoint=endpoint or "global",
                    allowed=rate_result.allowed,
                )
 
            return rate_result
 
        except redis.ConnectionError:
            logger.error("Redis connection failed for rate limiting")
            if self._fail_open:
                # Fail-open: cho phep request khi Redis down
                return RateLimitResult(
                    allowed=True,
                    limit=config.max_requests,
                    remaining=config.max_requests,
                    reset_at=time.time() + config.window_seconds,
                    retry_after=None,
                )
            else:
                # Fail-closed: reject tat ca khi Redis down
                return RateLimitResult(
                    allowed=False,
                    limit=config.max_requests,
                    remaining=0,
                    reset_at=time.time() + 60,
                    retry_after=60,
                )
 
        except redis.RedisError as e:
            logger.error(f"Redis error in rate limiter: {e}")
            if self._fail_open:
                return RateLimitResult(
                    allowed=True,
                    limit=config.max_requests,
                    remaining=config.max_requests,
                    reset_at=time.time() + config.window_seconds,
                    retry_after=None,
                )
            raise
 
 
# === Su dung ===
if __name__ == "__main__":
    r = redis.Redis(host="localhost", port=6379, decode_responses=False)
    limiter = SlidingWindowRateLimiter(redis_client=r, fail_open=True)
 
    config = RateLimitConfig.for_tier(RateLimitTier.FREE)
 
    for i in range(65):
        result = limiter.check(
            identifier="user:12345",
            config=config,
            endpoint="/api/search",
        )
        if not result.allowed:
            print(f"Request {i+1}: REJECTED | retry_after={result.retry_after}s")
            print(f"  Headers: {result.headers}")
            break
        else:
            print(f"Request {i+1}: OK | remaining={result.remaining}")

6.2 Node.js: Express Rate Limiting Middleware

// middleware/rate-limiter.js
// Production-grade Express middleware voi Redis backend
 
const Redis = require("ioredis");
const crypto = require("crypto");
 
// === Tier Configs ===
const TIER_CONFIGS = {
  free:       { maxRequests: 60,    windowSeconds: 60, burstSize: 10 },
  basic:      { maxRequests: 600,   windowSeconds: 60, burstSize: 50 },
  pro:        { maxRequests: 3000,  windowSeconds: 60, burstSize: 200 },
  enterprise: { maxRequests: 30000, windowSeconds: 60, burstSize: 1000 },
};
 
// === Lua Script (same logic as Python version) ===
const SLIDING_WINDOW_LUA = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
 
local current_window = math.floor(now / window)
local previous_window = current_window - 1
local elapsed = now - (current_window * window)
 
local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window
 
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local current_count = tonumber(redis.call("GET", current_key) or "0")
 
local weighted_count = previous_count * (1 - elapsed / window) + current_count
 
if weighted_count >= limit then
    local ttl = window - elapsed
    return {0, limit, math.floor(limit - weighted_count), math.ceil(now + ttl), math.ceil(ttl)}
end
 
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)
redis.call("EXPIRE", previous_key, window * 2)
 
local new_count = weighted_count + 1
local remaining = math.floor(limit - new_count)
local reset_at = (current_window + 1) * window
 
return {1, limit, remaining, math.ceil(reset_at), 0}
`;
 
class RateLimiter {
  constructor({ redisClient, failOpen = true, prefix = "rl" }) {
    this.redis = redisClient;
    this.failOpen = failOpen;
    this.prefix = prefix;
    this.scriptSha = null;
  }
 
  async ensureScript() {
    if (!this.scriptSha) {
      this.scriptSha = await this.redis.script("LOAD", SLIDING_WINDOW_LUA);
    }
    return this.scriptSha;
  }
 
  buildKey(identifier, endpoint) {
    const parts = [this.prefix, identifier];
    if (endpoint) {
      const hash = crypto
        .createHash("md5")
        .update(endpoint)
        .digest("hex")
        .slice(0, 8);
      parts.push(hash);
    }
    return parts.join(":");
  }
 
  async check(identifier, config, endpoint = null) {
    const key = this.buildKey(identifier, endpoint);
 
    try {
      const sha = await this.ensureScript();
      const now = Date.now() / 1000;
 
      const [allowed, limit, remaining, resetAt, retryAfter] =
        await this.redis.evalsha(
          sha,
          1,
          key,
          now,
          config.windowSeconds,
          config.maxRequests
        );
 
      return {
        allowed: Boolean(allowed),
        limit,
        remaining: Math.max(0, remaining),
        resetAt,
        retryAfter: retryAfter > 0 ? retryAfter : null,
      };
    } catch (err) {
      console.error("[RateLimiter] Redis error:", err.message);
      if (this.failOpen) {
        return {
          allowed: true,
          limit: config.maxRequests,
          remaining: config.maxRequests,
          resetAt: Date.now() / 1000 + config.windowSeconds,
          retryAfter: null,
        };
      }
      throw err;
    }
  }
}
 
// === Express Middleware Factory ===
function rateLimitMiddleware(options = {}) {
  const {
    redisUrl = "redis://localhost:6379",
    failOpen = true,
    keyExtractor = defaultKeyExtractor,
    tierExtractor = defaultTierExtractor,
    onRejected = defaultOnRejected,
  } = options;
 
  const redisClient = new Redis(redisUrl);
  const limiter = new RateLimiter({ redisClient, failOpen });
 
  return async function rateLimit(req, res, next) {
    try {
      const identifier = keyExtractor(req);
      const tier = tierExtractor(req);
      const config = TIER_CONFIGS[tier] || TIER_CONFIGS.free;
      const endpoint = `${req.method}:${req.route?.path || req.path}`;
 
      const result = await limiter.check(identifier, config, endpoint);
 
      // Set rate limit headers
      res.set("X-RateLimit-Limit", String(result.limit));
      res.set("X-RateLimit-Remaining", String(result.remaining));
      res.set("X-RateLimit-Reset", String(result.resetAt));
 
      if (!result.allowed) {
        res.set("Retry-After", String(result.retryAfter));
        return onRejected(req, res, result);
      }
 
      next();
    } catch (err) {
      console.error("[RateLimiter] Middleware error:", err.message);
      // Neu co loi va failOpen = true, Lua script da handle
      next();
    }
  };
}
 
// === Default Helpers ===
function defaultKeyExtractor(req) {
  // Uu tien: user ID > API key > IP
  if (req.user?.id) return `user:${req.user.id}`;
  if (req.headers["x-api-key"]) return `key:${req.headers["x-api-key"]}`;
  const ip = req.ip || req.headers["x-forwarded-for"] || "unknown";
  return `ip:${ip}`;
}
 
function defaultTierExtractor(req) {
  return req.user?.tier || "free";
}
 
function defaultOnRejected(req, res, result) {
  return res.status(429).json({
    error: {
      code: "RATE_LIMITED",
      message: "Rate limit exceeded. Please retry later.",
      retry_after: result.retryAfter,
    },
  });
}
 
// === Usage voi Express ===
// const express = require('express');
// const app = express();
//
// app.use(rateLimitMiddleware({
//   redisUrl: process.env.REDIS_URL || 'redis://localhost:6379',
//   failOpen: true,
// }));
//
// app.get('/api/data', (req, res) => {
//   res.json({ data: 'hello' });
// });
 
module.exports = { RateLimiter, rateLimitMiddleware, TIER_CONFIGS };

6.3 Nginx Rate Limiting Config (Production)

# /etc/nginx/conf.d/rate-limiting.conf
# Production-ready Nginx rate limiting configuration
 
# === Shared memory zones ===
# $binary_remote_addr = 4 bytes (IPv4) hoac 16 bytes (IPv6)
# 10m zone ≈ 160,000 IPv4 addresses
 
# General API: 10 requests/second per IP
limit_req_zone $binary_remote_addr zone=api_per_ip:10m rate=10r/s;
 
# Auth endpoints: 1 request/second per IP (strict)
limit_req_zone $binary_remote_addr zone=auth_per_ip:10m rate=1r/s;
 
# Per API key: 100 requests/minute
map $http_x_api_key $limit_key {
    default         $binary_remote_addr;
    "~^.+$"         $http_x_api_key;
}
limit_req_zone $limit_key zone=per_api_key:10m rate=100r/m;
 
# Connection limiting per IP
limit_conn_zone $binary_remote_addr zone=conn_per_ip:5m;
 
# === Logging ===
log_format rate_limit '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      'limit_req_status=$limit_req_status';
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
 
    # Connection limits
    limit_conn conn_per_ip 30;
    limit_conn_log_level warn;
 
    # Custom 429 response
    limit_req_status 429;
 
    access_log /var/log/nginx/rate_limit.log rate_limit;
 
    # --- Standard API endpoints ---
    location /api/ {
        limit_req zone=api_per_ip burst=20 nodelay;
        limit_req zone=per_api_key burst=50 nodelay;
 
        proxy_pass http://api_upstream;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Rate-Limited $limit_req_status;
    }
 
    # --- Auth endpoints (strict rate limiting) ---
    location /api/auth/ {
        limit_req zone=auth_per_ip burst=3 nodelay;
 
        proxy_pass http://api_upstream;
        proxy_set_header X-Real-IP $remote_addr;
    }
 
    # --- Health check (no rate limiting) ---
    location /health {
        limit_req off;
        return 200 '{"status":"ok"}';
        add_header Content-Type application/json;
    }
 
    # --- Error handling ---
    error_page 429 @rate_limited;
    location @rate_limited {
        default_type application/json;
        add_header Retry-After 30 always;
        add_header X-RateLimit-Limit 10 always;
        return 429 '{"error":{"code":"RATE_LIMITED","message":"Too many requests","retry_after":30}}';
    }
}

6.4 Lua Script cho Atomic Redis Rate Limiting

-- rate_limiter.lua
-- Lua script chay tren Redis cho token bucket algorithm
-- Su dung: EVALSHA <sha> 1 <key> <now> <rate> <capacity> <requested>
-- Tra ve: {allowed (0/1), tokens_remaining, retry_after_ms}
 
local key = KEYS[1]
local now = tonumber(ARGV[1])           -- Current timestamp (ms)
local rate = tonumber(ARGV[2])          -- Tokens per second
local capacity = tonumber(ARGV[3])      -- Max tokens (bucket size)
local requested = tonumber(ARGV[4])     -- Tokens requested (usually 1)
 
-- Lay state hien tai tu Redis
local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])
 
-- Lan dau: khoi tao bucket day
if tokens == nil then
    tokens = capacity
    last_refill = now
end
 
-- Tinh so token duoc refill
local elapsed_ms = math.max(0, now - last_refill)
local new_tokens = elapsed_ms * rate / 1000  -- rate la tokens/second
tokens = math.min(capacity, tokens + new_tokens)
 
local allowed = 0
local retry_after_ms = 0
 
if tokens >= requested then
    -- Du token: cho phep request
    allowed = 1
    tokens = tokens - requested
else
    -- Khong du token: tinh thoi gian cho
    local deficit = requested - tokens
    retry_after_ms = math.ceil(deficit / rate * 1000)
end
 
-- Luu state moi
redis.call("HMSET", key, "tokens", tostring(tokens), "last_refill", tostring(now))
-- TTL = 2x thoi gian de bucket day lai (phong truong hop key bi giu vinh vien)
local ttl_seconds = math.ceil(capacity / rate * 2)
redis.call("EXPIRE", key, ttl_seconds)
 
return {allowed, math.floor(tokens), retry_after_ms}
# Su dung Lua script tu Python
import time
import redis
 
r = redis.Redis(host="localhost", port=6379)
 
# Load script 1 lan
with open("rate_limiter.lua", "r") as f:
    TOKEN_BUCKET_SHA = r.script_load(f.read())
 
def token_bucket_check(
    user_id: str,
    rate: float = 10.0,        # 10 tokens/s
    capacity: int = 50,        # Max 50 tokens
    requested: int = 1,
) -> dict:
    """Token bucket rate limit check via Lua script."""
    key = f"tb:{user_id}"
    now_ms = int(time.time() * 1000)
 
    allowed, remaining, retry_after_ms = r.evalsha(
        TOKEN_BUCKET_SHA,
        1,          # number of keys
        key,        # KEYS[1]
        now_ms,     # ARGV[1]
        rate,       # ARGV[2]
        capacity,   # ARGV[3]
        requested,  # ARGV[4]
    )
 
    return {
        "allowed": bool(allowed),
        "remaining": remaining,
        "retry_after_ms": retry_after_ms,
    }
 
# Test
for i in range(55):
    result = token_bucket_check("user:42", rate=10, capacity=50)
    status = "OK" if result["allowed"] else f"REJECTED (retry in {result['retry_after_ms']}ms)"
    print(f"Request {i+1}: {status} | remaining={result['remaining']}")

7. System Design Diagrams

7.1 Rate Limiter Architecture trong API Gateway

flowchart TD
    Client([Client]) -->|Request| CDN[CDN / Edge<br/>Cloudflare / AWS CloudFront]

    CDN -->|Layer 3-4 filtering<br/>DDoS mitigation| LB[Load Balancer<br/>Nginx / ALB]

    LB --> GW[API Gateway<br/>Kong / AWS API GW]

    subgraph "Rate Limiting Pipeline"
        GW --> EXTRACT[Extract Identity<br/>IP / API Key / User ID / JWT]
        EXTRACT --> LOOKUP[Lookup Tier Config<br/>Free / Basic / Pro / Enterprise]
        LOOKUP --> CHECK{Rate Limit Check<br/>via Redis}
        CHECK -->|Allowed| AUTH[Authentication<br/>& Authorization]
        CHECK -->|Rejected| REJECT[HTTP 429<br/>+ Retry-After header]
    end

    subgraph "Redis Rate Limit Store"
        REDIS[(Redis Cluster)]
        CHECK <-->|Lua Script<br/>Atomic check + increment| REDIS
        REDIS --- R1[Sliding Window Counters]
        REDIS --- R2[Token Bucket State]
        REDIS --- R3[Blocked IPs Set]
    end

    AUTH --> APP[Application Server]
    APP --> DB[(Database)]

    REJECT -->|429 + headers| Client

    subgraph "Monitoring"
        REDIS -->|Metrics| PROM[Prometheus]
        GW -->|rate_limit_hits_total| PROM
        PROM --> GRAF[Grafana Dashboard]
        GRAF -->|Alert| OPS[Ops Team / PagerDuty]
    end

    style CHECK fill:#ff9800,stroke:#333,stroke-width:2px
    style REJECT fill:#f44336,stroke:#333,stroke-width:2px,color:#fff
    style AUTH fill:#4caf50,stroke:#333,stroke-width:2px
    style REDIS fill:#d32f2f,stroke:#333,stroke-width:2px,color:#fff

7.2 Token Bucket Visualization

sequenceDiagram
    participant C as Client
    participant RL as Rate Limiter
    participant B as Token Bucket<br/>(capacity=5, rate=2/s)
    participant S as Server

    Note over B: Bucket: [*][*][*][*][*]<br/>tokens = 5 (full)

    C->>RL: Request 1
    RL->>B: consume(1)
    B-->>RL: OK (tokens=4)
    RL->>S: Forward request
    S-->>C: 200 OK

    C->>RL: Request 2
    RL->>B: consume(1)
    B-->>RL: OK (tokens=3)
    RL->>S: Forward request
    S-->>C: 200 OK

    C->>RL: Request 3, 4, 5 (burst)
    RL->>B: consume(3)
    B-->>RL: OK (tokens=0)
    RL->>S: Forward 3 requests
    S-->>C: 200 OK x3

    Note over B: Bucket: [ ][ ][ ][ ][ ]<br/>tokens = 0 (empty!)

    C->>RL: Request 6
    RL->>B: consume(1)
    B-->>RL: REJECTED (tokens=0)
    RL-->>C: 429 Too Many Requests<br/>Retry-After: 1

    Note over B: +1 second passes...<br/>Refill: 2 tokens/s

    Note over B: Bucket: [*][*][ ][ ][ ]<br/>tokens = 2

    C->>RL: Request 7 (after 1s)
    RL->>B: consume(1)
    B-->>RL: OK (tokens=1)
    RL->>S: Forward request
    S-->>C: 200 OK

7.3 Distributed Rate Limiting — Race Condition va Solution

sequenceDiagram
    participant S1 as API Server 1
    participant S2 as API Server 2
    participant R as Redis

    Note over S1,R: === WITHOUT Lua Script (Race Condition) ===

    S1->>R: GET counter (= 99)
    S2->>R: GET counter (= 99)
    Note over S1: 99 < 100, allow!
    Note over S2: 99 < 100, allow!
    S1->>R: INCR counter (= 100)
    S2->>R: INCR counter (= 101)
    Note over R: Counter = 101<br/>LIMIT VIOLATED!

    Note over S1,R: === WITH Lua Script (Atomic) ===

    S1->>R: EVALSHA lua_script
    Note over R: Lua: GET=99, 99<100<br/>INCR → 100, return ALLOW
    R-->>S1: ALLOWED (remaining=0)

    S2->>R: EVALSHA lua_script
    Note over R: Lua: GET=100, 100>=100<br/>return REJECT
    R-->>S2: REJECTED (retry_after=45s)

    Note over R: Counter = 100<br/>LIMIT RESPECTED!

8. Aha Moments & Pitfalls

Aha Moment #1: Race Conditions trong Distributed Rate Limiting

Khi co 10 API servers cung query Redis, read-then-write pattern tao race condition. 2 servers doc counter = 99, ca hai cho phep, counter thanh 101. Giai phap duy nhat dung: Lua script hoac Redis MULTI/EXEC. Khong bao gio dung GET roi INCR rieng le.

Aha Moment #2: Clock Skew co the pha rate limiter

Trong distributed system, dong ho cac server co the lech nhau vai milliseconds den vai giay. Neu rate limiter dung timestamp tu application server thay vi Redis server, 2 servers o 2 window khac nhau tai cung thoi diem. Giai phap: Luon dung redis.call("TIME") trong Lua script, hoac dung Redis server timestamp. Khong bao gio trust client timestamp.

Aha Moment #3: Rate Limiting Internal Services — Can than!

Hieu, day la sai lam cuc ky pho bien: dat rate limit len tat ca services, ke ca internal service-to-service communication. Ket qua: khi traffic spike, Service A bi rate limit boi Service B, tao cascading failure con toi te hon khong co rate limit. Quy tac: Rate limit tai edge (API Gateway) cho external traffic. Internal services dung circuit breaker (xem Tuan-11-Microservices-Pattern) thay vi rate limiter.

Aha Moment #4: Over-aggressive Limits lam hai UX

Mot startup dat rate limit 10 req/min cho free tier. Ket qua: user load trang chu (5 API calls) + click 1 link (3 calls) + scroll (3 calls) = 11 calls → bi rate limit chi sau 2 giay su dung. User bo di va khong bao gio quay lai. Quy tac: Luon test rate limit bang cach mo app nhu user binh thuong va dem so request. Rate limit phai >= 3x normal usage pattern.

Pitfall #1: Fixed Window Boundary Burst

Su dung Fixed Window Counter voi limit 100/min. Attacker gui 100 req o giay 59 va 100 req o giay 60 → 200 requests trong 2 giay. Day la ly do Fixed Window khong bao gio duoc dung cho security-critical endpoints. Dung Sliding Window Counter hoac Token Bucket.

Pitfall #2: Fail-Open vs Fail-Closed — Chon sai co the chet

Fail-open (cho phep tat ca khi Redis down): An toan cho UX nhung DDoS se lot qua. Fail-closed (reject tat ca khi Redis down): An toan cho security nhung Redis down = toan bo he thong down. Giai phap: Fail-open + Redis HA (Sentinel/Cluster) + fallback local rate limiter (in-memory, less accurate but better than nothing).

Pitfall #3: Quen rate limit webhooks va background jobs

Hieu chi rate limit HTTP API ma quen rang webhooks (Stripe, GitHub) va background workers cung tao load. Mot webhook retry storm co the giong DDoS. Giai phap: Rate limit moi entry point vao system, khong chi user-facing API.

Pitfall #4: Khong co rate limit cho dang ky tai khoan

Attacker tao 100,000 accounts → moi account co rate limit rieng → bypass hoan toan. Giai phap: Rate limit account creation by IP (5 accounts/IP/ngay) + CAPTCHA + email verification.


Prerequisite (doc truoc)

Lien quan truc tiep

Ap dung trong Case Studies

Tham khao

  • Alex Xu, System Design Interview — Chapter 4: Design a Rate Limiter
  • sdi.anhvy.dev — Rate Limiter patterns & algorithms
  • Cloudflare Blog: How we built rate limiting capable of scaling to millions of domains
  • Stripe Engineering: Rate limiters and load shedders
  • Kong Documentation: Rate Limiting Plugin

Tuan truoc: Tuan-08-Message-Queue — Message Queue lam buffer cho traffic spikes Tuan sau: Tuan-10-Consistent-Hashing — Phan phoi deu data across nodes