Tuan 09: Rate Limiter

“Mot he thong khong co Rate Limiter giong nhu san van dong khong co cong soat ve — ai cung vao duoc, va ket qua la giam dap.”

Tags: system-design rate-limiter security devops alex-xu Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-05-Load-Balancer · Tuan-06-Cache-Strategy Lien quan: Tuan-14-AuthN-AuthZ-Security · Tuan-15-Data-Security-Encryption · Tuan-13-Monitoring-Observability · Tuan-11-Microservices-Pattern

1. Context & Why

Analogy doi thuong

Hieu, tuong tuong em di xem concert o san van dong 50,000 cho ngoi. O cong soat ve (ticket gate), nhan vien chi cho dung so nguoi/phut di qua — vi du 200 nguoi/phut. Neu 10,000 nguoi u o mot luc ma khong co kiem soat:

Cong bi qua tai (server overload)
Nguoi ta giam dap nhau (cascading failure)
Nhung nguoi co ve VIP cung bi ket (legitimate users affected)
Ke gian tron ve de dang (malicious traffic gets through)

Rate Limiter chinh la cai cong soat ve do. No gioi han so luong request mot client co the gui den server trong mot khoang thoi gian nhat dinh. Qua gioi han → tra ve HTTP 429 “Too Many Requests” va yeu cau doi.

Tai sao can Rate Limiter?

Van de	Khong co Rate Limiter	Co Rate Limiter
DDoS Attack	Server bi danh sap	Chi attacker bi block, user binh thuong van truy cap
Brute Force Login	Attacker thu hang trieu password	Bi chan sau 5-10 lan thu sai
API Abuse	Mot client goi 1M req/s, nguoi khac khong dung duoc	Moi client co quota cong bang
Accidental Loop	Bug trong client gui request vo han	Bi rate limit, server an toan
Cost Control	Cloud bill phat no vi runaway requests	Chi phi duoc kiem soat

Tai sao Alex Xu dat no o Chapter 4?

Vi Rate Limiter la component co ban ma moi he thong production deu can. Truoc khi design bat ky he thong nao (URL Shortener, Chat, News Feed), em phai biet cach bao ve no khoi abuse. No nam giua Load Balancer (Tuan 05) va application logic — la tuyen phong thu dau tien.

2. Deep Dive — Cac thuat toan Rate Limiting

2.1 Token Bucket Algorithm

Nguyen ly: Tuong tuong mot cai xo (bucket) chua cac token. Moi request can 1 token de duoc xu ly. Token duoc do vao xo deu dan theo toc do co dinh (refill rate). Neu xo het token → request bi tu choi.

Tham so:

Bucket size (dung luong xo): So token toi da — cho phep burst
Refill rate (toc do nap): So token duoc them moi giay

Vi du: Bucket size = 10, Refill rate = 2 tokens/s

T=0: Bucket co 10 tokens. Client gui 10 requests lien tuc → tat ca duoc xu ly, bucket = 0
T=1: Bucket duoc nap 2 tokens. Client gui 3 requests → 2 duoc xu ly, 1 bi reject
T=5: Bucket da nap lai 10 tokens (cap tai bucket size). Client co the burst lai

Uu diem:

Cho phep burst traffic (huu ich cho real-world usage patterns)
Memory hieu qua: chi can luu last_refill_time va tokens per user
Amazon va Stripe dung Token Bucket

Nhuoc diem:

Can tune 2 tham so (bucket size + refill rate) — khong truc quan
Burst co the lam overload backend neu khong can than

2.2 Leaky Bucket Algorithm

Nguyen ly: Tuong tuong mot cai xo bi thung day. Request do vao tu tren, va chay ra (duoc xu ly) voi toc do co dinh tu lo thung. Neu xo day → request moi bi do ra ngoai (reject).

Tham so:

Bucket size: So request toi da trong hang doi
Outflow rate (toc do chay ra): So request duoc xu ly moi giay

Vi du: Bucket size = 10, Outflow rate = 2 req/s

Client gui 10 requests cung luc → tat ca vao hang doi
He thong xu ly deu 2 req/s, mat 5s de xu ly het
Request thu 11 bi reject vi bucket day

Uu diem:

Output rate co dinh va on dinh — rat tot cho downstream services
Shopify dung Leaky Bucket cho API rate limiting

Nhuoc diem:

Khong cho phep burst — moi request deu phai cho trong queue
Request cu co the chiem hang doi, lam request moi bi reject

2.3 Fixed Window Counter

Nguyen ly: Chia thoi gian thanh cac cua so co dinh (vi du: moi phut). Dem so request trong moi cua so. Neu vuot gioi han → reject.

Vi du: Limit = 100 req/min

14:00:00 - 14:00:59: Dem requests. Neu dat 100 → reject cac request con lai
14:01:00: Counter reset ve 0. Bat dau dem lai

Uu diem:

Don gian nhat de implement
Memory cuc thap: chi can 1 counter per window per user

Nhuoc diem (QUAN TRONG):

Boundary burst problem: Neu client gui 100 req o giay 59 cua phut 1, va 100 req o giay 0 cua phut 2 → 200 requests trong 2 giay, gap doi limit!
Day la ly do Fixed Window khong duoc dung cho security-critical systems

2.4 Sliding Window Log

Nguyen ly: Luu timestamp cua moi request vao mot sorted set. Khi request moi den, xoa cac timestamp cu hon window size, dem so timestamp con lai. Neu vuot limit → reject.

Vi du: Limit = 100 req/min

Request moi den luc 14:01:30
Xoa tat ca timestamps truoc 14:00:30
Dem so timestamps con lai trong [14:00:30, 14:01:30]
Neu >= 100 → reject

Uu diem:

Chinh xac tuyet doi — khong co boundary burst problem
Rate limit duoc enforce chinh xac tai moi thoi diem

Nhuoc diem:

Ton memory: phai luu moi timestamp. Neu 1M users, moi user 1000 req/window → 1B timestamps
Redis ZSET per user co the rat lon

2.5 Sliding Window Counter

Nguyen ly: Ket hop Fixed Window Counter va Sliding Window Log. Dung counter cua window hien tai va window truoc, tinh weighted average dua tren vi tri trong window.

Cong thuc:

re q u es t s_{es t ima t e d} = re q u es t s_{p re v_w in d o w} \times (1 - \frac{e l a p se d}{w in d o w _ s i ze}) + re q u es t s_{c u rre n t_w in d o w}

Vi du: Limit = 100 req/min, window = 1 phut

Window truoc (14:00 - 14:01): 84 requests
Window hien tai (14:01 - 14:02): 36 requests
Thoi diem hien tai: 14:01:15 (da qua 25% cua window hien tai)

es t ima t e d = 84 \times (1 - 0.25) + 36 = 84 \times 0.75 + 36 = 63 + 36 = 99

Request tiep theo se dat 100 → vua du limit, request sau do bi reject.

Uu diem:

Memory hieu qua nhu Fixed Window (chi can 2 counters per user)
Chinh xac gan tuyet doi — smooth ra boundary problem
Cloudflare dung Sliding Window Counter

Nhuoc diem:

Chi la xap xi (approximation), khong chinh xac 100%
Trong thuc te, sai so < 1% — chap nhan duoc

2.6 Bang so sanh cac thuat toan

Thuat toan	Memory	Do chinh xac	Burst support	Do phuc tap	Dung khi
Token Bucket	Thap (2 vars/user)	Cao	Co (configurable)	Thap	API general purpose, AWS, Stripe
Leaky Bucket	Thap (queue + pointer)	Cao	Khong	Thap	Can output on dinh, Shopify
Fixed Window Counter	Rat thap (1 counter/user)	Thap (boundary burst)	Co (o ranh gioi)	Rat thap	Internal services, non-critical
Sliding Window Log	Rat cao (all timestamps)	Tuyet doi	Khong	Cao	Security-critical, brute force prevention
Sliding Window Counter	Thap (2 counters/user)	Gan tuyet doi (~99%)	Mot phan	Trung binh	Production API, Cloudflare

Ket luan thuc te: Hau het production systems dung Token Bucket hoac Sliding Window Counter. Fixed Window chi dung cho internal/non-critical. Sliding Window Log chi dung khi can chinh xac tuyet doi (vi du: financial transactions).

2.7 Distributed Rate Limiting (Redis-based)

Trong he thong distributed voi nhieu API server instances, rate limiter can shared state. Redis la lua chon pho bien nhat vi:

Atomic operations: INCR, EXPIRE, Lua scripting
In-memory: latency < 1ms
Single-threaded: khong co race condition trong 1 command

Kien truc:

Client → Load Balancer → API Server 1 ─┐
                       → API Server 2 ─┤→ Redis Cluster (shared counters)
                       → API Server 3 ─┘

Van de voi distributed rate limiting:

Race condition: 2 servers doc counter = 99 (limit = 100), ca hai tang len 100 va cho phep → thuc te 101 requests. Giai phap: Dung Lua script de atomic read-check-increment.
Redis failure: Neu Redis down, rate limiter mat tac dung. Giai phap: Fail-open (cho phep tat ca) hoac fail-closed (reject tat ca) tuy policy. Dung Redis Sentinel/Cluster cho HA.
Latency overhead: Moi request phai query Redis. Giai phap: Local cache + periodic sync, hoac dung Redis co-located with API servers.

2.8 Rate Limiting tai cac layers khac nhau

Layer	Vi tri	Cong cu	Dac diem
Client-side	Browser/Mobile app	Throttle library, debounce	De bi bypass, chi la UX protection
CDN/Edge	Cloudflare, AWS CloudFront	Built-in rate limiting	Chan DDoS truoc khi den origin
API Gateway	Kong, AWS API Gateway, Nginx	Plugin/module rate limiting	Centralized, de quan ly
Application	Code trong service	Redis-based custom logic	Flexible, business-logic aware
Database	Connection pool, query limiter	pgbouncer, MySQL proxy	Bao ve DB khoi overload

Best practice: Rate limit o nhieu layers (defense in depth). CDN chan DDoS, API Gateway chan per-user abuse, Application chan business-logic violations.

2.9 HTTP 429 + Retry-After Header

Khi rate limit bi triggered, server tra ve:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1678886400
 
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded. Please retry after 30 seconds.",
    "retry_after": 30
  }
}

Cac headers quan trong:

Retry-After: So giay client nen doi truoc khi thu lai
X-RateLimit-Limit: Tong so request duoc phep trong window
X-RateLimit-Remaining: So request con lai
X-RateLimit-Reset: Unix timestamp khi window reset

Aha Moment: Client tot se doc Retry-After va implement exponential backoff. Client xau se retry ngay luc → can rate limiter manh hon cho nhung client nay (progressive penalties).

2.10 Rate Limit by IP / User / API Key

Phuong phap	Khi nao dung	Han che
By IP	Anonymous traffic, DDoS mitigation	Shared IP (NAT, VPN, corporate proxy) → nhieu user bi anh huong
By User ID	Authenticated APIs	Can authn truoc rate limiting, khong chan duoc pre-auth attacks
By API Key	Public APIs, 3rd-party integrations	API key co the bi share hoac leak
Compound	IP + User + Endpoint	Chinh xac nhat, nhung phuc tap hon

Thuc te: Dung compound key la tot nhat. Vi du: rate limit = {user_id}:{endpoint}:{minute}. Nhu vay user A goi /api/search 100 lan/phut khong anh huong quota cua user A goi /api/profile.

2.11 Tiered Rate Limiting (Free vs Premium)

Tier	Rate Limit	Burst	Features
Free	60 req/min	10 req burst	Basic endpoints only
Basic ($29/mo)	600 req/min	50 req burst	All endpoints
Pro ($99/mo)	3,000 req/min	200 req burst	All endpoints + priority queue
Enterprise (custom)	30,000 req/min	1,000 req burst	Dedicated pool, custom limits

Implementation: Moi API key co tier field. Rate limiter lookup tier → apply tuong ung. Dung Redis hash:

HSET ratelimit:config:free max_requests 60 window_seconds 60 burst 10
HSET ratelimit:config:pro  max_requests 3000 window_seconds 60 burst 200

2.12 API Gateway Rate Limiting

Kong Gateway

# kong.yml - Rate Limiting Plugin
plugins:
  - name: rate-limiting
    config:
      second: 10        # 10 req/s
      minute: 100       # 100 req/min
      hour: 5000        # 5000 req/h
      policy: redis      # Dung Redis cho distributed
      redis_host: redis-cluster.internal
      redis_port: 6379
      redis_timeout: 2000
      fault_tolerant: true  # Fail-open neu Redis down
      hide_client_headers: false  # Tra ve X-RateLimit-* headers
      limit_by: consumer  # Rate limit per consumer (user)

AWS API Gateway

{
  "usagePlan": {
    "name": "ProPlan",
    "description": "Rate limit for Pro tier",
    "throttle": {
      "rateLimit": 50,
      "burstLimit": 200
    },
    "quota": {
      "limit": 100000,
      "period": "MONTH"
    }
  }
}

Nhan xet: API Gateway rate limiting rat tien nhung khong flexible bang custom implementation. Khi can rate limit theo business logic (vi du: limit so don hang/ngay), van can application-level rate limiter.

3. Estimation — Tinh toan cho Rate Limiter

Tham chieu: Tuan-02-Back-of-the-envelope, sdi.anhvy.dev — Rate Limiter

3.1 Tinh rate limit thresholds tu QPS estimation

Assumptions: API co 10M DAU, moi user trung binh 100 requests/ngay.

QP S_{a vg} = \frac{10 M \times 100}{86 , 400} \approx 11, 574 re q / s

QP S_{p e ak} = 11, 574 \times 3 \approx 34, 722 re q / s

Tinh per-user rate limit:

a vg_re q_p er_u ser_p er_min = \frac{100}{1 , 440} \approx 0.07 re q / min

r a t e_l imi t_p er_u ser = a vg_re q_p er_u ser_p er_min \times s a f e t y_f a c t or \times b u rs t_a ll o w an ce

= 0.07 \times 10 \times 20 = 14 re q / min \approx 15 re q / min

Giai thich: safety_factor = 10 vi user co the tap trung su dung trong vai phut (khong phan bo deu ca ngay). burst_allowance = 20 cho phep burst khi load trang.

Kiem tra: Neu tat ca 10M users deu hit rate limit cung luc:

w ors t_c a se = 10 M \times \frac{15}{60} = 2, 500, 000 re q / s

He thong can handle 2.5M QPS trong worst case → can horizontal scaling + CDN protection.

3.2 Redis memory cho Sliding Window Counter

Assumptions: 10M users, moi user can 2 counters (current + previous window) + metadata.

m e m ory_p er_u ser = k ey_s i ze + 2 \times co u n t er_s i ze + TT L_o v er h e a d

= 64 b y t es + 2 \times 8 b y t es + 24 b y t es = 104 b y t es

t o t a l_m e m ory = 10 M \times 104 b y t es = 1.04 GB

Nhan xet: Chi ~1 GB cho 10M users! Redis single node (16 GB) du suc. Neu dung Sliding Window Log (luu moi timestamp):

m e m ory_p er_u se r_{l o g} = 64 b y t es (k ey) + 15 re q / min \times 16 b y t es (t im es t am p + score) = 304 b y t es

t o t a l_m e m or y_{l o g} = 10 M \times 304 b y t es = 3.04 GB

Van chap nhan duoc, nhung gap 3x so voi Sliding Window Counter.

3.3 Overhead cua Rate Limiter len latency

Redis round-trip time (same AZ): ~0.5ms

o v er h e a d_{r a t e_l imi t er} = 1 \times R e d i s RTT = 0.5 m s

Neu API avg latency = 50ms:

o v er h e a d_p erce n t a g e = \frac{0.5}{50} = 1%

Chap nhan duoc. Tuy nhien, neu Redis o khac AZ (cross-AZ latency ~1-2ms):

o v er h e a d_p erce n t a g e_{cross - A Z} = \frac{2}{50} = 4%

Giai phap: Co-locate Redis voi API servers cung AZ. Hoac dung local cache voi periodic sync (tang throughput nhung giam accuracy).

Voi Lua script (atomic operation):

o v er h e a d_{l u a} \approx 0.5 - 1.0 m s

Lua script chay server-side tren Redis nen khong them network round trip, nhung execution time co the lau hon single command.

4. Security — Bao ve he thong bang Rate Limiter

4.1 DDoS Mitigation

DDoS attack vectors va rate limiting response:

Attack Type	Dac diem	Rate Limiting Strategy
Volumetric (UDP flood, DNS amplification)	Millions of req/s tu botnet	Layer 3/4: CDN + ISP-level filtering (Cloudflare, AWS Shield)
Protocol (SYN flood, Ping of Death)	Exploit network protocol	Layer 4: Connection rate limiting, SYN cookies
Application (HTTP flood, Slowloris)	Legitimate-looking requests	Layer 7: Per-IP + per-endpoint rate limiting

Multi-layer DDoS defense:

CDN/Edge (Cloudflare): Challenge suspicious IPs, block known bad actors
Load Balancer: Connection rate limiting, geographic blocking
API Gateway: Per-IP rate limiting (100 req/min/IP)
Application: Per-user rate limiting + anomaly detection

Quan trong: Rate limiter khong the chong DDoS mot minh. Can ket hop voi CDN, WAF (Web Application Firewall), va ISP-level mitigation. Rate limiter la tuyen cuoi, khong phai tuyen dau.

4.2 Brute Force Prevention

Vi du: Login endpoint bao ve khoi password brute force.

Strategy da tang:

Lan thu that bai	Response
1-3	Cho phep binh thuong, tra ve “Invalid credentials”
4-5	Them CAPTCHA
6-10	Rate limit: 1 attempt/30s + CAPTCHA
11-20	Rate limit: 1 attempt/5min
>20	Lock account 30 phut, notify user qua email

Rate limit key: login_attempt:{ip}:{username} — ket hop IP va username de tranh:

Attacker dung 1 IP brute force nhieu account
Attacker dung nhieu IP brute force 1 account

4.3 Credential Stuffing Protection

Credential stuffing = attacker dung danh sach username/password bi leak tu website khac de thu dang nhap hang loat.

Dau hieu nhan biet:

Login attempts tu nhieu IP khac nhau nhung pattern giong nhau
Failure rate bat thuong (>95% failed logins)
User agents/fingerprints giong het nhau

Rate limiting cho credential stuffing:

g l o ba l_l o g in_r a t e_l imi t = \frac{n or ma l _ l o g in _ QPS \times 1.5}{1} \approx t h res h o l d

Neu normal login = 500 req/s, set global limit = 750 req/s. Khi vuot → trigger enhanced security:

Bat buoc CAPTCHA cho tat ca login
Tang delay giua cac login attempts
Alert security team

4.4 API Abuse Detection

Cac pattern abuse thuong gap:

Pattern	Mo ta	Detection
Scraping	Request lien tuc den listing/search endpoints	High req rate tren specific endpoints
Enumeration	Thu sequential IDs (`/users/1`, `/users/2`, …)	Sequential access pattern
Data exfiltration	Download so luong lon data	Bandwidth per user bat thuong
Price manipulation	Gui hang ngan order requests	Order rate >> normal

Giai phap: Rate limiting ket hop anomaly detection. Khong chi dem so request ma con phan tich pattern:

Ratio giua cac endpoint (user binh thuong: 80% read, 20% write. Attacker: 95% read specific endpoint)
Request timing (human co jitter, bot co pattern deu dan)
Response consumption (user binh thuong doc response, bot ignore)

4.5 Rate Limiting Bypass Techniques va Prevention

Bypass Technique	Mo ta	Prevention
IP Rotation	Dung proxy pool / VPN de doi IP lien tuc	Rate limit by fingerprint (TLS, headers) + behavior analysis
Distributed attack	Dung botnet tu nhieu IP	Global rate limiting + CAPTCHA + anomaly detection
Slowloris	Gui request cuc cham de giu connection	Connection timeout + concurrent connection limit
Header spoofing	Gia mao `X-Forwarded-For` de doi IP	Trust chi proxy noi bo, dung `X-Real-IP` tu trusted proxy
API key sharing	Nhieu attacker dung chung 1 API key	Monitor usage pattern per key, revoke neu suspicious
Account rotation	Tao nhieu account de chia rate limit	Rate limit per IP va per account, limit account creation

Aha Moment: Khong co rate limiter nao la hoan hao. Attacker luon tim cach bypass. Muc tieu la tang chi phi cua attack cho den khi khong con kinh te nua — khong phai chan 100%.

5. DevOps — Trien khai va van hanh Rate Limiter

5.1 Redis-based Rate Limiter Deployment

# docker-compose.yml - Rate Limiter Infrastructure
version: '3.8'
 
services:
  redis-ratelimit:
    image: redis:7-alpine
    command: >
      redis-server
      --maxmemory 2gb
      --maxmemory-policy allkeys-lru
      --appendonly no
      --save ""
      --tcp-backlog 511
      --timeout 0
      --tcp-keepalive 300
    ports:
      - "6380:6379"
    deploy:
      resources:
        limits:
          memory: 2.5g
          cpus: '2'
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    networks:
      - ratelimit-net
 
  redis-sentinel-1:
    image: redis:7-alpine
    command: redis-sentinel /etc/redis/sentinel.conf
    volumes:
      - ./sentinel.conf:/etc/redis/sentinel.conf
    depends_on:
      - redis-ratelimit
    networks:
      - ratelimit-net
 
  # API Server voi rate limiting
  api-server:
    build: .
    environment:
      - REDIS_RATELIMIT_HOST=redis-ratelimit
      - REDIS_RATELIMIT_PORT=6379
      - RATE_LIMIT_DEFAULT=100/min
      - RATE_LIMIT_AUTH=10/min
    depends_on:
      redis-ratelimit:
        condition: service_healthy
    networks:
      - ratelimit-net
 
networks:
  ratelimit-net:
    driver: bridge

# sentinel.conf - Redis Sentinel cho HA
sentinel monitor ratelimit-master redis-ratelimit 6379 2
sentinel down-after-milliseconds ratelimit-master 5000
sentinel failover-timeout ratelimit-master 10000
sentinel parallel-syncs ratelimit-master 1

5.2 Monitoring Rate Limit Hits — Prometheus + Grafana

# prometheus-alerts.yml
groups:
  - name: rate_limiting
    rules:
      # Alert khi rate limit hit rate tang dot bien
      - alert: RateLimitHitSpike
        expr: >
          rate(rate_limit_hits_total[5m]) >
          rate(rate_limit_hits_total[1h] offset 1d) * 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Rate limit hits tang gap 3x so voi cung gio hom qua"
          description: "Current: {{ $value }}/s. Co the la DDoS hoac API abuse."
 
      # Alert khi qua nhieu user bi rate limited
      - alert: TooManyUsersRateLimited
        expr: >
          rate(rate_limit_hits_total{result="rejected"}[5m]) /
          rate(http_requests_total[5m]) > 0.1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: ">10% requests bi rate limited — co the rate limit qua chat"
 
      # Alert khi Redis rate limiter latency cao
      - alert: RateLimiterLatencyHigh
        expr: >
          histogram_quantile(0.99,
            rate(rate_limiter_duration_seconds_bucket[5m])
          ) > 0.005
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Rate limiter P99 latency > 5ms"
 
      # Alert khi Redis rate limiter down
      - alert: RateLimiterRedisDown
        expr: redis_up{instance=~".*ratelimit.*"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Rate limiter Redis instance down!"

Grafana Dashboard panels:

Panel	PromQL	Muc dich
Rate Limit Hits/s	`rate(rate_limit_hits_total[1m])`	Xu huong bi rate limit
Rejection Rate (%)	`rate(rate_limit_hits_total{result="rejected"}[5m]) / rate(http_requests_total[5m]) * 100`	% request bi reject
Top Rate Limited Users	`topk(10, sum by (user_id)(rate(rate_limit_hits_total{result="rejected"}[1h])))`	Tim abuse users
Top Rate Limited Endpoints	`topk(10, sum by (endpoint)(rate(rate_limit_hits_total{result="rejected"}[1h])))`	Tim endpoint bi tan cong
Redis Latency P99	`histogram_quantile(0.99, rate(rate_limiter_duration_seconds_bucket[5m]))`	Performance rate limiter
Redis Memory Usage	`redis_memory_used_bytes{instance=~".ratelimit."}`	Capacity planning

5.3 Alerting on Spike Patterns

Spike detection strategies:

Absolute threshold: rate > 10,000/s → alert
Relative to baseline: rate > 3x average of same hour yesterday
Rate of change: deriv(rate_limit_hits_total[5m]) > 100 (tang nhanh bat thuong)
Anomaly detection: Dung Grafana ML hoac external system (Datadog Anomaly Monitors)

# Ket hop cac strategies
- alert: SuspiciousTrafficPattern
  expr: >
    (
      rate(http_requests_total{path="/api/login"}[5m]) > 50
      and
      rate(http_requests_total{path="/api/login", status="401"}[5m]) /
      rate(http_requests_total{path="/api/login"}[5m]) > 0.9
    )
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Possible brute force attack: >90% login failures at >50 req/s"

5.4 Nginx Rate Limiting Module

# /etc/nginx/conf.d/rate-limiting.conf
 
# --- Zone definitions ---
# Shared memory zone cho rate limiting
# 10m = 10MB shared memory ≈ 160,000 IP addresses
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;
 
# Rate limit by API key (extracted from header)
map $http_x_api_key $api_key_zone {
    default "anonymous";
    ~^.+$   $http_x_api_key;
}
limit_req_zone $api_key_zone zone=api_key:10m rate=100r/min;
 
# --- Connection limiting ---
limit_conn_zone $binary_remote_addr zone=conn_per_ip:10m;
 
server {
    listen 80;
    server_name api.example.com;
 
    # Global connection limit
    limit_conn conn_per_ip 20;
 
    # Custom error page cho 429
    error_page 429 = @rate_limited;
    location @rate_limited {
        default_type application/json;
        return 429 '{"error":"rate_limited","retry_after":30}';
    }
 
    # --- General API ---
    location /api/ {
        limit_req zone=general burst=20 nodelay;
        limit_req zone=api_key burst=50 nodelay;
        limit_req_status 429;
        limit_req_log_level warn;
 
        # Truyen rate limit headers ve client
        add_header X-RateLimit-Limit 10;
        add_header X-RateLimit-Burst 20;
 
        proxy_pass http://backend;
    }
 
    # --- Login endpoint (strict) ---
    location /api/auth/login {
        limit_req zone=login burst=3 nodelay;
        limit_req_status 429;
        limit_req_log_level error;
 
        proxy_pass http://backend;
    }
 
    # --- Search endpoint (moderate) ---
    location /api/search {
        limit_req zone=api burst=10 delay=5;
        # delay=5: 5 requests duoc xu ly ngay,
        # 5 tiep theo bi delay (queued),
        # sau do reject
        limit_req_status 429;
 
        proxy_pass http://backend;
    }
}

Giai thich burst va nodelay:

burst=20: Cho phep 20 requests vuot rate truoc khi reject

nodelay: Xu ly burst requests ngay lap tuc (khong queue)

delay=5: 5 requests dau duoc xu ly ngay, con lai bi queue

6. Code — Production-grade Rate Limiter

6.1 Python: Sliding Window Rate Limiter voi Redis

"""
Production-grade Sliding Window Counter Rate Limiter
Su dung Redis + Lua script cho atomic operations
"""
 
import time
import hashlib
import logging
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
import redis
 
logger = logging.getLogger(__name__)
 
 
class RateLimitTier(Enum):
    FREE = "free"
    BASIC = "basic"
    PRO = "pro"
    ENTERPRISE = "enterprise"
 
 
@dataclass(frozen=True)
class RateLimitConfig:
    max_requests: int       # So request toi da trong window
    window_seconds: int     # Kich thuoc window (giay)
    burst_size: int         # Burst cho phep (Token Bucket component)
 
    @classmethod
    def for_tier(cls, tier: RateLimitTier) -> "RateLimitConfig":
        configs = {
            RateLimitTier.FREE:       cls(max_requests=60,    window_seconds=60, burst_size=10),
            RateLimitTier.BASIC:      cls(max_requests=600,   window_seconds=60, burst_size=50),
            RateLimitTier.PRO:        cls(max_requests=3000,  window_seconds=60, burst_size=200),
            RateLimitTier.ENTERPRISE: cls(max_requests=30000, window_seconds=60, burst_size=1000),
        }
        return configs[tier]
 
 
@dataclass
class RateLimitResult:
    allowed: bool
    limit: int
    remaining: int
    reset_at: float          # Unix timestamp khi window reset
    retry_after: Optional[int]  # Seconds to wait (None neu allowed)
 
    @property
    def headers(self) -> dict[str, str]:
        """Tra ve HTTP headers theo RFC draft."""
        headers = {
            "X-RateLimit-Limit": str(self.limit),
            "X-RateLimit-Remaining": str(max(0, self.remaining)),
            "X-RateLimit-Reset": str(int(self.reset_at)),
        }
        if not self.allowed and self.retry_after:
            headers["Retry-After"] = str(self.retry_after)
        return headers
 
 
# Lua script: atomic sliding window counter
# Chay toan bo tren Redis server — khong co race condition
SLIDING_WINDOW_LUA = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
 
local current_window = math.floor(now / window)
local previous_window = current_window - 1
local elapsed = now - (current_window * window)
 
local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window
 
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local current_count = tonumber(redis.call("GET", current_key) or "0")
 
-- Sliding window counter formula
local weighted_count = previous_count * (1 - elapsed / window) + current_count
 
if weighted_count >= limit then
    -- Rate limited
    local ttl = window - elapsed
    return {0, limit, math.floor(limit - weighted_count), math.ceil(now + ttl), math.ceil(ttl)}
end
 
-- Allowed: increment current window counter
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)  -- TTL = 2 windows de giu previous
redis.call("EXPIRE", previous_key, window * 2)
 
local new_count = weighted_count + 1
local remaining = math.floor(limit - new_count)
local reset_at = (current_window + 1) * window
 
return {1, limit, remaining, math.ceil(reset_at), 0}
"""
 
 
class SlidingWindowRateLimiter:
    """
    Production-grade sliding window counter rate limiter.
 
    Features:
    - Atomic Redis operations via Lua script (no race conditions)
    - Tiered rate limiting (Free/Basic/Pro/Enterprise)
    - Compound key support (IP + user + endpoint)
    - Graceful degradation khi Redis down (fail-open)
    - Metrics emission for Prometheus
    """
 
    def __init__(
        self,
        redis_client: redis.Redis,
        fail_open: bool = True,
        metrics_callback=None,
    ):
        self._redis = redis_client
        self._fail_open = fail_open
        self._metrics = metrics_callback
        self._lua_sha: Optional[str] = None
 
    def _ensure_script(self) -> str:
        """Load Lua script vao Redis (cached)."""
        if self._lua_sha is None:
            self._lua_sha = self._redis.script_load(SLIDING_WINDOW_LUA)
        return self._lua_sha
 
    def _build_key(
        self,
        identifier: str,
        endpoint: Optional[str] = None,
    ) -> str:
        """Tao Redis key tu identifier + endpoint."""
        parts = ["rl", identifier]
        if endpoint:
            # Hash endpoint de key khong qua dai
            ep_hash = hashlib.md5(endpoint.encode()).hexdigest()[:8]
            parts.append(ep_hash)
        return ":".join(parts)
 
    def check(
        self,
        identifier: str,
        config: RateLimitConfig,
        endpoint: Optional[str] = None,
    ) -> RateLimitResult:
        """
        Kiem tra va ghi nhan 1 request.
 
        Args:
            identifier: User ID, API key, hoac IP address
            config: Rate limit configuration
            endpoint: Optional endpoint path cho per-endpoint limiting
 
        Returns:
            RateLimitResult voi allowed status va headers
        """
        key = self._build_key(identifier, endpoint)
 
        try:
            sha = self._ensure_script()
            now = time.time()
 
            result = self._redis.evalsha(
                sha,
                1,  # so keys
                key,
                now,
                config.window_seconds,
                config.max_requests,
            )
 
            allowed, limit, remaining, reset_at, retry_after = result
 
            rate_result = RateLimitResult(
                allowed=bool(allowed),
                limit=limit,
                remaining=remaining,
                reset_at=reset_at,
                retry_after=retry_after if retry_after > 0 else None,
            )
 
            # Emit metrics
            if self._metrics:
                self._metrics(
                    identifier=identifier,
                    endpoint=endpoint or "global",
                    allowed=rate_result.allowed,
                )
 
            return rate_result
 
        except redis.ConnectionError:
            logger.error("Redis connection failed for rate limiting")
            if self._fail_open:
                # Fail-open: cho phep request khi Redis down
                return RateLimitResult(
                    allowed=True,
                    limit=config.max_requests,
                    remaining=config.max_requests,
                    reset_at=time.time() + config.window_seconds,
                    retry_after=None,
                )
            else:
                # Fail-closed: reject tat ca khi Redis down
                return RateLimitResult(
                    allowed=False,
                    limit=config.max_requests,
                    remaining=0,
                    reset_at=time.time() + 60,
                    retry_after=60,
                )
 
        except redis.RedisError as e:
            logger.error(f"Redis error in rate limiter: {e}")
            if self._fail_open:
                return RateLimitResult(
                    allowed=True,
                    limit=config.max_requests,
                    remaining=config.max_requests,
                    reset_at=time.time() + config.window_seconds,
                    retry_after=None,
                )
            raise
 
 
# === Su dung ===
if __name__ == "__main__":
    r = redis.Redis(host="localhost", port=6379, decode_responses=False)
    limiter = SlidingWindowRateLimiter(redis_client=r, fail_open=True)
 
    config = RateLimitConfig.for_tier(RateLimitTier.FREE)
 
    for i in range(65):
        result = limiter.check(
            identifier="user:12345",
            config=config,
            endpoint="/api/search",
        )
        if not result.allowed:
            print(f"Request {i+1}: REJECTED | retry_after={result.retry_after}s")
            print(f"  Headers: {result.headers}")
            break
        else:
            print(f"Request {i+1}: OK | remaining={result.remaining}")

6.2 Node.js: Express Rate Limiting Middleware

// middleware/rate-limiter.js
// Production-grade Express middleware voi Redis backend
 
const Redis = require("ioredis");
const crypto = require("crypto");
 
// === Tier Configs ===
const TIER_CONFIGS = {
  free:       { maxRequests: 60,    windowSeconds: 60, burstSize: 10 },
  basic:      { maxRequests: 600,   windowSeconds: 60, burstSize: 50 },
  pro:        { maxRequests: 3000,  windowSeconds: 60, burstSize: 200 },
  enterprise: { maxRequests: 30000, windowSeconds: 60, burstSize: 1000 },
};
 
// === Lua Script (same logic as Python version) ===
const SLIDING_WINDOW_LUA = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
 
local current_window = math.floor(now / window)
local previous_window = current_window - 1
local elapsed = now - (current_window * window)
 
local current_key = key .. ":" .. current_window
local previous_key = key .. ":" .. previous_window
 
local previous_count = tonumber(redis.call("GET", previous_key) or "0")
local current_count = tonumber(redis.call("GET", current_key) or "0")
 
local weighted_count = previous_count * (1 - elapsed / window) + current_count
 
if weighted_count >= limit then
    local ttl = window - elapsed
    return {0, limit, math.floor(limit - weighted_count), math.ceil(now + ttl), math.ceil(ttl)}
end
 
redis.call("INCR", current_key)
redis.call("EXPIRE", current_key, window * 2)
redis.call("EXPIRE", previous_key, window * 2)
 
local new_count = weighted_count + 1
local remaining = math.floor(limit - new_count)
local reset_at = (current_window + 1) * window
 
return {1, limit, remaining, math.ceil(reset_at), 0}
`;
 
class RateLimiter {
  constructor({ redisClient, failOpen = true, prefix = "rl" }) {
    this.redis = redisClient;
    this.failOpen = failOpen;
    this.prefix = prefix;
    this.scriptSha = null;
  }
 
  async ensureScript() {
    if (!this.scriptSha) {
      this.scriptSha = await this.redis.script("LOAD", SLIDING_WINDOW_LUA);
    }
    return this.scriptSha;
  }
 
  buildKey(identifier, endpoint) {
    const parts = [this.prefix, identifier];
    if (endpoint) {
      const hash = crypto
        .createHash("md5")
        .update(endpoint)
        .digest("hex")
        .slice(0, 8);
      parts.push(hash);
    }
    return parts.join(":");
  }
 
  async check(identifier, config, endpoint = null) {
    const key = this.buildKey(identifier, endpoint);
 
    try {
      const sha = await this.ensureScript();
      const now = Date.now() / 1000;
 
      const [allowed, limit, remaining, resetAt, retryAfter] =
        await this.redis.evalsha(
          sha,
          1,
          key,
          now,
          config.windowSeconds,
          config.maxRequests
        );
 
      return {
        allowed: Boolean(allowed),
        limit,
        remaining: Math.max(0, remaining),
        resetAt,
        retryAfter: retryAfter > 0 ? retryAfter : null,
      };
    } catch (err) {
      console.error("[RateLimiter] Redis error:", err.message);
      if (this.failOpen) {
        return {
          allowed: true,
          limit: config.maxRequests,
          remaining: config.maxRequests,
          resetAt: Date.now() / 1000 + config.windowSeconds,
          retryAfter: null,
        };
      }
      throw err;
    }
  }
}
 
// === Express Middleware Factory ===
function rateLimitMiddleware(options = {}) {
  const {
    redisUrl = "redis://localhost:6379",
    failOpen = true,
    keyExtractor = defaultKeyExtractor,
    tierExtractor = defaultTierExtractor,
    onRejected = defaultOnRejected,
  } = options;
 
  const redisClient = new Redis(redisUrl);
  const limiter = new RateLimiter({ redisClient, failOpen });
 
  return async function rateLimit(req, res, next) {
    try {
      const identifier = keyExtractor(req);
      const tier = tierExtractor(req);
      const config = TIER_CONFIGS[tier] || TIER_CONFIGS.free;
      const endpoint = `${req.method}:${req.route?.path || req.path}`;
 
      const result = await limiter.check(identifier, config, endpoint);
 
      // Set rate limit headers
      res.set("X-RateLimit-Limit", String(result.limit));
      res.set("X-RateLimit-Remaining", String(result.remaining));
      res.set("X-RateLimit-Reset", String(result.resetAt));
 
      if (!result.allowed) {
        res.set("Retry-After", String(result.retryAfter));
        return onRejected(req, res, result);
      }
 
      next();
    } catch (err) {
      console.error("[RateLimiter] Middleware error:", err.message);
      // Neu co loi va failOpen = true, Lua script da handle
      next();
    }
  };
}
 
// === Default Helpers ===
function defaultKeyExtractor(req) {
  // Uu tien: user ID > API key > IP
  if (req.user?.id) return `user:${req.user.id}`;
  if (req.headers["x-api-key"]) return `key:${req.headers["x-api-key"]}`;
  const ip = req.ip || req.headers["x-forwarded-for"] || "unknown";
  return `ip:${ip}`;
}
 
function defaultTierExtractor(req) {
  return req.user?.tier || "free";
}
 
function defaultOnRejected(req, res, result) {
  return res.status(429).json({
    error: {
      code: "RATE_LIMITED",
      message: "Rate limit exceeded. Please retry later.",
      retry_after: result.retryAfter,
    },
  });
}
 
// === Usage voi Express ===
// const express = require('express');
// const app = express();
//
// app.use(rateLimitMiddleware({
//   redisUrl: process.env.REDIS_URL || 'redis://localhost:6379',
//   failOpen: true,
// }));
//
// app.get('/api/data', (req, res) => {
//   res.json({ data: 'hello' });
// });
 
module.exports = { RateLimiter, rateLimitMiddleware, TIER_CONFIGS };

6.3 Nginx Rate Limiting Config (Production)

# /etc/nginx/conf.d/rate-limiting.conf
# Production-ready Nginx rate limiting configuration
 
# === Shared memory zones ===
# $binary_remote_addr = 4 bytes (IPv4) hoac 16 bytes (IPv6)
# 10m zone ≈ 160,000 IPv4 addresses
 
# General API: 10 requests/second per IP
limit_req_zone $binary_remote_addr zone=api_per_ip:10m rate=10r/s;
 
# Auth endpoints: 1 request/second per IP (strict)
limit_req_zone $binary_remote_addr zone=auth_per_ip:10m rate=1r/s;
 
# Per API key: 100 requests/minute
map $http_x_api_key $limit_key {
    default         $binary_remote_addr;
    "~^.+$"         $http_x_api_key;
}
limit_req_zone $limit_key zone=per_api_key:10m rate=100r/m;
 
# Connection limiting per IP
limit_conn_zone $binary_remote_addr zone=conn_per_ip:5m;
 
# === Logging ===
log_format rate_limit '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      'limit_req_status=$limit_req_status';
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
 
    # Connection limits
    limit_conn conn_per_ip 30;
    limit_conn_log_level warn;
 
    # Custom 429 response
    limit_req_status 429;
 
    access_log /var/log/nginx/rate_limit.log rate_limit;
 
    # --- Standard API endpoints ---
    location /api/ {
        limit_req zone=api_per_ip burst=20 nodelay;
        limit_req zone=per_api_key burst=50 nodelay;
 
        proxy_pass http://api_upstream;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Rate-Limited $limit_req_status;
    }
 
    # --- Auth endpoints (strict rate limiting) ---
    location /api/auth/ {
        limit_req zone=auth_per_ip burst=3 nodelay;
 
        proxy_pass http://api_upstream;
        proxy_set_header X-Real-IP $remote_addr;
    }
 
    # --- Health check (no rate limiting) ---
    location /health {
        limit_req off;
        return 200 '{"status":"ok"}';
        add_header Content-Type application/json;
    }
 
    # --- Error handling ---
    error_page 429 @rate_limited;
    location @rate_limited {
        default_type application/json;
        add_header Retry-After 30 always;
        add_header X-RateLimit-Limit 10 always;
        return 429 '{"error":{"code":"RATE_LIMITED","message":"Too many requests","retry_after":30}}';
    }
}

6.4 Lua Script cho Atomic Redis Rate Limiting

-- rate_limiter.lua
-- Lua script chay tren Redis cho token bucket algorithm
-- Su dung: EVALSHA <sha> 1 <key> <now> <rate> <capacity> <requested>
-- Tra ve: {allowed (0/1), tokens_remaining, retry_after_ms}
 
local key = KEYS[1]
local now = tonumber(ARGV[1])           -- Current timestamp (ms)
local rate = tonumber(ARGV[2])          -- Tokens per second
local capacity = tonumber(ARGV[3])      -- Max tokens (bucket size)
local requested = tonumber(ARGV[4])     -- Tokens requested (usually 1)
 
-- Lay state hien tai tu Redis
local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])
 
-- Lan dau: khoi tao bucket day
if tokens == nil then
    tokens = capacity
    last_refill = now
end
 
-- Tinh so token duoc refill
local elapsed_ms = math.max(0, now - last_refill)
local new_tokens = elapsed_ms * rate / 1000  -- rate la tokens/second
tokens = math.min(capacity, tokens + new_tokens)
 
local allowed = 0
local retry_after_ms = 0
 
if tokens >= requested then
    -- Du token: cho phep request
    allowed = 1
    tokens = tokens - requested
else
    -- Khong du token: tinh thoi gian cho
    local deficit = requested - tokens
    retry_after_ms = math.ceil(deficit / rate * 1000)
end
 
-- Luu state moi
redis.call("HMSET", key, "tokens", tostring(tokens), "last_refill", tostring(now))
-- TTL = 2x thoi gian de bucket day lai (phong truong hop key bi giu vinh vien)
local ttl_seconds = math.ceil(capacity / rate * 2)
redis.call("EXPIRE", key, ttl_seconds)
 
return {allowed, math.floor(tokens), retry_after_ms}

# Su dung Lua script tu Python
import time
import redis
 
r = redis.Redis(host="localhost", port=6379)
 
# Load script 1 lan
with open("rate_limiter.lua", "r") as f:
    TOKEN_BUCKET_SHA = r.script_load(f.read())
 
def token_bucket_check(
    user_id: str,
    rate: float = 10.0,        # 10 tokens/s
    capacity: int = 50,        # Max 50 tokens
    requested: int = 1,
) -> dict:
    """Token bucket rate limit check via Lua script."""
    key = f"tb:{user_id}"
    now_ms = int(time.time() * 1000)
 
    allowed, remaining, retry_after_ms = r.evalsha(
        TOKEN_BUCKET_SHA,
        1,          # number of keys
        key,        # KEYS[1]
        now_ms,     # ARGV[1]
        rate,       # ARGV[2]
        capacity,   # ARGV[3]
        requested,  # ARGV[4]
    )
 
    return {
        "allowed": bool(allowed),
        "remaining": remaining,
        "retry_after_ms": retry_after_ms,
    }
 
# Test
for i in range(55):
    result = token_bucket_check("user:42", rate=10, capacity=50)
    status = "OK" if result["allowed"] else f"REJECTED (retry in {result['retry_after_ms']}ms)"
    print(f"Request {i+1}: {status} | remaining={result['remaining']}")

7. System Design Diagrams

7.1 Rate Limiter Architecture trong API Gateway

flowchart TD
    Client([Client]) -->|Request| CDN[CDN / Edge<br/>Cloudflare / AWS CloudFront]

    CDN -->|Layer 3-4 filtering<br/>DDoS mitigation| LB[Load Balancer<br/>Nginx / ALB]

    LB --> GW[API Gateway<br/>Kong / AWS API GW]

    subgraph "Rate Limiting Pipeline"
        GW --> EXTRACT[Extract Identity<br/>IP / API Key / User ID / JWT]
        EXTRACT --> LOOKUP[Lookup Tier Config<br/>Free / Basic / Pro / Enterprise]
        LOOKUP --> CHECK{Rate Limit Check<br/>via Redis}
        CHECK -->|Allowed| AUTH[Authentication<br/>& Authorization]
        CHECK -->|Rejected| REJECT[HTTP 429<br/>+ Retry-After header]
    end

    subgraph "Redis Rate Limit Store"
        REDIS[(Redis Cluster)]
        CHECK <-->|Lua Script<br/>Atomic check + increment| REDIS
        REDIS --- R1[Sliding Window Counters]
        REDIS --- R2[Token Bucket State]
        REDIS --- R3[Blocked IPs Set]
    end

    AUTH --> APP[Application Server]
    APP --> DB[(Database)]

    REJECT -->|429 + headers| Client

    subgraph "Monitoring"
        REDIS -->|Metrics| PROM[Prometheus]
        GW -->|rate_limit_hits_total| PROM
        PROM --> GRAF[Grafana Dashboard]
        GRAF -->|Alert| OPS[Ops Team / PagerDuty]
    end

    style CHECK fill:#ff9800,stroke:#333,stroke-width:2px
    style REJECT fill:#f44336,stroke:#333,stroke-width:2px,color:#fff
    style AUTH fill:#4caf50,stroke:#333,stroke-width:2px
    style REDIS fill:#d32f2f,stroke:#333,stroke-width:2px,color:#fff

7.2 Token Bucket Visualization

sequenceDiagram
    participant C as Client
    participant RL as Rate Limiter
    participant B as Token Bucket<br/>(capacity=5, rate=2/s)
    participant S as Server

    Note over B: Bucket: [*][*][*][*][*]<br/>tokens = 5 (full)

    C->>RL: Request 1
    RL->>B: consume(1)
    B-->>RL: OK (tokens=4)
    RL->>S: Forward request
    S-->>C: 200 OK

    C->>RL: Request 2
    RL->>B: consume(1)
    B-->>RL: OK (tokens=3)
    RL->>S: Forward request
    S-->>C: 200 OK

    C->>RL: Request 3, 4, 5 (burst)
    RL->>B: consume(3)
    B-->>RL: OK (tokens=0)
    RL->>S: Forward 3 requests
    S-->>C: 200 OK x3

    Note over B: Bucket: [ ][ ][ ][ ][ ]<br/>tokens = 0 (empty!)

    C->>RL: Request 6
    RL->>B: consume(1)
    B-->>RL: REJECTED (tokens=0)
    RL-->>C: 429 Too Many Requests<br/>Retry-After: 1

    Note over B: +1 second passes...<br/>Refill: 2 tokens/s

    Note over B: Bucket: [*][*][ ][ ][ ]<br/>tokens = 2

    C->>RL: Request 7 (after 1s)
    RL->>B: consume(1)
    B-->>RL: OK (tokens=1)
    RL->>S: Forward request
    S-->>C: 200 OK

7.3 Distributed Rate Limiting — Race Condition va Solution

sequenceDiagram
    participant S1 as API Server 1
    participant S2 as API Server 2
    participant R as Redis

    Note over S1,R: === WITHOUT Lua Script (Race Condition) ===

    S1->>R: GET counter (= 99)
    S2->>R: GET counter (= 99)
    Note over S1: 99 < 100, allow!
    Note over S2: 99 < 100, allow!
    S1->>R: INCR counter (= 100)
    S2->>R: INCR counter (= 101)
    Note over R: Counter = 101<br/>LIMIT VIOLATED!

    Note over S1,R: === WITH Lua Script (Atomic) ===

    S1->>R: EVALSHA lua_script
    Note over R: Lua: GET=99, 99<100<br/>INCR → 100, return ALLOW
    R-->>S1: ALLOWED (remaining=0)

    S2->>R: EVALSHA lua_script
    Note over R: Lua: GET=100, 100>=100<br/>return REJECT
    R-->>S2: REJECTED (retry_after=45s)

    Note over R: Counter = 100<br/>LIMIT RESPECTED!

8. Aha Moments & Pitfalls

Aha Moment #1: Race Conditions trong Distributed Rate Limiting

Khi co 10 API servers cung query Redis, read-then-write pattern tao race condition. 2 servers doc counter = 99, ca hai cho phep, counter thanh 101. Giai phap duy nhat dung: Lua script hoac Redis MULTI/EXEC. Khong bao gio dung GET roi INCR rieng le.

Aha Moment #2: Clock Skew co the pha rate limiter

Trong distributed system, dong ho cac server co the lech nhau vai milliseconds den vai giay. Neu rate limiter dung timestamp tu application server thay vi Redis server, 2 servers o 2 window khac nhau tai cung thoi diem. Giai phap: Luon dung redis.call("TIME") trong Lua script, hoac dung Redis server timestamp. Khong bao gio trust client timestamp.

Aha Moment #3: Rate Limiting Internal Services — Can than!

Hieu, day la sai lam cuc ky pho bien: dat rate limit len tat ca services, ke ca internal service-to-service communication. Ket qua: khi traffic spike, Service A bi rate limit boi Service B, tao cascading failure con toi te hon khong co rate limit. Quy tac: Rate limit tai edge (API Gateway) cho external traffic. Internal services dung circuit breaker (xem Tuan-11-Microservices-Pattern) thay vi rate limiter.

Aha Moment #4: Over-aggressive Limits lam hai UX

Mot startup dat rate limit 10 req/min cho free tier. Ket qua: user load trang chu (5 API calls) + click 1 link (3 calls) + scroll (3 calls) = 11 calls → bi rate limit chi sau 2 giay su dung. User bo di va khong bao gio quay lai. Quy tac: Luon test rate limit bang cach mo app nhu user binh thuong va dem so request. Rate limit phai >= 3x normal usage pattern.

Pitfall #1: Fixed Window Boundary Burst

Su dung Fixed Window Counter voi limit 100/min. Attacker gui 100 req o giay 59 va 100 req o giay 60 → 200 requests trong 2 giay. Day la ly do Fixed Window khong bao gio duoc dung cho security-critical endpoints. Dung Sliding Window Counter hoac Token Bucket.

Pitfall #2: Fail-Open vs Fail-Closed — Chon sai co the chet

Fail-open (cho phep tat ca khi Redis down): An toan cho UX nhung DDoS se lot qua. Fail-closed (reject tat ca khi Redis down): An toan cho security nhung Redis down = toan bo he thong down. Giai phap: Fail-open + Redis HA (Sentinel/Cluster) + fallback local rate limiter (in-memory, less accurate but better than nothing).

Pitfall #3: Quen rate limit webhooks va background jobs

Hieu chi rate limit HTTP API ma quen rang webhooks (Stripe, GitHub) va background workers cung tao load. Mot webhook retry storm co the giong DDoS. Giai phap: Rate limit moi entry point vao system, khong chi user-facing API.

Pitfall #4: Khong co rate limit cho dang ky tai khoan

Attacker tao 100,000 accounts → moi account co rate limit rieng → bypass hoan toan. Giai phap: Rate limit account creation by IP (5 accounts/IP/ngay) + CAPTCHA + email verification.

9. Internal Links & Tham khao

Prerequisite (doc truoc)

Tuan-02-Back-of-the-envelope — Can biet estimation de tinh rate limit thresholds
Tuan-05-Load-Balancer — Rate limiter thuong nam sau Load Balancer
Tuan-06-Cache-Strategy — Redis la backbone cua distributed rate limiting

Lien quan truc tiep

Tuan-14-AuthN-AuthZ-Security — Rate limiting la mot phan cua security strategy
Tuan-15-Data-Security-Encryption — Bao ve API keys dung cho rate limiting
Tuan-13-Monitoring-Observability — Monitoring rate limit metrics
Tuan-11-Microservices-Pattern — Circuit breaker vs rate limiter cho internal services
Tuan-03-Networking-DNS-CDN — CDN-level rate limiting va DDoS mitigation

Ap dung trong Case Studies

Tuan-16-Design-URL-Shortener — Rate limit URL creation de chong spam
Tuan-17-Design-Chat-System — Rate limit messages de chong spam/flooding
Tuan-18-Design-News-Feed — Rate limit feed refresh de giam load
Tuan-19-Design-Notification-System — Rate limit notification sending
Tuan-20-Design-Key-Value-Store — Rate limit writes de bao ve storage

Tham khao

Alex Xu, System Design Interview — Chapter 4: Design a Rate Limiter
sdi.anhvy.dev — Rate Limiter patterns & algorithms
Cloudflare Blog: How we built rate limiting capable of scaling to millions of domains
Stripe Engineering: Rate limiters and load shedders
Kong Documentation: Rate Limiting Plugin

Tuan truoc: Tuan-08-Message-Queue — Message Queue lam buffer cho traffic spikes Tuan sau: Tuan-10-Consistent-Hashing — Phan phoi deu data across nodes

lthieu's notes

Explorer

Tuan-09-Rate-Limiter

Tuan 09: Rate Limiter

1. Context & Why

Analogy doi thuong

Tai sao can Rate Limiter?

Tai sao Alex Xu dat no o Chapter 4?

2. Deep Dive — Cac thuat toan Rate Limiting

2.1 Token Bucket Algorithm

2.2 Leaky Bucket Algorithm

2.3 Fixed Window Counter

2.4 Sliding Window Log

2.5 Sliding Window Counter

2.6 Bang so sanh cac thuat toan

2.7 Distributed Rate Limiting (Redis-based)

2.8 Rate Limiting tai cac layers khac nhau

2.9 HTTP 429 + Retry-After Header

2.10 Rate Limit by IP / User / API Key

2.11 Tiered Rate Limiting (Free vs Premium)

2.12 API Gateway Rate Limiting

Kong Gateway

AWS API Gateway

3. Estimation — Tinh toan cho Rate Limiter

3.1 Tinh rate limit thresholds tu QPS estimation

3.2 Redis memory cho Sliding Window Counter

3.3 Overhead cua Rate Limiter len latency

4. Security — Bao ve he thong bang Rate Limiter

4.1 DDoS Mitigation

4.2 Brute Force Prevention

4.3 Credential Stuffing Protection

4.4 API Abuse Detection

4.5 Rate Limiting Bypass Techniques va Prevention

5. DevOps — Trien khai va van hanh Rate Limiter

5.1 Redis-based Rate Limiter Deployment

5.2 Monitoring Rate Limit Hits — Prometheus + Grafana

5.3 Alerting on Spike Patterns

5.4 Nginx Rate Limiting Module

6. Code — Production-grade Rate Limiter

6.1 Python: Sliding Window Rate Limiter voi Redis

6.2 Node.js: Express Rate Limiting Middleware

6.3 Nginx Rate Limiting Config (Production)

6.4 Lua Script cho Atomic Redis Rate Limiting

7. System Design Diagrams

7.1 Rate Limiter Architecture trong API Gateway

7.2 Token Bucket Visualization

7.3 Distributed Rate Limiting — Race Condition va Solution

8. Aha Moments & Pitfalls

Aha Moment #1: Race Conditions trong Distributed Rate Limiting

Aha Moment #2: Clock Skew co the pha rate limiter

Aha Moment #3: Rate Limiting Internal Services — Can than!

Aha Moment #4: Over-aggressive Limits lam hai UX

Pitfall #1: Fixed Window Boundary Burst

Pitfall #2: Fail-Open vs Fail-Closed — Chon sai co the chet

Pitfall #3: Quen rate limit webhooks va background jobs

Pitfall #4: Khong co rate limit cho dang ky tai khoan

9. Internal Links & Tham khao

Prerequisite (doc truoc)

Lien quan truc tiep

Ap dung trong Case Studies

Tham khao

Graph View

Table of Contents

Backlinks