Tuần 05: Load Balancer — Bộ não phân luồng của hệ thống

“Một hệ thống không có Load Balancer giống như siêu thị chỉ mở một quầy thu ngân vào giờ cao điểm. Không phải server yếu — mà là không ai phân luồng.”

Tags: system-design load-balancer devops security alex-xu Student: Hieu (Backend Dev → System Architect) Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-03-Networking-DNS-CDN Liên quan: Tuan-01-Scale-From-Zero-To-Millions · Tuan-06-Cache-Strategy · Tuan-09-Rate-Limiter · Tuan-10-Consistent-Hashing · Tuan-13-Monitoring-Observability

1. Context & Why

Analogy đời thường — Quầy thu ngân siêu thị

Hieu, tưởng tượng em đi siêu thị vào chiều Chủ nhật. Có 20 quầy thu ngân nhưng chỉ mở 3 quầy → hàng dài xếp chờ, khách bỏ đi. Giờ hình dung có một người quản lý sàn đứng ở đầu khu thu ngân:

Nhìn quầy nào ít khách → hướng khách mới tới đó (Least Connections)
Quầy VIP cho khách thân thiết → hướng khách VIP tới quầy đó (Sticky Sessions)
Quầy nào nhân viên nghỉ đột xuất → không hướng khách tới nữa (Health Check)
Quầy express (< 5 món) vs quầy thường → phân loại request (L7 routing)
Giờ cao điểm → mở thêm quầy (Auto-scaling)

Người quản lý sàn đó chính là Load Balancer (LB) — Bộ cân bằng tải.

Tại sao Load Balancer là component đầu tiên cần hiểu?

Trong Alex Xu Chapter 1, khi scale từ single server lên multiple servers, thứ đầu tiên thêm vào là Load Balancer. Lý do:

Availability (Tính sẵn sàng): Nếu 1 server chết, LB tự động chuyển traffic sang server khác. User không biết gì.
Scalability (Khả năng mở rộng): Thêm server mới? Chỉ cần đăng ký với LB — zero downtime.
Performance: Phân tải đều → không server nào bị quá tải trong khi server khác rảnh rỗi.
Security: LB là tuyến phòng thủ đầu tiên — chặn DDoS, terminate SSL, filter request.

Core Principle: Load Balancer biến N servers thành một endpoint duy nhất cho client. Client không cần biết (và không nên biết) phía sau có bao nhiêu server.

2. Deep Dive — Kiến thức cốt lõi

2.1 L4 vs L7 Load Balancing

Đây là phân biệt quan trọng nhất khi nói về Load Balancer. “L4” và “L7” lấy từ OSI Model (mô hình 7 tầng mạng).

Layer 4 Load Balancing (Transport Layer — Tầng vận chuyển)

LB hoạt động ở tầng TCP/UDP. Nó chỉ nhìn thấy:

IP nguồn / IP đích
Port nguồn / Port đích
TCP flags (SYN, ACK, FIN…)

Không nhìn thấy: URL, HTTP headers, cookies, request body.

Cách hoạt động: Khi TCP connection đến, L4 LB quyết định forward tới backend server nào dựa trên IP + Port. Sau đó mọi packet trong connection đó đều đi tới cùng server (connection affinity).

Ưu điểm:

Cực nhanh: Không cần parse HTTP → latency thêm chỉ ~microseconds
Protocol-agnostic: Hoạt động với bất kỳ protocol nào chạy trên TCP/UDP (MySQL, gRPC, SMTP, game servers…)
Ít tài nguyên: Không cần buffer toàn bộ request

Nhược điểm:

Không thể route dựa trên nội dung (URL path, cookie, header)
Không thể inspect/modify HTTP request
Không hỗ trợ SSL termination (ở dạng thuần L4)

Khi nào dùng: Database load balancing, TCP-based protocols, cần ultra-low latency, non-HTTP traffic.

Ví dụ thực tế: AWS NLB (Network Load Balancer), HAProxy mode TCP, LVS (Linux Virtual Server).

Layer 7 Load Balancing (Application Layer — Tầng ứng dụng)

LB hoạt động ở tầng HTTP/HTTPS. Nó nhìn thấy toàn bộ:

URL path (/api/users vs /api/orders)
HTTP method (GET, POST, PUT…)
Headers (Host, Authorization, Cookie…)
Request body (nếu cần)
TLS SNI (Server Name Indication)

Cách hoạt động: LB nhận toàn bộ HTTP request, parse nó, rồi quyết định forward tới backend nào. Có thể rewrite URL, thêm/xóa header, redirect, cache response.

Ưu điểm:

Content-based routing: Route /api/* tới API servers, /static/* tới CDN/static servers
SSL termination: Decrypt HTTPS tại LB, forward HTTP tới backend → giảm tải encryption cho backend
Request manipulation: Thêm X-Forwarded-For, X-Request-ID, modify cookies
Advanced health checks: Kiểm tra HTTP status code, response body, response time
WAF integration: Inspect request body để chặn SQL injection, XSS

Nhược điểm:

Chậm hơn L4: Phải parse HTTP → latency thêm ~milliseconds
Tốn tài nguyên hơn: Buffer request/response, SSL processing
Chỉ hoạt động với HTTP/HTTPS (và một số protocol L7 khác như gRPC)

Khi nào dùng: Web applications, API gateways, microservices routing, cần SSL termination.

Ví dụ thực tế: AWS ALB (Application Load Balancer), Nginx, HAProxy mode HTTP, Envoy.

Bảng so sánh L4 vs L7

Tiêu chí	L4 (Transport)	L7 (Application)
OSI Layer	TCP/UDP	HTTP/HTTPS
Nhìn thấy	IP, Port, TCP flags	URL, Headers, Cookies, Body
Tốc độ	Cực nhanh (~μs overhead)	Nhanh (~ms overhead)
Tài nguyên	Thấp	Cao hơn
SSL termination	Không (pass-through)	Có
Content routing	Không	Có
Protocol	Bất kỳ TCP/UDP	HTTP/HTTPS/gRPC
Sticky session	Dựa trên IP	Dựa trên Cookie
Use case	DB, gaming, non-HTTP	Web apps, APIs, microservices
AWS equivalent	NLB	ALB

Aha Moment: Trong thực tế production, thường dùng cả hai: L4 LB ở ngoài cùng (handle raw TCP, chặn DDoS ở tầng network) → phía sau là L7 LB (route HTTP, terminate SSL). Đây là pattern two-tier load balancing.

2.2 Thuật toán Load Balancing (LB Algorithms)

2.2.1 Round Robin (Vòng tròn)

Cách hoạt động: Request được phân lần lượt cho từng server theo thứ tự vòng tròn. Server 1 → Server 2 → Server 3 → Server 1 → …

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (quay lại)
Request 5 → Server B
...

Ưu điểm: Đơn giản nhất, dễ implement, phân tải đều nếu các server có capacity bằng nhau.

Nhược điểm: Không tính đến server nào đang busy hơn. Nếu Request 1 là query nặng (5s) và Request 2 là health check (10ms), Server A bị overload trong khi Server B rảnh.

Khi nào dùng: Stateless services, servers đồng nhất (homogeneous), requests có cost tương đương.

2.2.2 Weighted Round Robin (Vòng tròn có trọng số)

Cách hoạt động: Giống Round Robin nhưng mỗi server có weight (trọng số). Server mạnh hơn nhận nhiều request hơn.

Server A (weight=5): 8 CPU, 32GB RAM
Server B (weight=3): 4 CPU, 16GB RAM
Server C (weight=2): 2 CPU, 8GB RAM

Sequence: A, A, A, A, A, B, B, B, C, C, A, A, A, A, A, B, B, B, C, C, ...

Khi nào dùng: Khi servers có hardware khác nhau (ví dụ đang migrate từ server cũ sang mới), canary deployment (server mới weight=1, server cũ weight=9).

2.2.3 Least Connections (Ít kết nối nhất)

Cách hoạt động: Request mới được gửi tới server đang có ít active connections nhất.

Server A: 15 active connections
Server B: 8 active connections   ← Request mới đi vào đây
Server C: 12 active connections

Ưu điểm: Tự động cân bằng khi requests có processing time khác nhau. Server xử lý nhanh → free connection sớm → nhận request mới sớm hơn.

Nhược điểm: Cần track connection count real-time → overhead cho LB. Với rất nhiều server, tracking trở nên phức tạp.

Khi nào dùng: Long-lived connections (WebSocket, database connections), requests có processing time khác nhau nhiều (ví dụ API vừa có endpoint trả về nhanh, vừa có endpoint generate report nặng).

Biến thể: Weighted Least Connections — kết hợp weight + least connections:

S cor e_{i} = \frac{A c t i v e C o nn ec t i o n s _{i}}{W e i g h t _{i}}

Chọn server có $S core$ thấp nhất.

2.2.4 IP Hash

Cách hoạt động: Hash IP address của client → map tới một server cố định.

ser v er_in d e x = ha s h (c l i e n t_I P) mod N

Trong đó $N$ = số servers.

hash("192.168.1.100") % 3 = 0 → Server A
hash("192.168.1.101") % 3 = 2 → Server C
hash("192.168.1.102") % 3 = 1 → Server B
hash("192.168.1.100") % 3 = 0 → Server A  (cùng client → cùng server)

Ưu điểm: Cùng client luôn tới cùng server → tận dụng server-side cache tốt hơn. Dạng “poor man’s sticky session”.

Nhược điểm nghiêm trọng: Khi thêm/bớt server (N thay đổi), hầu hết mọi mapping bị thay đổi → cache invalidation hàng loạt → gọi là “rehashing storm”.

Khi N thay đ ổi: \frac{N - 1}{N} clients bị remapped

Ví dụ: Có 10 servers, thêm 1 server → 90% clients bị chuyển sang server khác → cache miss hàng loạt.

→ Giải pháp: Consistent Hashing ở mục tiếp theo.

2.2.5 Consistent Hashing (Hash nhất quán)

Chi tiết đầy đủ: Tuan-10-Consistent-Hashing

Vấn đề của IP Hash: $ha s h mod N$ — khi $N$ thay đổi, gần như toàn bộ mapping thay đổi.

Giải pháp: Đặt cả servers và clients lên một vòng hash ring (thường $0$ tới $2^{32} - 1$ ).

Cách hoạt động:

Hash mỗi server → vị trí trên ring
Hash mỗi request (IP hoặc key) → vị trí trên ring
Request đi theo chiều kim đồng hồ tới server gần nhất

Khi thêm/bớt server: Chỉ requests ở “vùng lân cận” bị ảnh hưởng → chỉ khoảng $\frac{1}{N}$ requests bị remapped.

Rehash rate \approx \frac{K}{N}

Trong đó $K$ = tổng số keys, $N$ = tổng số servers.

Virtual nodes: Mỗi server thực tế được tạo nhiều “virtual node” trên ring (thường 100-200) để phân bố đều hơn.

Khi nào dùng: Cache servers (Memcached, Redis cluster), distributed databases, bất kỳ lúc nào cần phân tải mà minimize disruption khi topology thay đổi.

Ai dùng: Amazon DynamoDB, Apache Cassandra, Akamai CDN, Discord.

2.2.6 Các thuật toán khác

Algorithm	Mô tả	Use case
Random	Chọn server ngẫu nhiên	Simple, stateless, surprisingly effective
Least Response Time	Chọn server có response time thấp nhất + ít connection nhất	Latency-sensitive apps
Resource-based	Chọn server dựa trên CPU/RAM usage (cần agent report)	Heterogeneous workloads
Maglev Hashing	Google’s consistent hashing variant, lookup table	Ultra-high throughput (Google)
Power of Two Choices	Random chọn 2 server, pick server có ít load hơn	Envoy proxy default

Aha Moment: Thuật toán Power of Two Choices đơn giản đáng kinh ngạc: random pick 2 servers, chọn cái có ít load hơn. Nghiên cứu chứng minh nó giảm max load từ $O (lo g N / lo g lo g N)$ xuống $O (lo g lo g N)$ — cải thiện exponential. Envoy dùng nó làm default.

2.3 Health Checks (Kiểm tra sức khoẻ)

Health check là cách LB biết server nào đang “sống” và sẵn sàng nhận traffic. Không có health check, LB sẽ gửi request tới server đã chết → user nhận 502/503.

Active Health Check (Chủ động kiểm tra)

LB chủ động gửi request tới backend server theo interval:

LB → GET /health → Server A → 200 OK ✓
LB → GET /health → Server B → 200 OK ✓
LB → GET /health → Server C → timeout ✗ (đánh dấu unhealthy)

Cấu hình thường thấy:

Interval: 5–30 giây (mỗi bao lâu check 1 lần)
Timeout: 2–5 giây (chờ response bao lâu)
Unhealthy threshold: 2–3 lần fail liên tiếp → đánh dấu unhealthy
Healthy threshold: 2–3 lần pass liên tiếp → đánh dấu healthy lại

Health check endpoint nên kiểm tra gì?

# ❌ Sai: Chỉ return 200 — không biết app có thực sự hoạt động không
@app.get("/health")
def health():
    return {"status": "ok"}
 
# ✅ Đúng: Deep health check — kiểm tra mọi dependency
@app.get("/health")
def health():
    checks = {}
 
    # Check database connection
    try:
        db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception as e:
        checks["database"] = f"error: {e}"
 
    # Check Redis connection
    try:
        redis.ping()
        checks["redis"] = "ok"
    except Exception as e:
        checks["redis"] = f"error: {e}"
 
    # Check disk space
    disk_usage = shutil.disk_usage("/")
    disk_pct = disk_usage.used / disk_usage.total
    checks["disk"] = f"{disk_pct:.0%} used"
 
    all_ok = all(v == "ok" for k, v in checks.items() if k != "disk")
    status_code = 200 if all_ok else 503
 
    return JSONResponse(
        status_code=status_code,
        content={"status": "healthy" if all_ok else "unhealthy", "checks": checks}
    )

Cẩn thận: Deep health check có thể gây cascade failure — nếu DB chậm, health check timeout, LB đánh dấu MỌI server là unhealthy → toàn bộ traffic bị drop. Giải pháp: tách thành liveness check (app sống không?) và readiness check (app sẵn sàng nhận traffic không?). Kubernetes dùng pattern này.

Passive Health Check (Kiểm tra bị động)

LB không gửi request riêng mà quan sát traffic thật:

Client → Request → Server A → 502 Bad Gateway  (LB ghi nhận 1 fail)
Client → Request → Server A → 502 Bad Gateway  (LB ghi nhận 2 fail)
Client → Request → Server A → 502 Bad Gateway  (3 fail → đánh dấu unhealthy)

Ưu điểm: Không tạo thêm traffic, phát hiện lỗi nhanh hơn (dựa trên real traffic). Nhược điểm: Nếu không có traffic → không phát hiện server chết. Một số user thật bị ảnh hưởng trước khi server bị đánh dấu unhealthy.

Best Practice: Dùng cả Active + Passive cùng lúc. Active để detect server chết khi không có traffic. Passive để phản ứng nhanh với lỗi real-time.

So sánh Active vs Passive Health Check

Tiêu chí	Active	Passive
Cách hoạt động	LB gửi probe request	LB quan sát real traffic
Phát hiện server chết không có traffic	Có	Không
Tốn thêm bandwidth	Có	Không
Tốc độ phát hiện	Phụ thuộc interval (5-30s)	Real-time
User bị ảnh hưởng trước khi detect	Không	Có (1-3 requests)
Cấu hình trong Nginx	`health_check` directive	`max_fails`, `fail_timeout`

2.4 Sticky Sessions (Phiên gắn cố định)

Vấn đề: User đăng nhập ở Server A (session lưu trong memory của Server A). Request tiếp theo bị LB gửi tới Server B → Server B không có session → user bị kick ra.

Giải pháp 1 — Sticky Sessions: LB đảm bảo cùng user luôn tới cùng server.

Cách implement:

Cookie-based: LB set cookie SERVERID=server-a, request sau mang cookie này → LB route tới Server A
IP-based: Dùng IP Hash (nhược điểm: nhiều user cùng IP qua NAT/proxy)

Vấn đề của Sticky Sessions:

Phân tải không đều: Nếu “hot users” (user dùng nhiều) dồn vào 1 server → server đó overload
Mất session khi server chết: User bị logout đột ngột
Khó scale: Không thể tự do thêm/bớt server vì session bị “dính”
Cản trở blue-green deployment: Không thể chuyển traffic sang server mới nếu user bị “dính” ở server cũ

Giải pháp 2 (Recommended) — Externalized Session:

Lưu session vào Redis/Memcached (shared session store)
Mọi server đều đọc được session từ Redis
LB thoải mái route request tới bất kỳ server nào
Server chết? Không sao — session vẫn ở Redis

Client → LB → Server A → Redis.get(session_id) → session data
Client → LB → Server B → Redis.get(session_id) → cùng session data!

Rule of thumb cho Hieu: Tránh sticky sessions nếu có thể. Dùng stateless servers + external session store (Redis). Đây là pattern mà mọi hệ thống lớn đều dùng.

2.5 SSL/TLS Termination (Chấm dứt SSL tại LB)

Khái niệm: Client kết nối HTTPS tới LB. LB decrypt request (terminate SSL), rồi forward HTTP (plaintext) tới backend servers.

Client ──HTTPS──→ [Load Balancer] ──HTTP──→ Backend Server
         (encrypted)    (decrypt)     (plaintext, internal network)

Tại sao làm vậy?

Giảm tải CPU cho backend: SSL/TLS handshake + encryption rất tốn CPU. LB chuyên biệt (hoặc có hardware accelerator) xử lý hiệu quả hơn
Quản lý certificate tập trung: Chỉ cần cài SSL cert ở LB, không phải cài trên 100 backend servers
Dễ rotate certificate: Update cert 1 chỗ thay vì 100 chỗ
L7 inspection: LB phải decrypt mới đọc được HTTP headers để route

Lưu ý Security (rất quan trọng — xem thêm Section 4):

Traffic từ LB → backend là plaintext → nếu internal network bị compromise → data bị đọc
Trong môi trường compliance cao (PCI-DSS), có thể cần re-encrypt (SSL termination + SSL origination): LB decrypt, inspect, rồi encrypt lại trước khi gửi tới backend
Hoặc dùng SSL passthrough: LB forward encrypted traffic trực tiếp, không decrypt (nhưng mất khả năng L7 routing)

2.6 Connection Draining (Graceful Shutdown — Tắt server nhẹ nhàng)

Vấn đề: Cần remove server khỏi pool (deploy code mới, maintenance). Nếu remove ngay → requests đang xử lý bị cắt ngang → user nhận 502.

Giải pháp — Connection Draining:

LB đánh dấu server là “draining” → ngừng gửi request MỚI tới server
Requests đang xử lý được cho phép hoàn thành (trong timeout, thường 30-300 giây)
Khi hết requests in-flight hoặc hết timeout → server thực sự bị remove

Trước draining:
  LB → Server A (100 active connections, nhận request mới)

Bắt đầu drain:
  LB → Server A (100 active connections, KHÔNG nhận request mới)

Sau 30 giây:
  LB → Server A (2 active connections, chờ hoàn thành)

Sau 60 giây:
  LB → Server A (0 connections → safe to remove)

Cấu hình:

AWS ALB: Deregistration Delay (mặc định 300s)
Nginx: proxy_next_upstream_timeout
HAProxy: option redispatch + server drain mode

Best Practice: Luôn bật connection draining. Đặt timeout = max expected request duration × 2. Nếu API request max 30s → draining timeout = 60s.

2.7 Global Server Load Balancing (GSLB)

GSLB phân tải traffic giữa các data center/region, không phải giữa servers trong cùng một DC.

Cách hoạt động: Thường dựa trên DNS-based routing:

User ở Việt Nam query api.example.com
DNS resolver (có GSLB logic) trả về IP của data center Singapore (gần nhất)
User ở Mỹ query api.example.com
DNS resolver trả về IP của data center US-East

Routing strategies:

Geographic (Geo-DNS): Route tới DC gần nhất về địa lý
Latency-based: Route tới DC có latency thấp nhất (measure thực tế)
Failover: DC primary chết → route sang DC secondary
Weighted: 80% traffic tới DC1, 20% tới DC2

Ví dụ thực tế:

AWS Route 53: Latency-based routing, geo routing, failover
Cloudflare: Anycast DNS + global load balancing
Akamai GTM (Global Traffic Manager)

User (Vietnam) → DNS → Singapore DC IP → Singapore LB → Backend
User (USA)     → DNS → US-East DC IP   → US-East LB  → Backend
User (Europe)  → DNS → Frankfurt DC IP → Frankfurt LB → Backend

Aha Moment: GSLB thường kết hợp với local LB. GSLB chọn data center (DNS level), local LB chọn server trong data center đó (L4/L7 level). Đây là multi-tier load balancing.

2.8 Hardware vs Software Load Balancer

Hardware LB (Thiết bị vật lý)

Đại diện: F5 BIG-IP, Citrix ADC (NetScaler), A10 Networks.

Ưu điểm: Throughput cực cao (hàng triệu concurrent connections), hardware SSL accelerator, FPGA/ASIC-based → deterministic latency
Nhược điểm: Đắt ( $50 K -$ 500K+), vendor lock-in, khó scale horizontally, cần chuyên gia vận hành
Khi nào dùng: Enterprise on-premise, financial trading (cần μs latency), telecom

Software LB (Phần mềm)

Software	Đặc điểm	Protocol	Performance
HAProxy	Purpose-built LB, C-based, cực kỳ ổn định	L4 + L7	2M+ req/s
Nginx	Web server + reverse proxy + LB	L7 (L4 với stream)	1M+ req/s
Envoy	Modern proxy, xAPI-driven, Lyft-born	L4 + L7	500K+ req/s
Traefik	Cloud-native, auto-discovery	L7	300K+ req/s
Caddy	Automatic HTTPS, Go-based	L7	200K+ req/s

Nginx vs HAProxy vs Envoy — Chi tiết:

Nginx:

Web server kiêm reverse proxy/LB
Config file-based (nginx.conf)
Phổ biến nhất — hầu hết dev đã quen
Open source + Nginx Plus (commercial)
Hạn chế: dynamic reconfiguration cần reload (Nginx Plus có API)

HAProxy:

Chuyên biệt cho load balancing — không phải web server
Advanced health checks, connection queuing, rate limiting built-in
Stats dashboard built-in (/haproxy?stats)
Hỗ trợ multi-threading tốt hơn Nginx OSS
Config file-based, nhưng hỗ trợ runtime API

Envoy:

Cloud-native proxy, xAPI-driven (config qua API, không qua file)
Built-in observability: distributed tracing (Jaeger/Zipkin), metrics (Prometheus), logging
Service mesh sidecar: Istio dùng Envoy làm data plane
gRPC first-class support
Hot restart (zero-downtime config update)
Nhược điểm: phức tạp hơn, resource footprint lớn hơn

Recommendation cho Hieu: Bắt đầu với Nginx (quen thuộc, tài liệu nhiều). Khi cần advanced features → HAProxy. Khi làm microservices/Kubernetes → Envoy (thường qua Istio service mesh).

2.9 Service Mesh Load Balancing

Trong kiến trúc microservices, service mesh đưa load balancing xuống sidecar proxy chạy cạnh mỗi service instance.

Mô hình truyền thống:

Client → Centralized LB → Service A
                        → Service B
                        → Service C

Mô hình Service Mesh:

Service A + Sidecar Proxy ←→ Service B + Sidecar Proxy
                          ←→ Service C + Sidecar Proxy
(Mỗi sidecar tự làm LB, retry, circuit break, TLS)

Ưu điểm:

Client-side LB: Không có centralized LB → không có single point of failure
mTLS (mutual TLS) giữa mọi service → zero-trust network
Observability: Distributed tracing, metrics tự động
Traffic control: Canary deployments, A/B testing, fault injection

Đại diện: Istio (Envoy-based), Linkerd, Consul Connect.

Nhược điểm: Complexity cao, resource overhead (mỗi pod thêm 1 sidecar container), debugging khó hơn.

3. Estimation — Tính số server backend cần thiết

Bài toán

Cho:

QPS peak = 50,000 requests/second
Per-server capacity = 5,000 requests/second (Node.js + Express, simple API)
Target availability = 99.99%

Bước 1: Tính số server tối thiểu

N_{min} = ⌈ \frac{QP S _{p e ak}}{C a p a c i t y _{p er_ser v er}} ⌉ = ⌈ \frac{50 , 000}{5 , 000} ⌉ = 10 servers

Bước 2: Thêm headroom cho spikes

Không bao giờ chạy 100% capacity. Rule of thumb: giữ 70-80% utilization max.

N_{w i t h_h e a d roo m} = ⌈ \frac{N _{min}}{T a r g e t _ u t i l i z a t i o n} ⌉ = ⌈ \frac{10}{0.7} ⌉ = 15 servers

Bước 3: Thêm redundancy cho availability

Với target 99.99%, cần chịu được ít nhất 1-2 server failure mà không ảnh hưởng. Dùng N+2 redundancy:

N_{f ina l} = N_{w i t h_h e a d roo m} + 2 = 15 + 2 = 17 servers

Hoặc tính theo failure tolerance:

N_{f ina l} = ⌈ \frac{QP S _{p e ak}}{C a p a c i t y _{p er_ser v er} \times T a r g e t _ u t i l i z a t i o n} ⌉ + F

Trong đó $F$ = số servers cho phép fail đồng thời.

Bước 4: Tính capacity cần cho Load Balancer

LB phải handle toàn bộ 50,000 QPS + overhead:

L B_c a p a c i t y = QP S_{p e ak} \times (1 + o v er h e a d_{SS L} + o v er h e a d_{h e a lt h_c h ec k})

= 50, 000 \times (1 + 0.1 + 0.01) = 55, 500 req/s

Nginx single instance handle ~100K req/s → 1 Nginx instance đủ. Nhưng LB cũng cần HA → cần 2 LB instances (active-passive hoặc active-active).

Bước 5: LB bandwidth estimation

B W_{t o t a l} = QP S_{p e ak} \times a vg_res p o n se_s i ze = 50, 000 \times 5 KB = 250 MB/s = 2 Gbps

→ Cần network interface 10 Gbps cho LB (headroom).

Tóm tắt estimation

Metric	Value	Ghi chú
Backend servers	17	10 min + 5 headroom + 2 redundancy
Load Balancers	2	HA pair (active-passive)
LB capacity needed	55,500 req/s	Nginx single instance đủ
Network bandwidth (LB)	~2 Gbps	Cần NIC 10Gbps
Target utilization	70%	Không chạy quá 70% peak

Estimation formula tổng quát

N_{ser v ers} = ⌈ \frac{QP S _{p e ak}}{C _{ser v er} \times U _{t a r g e t}} ⌉ + F

Trong đó:

$QP S_{p e ak}$ : Peak queries per second
$C_{ser v er}$ : Capacity per server (req/s)
$U_{t a r g e t}$ : Target utilization (0.7 – 0.8)
$F$ : Failure tolerance (thường 1–3)

4. Security — Bảo mật tại tầng Load Balancer

LB là tuyến phòng thủ đầu tiên — mọi traffic từ Internet đều đi qua đây trước khi tới backend.

4.1 DDoS Protection tại LB Layer

Layer 4 DDoS (SYN Flood, UDP Flood):

SYN Cookies: LB không allocate memory cho half-open connections. Thay vào đó, encode state vào TCP sequence number.
Rate limiting per IP: Giới hạn connections/second từ mỗi IP
Connection limits: Set max concurrent connections per source IP
Blackhole routing: Khi detect attack, route traffic tới null route

Layer 7 DDoS (HTTP Flood, Slowloris):

Request rate limiting: Max requests/second per IP hoặc per session
Slowloris protection: Set timeout cho header reads (Nginx: client_header_timeout)
Request body size limit: Ngăn large POST attacks (client_max_body_size)
Geographic blocking: Block traffic từ countries không phục vụ

Nginx DDoS protection config:

# Rate limiting zone: 10 requests/second per IP
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
 
# Connection limiting: max 100 concurrent connections per IP
limit_conn_zone $binary_remote_addr zone=conn_limit:10m;
 
server {
    # Slowloris protection
    client_header_timeout 10s;
    client_body_timeout 10s;
    send_timeout 10s;
 
    # Request body limit
    client_max_body_size 10m;
 
    location /api/ {
        limit_req zone=api_limit burst=20 nodelay;
        limit_conn conn_limit 100;
 
        # Return 429 Too Many Requests (thay vì default 503)
        limit_req_status 429;
        limit_conn_status 429;
 
        proxy_pass http://backend;
    }
}

4.2 WAF Integration (Web Application Firewall)

WAF hoạt động ở L7, inspect HTTP request để chặn:

SQL Injection: ' OR 1=1 --
XSS: <script>alert('xss')</script>
Path Traversal: ../../etc/passwd
Command Injection: ; rm -rf /

Vị trí trong architecture:

Client → L4 LB (DDoS filter) → WAF → L7 LB (routing) → Backend

Nginx + ModSecurity (WAF open source):

# Load ModSecurity module
load_module modules/ngx_http_modsecurity_module.so;
 
server {
    modsecurity on;
    modsecurity_rules_file /etc/nginx/modsecurity/main.conf;
 
    location /api/ {
        proxy_pass http://backend;
    }
}

AWS Architecture: CloudFront → AWS WAF → ALB → EC2/ECS.

4.3 SSL Offloading — Security Implications

Rủi ro khi traffic LB → backend là plaintext:

Network sniffing: Nếu attacker compromise internal network → đọc được mọi request/response
Man-in-the-middle: Nếu LB → backend qua network chung (không isolated) → bị intercept
Compliance violation: PCI-DSS yêu cầu encrypt cardholder data in transit — bao gồm internal network

Mitigation strategies:

Strategy	Security	Performance	Complexity
SSL Termination (plaintext internal)	Thấp	Cao	Thấp
SSL Termination + Re-encryption	Cao	Trung bình	Trung bình
SSL Passthrough	Cao	Cao (nhưng mất L7 features)	Thấp
mTLS (mutual TLS)	Rất cao	Trung bình	Cao

Recommendation:

Non-sensitive apps: SSL termination + private VLAN/VPC đủ an toàn
Financial/Healthcare: SSL termination + re-encryption (hoặc mTLS)
Zero-trust environments: mTLS everywhere (service mesh pattern)

4.4 Header Injection & X-Forwarded-For Trust

Vấn đề: Khi LB forward request, nó thêm header X-Forwarded-For chứa client IP. Nhưng attacker có thể giả mạo header này!

# Attacker gửi request với giả mạo XFF:
GET /api/data HTTP/1.1
X-Forwarded-For: 10.0.0.1  ← Giả mạo internal IP để bypass whitelist!

# LB thêm real IP vào cuối:
X-Forwarded-For: 10.0.0.1, 203.0.113.50  ← 203.0.113.50 là IP thật

Nếu backend app đọc XFF từ đầu (10.0.0.1) → bị bypass security!

Giải pháp:

Luôn đọc IP từ cuối cùng (hoặc chính xác hơn: IP được thêm bởi trusted proxy đầu tiên)
Cấu hình trusted proxies — Nginx:

# Chỉ trust proxy từ internal network
set_real_ip_from 10.0.0.0/8;
set_real_ip_from 172.16.0.0/12;
set_real_ip_from 192.168.0.0/16;
real_ip_header X-Forwarded-For;
real_ip_recursive on;  # Bỏ qua trusted IPs, lấy IP cuối cùng không trusted

Strip XFF header tại LB ngoài cùng — ghi đè hoàn toàn:

proxy_set_header X-Forwarded-For $remote_addr;  # Chỉ dùng real client IP

Đừng tin X-Forwarded-For cho security decisions nếu không kiểm soát toàn bộ proxy chain.

Các header quan trọng LB thường set:

Header	Mục đích
`X-Forwarded-For`	Client IP (chuỗi proxy chain)
`X-Forwarded-Proto`	Protocol gốc (http/https)
`X-Forwarded-Host`	Original Host header
`X-Real-IP`	Client IP (single value, Nginx specific)
`X-Request-ID`	Unique request ID cho tracing
`Strict-Transport-Security`	HSTS header (force HTTPS)

4.5 Security Checklist cho Load Balancer

5. DevOps — Cấu hình thực chiến

5.1 Nginx Load Balancer Configuration

# /etc/nginx/nginx.conf
 
user nginx;
worker_processes auto;  # Tự detect số CPU cores
worker_rlimit_nofile 65535;  # Max file descriptors
 
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
 
events {
    worker_connections 10240;  # Max connections per worker
    multi_accept on;
    use epoll;  # Linux high-performance event model
}
 
http {
    # === Basic Settings ===
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;  # Ẩn version Nginx (security)
 
    # === Logging ===
    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    'rt=$request_time urt=$upstream_response_time '
                    'uct=$upstream_connect_time uht=$upstream_header_time '
                    'us=$upstream_status ua=$upstream_addr';
 
    access_log /var/log/nginx/access.log main buffer=16k flush=5s;
 
    # === Gzip Compression ===
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_types text/plain text/css application/json application/javascript
               text/xml application/xml application/xml+rss text/javascript;
 
    # === Rate Limiting ===
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=login_limit:10m rate=1r/s;
    limit_conn_zone $binary_remote_addr zone=conn_limit:10m;
 
    # === Upstream — Backend Pool ===
    upstream backend_api {
        # Algorithm: Least Connections
        least_conn;
 
        # Backend servers with health check parameters
        server 10.0.1.10:3000 weight=5 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:3000 weight=5 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:3000 weight=5 max_fails=3 fail_timeout=30s;
        server 10.0.1.13:3000 weight=3 max_fails=3 fail_timeout=30s;  # Server yếu hơn
        server 10.0.1.14:3000 backup;  # Chỉ dùng khi các server khác chết
 
        # Keepalive connections to backend (connection pooling)
        keepalive 32;
        keepalive_requests 1000;
        keepalive_timeout 60s;
    }
 
    upstream backend_websocket {
        # IP Hash cho WebSocket (cần maintain connection)
        ip_hash;
 
        server 10.0.2.10:8080 max_fails=2 fail_timeout=10s;
        server 10.0.2.11:8080 max_fails=2 fail_timeout=10s;
        server 10.0.2.12:8080 max_fails=2 fail_timeout=10s;
    }
 
    # === HTTPS Server ===
    server {
        listen 443 ssl http2;
        server_name api.example.com;
 
        # --- SSL Configuration ---
        ssl_certificate /etc/nginx/ssl/fullchain.pem;
        ssl_certificate_key /etc/nginx/ssl/privkey.pem;
        ssl_session_timeout 1d;
        ssl_session_cache shared:SSL:50m;
        ssl_session_tickets off;
 
        # Modern TLS only
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
        ssl_prefer_server_ciphers off;
 
        # HSTS (HTTP Strict Transport Security)
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
 
        # --- Security Headers ---
        add_header X-Content-Type-Options nosniff always;
        add_header X-Frame-Options DENY always;
        add_header X-XSS-Protection "1; mode=block" always;
 
        # --- DDoS Protection ---
        client_header_timeout 10s;
        client_body_timeout 10s;
        client_max_body_size 10m;
 
        # --- API Routes ---
        location /api/ {
            limit_req zone=api_limit burst=20 nodelay;
            limit_conn conn_limit 100;
            limit_req_status 429;
 
            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Connection "";  # Enable keepalive to upstream
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Request-ID $request_id;
 
            # Timeouts
            proxy_connect_timeout 5s;
            proxy_send_timeout 30s;
            proxy_read_timeout 30s;
 
            # Retry on failure (connection errors only, NOT on 5xx)
            proxy_next_upstream error timeout;
            proxy_next_upstream_tries 2;
            proxy_next_upstream_timeout 10s;
        }
 
        # --- Login Route (stricter rate limit) ---
        location /api/auth/login {
            limit_req zone=login_limit burst=5 nodelay;
            limit_req_status 429;
 
            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
 
        # --- WebSocket ---
        location /ws/ {
            proxy_pass http://backend_websocket;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
 
            proxy_read_timeout 86400s;  # 24h timeout for WebSocket
            proxy_send_timeout 86400s;
        }
 
        # --- Health Check Endpoint (for external monitoring) ---
        location /nginx-health {
            access_log off;
            return 200 "OK\n";
            add_header Content-Type text/plain;
        }
 
        # --- Stub Status (for Prometheus exporter) ---
        location /nginx_status {
            stub_status on;
            allow 10.0.0.0/8;      # Only internal network
            allow 127.0.0.1;
            deny all;
        }
    }
 
    # === HTTP → HTTPS Redirect ===
    server {
        listen 80;
        server_name api.example.com;
        return 301 https://$server_name$request_uri;
    }
}

5.2 HAProxy Configuration

# /etc/haproxy/haproxy.cfg
 
global
    maxconn 50000
    log stdout format raw local0
    stats socket /var/run/haproxy.sock mode 660 level admin
    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256
    ssl-default-bind-options ssl-min-ver TLSv1.2
 
defaults
    mode http
    log global
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor except 127.0.0.0/8
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    timeout http-request 10s      # Slowloris protection
    timeout http-keep-alive 10s
    timeout queue 30s
    retries 3
    errorfile 429 /etc/haproxy/errors/429.http
 
# === Stats Dashboard ===
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST
 
# === Frontend (SSL Termination) ===
frontend https_front
    bind *:443 ssl crt /etc/haproxy/ssl/example.com.pem alpn h2,http/1.1
    bind *:80
 
    # Redirect HTTP → HTTPS
    http-request redirect scheme https unless { ssl_fc }
 
    # Security headers
    http-response set-header Strict-Transport-Security "max-age=63072000; includeSubDomains"
    http-response set-header X-Content-Type-Options nosniff
    http-response set-header X-Frame-Options DENY
 
    # Rate limiting (stick table)
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
 
    # Request ID
    unique-id-format %{+X}o\ %ci:%cp_%fi:%fp_%Ts_%rt:%pid
    unique-id-header X-Request-ID
 
    # Content-based routing
    acl is_api path_beg /api/
    acl is_websocket hdr(Upgrade) -i websocket
    acl is_static path_beg /static/ /assets/ /images/
 
    use_backend backend_websocket if is_websocket
    use_backend backend_static if is_static
    default_backend backend_api
 
# === Backend — API Servers ===
backend backend_api
    balance leastconn
    option httpchk GET /health
    http-check expect status 200
 
    # Connection draining
    option redispatch
 
    # Servers
    server api1 10.0.1.10:3000 check inter 5s fall 3 rise 2 weight 5 maxconn 1000
    server api2 10.0.1.11:3000 check inter 5s fall 3 rise 2 weight 5 maxconn 1000
    server api3 10.0.1.12:3000 check inter 5s fall 3 rise 2 weight 5 maxconn 1000
    server api4 10.0.1.13:3000 check inter 5s fall 3 rise 2 weight 3 maxconn 800
    server api5 10.0.1.14:3000 check inter 5s fall 3 rise 2 backup
 
# === Backend — WebSocket Servers ===
backend backend_websocket
    balance source  # IP-based affinity for WebSocket
    option httpchk GET /health
    timeout tunnel 3600s  # 1 hour WebSocket timeout
 
    server ws1 10.0.2.10:8080 check inter 5s fall 2 rise 2
    server ws2 10.0.2.11:8080 check inter 5s fall 2 rise 2
    server ws3 10.0.2.12:8080 check inter 5s fall 2 rise 2
 
# === Backend — Static Files ===
backend backend_static
    balance roundrobin
    option httpchk HEAD /health
 
    server static1 10.0.3.10:80 check inter 10s
    server static2 10.0.3.11:80 check inter 10s

5.3 AWS ALB/NLB Setup (Terraform)

# terraform/load-balancer.tf
 
# === Application Load Balancer (L7) ===
resource "aws_lb" "app_alb" {
  name               = "app-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb_sg.id]
  subnets            = var.public_subnet_ids
 
  enable_deletion_protection = true
  enable_http2               = true
  idle_timeout               = 60
 
  access_logs {
    bucket  = aws_s3_bucket.alb_logs.id
    prefix  = "alb-logs"
    enabled = true
  }
 
  tags = {
    Environment = "production"
    Component   = "load-balancer"
  }
}
 
# HTTPS Listener
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app_alb.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.acm_certificate_arn
 
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api.arn
  }
}
 
# HTTP → HTTPS Redirect
resource "aws_lb_listener" "http_redirect" {
  load_balancer_arn = aws_lb.app_alb.arn
  port              = 80
  protocol          = "HTTP"
 
  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}
 
# Path-based routing rules
resource "aws_lb_listener_rule" "api_rule" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 100
 
  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api.arn
  }
 
  condition {
    path_pattern {
      values = ["/api/*"]
    }
  }
}
 
# Target Group with health check
resource "aws_lb_target_group" "api" {
  name                 = "api-targets"
  port                 = 3000
  protocol             = "HTTP"
  vpc_id               = var.vpc_id
  target_type          = "instance"
  deregistration_delay = 60  # Connection draining: 60 seconds
 
  health_check {
    enabled             = true
    interval            = 15
    path                = "/health"
    port                = "traffic-port"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    matcher             = "200"
  }
 
  stickiness {
    type            = "lb_cookie"
    cookie_duration = 86400  # 1 day
    enabled         = false  # Disabled — stateless servers + Redis session
  }
}
 
# === Network Load Balancer (L4) — for gRPC/TCP services ===
resource "aws_lb" "grpc_nlb" {
  name               = "grpc-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.private_subnet_ids
 
  enable_cross_zone_load_balancing = true
 
  tags = {
    Environment = "production"
    Component   = "grpc-lb"
  }
}
 
resource "aws_lb_target_group" "grpc" {
  name        = "grpc-targets"
  port        = 50051
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "instance"
 
  health_check {
    enabled             = true
    interval            = 10
    port                = 50051
    protocol            = "TCP"
    healthy_threshold   = 2
    unhealthy_threshold = 2
  }
}
 
# === Security Group for ALB ===
resource "aws_security_group" "alb_sg" {
  name_prefix = "alb-sg-"
  vpc_id      = var.vpc_id
 
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS from internet"
  }
 
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTP (redirect to HTTPS)"
  }
 
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "All outbound"
  }
}

5.4 Prometheus Metrics cho Load Balancer

# prometheus/prometheus.yml
 
global:
  scrape_interval: 15s
  evaluation_interval: 15s
 
scrape_configs:
  # Nginx metrics (via nginx-prometheus-exporter)
  - job_name: 'nginx'
    static_configs:
      - targets: ['nginx-exporter:9113']
    metrics_path: /metrics
 
  # HAProxy metrics (built-in Prometheus endpoint)
  - job_name: 'haproxy'
    static_configs:
      - targets: ['haproxy:8404']
    metrics_path: /metrics
 
  # Backend app metrics
  - job_name: 'backend-api'
    static_configs:
      - targets:
          - '10.0.1.10:9090'
          - '10.0.1.11:9090'
          - '10.0.1.12:9090'
          - '10.0.1.13:9090'

# prometheus/alerts/load-balancer-alerts.yml
 
groups:
  - name: load_balancer_alerts
    rules:
      # Alert: High error rate on backend
      - alert: HighBackendErrorRate
        expr: |
          sum(rate(haproxy_server_http_responses_total{code="5xx"}[5m]))
          /
          sum(rate(haproxy_server_http_responses_total[5m]))
          > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Backend error rate > 1% (current: {{ $value | humanizePercentage }})"
 
      # Alert: Backend server down
      - alert: BackendServerDown
        expr: haproxy_server_status == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Backend server {{ $labels.server }} is DOWN"
 
      # Alert: High connection count on LB
      - alert: HighLBConnections
        expr: haproxy_frontend_current_sessions > 8000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LB connections high: {{ $value }} (threshold: 8000)"
 
      # Alert: High response time
      - alert: HighResponseTime
        expr: |
          histogram_quantile(0.99,
            sum(rate(haproxy_server_http_response_time_average_seconds[5m])) by (le, server)
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 response time > 2s on {{ $labels.server }}"
 
      # Alert: SSL certificate expiring soon
      - alert: SSLCertExpiringSoon
        expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate expires in {{ $value | humanize }} days"
 
      # Alert: Unhealthy host count
      - alert: TooManyUnhealthyHosts
        expr: |
          count(haproxy_server_status == 0) by (proxy)
          /
          count(haproxy_server_status) by (proxy)
          > 0.3
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "> 30% of backend servers unhealthy in pool {{ $labels.proxy }}"

5.5 Blue-Green Deployment with Load Balancer

Blue-Green Deployment: Duy trì 2 môi trường production giống hệt nhau (Blue = hiện tại, Green = phiên bản mới). Khi deploy xong Green, chuyển LB traffic từ Blue → Green.

Nginx implementation:

# /etc/nginx/conf.d/upstream.conf
 
# Blue environment (current production)
upstream backend_blue {
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}
 
# Green environment (new version)
upstream backend_green {
    server 10.0.2.10:3000;
    server 10.0.2.11:3000;
    server 10.0.2.12:3000;
}
 
# Active environment — thay đổi file này khi switch
# File: /etc/nginx/conf.d/active_backend.conf
# Nội dung: set $active_backend "backend_blue";
# Đổi thành: set $active_backend "backend_green"; rồi nginx -s reload

Deployment script:

#!/bin/bash
# deploy-blue-green.sh
 
set -euo pipefail
 
ACTIVE_CONFIG="/etc/nginx/conf.d/active_backend.conf"
CURRENT=$(grep -oP 'backend_\w+' "$ACTIVE_CONFIG")
 
if [ "$CURRENT" == "backend_blue" ]; then
    NEW="backend_green"
else
    NEW="backend_blue"
fi
 
echo "Current: $CURRENT → Switching to: $NEW"
 
# Step 1: Deploy new version to inactive environment
echo "Deploying to $NEW environment..."
# ... deploy commands (docker pull, restart containers, etc.) ...
 
# Step 2: Wait for health checks to pass
echo "Waiting for health checks..."
for i in {1..30}; do
    if curl -sf "http://${NEW}-endpoint/health" > /dev/null; then
        echo "Health check passed!"
        break
    fi
    echo "Attempt $i/30 — waiting..."
    sleep 2
done
 
# Step 3: Switch traffic
echo "Switching traffic to $NEW..."
echo "set \$active_backend \"$NEW\";" > "$ACTIVE_CONFIG"
nginx -t && nginx -s reload
 
echo "Traffic switched to $NEW!"
echo "Monitor for 5 minutes. Rollback: switch back to $CURRENT"
 
# Step 4: Optional — canary (10% traffic to new, 90% to old)
# Dùng split_clients module:
# split_clients $request_uri $backend_pool {
#     10% backend_green;
#     *   backend_blue;
# }

AWS ALB Blue-Green (dùng weighted target groups):

# Canary: 10% traffic to green
resource "aws_lb_listener_rule" "weighted" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 50
 
  action {
    type = "forward"
    forward {
      target_group {
        arn    = aws_lb_target_group.blue.arn
        weight = 90
      }
      target_group {
        arn    = aws_lb_target_group.green.arn
        weight = 10
      }
      stickiness {
        enabled  = true
        duration = 600
      }
    }
  }
 
  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

6. Code — Nginx + Docker Compose Full Stack

6.1 Project Structure

load-balancer-demo/
├── docker-compose.yml
├── nginx/
│   ├── nginx.conf
│   ├── ssl/
│   │   ├── self-signed.crt    # Dev only — production dùng Let's Encrypt
│   │   └── self-signed.key
│   └── conf.d/
│       └── default.conf
├── app/
│   ├── Dockerfile
│   ├── package.json
│   └── server.js
└── prometheus/
    ├── prometheus.yml
    └── alerts/
        └── lb-alerts.yml

6.2 Docker Compose — Multiple App Instances Behind Nginx

# docker-compose.yml
 
version: "3.9"
 
services:
  # === Load Balancer ===
  nginx:
    image: nginx:1.25-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      app1:
        condition: service_healthy
      app2:
        condition: service_healthy
      app3:
        condition: service_healthy
    restart: unless-stopped
    networks:
      - frontend
      - backend
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 256M
 
  # === App Instances ===
  app1:
    build: ./app
    environment:
      - INSTANCE_ID=app1
      - PORT=3000
      - NODE_ENV=production
      - REDIS_URL=redis://redis:6379
    expose:
      - "3000"
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 15s
    restart: unless-stopped
    networks:
      - backend
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
 
  app2:
    build: ./app
    environment:
      - INSTANCE_ID=app2
      - PORT=3000
      - NODE_ENV=production
      - REDIS_URL=redis://redis:6379
    expose:
      - "3000"
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 15s
    restart: unless-stopped
    networks:
      - backend
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
 
  app3:
    build: ./app
    environment:
      - INSTANCE_ID=app3
      - PORT=3000
      - NODE_ENV=production
      - REDIS_URL=redis://redis:6379
    expose:
      - "3000"
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 15s
    restart: unless-stopped
    networks:
      - backend
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
 
  # === Shared Session Store ===
  redis:
    image: redis:7-alpine
    expose:
      - "6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
    restart: unless-stopped
    networks:
      - backend
 
  # === Monitoring ===
  nginx-exporter:
    image: nginx/nginx-prometheus-exporter:1.0
    command:
      - '-nginx.scrape-uri=http://nginx:8080/nginx_status'
    expose:
      - "9113"
    depends_on:
      - nginx
    networks:
      - backend
 
  prometheus:
    image: prom/prometheus:v2.48.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/alerts:/etc/prometheus/alerts:ro
      - prometheus_data:/prometheus
    networks:
      - backend
 
networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # Backend network not accessible from host
 
volumes:
  redis_data:
  prometheus_data:

6.3 Nginx Config cho Docker Compose

# nginx/nginx.conf
 
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
 
events {
    worker_connections 2048;
    multi_accept on;
}
 
http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    server_tokens off;
 
    log_format json_combined escape=json
        '{'
        '"time":"$time_iso8601",'
        '"remote_addr":"$remote_addr",'
        '"request":"$request",'
        '"status":$status,'
        '"body_bytes_sent":$body_bytes_sent,'
        '"request_time":$request_time,'
        '"upstream_addr":"$upstream_addr",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status":"$upstream_status"'
        '}';
 
    access_log /var/log/nginx/access.log json_combined;
 
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=general:10m rate=20r/s;
 
    # Upstream — 3 app instances
    upstream app_backend {
        least_conn;
 
        server app1:3000 max_fails=3 fail_timeout=30s;
        server app2:3000 max_fails=3 fail_timeout=30s;
        server app3:3000 max_fails=3 fail_timeout=30s;
 
        keepalive 16;
    }
 
    # HTTPS server
    server {
        listen 443 ssl http2;
        server_name localhost;
 
        ssl_certificate /etc/nginx/ssl/self-signed.crt;
        ssl_certificate_key /etc/nginx/ssl/self-signed.key;
        ssl_protocols TLSv1.2 TLSv1.3;
 
        # Security headers
        add_header X-Content-Type-Options nosniff always;
        add_header X-Frame-Options DENY always;
        add_header Strict-Transport-Security "max-age=31536000" always;
 
        # DDoS protection
        client_header_timeout 10s;
        client_body_timeout 10s;
        client_max_body_size 5m;
 
        location / {
            limit_req zone=general burst=30 nodelay;
 
            proxy_pass http://app_backend;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Request-ID $request_id;
 
            proxy_connect_timeout 5s;
            proxy_send_timeout 30s;
            proxy_read_timeout 30s;
 
            proxy_next_upstream error timeout http_502 http_503;
            proxy_next_upstream_tries 2;
        }
 
        location /health {
            access_log off;
            proxy_pass http://app_backend/health;
        }
    }
 
    # HTTP → HTTPS redirect
    server {
        listen 80;
        server_name localhost;
        return 301 https://$host$request_uri;
    }
 
    # Stub status for Prometheus exporter
    server {
        listen 8080;
        location /nginx_status {
            stub_status on;
            allow 172.16.0.0/12;  # Docker network
            deny all;
        }
    }
}

6.4 App Server (Node.js)

// app/server.js
 
const express = require('express');
const Redis = require('ioredis');
const crypto = require('crypto');
const os = require('os');
 
const app = express();
const PORT = process.env.PORT || 3000;
const INSTANCE_ID = process.env.INSTANCE_ID || 'unknown';
 
// Redis connection (shared session store — no sticky sessions needed!)
const redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379', {
  retryDelayOnFailover: 100,
  maxRetriesPerRequest: 3,
  lazyConnect: true,
});
 
let redisConnected = false;
redis.on('connect', () => { redisConnected = true; });
redis.on('error', () => { redisConnected = false; });
redis.connect().catch(() => {});
 
app.use(express.json());
 
// Middleware: Add instance info to response headers (useful for debugging LB)
app.use((req, res, next) => {
  res.setHeader('X-Served-By', INSTANCE_ID);
  res.setHeader('X-Request-ID', req.headers['x-request-id'] || crypto.randomUUID());
  next();
});
 
// Health check endpoint — deep check
app.get('/health', async (req, res) => {
  const checks = {
    instance: INSTANCE_ID,
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    redis: 'unknown',
  };
 
  try {
    if (redisConnected) {
      await redis.ping();
      checks.redis = 'ok';
    } else {
      checks.redis = 'disconnected';
    }
  } catch (err) {
    checks.redis = `error: ${err.message}`;
  }
 
  const isHealthy = checks.redis === 'ok';
  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? 'healthy' : 'degraded',
    ...checks,
  });
});
 
// Demo endpoint: shows which instance handles the request
app.get('/api/info', (req, res) => {
  res.json({
    instance: INSTANCE_ID,
    hostname: os.hostname(),
    timestamp: new Date().toISOString(),
    headers: {
      'x-real-ip': req.headers['x-real-ip'],
      'x-forwarded-for': req.headers['x-forwarded-for'],
      'x-forwarded-proto': req.headers['x-forwarded-proto'],
      'x-request-id': req.headers['x-request-id'],
    },
  });
});
 
// Demo endpoint: session via Redis (no sticky session needed)
app.post('/api/session', async (req, res) => {
  const sessionId = crypto.randomUUID();
  const sessionData = {
    userId: req.body.userId || 'anonymous',
    createdAt: new Date().toISOString(),
    createdBy: INSTANCE_ID,  // Track which instance created it
  };
 
  await redis.setex(`session:${sessionId}`, 3600, JSON.stringify(sessionData));
 
  res.json({ sessionId, ...sessionData });
});
 
app.get('/api/session/:id', async (req, res) => {
  const data = await redis.get(`session:${req.params.id}`);
  if (!data) {
    return res.status(404).json({ error: 'Session not found' });
  }
 
  const session = JSON.parse(data);
  res.json({
    ...session,
    servedBy: INSTANCE_ID,  // This can be different from createdBy!
    message: session.createdBy !== INSTANCE_ID
      ? `Created by ${session.createdBy}, served by ${INSTANCE_ID} — LB is working!`
      : `Created and served by same instance ${INSTANCE_ID}`,
  });
});
 
// Simulate slow endpoint (for testing least_conn algorithm)
app.get('/api/slow', async (req, res) => {
  const delay = parseInt(req.query.delay) || 2000;
  await new Promise(resolve => setTimeout(resolve, Math.min(delay, 10000)));
  res.json({ instance: INSTANCE_ID, delayed: delay });
});
 
// Graceful shutdown (connection draining support)
process.on('SIGTERM', () => {
  console.log(`[${INSTANCE_ID}] SIGTERM received. Starting graceful shutdown...`);
  // Stop accepting new connections
  server.close(() => {
    console.log(`[${INSTANCE_ID}] All connections closed. Exiting.`);
    redis.disconnect();
    process.exit(0);
  });
  // Force exit after 30s
  setTimeout(() => {
    console.error(`[${INSTANCE_ID}] Forced shutdown after timeout.`);
    process.exit(1);
  }, 30000);
});
 
const server = app.listen(PORT, () => {
  console.log(`[${INSTANCE_ID}] Server running on port ${PORT}`);
});

6.5 Dockerfile

# app/Dockerfile
 
FROM node:20-alpine
 
WORKDIR /app
 
COPY package.json package-lock.json* ./
RUN npm ci --production
 
COPY server.js .
 
# Non-root user (security)
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser
USER appuser
 
EXPOSE 3000
 
# Graceful shutdown: use exec form to receive SIGTERM
CMD ["node", "server.js"]

6.6 Test Load Balancing

# Test: gửi 10 requests, xem response header X-Served-By
for i in {1..10}; do
  echo "Request $i: $(curl -sk https://localhost/api/info | jq -r .instance)"
done
 
# Expected output (least_conn):
# Request 1: app1
# Request 2: app2
# Request 3: app3
# Request 4: app1
# Request 5: app2
# ...
 
# Test: session tạo ở app1, đọc từ app2 (proves stateless + Redis session)
SESSION_ID=$(curl -sk -X POST https://localhost/api/session \
  -H "Content-Type: application/json" \
  -d '{"userId": "hieu"}' | jq -r .sessionId)
 
echo "Session ID: $SESSION_ID"
 
# Đọc nhiều lần — sẽ được serve bởi các instance khác nhau
for i in {1..5}; do
  curl -sk "https://localhost/api/session/$SESSION_ID" | jq '{servedBy: .servedBy, message: .message}'
done
 
# Test: slow endpoint — least_conn sẽ tránh gửi request mới tới server đang busy
curl -sk "https://localhost/api/slow?delay=5000" &
curl -sk "https://localhost/api/slow?delay=5000" &
curl -sk "https://localhost/api/info" | jq .instance  # Sẽ tới server không busy

7. Mermaid Diagrams

7.1 L4 vs L7 Load Balancing Comparison

flowchart TB
    subgraph "Layer 4 Load Balancing"
        C1[Client] -->|TCP SYN| L4[L4 Load Balancer<br/>Chỉ thấy: IP + Port]
        L4 -->|Forward TCP packets| S1A[Server A<br/>10.0.1.1:3000]
        L4 -->|Forward TCP packets| S1B[Server B<br/>10.0.1.2:3000]
        L4 -->|Forward TCP packets| S1C[Server C<br/>10.0.1.3:3000]

        Note1["✅ Ultra-fast (μs overhead)<br/>✅ Protocol-agnostic<br/>❌ No content routing<br/>❌ No SSL termination<br/>📦 AWS NLB, LVS, HAProxy TCP"]
    end

    subgraph "Layer 7 Load Balancing"
        C2[Client] -->|HTTPS Request<br/>GET /api/users| L7[L7 Load Balancer<br/>Thấy: URL, Headers, Cookies, Body]
        L7 -->|"/api/*"| S2A[API Server A]
        L7 -->|"/api/*"| S2B[API Server B]
        L7 -->|"/static/*"| S2C[Static Server]
        L7 -->|"/ws/*"| S2D[WebSocket Server]

        Note2["✅ Content-based routing<br/>✅ SSL termination<br/>✅ Request manipulation<br/>❌ Slower (ms overhead)<br/>📦 AWS ALB, Nginx, Envoy"]
    end

    style L4 fill:#4fc3f7,stroke:#0277bd,stroke-width:2px,color:#000
    style L7 fill:#81c784,stroke:#2e7d32,stroke-width:2px,color:#000
    style Note1 fill:#e1f5fe,stroke:#0277bd,color:#000
    style Note2 fill:#e8f5e9,stroke:#2e7d32,color:#000

7.2 Full Load Balancer Architecture

flowchart TD
    Users[/"👤 Users (Global)"\]

    subgraph "Tier 1: Global Load Balancing (DNS)"
        GSLB["GSLB / DNS<br/>(Route 53 / Cloudflare)"]
        Users --> GSLB
        GSLB -->|"User ở Asia"| DC_ASIA["🌏 Asia DC"]
        GSLB -->|"User ở US"| DC_US["🌎 US DC"]
    end

    subgraph "Tier 2: Edge Protection"
        DC_ASIA --> CDN["CDN + WAF<br/>(CloudFront + AWS WAF)"]
        CDN -->|"Static content"| CACHE_EDGE["Edge Cache<br/>Cache Hit → Return"]
        CDN -->|"Dynamic content"| L4LB
    end

    subgraph "Tier 3: L4 Load Balancer"
        L4LB["L4 LB (NLB)<br/>TCP/UDP routing<br/>DDoS protection"]
        L4LB --> L7LB1["L7 LB #1<br/>(Nginx/ALB)"]
        L4LB --> L7LB2["L7 LB #2<br/>(Nginx/ALB)"]
    end

    subgraph "Tier 4: L7 Load Balancer"
        L7LB1 & L7LB2 -->|"/api/*"| API_POOL
        L7LB1 & L7LB2 -->|"/ws/*"| WS_POOL
        L7LB1 & L7LB2 -->|"/admin/*"| ADMIN_POOL
    end

    subgraph "Tier 5: Backend Server Pools"
        API_POOL["API Pool<br/>(least_conn)"]
        API_POOL --> API1["API Server 1"]
        API_POOL --> API2["API Server 2"]
        API_POOL --> API3["API Server 3"]
        API_POOL --> APIN["API Server N"]

        WS_POOL["WebSocket Pool<br/>(ip_hash)"]
        WS_POOL --> WS1["WS Server 1"]
        WS_POOL --> WS2["WS Server 2"]

        ADMIN_POOL["Admin Pool<br/>(round_robin)"]
        ADMIN_POOL --> ADM1["Admin Server 1"]
    end

    subgraph "Shared State"
        API1 & API2 & API3 & APIN --> REDIS[("Redis Cluster<br/>Sessions + Cache")]
        API1 & API2 & API3 & APIN --> DB[("Database<br/>Primary + Replicas")]
    end

    subgraph "Monitoring"
        L7LB1 & L7LB2 -.->|metrics| PROM["Prometheus"]
        API1 & API2 & API3 -.->|metrics| PROM
        PROM --> GRAFANA["Grafana Dashboard"]
        PROM --> ALERT["AlertManager<br/>→ Slack/PagerDuty"]
    end

    style GSLB fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#000
    style L4LB fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
    style L7LB1 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
    style L7LB2 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
    style REDIS fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:#000
    style DB fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:#000

7.3 Health Check Flow

sequenceDiagram
    participant LB as Load Balancer
    participant S1 as Server A (healthy)
    participant S2 as Server B (failing)
    participant S3 as Server C (healthy)

    Note over LB: Active Health Check (every 5s)

    loop Every 5 seconds
        LB->>S1: GET /health
        S1-->>LB: 200 OK ✅
        LB->>S2: GET /health
        S2-->>LB: 503 Error ❌ (fail 1/3)
        LB->>S3: GET /health
        S3-->>LB: 200 OK ✅
    end

    Note over LB,S2: After 3 consecutive failures...

    LB->>S2: GET /health
    S2-->>LB: 503 Error ❌ (fail 2/3)
    LB->>S2: GET /health
    S2-->>LB: timeout ❌ (fail 3/3)

    Note over LB: Server B marked UNHEALTHY<br/>Removed from pool

    rect rgb(255, 230, 230)
        Note over LB: Traffic only to A and C
        LB->>S1: Client Request
        S1-->>LB: Response
        LB->>S3: Client Request
        S3-->>LB: Response
    end

    Note over S2: Server B recovers...

    LB->>S2: GET /health
    S2-->>LB: 200 OK ✅ (pass 1/2)
    LB->>S2: GET /health
    S2-->>LB: 200 OK ✅ (pass 2/2)

    Note over LB: Server B marked HEALTHY<br/>Added back to pool

    rect rgb(230, 255, 230)
        Note over LB: Traffic to A, B, and C
        LB->>S1: Client Request
        LB->>S2: Client Request
        LB->>S3: Client Request
    end

7.4 Connection Draining Flow

sequenceDiagram
    participant Ops as DevOps Engineer
    participant LB as Load Balancer
    participant S as Server (draining)
    participant C1 as Existing Client
    participant C2 as New Client

    Ops->>LB: Mark Server as "draining"

    Note over LB,S: Phase 1: Stop new requests

    C2->>LB: New Request
    LB-->>C2: Route to other server (NOT this one)

    Note over LB,S: Phase 2: Complete in-flight requests

    C1->>S: Continue existing request...
    S-->>C1: Response (completed)

    C1->>S: Another in-flight request...
    S-->>C1: Response (completed)

    Note over LB,S: Phase 3: All connections closed

    LB->>S: Remove from pool
    Ops->>S: Safe to shutdown/update

    Note over S: Deploy new version, restart

    S->>LB: Register back (healthy)
    LB->>S: Health check → 200 OK
    Note over LB,S: Server back in pool!

8. Aha Moments & Pitfalls

Aha Moments

#1 — LB as Single Point of Failure: “Nếu Load Balancer chết thì sao?” — Đây là câu hỏi mọi interviewer sẽ hỏi. Trả lời: dùng HA pair (2 LB instances). Active LB chết → Standby LB lên thay trong vài giây nhờ Keepalived + Virtual IP (VRRP protocol). Hoặc dùng cloud LB (AWS ALB/NLB) — AWS quản lý HA cho bạn.

#2 — LB không chỉ 1 tầng: Hệ thống production lớn thường có 3-4 tầng LB: DNS (GSLB) → L4 LB (DDoS protection) → L7 LB (routing/SSL) → Service Mesh sidecar (inter-service). Mỗi tầng giải quyết một vấn đề khác nhau.

#3 — Least Connections thắng Round Robin: Trong thực tế, requests có processing time khác nhau rất lớn (1ms vs 5s). Round Robin phân đều số request nhưng không phân đều load. Least Connections thông minh hơn vì nó phân tải dựa trên actual load thay vì request count.

#4 — Health check interval là trade-off: Interval ngắn (1s) → phát hiện failure nhanh nhưng tốn bandwidth + gây load cho backend. Interval dài (30s) → tiết kiệm nhưng user bị ảnh hưởng 30 giây trước khi server bị remove. Thực tế: 5-10 giây là sweet spot.

#5 — SSL termination tiết kiệm đáng kinh ngạc: SSL handshake tốn ~2-5ms CPU time. Với 50,000 QPS, nếu mỗi backend server phải handle SSL → mỗi server mất ~100-250ms CPU/giây chỉ cho SSL. Terminate tại LB → giảm 15-30% CPU usage cho backend.

Pitfalls (Bẫy thường gặp)

Pitfall 1: Load Balancer là Single Point of Failure

Sai: Deploy 1 Nginx instance làm LB, không có backup. Đúng: Luôn deploy HA pair (active-passive hoặc active-active). Dùng Keepalived cho on-premise, hoặc managed LB (AWS ALB) cho cloud. LB chết = toàn bộ hệ thống chết, nên đây phải là component có availability cao nhất.

Pitfall 2: Sticky Sessions gây uneven load

Sai: Enable sticky sessions “cho an toàn” → 1 server có 10,000 sessions, server khác có 100. Đúng: Dùng external session store (Redis). Nếu buộc phải dùng sticky sessions (legacy app), monitor sessions per server và set max session limit per server.

Pitfall 3: Health Check False Positives

Sai: Health check chỉ ping port → server “alive” nhưng app đã crash / DB connection exhausted. Đúng: Implement deep health check kiểm tra DB, Redis, disk space. Nhưng cẩn thận: nếu DB chậm, ĐỪNG để health check fail cho tất cả servers → cascade failure. Tách liveness (process alive?) vs readiness (ready to serve?).

Pitfall 4: Không bật Connection Draining

Sai: Remove server khỏi pool ngay lập tức → 500 users đang giữa transaction bị error 502. Đúng: Luôn bật connection draining với timeout hợp lý (30-300s). AWS ALB default 300s — thường quá dài, nên tune xuống 30-60s.

Pitfall 5: Quên test failover

Sai: Config LB xong, deploy, quên test khi server thật sự chết thì sao. Đúng: Chaos testing — chủ động kill server và verify: (1) LB detect trong bao lâu? (2) Traffic có bị ảnh hưởng? (3) Server recovery có tự động? Netflix gọi đây là Chaos Monkey.

Pitfall 6: Over-relying on LB for security

Sai: “Có LB rồi, không cần firewall/WAF.” Đúng: LB là 1 layer trong defense-in-depth. Vẫn cần: WAF (application-level filtering), Security Groups/Firewall (network-level), Rate Limiter (Tuan-09-Rate-Limiter), Authentication/Authorization ở backend.

Pitfall 7: Ignoring LB as bottleneck

Sai: Thêm 50 backend servers nhưng vẫn chỉ 1 LB → LB trở thành bottleneck. Đúng: Monitor LB metrics (connections, bandwidth, CPU). Khi LB đạt 70% capacity → scale LB (thêm instance, upgrade hardware, hoặc chuyển sang cloud managed LB).

9. Internal Links — Liên kết kiến thức

Topic	Link	Liên quan thế nào
Scale from Zero to Millions	Tuan-01-Scale-From-Zero-To-Millions	LB là bước đầu tiên khi scale beyond single server
Back-of-the-envelope	Tuan-02-Back-of-the-envelope	Estimation QPS để sizing LB + backend
Networking, DNS, CDN	Tuan-03-Networking-DNS-CDN	DNS-based GSLB, CDN trước LB
Cache Strategy	Tuan-06-Cache-Strategy	Cache giảm load → giảm yêu cầu cho LB
Database Sharding	Tuan-07-Database-Sharding-Replication	LB cho DB connections (ProxySQL, PgBouncer)
Message Queue	Tuan-08-Message-Queue	Queue absorb spikes → bảo vệ backend sau LB
Rate Limiter	Tuan-09-Rate-Limiter	Rate limiting tại LB layer
Consistent Hashing	Tuan-10-Consistent-Hashing	LB algorithm cho cache/DB routing
Monitoring & Observability	Tuan-13-Monitoring-Observability	Monitor LB metrics, alert thresholds
Data Security & Encryption	Tuan-15-Data-Security-Encryption	SSL/TLS termination, mTLS

Tham khảo

Alex Xu, System Design Interview — Chapter 1: Scale From Zero To Millions (Load Balancer section)
Nginx Documentation: nginx.org/en/docs
HAProxy Documentation: docs.haproxy.org
Envoy Proxy: envoyproxy.io
AWS Elastic Load Balancing: docs.aws.amazon.com/elasticloadbalancing
Maglev Paper (Google): Maglev: A Fast and Reliable Software Network Load Balancer (NSDI 2016)
The Power of Two Choices: The Power of Two Choices in Randomized Load Balancing (Mitzenmacher, 2001)
Tuan-04-API-Design — API Gateway vs Load Balancer
Tuan-10-Consistent-Hashing — Thuật toán hash nhất quán chi tiết

Tuần tới: Tuan-06-Cache-Strategy — Tại sao “thêm cache” không đơn giản như bạn nghĩ

lthieu's notes

Explorer

Tuan-05-Load-Balancer

Tuần 05: Load Balancer — Bộ não phân luồng của hệ thống

1. Context & Why

Analogy đời thường — Quầy thu ngân siêu thị

Tại sao Load Balancer là component đầu tiên cần hiểu?

2. Deep Dive — Kiến thức cốt lõi

2.1 L4 vs L7 Load Balancing

Layer 4 Load Balancing (Transport Layer — Tầng vận chuyển)

Layer 7 Load Balancing (Application Layer — Tầng ứng dụng)

Bảng so sánh L4 vs L7

2.2 Thuật toán Load Balancing (LB Algorithms)

2.2.1 Round Robin (Vòng tròn)

2.2.2 Weighted Round Robin (Vòng tròn có trọng số)

2.2.3 Least Connections (Ít kết nối nhất)

2.2.4 IP Hash

2.2.5 Consistent Hashing (Hash nhất quán)

2.2.6 Các thuật toán khác

2.3 Health Checks (Kiểm tra sức khoẻ)

Active Health Check (Chủ động kiểm tra)

Passive Health Check (Kiểm tra bị động)

So sánh Active vs Passive Health Check

2.4 Sticky Sessions (Phiên gắn cố định)

2.5 SSL/TLS Termination (Chấm dứt SSL tại LB)

2.6 Connection Draining (Graceful Shutdown — Tắt server nhẹ nhàng)

2.7 Global Server Load Balancing (GSLB)

2.8 Hardware vs Software Load Balancer

Hardware LB (Thiết bị vật lý)

Software LB (Phần mềm)

2.9 Service Mesh Load Balancing

3. Estimation — Tính số server backend cần thiết

Bài toán

Bước 1: Tính số server tối thiểu

Bước 2: Thêm headroom cho spikes

Bước 3: Thêm redundancy cho availability

Bước 4: Tính capacity cần cho Load Balancer

Bước 5: LB bandwidth estimation

Tóm tắt estimation

Estimation formula tổng quát

4. Security — Bảo mật tại tầng Load Balancer

4.1 DDoS Protection tại LB Layer

4.2 WAF Integration (Web Application Firewall)

4.3 SSL Offloading — Security Implications

4.4 Header Injection & X-Forwarded-For Trust

4.5 Security Checklist cho Load Balancer

5. DevOps — Cấu hình thực chiến

5.1 Nginx Load Balancer Configuration

5.2 HAProxy Configuration

5.3 AWS ALB/NLB Setup (Terraform)

5.4 Prometheus Metrics cho Load Balancer

5.5 Blue-Green Deployment with Load Balancer

6. Code — Nginx + Docker Compose Full Stack

6.1 Project Structure

6.2 Docker Compose — Multiple App Instances Behind Nginx

6.3 Nginx Config cho Docker Compose

6.4 App Server (Node.js)

6.5 Dockerfile

6.6 Test Load Balancing

7. Mermaid Diagrams

7.1 L4 vs L7 Load Balancing Comparison

7.2 Full Load Balancer Architecture

7.3 Health Check Flow

7.4 Connection Draining Flow

8. Aha Moments & Pitfalls

Aha Moments

Pitfalls (Bẫy thường gặp)

Pitfall 1: Load Balancer là Single Point of Failure

Pitfall 2: Sticky Sessions gây uneven load

Pitfall 3: Health Check False Positives

Pitfall 4: Không bật Connection Draining

Pitfall 5: Quên test failover

Pitfall 6: Over-relying on LB for security

Pitfall 7: Ignoring LB as bottleneck

9. Internal Links — Liên kết kiến thức

Tham khảo

Graph View

Table of Contents

Backlinks