Tuần 01: Scale from Zero to Millions of Users

“Không ai xây nhà 100 tầng ngay từ đầu. Họ bắt đầu bằng một căn phòng, rồi mở rộng khi có thêm người ở. Hệ thống phần mềm cũng vậy.”

Tags: system-design scaling architecture alex-xu Prerequisite: Không — đây là tuần đầu tiên Liên quan: Tuan-02-Back-of-the-envelope · Tuan-03-Networking-DNS-CDN · Tuan-05-Load-Balancer · Tuan-06-Cache-Strategy · Tuan-07-Database-Sharding-Replication · Tuan-08-Message-Queue

1. Context & Why

Analogy đời thường

Hieu, tưởng tượng em mở một quán cơm trưa ở Sài Gòn.

Giai đoạn 1 — 1 người bán, 1 bàn: Em tự nấu, tự bưng, tự tính tiền. Khách ít (10 người/ngày) thì ổn. Đây là single server — web server, app server, database đều nằm trên 1 máy.

Giai đoạn 2 — Khách đông hơn, em mua bếp to hơn: Thay vì thuê thêm người, em đầu tư bếp công nghiệp, bàn ghế nhiều hơn. Đây là vertical scaling — nâng cấp phần cứng (CPU, RAM, SSD) cho cùng 1 server.

Giai đoạn 3 — Quá tải, mở thêm chi nhánh: Một quán không đủ phục vụ 500 khách/ngày. Em mở thêm 2 chi nhánh, mỗi nơi phục vụ một khu vực. Đây là horizontal scaling — thêm nhiều server.

Giai đoạn 4 — Tổng đài tiếp nhận đơn: Khách gọi tới tổng đài, tổng đài điều phối đơn tới chi nhánh nào rảnh. Đây là load balancer.

Giai đoạn 5 — Nhà bếp trung tâm + nhân bản công thức: Một bếp chính (master) quyết định menu và cập nhật công thức. Các bếp phụ (slave/replica) chỉ nấu theo công thức đã có. Đây là database replication.

Giai đoạn 6 — Quầy pha sẵn: Món phổ biến (cơm sườn, cơm gà) được nấu sẵn để phục vụ nhanh thay vì nấu từ đầu mỗi lần. Đây là cache layer.

Giai đoạn 7 — Xe đẩy di động khắp quận: Thay vì bắt khách chạy tới quán chính, em đặt xe đẩy ở mỗi khu phố bán đồ có sẵn. Đây là CDN (Content Delivery Network).

Giai đoạn 8 — Franchise toàn quốc: Mỗi khu vực có hệ thống độc lập, không chia sẻ kho nguyên liệu. Đây là database sharding.

Bản chất của scaling là: bắt đầu đơn giản nhất có thể, chỉ phức tạp hoá khi có lý do cụ thể (data-driven decision). Đó cũng là triết lý xuyên suốt của Alex Xu trong Chapter 1.

Tại sao Alex Xu đặt nó ở Chương 1?

Vì đây là bản đồ toàn cảnh (big picture) cho toàn bộ cuốn sách. Mỗi component được giới thiệu ở chương này sẽ có chương riêng đi sâu:

Load Balancer → Tuan-05-Load-Balancer
Cache → Tuan-06-Cache-Strategy
Database Sharding → Tuan-07-Database-Sharding-Replication
Message Queue → Tuan-08-Message-Queue

Nếu không hiểu bản đồ toàn cảnh, các chương sau sẽ rời rạc và thiếu context.

2. Deep Dive — Các khái niệm cốt lõi

2.1 Single Server Setup — Khởi đầu đơn giản nhất

Mọi hệ thống đều bắt đầu từ đây:

User → DNS → Web Server (chứa luôn App + DB)

Luồng request chi tiết:

User gõ example.com trên browser
Browser gửi DNS query → nhận về IP (ví dụ: 93.184.216.34)
Browser gửi HTTP request tới IP đó
Web server nhận request, xử lý logic, query database (cùng máy), trả response

Khi nào dùng single server?

MVP / Prototype
Internal tool cho < 100 users
Side project, blog cá nhân
QPS < 100, data < 10GB

Giới hạn của single server:

Single Point of Failure (SPOF): Server chết = toàn bộ system chết
Resource ceiling: Một máy dù mạnh đến mấy cũng có giới hạn CPU, RAM, disk I/O
Không thể maintenance: Muốn update OS hay patch security → phải downtime

Thực tế: Rất nhiều startup thành công ban đầu chỉ chạy trên 1 server. StackOverflow từng serve hàng triệu request/ngày trên vài server physical. Đừng over-engineer quá sớm.

2.2 Tách Database ra server riêng — Bước đầu tiên

Khi traffic tăng, điều đầu tiên nên làm là tách database ra khỏi application server:

User → Web Server (App logic) → Database Server

Lý do tách:

Resource isolation: App server cần CPU mạnh (compute-heavy), DB server cần RAM + SSD (I/O-heavy). Trộn chung sẽ tranh chấp resource
Scale độc lập: Có thể nâng cấp DB server mà không ảnh hưởng app, và ngược lại
Security: DB server không cần expose ra internet → đặt trong private subnet

Chọn database nào?

Tiêu chí	Relational (SQL)	Non-Relational (NoSQL)
Data structure	Structured, schema rõ ràng	Flexible, schema-less
Relationships	Joins phức tạp giữa tables	Denormalized, nested documents
ACID	Đảm bảo hoàn toàn	Tuỳ loại (eventual consistency)
Scale	Vertical chủ yếu, horizontal khó	Horizontal dễ dàng hơn
Ví dụ	PostgreSQL, MySQL	MongoDB, Cassandra, DynamoDB
Dùng khi	E-commerce, banking, CRM	Logging, real-time analytics, IoT

Rule of thumb: Nếu không có lý do đặc biệt → chọn SQL (PostgreSQL). Relational database đã được kiểm chứng qua hàng thập kỷ và phù hợp với 80% use cases.

2.3 Vertical Scaling vs Horizontal Scaling

Vertical Scaling (Scale Up)

Nâng cấp phần cứng cho 1 server: thêm CPU cores, thêm RAM, chuyển sang NVMe SSD.

Ưu điểm	Nhược điểm
Đơn giản, không cần thay đổi code	Có giới hạn vật lý (max ~128 cores, 12TB RAM)
Không có distributed system complexity	Không giải quyết SPOF
Latency thấp (mọi thứ cùng máy)	Chi phí tăng theo hàm mũ
Debug dễ hơn	Downtime khi nâng cấp

Chi phí thực tế trên AWS EC2 (us-east-1, 2024):

Instance	vCPU	RAM	Giá/tháng (on-demand)
t3.medium	2	4 GB	~$30
m5.xlarge	4	16 GB	~$140
m5.4xlarge	16	64 GB	~$560
m5.16xlarge	64	256 GB	~$2,240
m5.24xlarge	96	384 GB	~$3,360
x2idn.32xlarge	128	2,048 GB	~$13,340

Nhận xét: Từ 4GB RAM ( $30) l \overset{e}{^} n 2 TBR A M ($ 13,340) — giá tăng 444 lần nhưng RAM chỉ tăng 512 lần. Và quan trọng hơn: vẫn là SPOF.

Horizontal Scaling (Scale Out)

Thêm nhiều server, mỗi server xử lý một phần traffic.

Ưu điểm	Nhược điểm
Gần như không giới hạn scale	Phức tạp hơn (distributed system)
Redundancy tự nhiên (1 server chết, còn lại vẫn chạy)	Cần load balancer
Chi phí tuyến tính (linear)	Data consistency khó hơn
Không cần downtime để thêm server	Debugging phức tạp hơn

Aha Moment: Vertical scaling là giải pháp ngắn hạn, horizontal scaling là giải pháp dài hạn. Mọi hệ thống lớn đều horizontal scaling. Google, Facebook, Netflix — không ai chạy trên 1 siêu máy tính.

2.4 Load Balancer — Bộ não điều phối

Load balancer là component đứng giữa user và server pool, phân phối traffic đều giữa các server.

Users → Load Balancer (public IP) → Server 1 (private IP)
                                   → Server 2 (private IP)
                                   → Server N (private IP)

Các thuật toán phân phối phổ biến:

Algorithm	Cách hoạt động	Khi nào dùng
Round Robin	Lần lượt từng server	Servers đồng đều, request tương tự nhau
Weighted Round Robin	Server mạnh nhận nhiều hơn	Servers không đồng đều
Least Connections	Gửi tới server ít connection nhất	Long-lived connections (WebSocket)
IP Hash	Hash IP client → cố định server	Cần session affinity (nhưng nên tránh)
Random	Random server	Khi không cần deterministic

Lợi ích quan trọng nhất: loại bỏ SPOF ở tầng application

Server 1 chết → Load balancer tự động route traffic sang Server 2, 3, …
Thêm Server 4 → Load balancer tự nhận thêm (nếu có auto-discovery)
Health check: Load balancer gửi ping định kỳ (mỗi 5–10s) tới mỗi server. Server không response → đánh dấu “unhealthy” → ngừng gửi traffic

Quan trọng: Load balancer expose public IP, còn các server phía sau chỉ có private IP. Không ai trên internet có thể truy cập trực tiếp vào server → tăng security đáng kể.

Chi tiết: Tuan-05-Load-Balancer

2.5 Database Replication — Master-Slave

Khi 1 database server trở thành bottleneck (thường xảy ra khi read QPS > 5,000):

App Servers → Write → Master DB
App Servers → Read  → Slave DB 1
                    → Slave DB 2
                    → Slave DB N

Nguyên tắc hoạt động:

Master: Chỉ nhận write operations (INSERT, UPDATE, DELETE)
Slave (Replica): Nhận copy data từ master, chỉ phục vụ read operations (SELECT)
Replication: Master ghi WAL (Write-Ahead Log) → Slave đọc WAL và replay

Tại sao hiệu quả?

Hầu hết ứng dụng có read:write ratio = 10:1 hoặc cao hơn:

Facebook: đọc newsfeed nhiều hơn rất nhiều so với tạo post
Wikipedia: đọc bài viết >> chỉnh sửa bài viết
E-commerce: xem sản phẩm >> mua hàng

Nếu 90% traffic là read → thêm 3 slave sẽ giảm load trên mỗi DB server ~75%.

Tình huống failover:

Scenario	Xử lý
Slave chết	Route read traffic sang slave khác. Dựng slave mới replicate từ master
Master chết	Promote 1 slave thành master mới. Các slave còn lại replicate từ master mới. Đây là quá trình phức tạp nhất — cần automated failover

Replication lag — Vấn đề không thể tránh:

Asynchronous replication có lag (thường 10ms – 1s, worst case vài giây). Nghĩa là:

User write data vào master
User ngay lập tức read → request đi tới slave → slave chưa có data mới → user thấy data cũ

Giải pháp:

Read-after-write consistency: Sau khi write, force read từ master trong 1-2 giây
Synchronous replication: Slave phải confirm đã nhận data trước khi master trả về success (đánh đổi latency)

Chi tiết: Tuan-07-Database-Sharding-Replication

2.6 Cache Layer — Tốc độ là vua

Cache là lớp lưu trữ tạm in-memory (nhanh hơn disk 100-1000 lần), chứa data được truy cập thường xuyên.

App Server → Check Cache (Redis/Memcached)
           → Cache HIT → Return ngay (< 1ms)
           → Cache MISS → Query DB → Lưu vào Cache → Return

Cache strategy phổ biến nhất — Cache-Aside (Lazy Loading):

def get_user(user_id):
    # 1. Check cache trước
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)  # Cache HIT
 
    # 2. Cache MISS → query DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
 
    # 3. Lưu vào cache với TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))  # TTL = 1 hour
 
    return user

Khi nào dùng cache?

Data read frequently, modified infrequently
Có thể chấp nhận stale data trong khoảng thời gian ngắn (TTL)
Query database tốn thời gian (complex joins, aggregations)

Khi nào KHÔNG dùng cache?

Data thay đổi liên tục và cần real-time (ví dụ: số dư tài khoản ngân hàng đang giao dịch)
Data lớn nhưng ít truy cập lại → lãng phí RAM

Redis vs Memcached:

Feature	Redis	Memcached
Data structures	String, Hash, List, Set, Sorted Set, Stream	String only
Persistence	RDB + AOF (có thể khôi phục)	Không persist
Replication	Có (master-slave)	Không
Cluster mode	Có	Có (client-side sharding)
Memory efficiency	Tốt	Tốt hơn một chút cho simple strings
Pub/Sub	Có	Không
Lua scripting	Có	Không

Recommendation: Chọn Redis trừ khi có lý do đặc biệt. Redis phổ biến hơn, feature-rich hơn, community lớn hơn.

Con số thực tế:

Redis single node: 100,000+ ops/s, latency < 1ms
PostgreSQL simple query: 5,000–20,000 QPS, latency 1-10ms
Cache hit rate mục tiêu: > 80% (nếu < 80%, cache không hiệu quả, cần review strategy)

Chi tiết: Tuan-06-Cache-Strategy

2.7 CDN (Content Delivery Network) — Đưa nội dung tới gần user

CDN là mạng lưới server phân tán toàn cầu, cache static assets (images, CSS, JS, videos) ở các edge location gần user nhất.

User ở Việt Nam → CDN Edge (Singapore, 15ms) → Origin Server (US, 200ms)
                  Cache HIT → Trả về ngay (15ms)
                  Cache MISS → Lấy từ Origin → Cache lại → Trả về

Pull CDN vs Push CDN:

Loại	Cách hoạt động	Khi nào dùng
Pull	CDN tự lấy từ origin khi có request đầu tiên	Hầu hết trường hợp. Traffic lớn, content đa dạng
Push	Dev upload trực tiếp lên CDN	Content ít thay đổi, cần kiểm soát chặt

CDN Invalidation — Vấn đề “đau đầu nhất”:

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

Cách xử lý:

TTL (Time-to-Live): Set TTL ngắn cho content hay đổi (CSS: 1 ngày), TTL dài cho content ít đổi (images: 1 năm)
Versioning: style.css?v=2.1 hoặc style.a1b2c3d4.css — URL mới = object mới trên CDN
Purge API: Gọi API CDN để xoá cache cụ thể (dùng khi cần update ngay lập tức)

Chi phí CDN thực tế:

Provider	Giá / GB (trung bình)	Ghi chú
CloudFront	$0.085/GB	AWS ecosystem, tích hợp tốt
Cloudflare	$0 (Free tier generous)	Miễn phí bandwidth, trả tiền cho features
Fastly	$0.12/GB	Real-time purge, dùng bởi GitHub

Tip: Không cache dynamic content trên CDN (trừ khi dùng edge computing). CDN là cho static assets.

Chi tiết: Tuan-03-Networking-DNS-CDN

2.8 Stateless vs Stateful Architecture — Quyết định then chốt

Stateful Server (Có trạng thái)

Server lưu session data (ai đang login, giỏ hàng) trên chính server đó.

User A → Server 1 (lưu session A)
User A → (lần sau) PHẢI tới Server 1 → Nếu Server 1 chết → Mất session → Phải login lại

Vấn đề:

Sticky session: Load balancer phải luôn route User A tới Server 1 → không thể phân phối đều
Không scale được: Thêm Server 3 → không có session cũ → user phải login lại
SPOF: Server chết = mất toàn bộ session trên server đó

Stateless Server (Không trạng thái)

Server KHÔNG lưu bất kỳ state nào. Tất cả state được lưu ở external store (Redis, database).

User A → Bất kỳ Server nào → Đọc session từ Redis → Xử lý → Trả về

Lợi ích:

Horizontal scaling dễ dàng: Thêm server mới không cần migrate session
Không cần sticky session: Load balancer round-robin thoải mái
Auto-scaling: Có thể tự động thêm/bớt server dựa trên traffic
Zero-downtime deployment: Rolling update từng server, user không bị ảnh hưởng

# Stateful (BAD) - Session lưu trên server
session_store = {}  # In-memory dictionary trên server
 
@app.route('/api/cart')
def get_cart():
    session_id = request.cookies.get('session_id')
    cart = session_store.get(session_id, {})  # Mất nếu server restart
    return jsonify(cart)
 
# Stateless (GOOD) - Session lưu trên Redis
@app.route('/api/cart')
def get_cart():
    session_id = request.cookies.get('session_id')
    cart = redis.get(f"cart:{session_id}")  # Tồn tại độc lập với server
    return jsonify(json.loads(cart) if cart else {})

Rule of thumb: LUÔN LUÔN thiết kế stateless server. Đây là điều kiện tiên quyết cho horizontal scaling. Nếu codebase hiện tại stateful → ưu tiên refactor sang stateless trước khi scale.

2.9 Database Sharding — Chia để trị

Khi database quá lớn cho 1 server (thường > 1-5TB hoặc QPS > 20K), cần chia data ra nhiều database server (shard).

Horizontal Sharding (phổ biến nhất):

Chia rows vào các shard khác nhau dựa trên shard key:

Shard Key: user_id
Shard function: user_id % 4

user_id = 1  → Shard 0 (1 % 4 = 1) → DB Server 1
user_id = 5  → Shard 1 (5 % 4 = 1) → DB Server 1
user_id = 10 → Shard 2 (10 % 4 = 2) → DB Server 2
user_id = 15 → Shard 3 (15 % 4 = 3) → DB Server 3

Chọn Shard Key — Quyết định quan trọng nhất:

Shard Key	Ưu điểm	Nhược điểm
user_id	Phân bố đều, query by user nhanh	Cross-user queries (leaderboard) cần scatter-gather
geo_region	Compliance data residency, latency thấp cho user cùng region	Uneven distribution (VN ít hơn US)
timestamp	Range queries hiệu quả	Hot shard (tất cả writes vào shard hiện tại)
hash(user_id)	Phân bố đều nhất	Mất range query capability

Vấn đề khó của Sharding:

Resharding: Khi cần thêm shard, phải migrate data → phức tạp. Giải pháp: Consistent Hashing (xem Tuan-04-Consistent-Hashing)
Cross-shard queries: JOIN giữa 2 shard → phải query 2 DB rồi merge ở application → chậm
Celebrity problem (Hotspot): Justin Bieber có 400M followers → shard chứa Bieber bị quá tải
Distributed transactions: 2-phase commit giữa các shard → chậm và phức tạp

Nguyên tắc: Sharding là giải pháp cuối cùng, không phải đầu tiên. Trước khi shard, hãy thử: vertical scaling DB, read replicas, caching, query optimization, partitioning (trong cùng 1 DB).

Chi tiết: Tuan-07-Database-Sharding-Replication

2.10 Message Queue — Giải phóng xử lý bất đồng bộ

Message Queue cho phép decouple components: producer gửi message vào queue, consumer xử lý khi sẵn sàng.

User uploads image → API Server → Push to Queue → Return "Processing..."
                                                  ↓
                               Image Worker 1 → Resize, compress, upload S3
                               Image Worker 2 → (scale independently)

Khi nào cần Message Queue?

Task tốn thời gian (> 500ms): image processing, email sending, video encoding
Cần buffer khi traffic spike: Queue giữ messages, workers xử lý từ từ
Decouple services: Payment service không cần biết Notification service tồn tại

Message Queue phổ biến:

Queue	Throughput	Ordering	Use case
RabbitMQ	20-50K msg/s	Per-queue FIFO	Task distribution, RPC
Apache Kafka	100K-1M msg/s	Per-partition ordering	Event streaming, log aggregation
Amazon SQS	Auto-scale	Best-effort (Standard), FIFO (FIFO)	Cloud-native, serverless
Redis Streams	100K+ msg/s	Per-stream	Lightweight, đã có Redis sẵn

Aha Moment: Message Queue không chỉ là “để làm async”. Nó là cách biến hệ thống tightly-coupled thành loosely-coupled. Khi service A push message, nó không cần biết ai consume. Có thể thêm consumer mới mà không sửa service A.

Chi tiết: Tuan-08-Message-Queue

2.11 Tổng hợp — Các lớp Scaling theo thứ tự

Đây là lộ trình scaling mà Alex Xu recommend (và thực tế các công ty cũng làm tương tự):

Giai đoạn	Users	Hành động	Phức tạp
1	0 – 100	Single server (app + DB cùng máy)	Thấp
2	100 – 1K	Tách DB ra server riêng	Thấp
3	1K – 10K	Vertical scaling (nâng cấp server)	Thấp
4	10K – 100K	Load Balancer + Multiple app servers	Trung bình
5	100K – 500K	DB Read Replicas + Cache (Redis)	Trung bình
6	500K – 1M	CDN cho static assets	Trung bình
7	1M – 10M	Stateless architecture + Session store	Cao
8	10M – 50M	Message Queue + Async processing	Cao
9	50M – 100M	Database Sharding	Rất cao
10	100M+	Multi-region, Microservices	Rất cao

Quan trọng: Đây là hướng dẫn, không phải luật cứng. Tuỳ vào workload cụ thể mà thứ tự có thể khác. Một hệ thống media-heavy có thể cần CDN sớm hơn. Một hệ thống write-heavy có thể cần message queue sớm hơn sharding.

3. Back-of-the-envelope — Khi nào cần scale?

3.1 Ước lượng capacity cho single server

Một server trung bình (4 vCPU, 16GB RAM, SSD) với Node.js/Express:

C a p a c i t y_{s in g l e} \approx 5, 000 - 10, 000 re q u es t s / s (s im pl e A P I)

C a p a c i t y_{s in g l e} \approx 500 - 2, 000 re q u es t s / s (v ớ i D B q u ery)

3.2 Từ DAU → cần bao nhiêu servers?

Bước 1: Tính QPS

QP S_{a vg} = \frac{D A U \times requests/user/day}{86 , 400}

QP S_{p e ak} = QP S_{a vg} \times p e ak_f a c t or (t h ư ờ n g 2 x - 5 x)

Bước 2: Tính số servers cần thiết

N_{ser v ers} = ⌈ \frac{QP S _{p e ak}}{C a p a c i t y _{p er_ser v er}} ⌉ \times s a f e t y_f a c t or

Safety factor thường = 1.5 – 2.0 (dự phòng server fail + maintenance)

3.3 Ví dụ cụ thể: Ứng dụng E-commerce

Assumptions:

Thông số	Giá trị
DAU	1M (1 triệu)
Requests/user/day	20 (browse, search, add to cart)
Peak multiplier	5x (flash sale)
Server capacity	1,000 req/s (with DB queries)
Avg request size	5 KB
Avg response size	20 KB

Tính QPS:

QP S_{a vg} = \frac{1 , 000 , 000 \times 20}{86 , 400} \approx 231 re q / s

QP S_{p e ak} = 231 \times 5 = 1, 155 re q / s

Tính servers:

N_{ser v ers} = ⌈ \frac{1 , 155}{1 , 000} ⌉ \times 2 = 2 \times 2 = 4 ser v ers

Nhận xét: 1M DAU chỉ cần ~4 app servers. Đây là lý do đừng over-engineer quá sớm. Nhưng cần ít nhất 2 servers cho redundancy dù chỉ 1 server đã đủ handle.

Tính bandwidth:

B an d w i d t h_{in} = 1, 155 \times 5 K B = 5.78 MB / s \approx 46 M b p s

B an d w i d t h_{o u t} = 1, 155 \times 20 K B = 23.1 MB / s \approx 185 M b p s

Tính khi nào cần scale tiếp (from 1 server to multiple):

T r i gg e r_{sc a l e} = QP S_{a c t u a l} > 0.7 \times C a p a c i t y_{ser v er}

= 0.7 \times 1, 000 = 700 re q / s

Tức khi QPS vượt 700 req/s → cần thêm server. Với DAU tương ứng:

D A U_{t r i gg er} = \frac{700 \times 86 , 400}{20 \times 5} = 604, 800 \approx 600 K D A U

Insight: Khoảng 600K DAU (với 20 req/user/day, peak 5x) là lúc 1 server bắt đầu “khó thở” và cần horizontal scaling.

3.4 Database Scaling Estimation

Khi nào single DB không đủ?

D B_{ma x_co nn ec t i o n s} \approx 200 - 500 (P os t g re SQ L d e f a u lt)

QP S_{D B} = \frac{QP S _{a pp} \times D B _ q u er i es _ p er _ re q u es t}{co nn ec t i o n _ p oo l _ s i ze}

Nếu mỗi API request trung bình cần 3 DB queries:

D B l o a d = 1, 155 \times 3 = 3, 465 q u er i es / s

PostgreSQL handle ~10K simple QPS → vẫn ổn. Nhưng nếu DAU tăng 10x:

D B l o a d_{10 M D A U} = 11, 550 \times 3 = 34, 650 q u er i es / s

→ Cần read replicas hoặc cache layer để giảm DB load.

Với cache hit rate 80%:

D B l o a d_{w i t h_c a c h e} = 34, 650 \times (1 - 0.8) = 6, 930 q u er i es / s

→ Vẫn trong capacity PostgreSQL single node. Cache mua thêm 5-10x headroom.

Xem thêm framework ước lượng chi tiết: Tuan-02-Back-of-the-envelope

4. Security First — Bảo mật tại mỗi giai đoạn scaling

4.1 Single Server — Nền tảng bảo mật

Ngay cả khi chỉ 1 server, PHẢI có:

HTTPS everywhere:

# /etc/nginx/sites-available/app
server {
    listen 80;
    server_name example.com;
    return 301 https://$server_name$request_uri;  # Force redirect HTTP → HTTPS
}
 
server {
    listen 443 ssl http2;
    server_name example.com;
 
    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
 
    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Content-Type-Options nosniff;
    add_header X-Frame-Options DENY;
    add_header X-XSS-Protection "1; mode=block";
 
    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Firewall cơ bản (UFW):

# Chỉ mở port cần thiết
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp    # SSH (nên đổi port khác, ví dụ 2222)
sudo ufw allow 80/tcp    # HTTP (redirect to HTTPS)
sudo ufw allow 443/tcp   # HTTPS
sudo ufw enable

Secrets management — KHÔNG BAO GIỜ hardcode:

# BAD — Hardcode trong code
DATABASE_URL="postgresql://admin:SuperSecret123@localhost/mydb"
 
# GOOD — Environment variables
DATABASE_URL="${DB_URL}"  # Set trong .env (KHÔNG commit vào git)
 
# BETTER — Dùng secrets manager
# AWS Secrets Manager, HashiCorp Vault, hoặc đơn giản là Docker secrets

4.2 Multiple Servers + Load Balancer — Network Segmentation

Internet → [Firewall/WAF] → [Load Balancer] → [Public Subnet]
                                                    ↓
                                              [Private Subnet]
                                              App Servers (no public IP)
                                                    ↓
                                              [Data Subnet]
                                              DB + Redis (no public IP,
                                              chỉ accept connection từ App subnet)

Nguyên tắc “Least Privilege Network”:

Public Subnet: Chỉ Load Balancer có public IP
Private Subnet: App servers — chỉ nhận traffic từ LB
Data Subnet: DB, Redis, Queue — chỉ nhận traffic từ App servers
Bastion Host: Để SSH vào private servers khi cần debug (không SSH trực tiếp)

4.3 Database Replication — Encryption in Transit

Master → Slave: MỌI replication traffic PHẢI encrypted (TLS)
App → DB: MỌI connection PHẢI qua TLS

# PostgreSQL ssl config
# postgresql.conf
ssl = on
ssl_cert_file = '/path/to/server.crt'
ssl_key_file = '/path/to/server.key'

4.4 Cache Layer — Bảo vệ Redis

Redis mặc định KHÔNG có authentication và bind 0.0.0.0 → CỰC KỲ NGUY HIỂM nếu expose ra internet.

# redis.conf — PHẢI config
bind 127.0.0.1 10.0.1.0    # Chỉ bind private IP
requirepass "StrongPasswordHere"  # Bắt buộc auth
rename-command FLUSHALL ""         # Disable dangerous commands
rename-command CONFIG ""
rename-command DEBUG ""
protected-mode yes

Thực tế đáng sợ: Năm 2020, hơn 75,000 Redis instances bị expose trên internet mà không có password → data bị xoá, server bị cài malware. Đừng là nạn nhân tiếp theo.

4.5 CDN — Security Headers & Origin Protection

Origin Shield: CDN chỉ accept requests từ CDN edge nodes, không từ internet trực tiếp
Signed URLs: Cho content private (video trả phí), CDN chỉ serve nếu URL có valid signature
WAF integration: CloudFront + AWS WAF, Cloudflare WAF → chặn SQL injection, XSS ở edge

4.6 Security Scaling Summary

Giai đoạn	Security checklist
Single Server	HTTPS, Firewall (UFW), env secrets, security headers
Multi-Server	Network segmentation (public/private/data subnets), bastion host
DB Replication	TLS cho replication traffic, encrypted connections
Cache Layer	Redis AUTH, bind private IP, disable dangerous commands
CDN	Signed URLs, origin protection, WAF
Sharding	Encryption at rest per-shard, separate credentials per shard
Message Queue	TLS + SASL authentication, message encryption

Chi tiết: Tuan-15-Data-Security-Encryption

5. DevOps/Ops-Light — Từ local dev đến production

5.1 Docker Basics — Containerize ứng dụng

Tại sao Docker?

“Works on my machine” → chạy giống nhau ở mọi nơi
Isolation: mỗi container như 1 mini-server
Reproducible: Dockerfile là “công thức” để build lại bất cứ lúc nào

Dockerfile cơ bản cho Node.js:

# Dockerfile
FROM node:20-alpine AS base
 
# Security: don't run as root
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
 
WORKDIR /app
 
# Install deps first (cache layer optimization)
COPY package*.json ./
RUN npm ci --only=production
 
# Copy source code
COPY src/ ./src/
 
# Switch to non-root user
USER appuser
 
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
 
EXPOSE 3000
CMD ["node", "src/server.js"]

Dockerfile cho Python (Flask/FastAPI):

# Dockerfile
FROM python:3.12-slim
 
RUN adduser --disabled-password --gecos '' appuser
 
WORKDIR /app
 
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
COPY app/ ./app/
 
USER appuser
 
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
 
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

5.2 Docker Compose — Mô phỏng scaled system trên local

# docker-compose.yml — Mô phỏng kiến trúc scaled system
version: "3.9"
 
services:
  # === LOAD BALANCER (Nginx) ===
  nginx:
    image: nginx:1.25-alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      app1:
        condition: service_healthy
      app2:
        condition: service_healthy
    networks:
      - frontend
    restart: unless-stopped
 
  # === APP SERVER 1 ===
  app1:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      - NODE_ENV=production
      - REDIS_URL=redis://:${REDIS_PASSWORD}@redis:6379
      - DATABASE_URL=postgresql://${DB_USER}:${DB_PASSWORD}@db-master:5432/${DB_NAME}
      - DATABASE_READ_URL=postgresql://${DB_USER}:${DB_PASSWORD}@db-slave:5432/${DB_NAME}
      - SERVER_ID=app1
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/health"]
      interval: 10s
      timeout: 3s
      retries: 3
    networks:
      - frontend
      - backend
    restart: unless-stopped
 
  # === APP SERVER 2 (clone của app1) ===
  app2:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      - NODE_ENV=production
      - REDIS_URL=redis://:${REDIS_PASSWORD}@redis:6379
      - DATABASE_URL=postgresql://${DB_USER}:${DB_PASSWORD}@db-master:5432/${DB_NAME}
      - DATABASE_READ_URL=postgresql://${DB_USER}:${DB_PASSWORD}@db-slave:5432/${DB_NAME}
      - SERVER_ID=app2
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/health"]
      interval: 10s
      timeout: 3s
      retries: 3
    networks:
      - frontend
      - backend
    restart: unless-stopped
 
  # === CACHE (Redis) ===
  redis:
    image: redis:7-alpine
    command: >
      redis-server
      --requirepass ${REDIS_PASSWORD}
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
      --appendonly yes
    volumes:
      - redis_data:/data
    networks:
      - backend
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
 
  # === DATABASE MASTER (PostgreSQL) ===
  db-master:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: ${DB_NAME}
    volumes:
      - db_master_data:/var/lib/postgresql/data
      - ./db/init-master.sh:/docker-entrypoint-initdb.d/init-master.sh
    networks:
      - backend
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 3s
      retries: 3
 
  # === DATABASE SLAVE (Read Replica) ===
  db-slave:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: ${DB_NAME}
      PGMASTER_HOST: db-master
    depends_on:
      db-master:
        condition: service_healthy
    volumes:
      - db_slave_data:/var/lib/postgresql/data
    networks:
      - backend
    restart: unless-stopped
 
networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # Không expose ra ngoài — security!
 
volumes:
  redis_data:
  db_master_data:
  db_slave_data:

File .env (KHÔNG commit vào git):

# .env
DB_USER=appuser
DB_PASSWORD=ChangeMe_In_Production!
DB_NAME=myapp
REDIS_PASSWORD=RedisSecret_Change_This!

5.3 Nginx Load Balancer Config

# nginx/nginx.conf
worker_processes auto;
 
events {
    worker_connections 1024;
}
 
http {
    # Upstream — danh sách app servers
    upstream app_servers {
        # Thuật toán: least connections
        least_conn;
 
        server app1:3000 max_fails=3 fail_timeout=30s;
        server app2:3000 max_fails=3 fail_timeout=30s;
    }
 
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
 
    # Logging
    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    'upstream=$upstream_addr response_time=$upstream_response_time';
    access_log /var/log/nginx/access.log main;
 
    server {
        listen 80;
        server_name localhost;
 
        # Security headers
        add_header X-Content-Type-Options nosniff always;
        add_header X-Frame-Options DENY always;
 
        # Health check endpoint cho load balancer bên ngoài
        location /nginx-health {
            access_log off;
            return 200 "OK";
            add_header Content-Type text/plain;
        }
 
        # API routes → proxy to app servers
        location / {
            limit_req zone=api_limit burst=20 nodelay;
 
            proxy_pass http://app_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
 
            # Timeouts
            proxy_connect_timeout 5s;
            proxy_read_timeout 30s;
            proxy_send_timeout 30s;
 
            # Retry khi upstream fail
            proxy_next_upstream error timeout http_502 http_503;
            proxy_next_upstream_tries 2;
        }
    }
}

5.4 Health Check — Không có health check = mù quáng

Mọi service PHẢI có health check endpoint:

// Node.js health check
app.get('/health', async (req, res) => {
    const checks = {
        status: 'ok',
        timestamp: new Date().toISOString(),
        uptime: process.uptime(),
        checks: {}
    };
 
    // Check Redis
    try {
        await redis.ping();
        checks.checks.redis = { status: 'ok' };
    } catch (err) {
        checks.checks.redis = { status: 'error', message: err.message };
        checks.status = 'degraded';
    }
 
    // Check Database
    try {
        await db.query('SELECT 1');
        checks.checks.database = { status: 'ok' };
    } catch (err) {
        checks.checks.database = { status: 'error', message: err.message };
        checks.status = 'error';
    }
 
    const statusCode = checks.status === 'ok' ? 200 :
                       checks.status === 'degraded' ? 200 : 503;
    res.status(statusCode).json(checks);
});

6. Code Example — Stateless App hoàn chỉnh

6.1 Node.js — Stateless Express App

// src/server.js
const express = require('express');
const Redis = require('ioredis');
const { Pool } = require('pg');
 
const app = express();
app.use(express.json());
 
// === External State Stores (STATELESS — không lưu gì trên server) ===
 
// Redis cho session & cache
const redis = new Redis(process.env.REDIS_URL);
 
// PostgreSQL — Master cho writes
const dbMaster = new Pool({
    connectionString: process.env.DATABASE_URL,
    max: 20,  // Connection pool
});
 
// PostgreSQL — Slave cho reads
const dbSlave = new Pool({
    connectionString: process.env.DATABASE_READ_URL || process.env.DATABASE_URL,
    max: 20,
});
 
const SERVER_ID = process.env.SERVER_ID || 'unknown';
 
// === Middleware: Session từ Redis (KHÔNG từ memory) ===
async function sessionMiddleware(req, res, next) {
    const sessionId = req.headers['x-session-id'];
    if (sessionId) {
        const sessionData = await redis.get(`session:${sessionId}`);
        req.session = sessionData ? JSON.parse(sessionData) : {};
        req.sessionId = sessionId;
    } else {
        req.session = {};
    }
    next();
}
 
app.use(sessionMiddleware);
 
// === Health Check ===
app.get('/health', async (req, res) => {
    try {
        await redis.ping();
        await dbMaster.query('SELECT 1');
        res.json({ status: 'ok', server: SERVER_ID, uptime: process.uptime() });
    } catch (err) {
        res.status(503).json({ status: 'error', server: SERVER_ID, error: err.message });
    }
});
 
// === API: Get User (READ → slave + cache) ===
app.get('/api/users/:id', async (req, res) => {
    const { id } = req.params;
    const cacheKey = `user:${id}`;
 
    // 1. Check cache
    const cached = await redis.get(cacheKey);
    if (cached) {
        return res.json({
            ...JSON.parse(cached),
            _meta: { source: 'cache', server: SERVER_ID }
        });
    }
 
    // 2. Cache miss → query slave DB
    const result = await dbSlave.query('SELECT * FROM users WHERE id = $1', [id]);
    if (result.rows.length === 0) {
        return res.status(404).json({ error: 'User not found' });
    }
 
    const user = result.rows[0];
 
    // 3. Store in cache (TTL = 5 minutes)
    await redis.setex(cacheKey, 300, JSON.stringify(user));
 
    res.json({
        ...user,
        _meta: { source: 'database', server: SERVER_ID }
    });
});
 
// === API: Create User (WRITE → master + invalidate cache) ===
app.post('/api/users', async (req, res) => {
    const { name, email } = req.body;
 
    const result = await dbMaster.query(
        'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
        [name, email]
    );
 
    const newUser = result.rows[0];
 
    // Cache mới luôn (write-through cho read-after-write consistency)
    await redis.setex(`user:${newUser.id}`, 300, JSON.stringify(newUser));
 
    res.status(201).json({
        ...newUser,
        _meta: { server: SERVER_ID }
    });
});
 
// === API: Demo stateless — request nào tới server nào? ===
app.get('/api/whoami', (req, res) => {
    res.json({
        server: SERVER_ID,
        message: `Request được xử lý bởi ${SERVER_ID}. Refresh nhiều lần sẽ thấy server khác nhau!`,
        session: req.session,
        timestamp: new Date().toISOString(),
    });
});
 
// === Start Server ===
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
    console.log(`[${SERVER_ID}] Server running on port ${PORT}`);
});

6.2 Python — Stateless FastAPI App (tương đương)

# app/main.py
import os
import json
import time
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import redis.asyncio as aioredis
import asyncpg
 
app = FastAPI()
 
SERVER_ID = os.getenv("SERVER_ID", "unknown")
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
DB_MASTER_URL = os.getenv("DATABASE_URL", "postgresql://user:pass@localhost/mydb")
DB_SLAVE_URL = os.getenv("DATABASE_READ_URL", DB_MASTER_URL)
 
# Connection pools (initialized on startup)
redis_pool = None
db_master_pool = None
db_slave_pool = None
 
@app.on_event("startup")
async def startup():
    global redis_pool, db_master_pool, db_slave_pool
    redis_pool = aioredis.from_url(REDIS_URL, decode_responses=True)
    db_master_pool = await asyncpg.create_pool(DB_MASTER_URL, min_size=5, max_size=20)
    db_slave_pool = await asyncpg.create_pool(DB_SLAVE_URL, min_size=5, max_size=20)
 
@app.on_event("shutdown")
async def shutdown():
    await redis_pool.close()
    await db_master_pool.close()
    await db_slave_pool.close()
 
# === Health Check ===
@app.get("/health")
async def health():
    try:
        await redis_pool.ping()
        async with db_master_pool.acquire() as conn:
            await conn.fetchval("SELECT 1")
        return {"status": "ok", "server": SERVER_ID, "uptime": time.monotonic()}
    except Exception as e:
        raise HTTPException(status_code=503, detail=str(e))
 
# === Read: Cache → Slave DB ===
@app.get("/api/users/{user_id}")
async def get_user(user_id: int):
    cache_key = f"user:{user_id}"
 
    # 1. Check cache
    cached = await redis_pool.get(cache_key)
    if cached:
        user = json.loads(cached)
        return {**user, "_meta": {"source": "cache", "server": SERVER_ID}}
 
    # 2. Cache miss → slave DB
    async with db_slave_pool.acquire() as conn:
        row = await conn.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
 
    if not row:
        raise HTTPException(status_code=404, detail="User not found")
 
    user = dict(row)
    await redis_pool.setex(cache_key, 300, json.dumps(user, default=str))
 
    return {**user, "_meta": {"source": "database", "server": SERVER_ID}}
 
# === Write: Master DB + Cache ===
class CreateUser(BaseModel):
    name: str
    email: str
 
@app.post("/api/users", status_code=201)
async def create_user(body: CreateUser):
    async with db_master_pool.acquire() as conn:
        row = await conn.fetchrow(
            "INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *",
            body.name, body.email
        )
 
    user = dict(row)
    await redis_pool.setex(f"user:{user['id']}", 300, json.dumps(user, default=str))
 
    return {**user, "_meta": {"server": SERVER_ID}}
 
@app.get("/api/whoami")
async def whoami():
    return {
        "server": SERVER_ID,
        "message": f"Handled by {SERVER_ID}. Refresh to see different servers!",
    }

7. System Design Diagram — Evolution từ Single Server đến Scaled System

7.1 Giai đoạn 1 — Single Server

flowchart TD
    subgraph "Stage 1: Single Server"
        U1[Users] -->|DNS Lookup| DNS1[DNS Server]
        DNS1 -->|IP Address| U1
        U1 -->|HTTP/HTTPS| S1["Single Server<br/>Web + App + DB<br/>SPOF"]
    end

    style S1 fill:#ff6b6b,stroke:#333,stroke-width:2px

7.2 Giai đoạn 2 — Tách DB + Load Balancer

flowchart TD
    subgraph "Stage 2: Separated Concerns"
        U2[Users] --> LB["Load Balancer<br/>(Nginx / ALB)"]
        LB --> APP1["App Server 1<br/>(Stateless)"]
        LB --> APP2["App Server 2<br/>(Stateless)"]
        APP1 --> DB2["Database Server<br/>(PostgreSQL)"]
        APP2 --> DB2
    end

    style LB fill:#4ecdc4,stroke:#333,stroke-width:2px
    style DB2 fill:#f9a825,stroke:#333,stroke-width:2px

7.3 Giai đoạn 3 — Full Architecture (Target)

flowchart TD
    subgraph "Internet"
        USERS[Users Worldwide]
    end

    subgraph "Edge Layer"
        CDN["CDN<br/>(CloudFront / Cloudflare)<br/>Static Assets"]
        DNS["DNS<br/>(Route 53)"]
    end

    subgraph "Public Subnet"
        LB["Load Balancer<br/>HTTPS Termination<br/>Rate Limiting"]
    end

    subgraph "Private Subnet — Application"
        APP1["App Server 1<br/>(Stateless, Docker)"]
        APP2["App Server 2<br/>(Stateless, Docker)"]
        APP3["App Server N<br/>(Auto-scale)"]
    end

    subgraph "Private Subnet — Cache & Queue"
        REDIS["Redis Cluster<br/>Session + Cache<br/>100K+ ops/s"]
        MQ["Message Queue<br/>(RabbitMQ / Kafka)<br/>Async Processing"]
    end

    subgraph "Private Subnet — Data"
        MASTER["DB Master<br/>(PostgreSQL)<br/>Writes Only"]
        SLAVE1["DB Slave 1<br/>Reads"]
        SLAVE2["DB Slave 2<br/>Reads"]
    end

    subgraph "Workers"
        W1["Worker 1<br/>Image Processing"]
        W2["Worker 2<br/>Email / Notifications"]
    end

    USERS --> DNS
    USERS --> CDN
    DNS --> LB
    CDN -->|Cache MISS| LB

    LB --> APP1
    LB --> APP2
    LB --> APP3

    APP1 & APP2 & APP3 --> REDIS
    APP1 & APP2 & APP3 --> MASTER
    APP1 & APP2 & APP3 --> SLAVE1
    APP1 & APP2 & APP3 --> SLAVE2
    APP1 & APP2 & APP3 --> MQ

    MQ --> W1
    MQ --> W2

    MASTER -->|Replication| SLAVE1
    MASTER -->|Replication| SLAVE2

    style LB fill:#4ecdc4,stroke:#333,stroke-width:2px
    style REDIS fill:#e74c3c,stroke:#fff,stroke-width:2px
    style MASTER fill:#f9a825,stroke:#333,stroke-width:2px
    style SLAVE1 fill:#f9e784,stroke:#333,stroke-width:1px
    style SLAVE2 fill:#f9e784,stroke:#333,stroke-width:1px
    style MQ fill:#a29bfe,stroke:#333,stroke-width:2px
    style CDN fill:#00b894,stroke:#333,stroke-width:2px

7.4 Scaling Decision Tree

flowchart TD
    START["System cần scale?"] --> Q1{"QPS > server capacity?"}

    Q1 -->|Yes| A1["Add Load Balancer<br/>+ More App Servers"]
    Q1 -->|No| Q2{"DB query chậm?<br/>(latency > 100ms)"}

    Q2 -->|Yes| A2{"Read-heavy?<br/>(Read:Write > 5:1)"}
    Q2 -->|No| Q3{"Static assets chậm?"}

    A2 -->|Yes| A3["Add Read Replicas<br/>+ Redis Cache"]
    A2 -->|No| A4["Optimize queries<br/>Add indexes<br/>→ Nếu vẫn chậm: Sharding"]

    Q3 -->|Yes| A5["Add CDN"]
    Q3 -->|No| Q4{"Long-running tasks<br/>block API?"}

    Q4 -->|Yes| A6["Add Message Queue<br/>+ Background Workers"]
    Q4 -->|No| Q5{"Session issues<br/>khi scale?"}

    Q5 -->|Yes| A7["Move to Stateless<br/>Session → Redis"]
    Q5 -->|No| A8["Profile & Optimize Code<br/>Trước khi thêm infra"]

    A1 --> DONE["Monitor & Repeat"]
    A3 --> DONE
    A4 --> DONE
    A5 --> DONE
    A6 --> DONE
    A7 --> DONE
    A8 --> DONE

    style START fill:#e74c3c,stroke:#fff,stroke-width:2px,color:#fff
    style DONE fill:#00b894,stroke:#333,stroke-width:2px

8. Aha Moments & Common Pitfalls

Aha Moments

#1: Scaling không phải thêm server. Scaling là quá trình xác định bottleneck → chọn đúng giải pháp → đo lường kết quả → lặp lại. Thêm server mà bottleneck ở DB thì vô nghĩa.

#2: Stateless là nền tảng của mọi thứ. Nếu server có state (session in memory, file cache on disk), thì không thể horizontal scale, không thể auto-scale, không thể zero-downtime deploy. Stateless first, always.

#3: Cache hit rate quyết định mọi thứ. Cache hit rate 80% = giảm DB load 80%. Cache hit rate 95% = giảm DB load 95%. Chênh 15% nhưng DB load giảm thêm 75%! Đây là lý do cache strategy cực kỳ quan trọng → Tuan-06-Cache-Strategy.

#4: Read replica giải quyết 80% database bottleneck. Vì hầu hết hệ thống là read-heavy. Chỉ khi read replicas + cache vẫn không đủ, mới nghĩ tới sharding.

#5: CDN không chỉ cho performance — nó cho security. CDN ẩn origin IP, chống DDoS, và enforce SSL ở edge. Cloudflare miễn phí → không có lý do không dùng.

#6: Load Balancer tạo ra “illusion of a single server”. User không biết có 2 hay 200 servers phía sau. Họ chỉ thấy 1 endpoint. Đây là abstraction đẹp nhất trong system design.

Common Pitfalls

Pitfall 1: Premature Optimization

Sai: Ngày đầu đã setup Kubernetes cluster 10 nodes, Kafka, Redis Cluster cho app có 100 users. Đúng: Bắt đầu với 1 server. Khi thấy CPU consistently > 70% hoặc response time > 500ms → scale. Measure first, optimize second.

Pitfall 2: Scaling mà không fix root cause

Sai: API chậm → thêm 5 servers nữa. Đúng: API chậm → profile trước. Có thể chỉ cần thêm 1 index trong DB (5 phút) thay vì thêm 5 servers ($500/tháng). N+1 query là nguyên nhân phổ biến nhất.

Pitfall 3: Quên tính replication lag

Sai: User tạo post → redirect tới profile → query slave → post chưa có → user hoảng. Đúng: Sau write, read từ master trong vài giây (read-after-write consistency). Hoặc dùng synchronous replication cho critical paths.

Pitfall 4: Cache mà không có invalidation strategy

Sai: Cache user profile 24 giờ. User đổi avatar → vẫn thấy avatar cũ suốt 24h. Đúng: Khi write, invalidate cache ngay lập tức. Hoặc dùng TTL ngắn (5-15 phút) + event-driven invalidation.

Pitfall 5: Shared state giữa servers

Sai: Lưu uploaded files vào /tmp trên server. Server 1 nhận file, Server 2 xử lý → file không có. Đúng: Lưu files vào Object Storage (S3, MinIO). Mọi server đều access được.

Pitfall 6: Database là bottleneck nhưng cứ scale app server

Sai: DB query mất 2 giây. Thêm 10 app servers → vẫn 2 giây vì DB là bottleneck. Đúng: Identify bottleneck bằng monitoring. Nếu DB → thêm cache, read replicas, optimize queries. Nếu CPU → thêm app servers.

Pitfall 7: Quên health check

Sai: Server crash nhưng load balancer vẫn gửi traffic → 50% requests fail. Đúng: Health check endpoint + load balancer health check interval. Server unhealthy → tự động remove khỏi pool trong < 30s.

9. Bài tập tự luyện

Bài 1: Vẽ lại Architecture Evolution

Vẽ tay (hoặc dùng draw.io) kiến trúc cho một ứng dụng blog, đi qua 5 giai đoạn:

Bài 2: Hands-on Docker Compose

Dùng docker-compose.yml ở trên, tạo app đơn giản và test:

Chạy docker-compose up
Gọi GET /api/whoami 10 lần → kiểm tra xem load balancer có route tới cả 2 servers không
Stop 1 container (docker stop app1) → kiểm tra traffic có tự chuyển sang app2 không
Restart app1 → traffic có trở lại cân bằng không

Bài 3: Estimate cho ứng dụng thực tế

Chọn một ứng dụng em đang làm hoặc dự định làm, ước lượng:

DAU, QPS (avg & peak)
Cần bao nhiêu app servers?
Cần cache bao nhiêu memory?
Khi nào cần thêm read replica?

→ Áp dụng framework từ Tuan-02-Back-of-the-envelope

10. Internal Links — Kết nối với các tuần khác

Tuần	Chủ đề	Liên quan
Tuan-02-Back-of-the-envelope	Estimation	Cách tính khi nào cần scale
Tuan-03-Networking-DNS-CDN	DNS, CDN	Chi tiết CDN và networking
Tuan-04-Consistent-Hashing	Consistent Hashing	Giải quyết resharding problem
Tuan-05-Load-Balancer	Load Balancer deep dive	Algorithms, L4 vs L7, health checks
Tuan-06-Cache-Strategy	Cache patterns	Cache-aside, write-through, eviction
Tuan-07-Database-Sharding-Replication	DB scaling	Replication, sharding strategies
Tuan-08-Message-Queue	Async processing	Kafka, RabbitMQ patterns
Tuan-09-Rate-Limiter	Rate limiting	Bảo vệ hệ thống khỏi abuse
Tuan-13-Monitoring-Observability	Monitoring	Đo lường thực tế vs estimation
Tuan-15-Data-Security-Encryption	Security	Encryption at rest & in transit

Tham khảo

Alex Xu, System Design Interview — Chapter 1: Scale from Zero to Millions of Users
Martin Kleppmann, Designing Data-Intensive Applications — Chapter 1: Reliable, Scalable, and Maintainable Applications
sdi.anhvy.dev — Vietnamese System Design Reference
highscalability.com — Real-world architecture case studies
StackOverflow Architecture: nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/
Tuan-02-Back-of-the-envelope — Tuần tiếp theo: Back-of-the-envelope Estimation

Tuần tới: Tuan-02-Back-of-the-envelope — Kỹ năng ước lượng nhanh để ra quyết định kiến trúc

lthieu's notes

Explorer

Tuan-01-Scale-From-Zero-To-Millions