Case Study: Design a Unique ID Generator in Distributed Systems

“Mỗi công dân có một số CMND/CCCD duy nhất. Mỗi sự kiện trong hệ thống phân tán cũng cần một ID duy nhất --- không trùng lặp, không xung đột, dù hàng triệu máy chủ cùng tạo ra cùng lúc. Đó chính là bài toán Unique ID Generator.”

Tags: system-design case-study unique-id snowflake distributed-systems alex-xu interview Student: Hieu Prerequisite: Tuan-01-Scale-From-Zero-To-Millions · Tuan-02-Back-of-the-envelope · Tuan-07-Database-Sharding-Replication Lien quan: Tuan-10-Consistent-Hashing · Tuan-16-Design-URL-Shortener · Tuan-20-Design-Key-Value-Store

1. Context & Why --- Tai sao can Unique ID Generator?

1.1 Analogy: So CMND/CCCD cho he thong

Hieu, em hay nghi the nay: moi cong dan Viet Nam co mot so CMND/CCCD duy nhat. So nay giup nha nuoc phan biet 100 trieu nguoi --- khong ai trung nhau. Trong he thong phan tan, moi event, moi record, moi transaction cung can mot “so CMND” --- mot globally unique identifier.

The gioi thuc	He thong phan tan
So CMND/CCCD	Unique ID
Co quan cong an cap	ID Generator service
Khong trung giua cac tinh	Khong trung giua cac datacenter
Co the tra cuu theo thoi gian	Sortable by time
Co dinh dang (12 so)	Co dinh dang (64-bit integer)

1.2 Tai sao khong dung `AUTO_INCREMENT` trong DB?

Khi he thong chi co mot database duy nhat, AUTO_INCREMENT la du. Nhung khi em scale ra nhieu DB server, nhieu datacenter:

Van de	Giai thich
Single point of failure	Mot DB di xuong = khong tao duoc ID
Bottleneck	Moi request phai goi DB de lay ID --- latency cao
Khong scale horizontally	10K+ IDs/sec se lam DB qua tai
Khong work across datacenter	2 DB o 2 DC co the tao trung ID
Khong sortable by time	`AUTO_INCREMENT` chi tang dan, khong chua thong tin thoi gian

Aha Moment: AUTO_INCREMENT la anti-pattern trong distributed systems. Em can mot co che tao ID khong can coordination giua cac node.

2. Deep Dive --- Alex Xu 4-Step Framework

Overview --- Bon buoc cua Alex Xu

Buoc	Ten goi	Thoi gian (45 phut)
1	Understand the Problem & Establish Design Scope	~5 phut
2	Propose High-Level Design	~15 phut
3	Design Deep Dive	~20 phut
4	Wrap Up	~5 phut

Step 1 --- Understand the Problem & Establish Design Scope

2.1 Functional Requirements (Yeu cau chuc nang)

Cau hoi cho Interviewer	Tra loi / Gia dinh
ID phai unique globally?	Co, khong bao gio trung lap
ID chi chua so hay co the chua chu cai?	Chi chua so (numerical values only)
ID co can sortable theo thoi gian?	Co, ID tao sau phai lon hon ID tao truoc (roughly)
Do dai ID?	64-bit integer (fit vao `BIGINT` cua SQL)
He thong can tao bao nhieu ID/giay?	10,000+ IDs/sec
He thong co nhieu datacenter?	Co, nhieu datacenter, nhieu machine

Tom tat Functional Requirements:

Globally unique --- Khong bao gio trung lap, ke ca across datacenter
64-bit integer --- Fit vao BIGINT, dung lam primary key
Sortable by time --- ID tao sau > ID tao truoc (roughly ordered)
High throughput --- 10,000+ IDs/sec per system

2.2 Non-Functional Requirements (Yeu cau phi chuc nang)

Yeu cau	Muc tieu	Giai thich
High Availability	99.99% uptime	ID generation khong bao gio duoc down --- moi service deu phu thuoc
Low Latency	< 1ms per ID	ID generation nam tren critical path cua moi write operation
Scalability	Linear scale voi so machine	Them machine = them throughput, khong can coordination
Ordering	Time-sortable	IDs tao trong cung millisecond tren cung machine phai co thu tu
No coordination	Khong can central authority	Moi node tu tao ID doc lap

Step 2 --- Propose High-Level Design: 4 Approaches

Hieu, co 4 cach tiep can chinh de tao Unique ID trong distributed systems. Moi cach co trade-off rieng.

Approach 1: Multi-Master Replication (DB Auto-Increment voi buoc nhay)

Y tuong: Dung AUTO_INCREMENT cua DB, nhung moi server tang voi buoc nhay khac nhau.

Server 1: 1, 3, 5, 7, 9, ...    (start=1, increment=2)
Server 2: 2, 4, 6, 8, 10, ...   (start=2, increment=2)

Tong quat voi N servers:
Server k: k, k+N, k+2N, k+3N, ...

Uu diem	Nhuoc diem
Don gian, dung san feature cua DB	Kho scale: them/bot server phai thay doi `increment` cua TAT CA server
Khong can them service nao	ID khong sortable by time (Server 1 co the tao ID 7 SAU khi Server 2 tao ID 8)
Tuong doi reliable	Khong work across datacenter (latency cao)
	Throughput bi gioi han boi DB write speed

Khi nao dung: He thong nho, it server, khong can time-sorted IDs.

Approach 2: UUID (Universally Unique Identifier)

Y tuong: Moi machine tu tao UUID 128-bit doc lap, khong can coordination.

UUID v4 Example: 550e8400-e29b-41d4-a716-446655440000
                 |--------|    |  |    |  |-----------|
                  random    version  variant   random

Uu diem	Nhuoc diem
Don gian nhat, moi ngon ngu deu ho tro	128-bit (khong fit vao yeu cau 64-bit)
Khong can coordination giua cac node	Khong sortable by time
Khong co single point of failure	Hieu suat DB kem: random UUID lam B+ tree index fragmentation
Throughput vo han (moi node tu tao)	Dai (36 chars string) --- ton storage

Collision probability cua UUID v4:

P (collision) \approx 1 - e^{- \frac{n ^{2}}{2 \times 2 ^{122}}}

Voi $n = 1 0^{18}$ UUIDs (1 quintillion):

P (collision) \approx \frac{( 1 0 ^{18} ) ^{2}}{2 \times 2 ^{122}} = \frac{1 0 ^{36}}{2 ^{123}} \approx 9.4 \times 1 0^{- 2} \approx 0.094

Thuc te collision probability cuc thap, nhung khong phai zero va khong dam bao unique bang Snowflake.

Khi nao dung: Khi khong can sortable, khong gioi han 64-bit, can trien khai nhanh (vd: session ID, correlation ID, idempotency key).

Approach 3: Ticket Server (Flickr Approach)

Y tuong: Dung mot (hoac vai) server trung tam voi AUTO_INCREMENT lam “may ban ve”. Moi service goi den Ticket Server de lay ID tiep theo.

          +------------------+
          |  Ticket Server   |
          |  AUTO_INCREMENT  |
          |  current_id: 1000|
          +--------+---------+
                   |
      +-----+-----+-----+
      |     |     |     |
   Svc A  Svc B  Svc C  Svc D
   id=1001 id=1002 id=1003 id=1004

Uu diem	Nhuoc diem
Don gian, de implement	Single Point of Failure (SPOF)
ID la so, 64-bit, tang dan	Bottleneck: moi request phai goi Ticket Server
Sortable by time (roughly)	Latency tang khi cross-datacenter
Flickr dung thanh cong nhieu nam	Can replicate Ticket Server de HA --- them complexity

Flickr’s solution: Dung 2 Ticket Servers voi increment = 2, giong Multi-Master nhung chi danh rieng cho ID generation.

Khi nao dung: He thong vua, chap nhan single point of coordination, can ID tang dan don gian.

Approach 4: Snowflake (Twitter Approach) --- Recommended

Y tuong: Moi ID la mot 64-bit integer duoc compose tu nhieu thanh phan: timestamp, datacenter ID, machine ID, va sequence number. Moi machine tu tao ID doc lap.

+---+-------------------------------------------+---------+---------+----------------+
| 0 |          41-bit Timestamp                  | 5-bit   | 5-bit   | 12-bit         |
|   |          (milliseconds since epoch)        |Datacenter| Machine | Sequence       |
+---+-------------------------------------------+---------+---------+----------------+
 1 bit                41 bits                     5 bits    5 bits     12 bits = 64 bits

Uu diem	Nhuoc diem
64-bit integer --- fit yeu cau	Phu thuoc vao dong ho may (clock)
Sortable by time (timestamp trong ID)	Clock skew giua cac machine gay ra out-of-order IDs
Khong can coordination	Can quan ly datacenter ID va machine ID
Throughput cuc cao: 4,096 IDs/ms/machine	Time-sorted chi trong pham vi “roughly” --- khong tuyet doi
De extract timestamp tu ID	Epoch overflow sau 69 nam

Aha Moment: Snowflake la cach tiep can duoc su dung nhieu nhat trong thuc te. Twitter, Discord, Instagram, Baidu deu dung bien the cua Snowflake.

Bang so sanh toan bo Approaches

Tieu chi	Multi-Master	UUID	Ticket Server	Snowflake	ULID	KSUID	MongoDB ObjectId
Bit size	64	128	64	64	128	160	96
Globally unique	Co (voi cau hinh dung)	Co (cuc cao)	Co	Co	Co	Co	Co
Sortable by time	Khong	Khong (v4)	Co	Co	Co	Co	Co
No coordination	Khong	Co	Khong	Co	Co	Co	Co
Throughput	Thap (DB bound)	Vo han	Thap (bottleneck)	4,096/ms/machine	Vo han	Vo han	16M/sec/process
Complexity	Thap	Cuc thap	Trung binh	Trung binh	Thap	Thap	Thap
DB index friendly	Co	Khong (random)	Co	Co	Co	Co	Co
Dung trong thuc te	MySQL replication	Session ID, correlation ID	Flickr	Twitter, Discord, Instagram	Shopify, CockroachDB	Segment	MongoDB
Fit 64-bit?	Co	Khong	Co	Co	Khong	Khong	Khong

Cac bien the khac (Ngoai Alex Xu)

ULID (Universally Unique Lexicographically Sortable Identifier):

128-bit: 48-bit timestamp (millisecond) + 80-bit randomness
Encoded thanh 26-char Crockford Base32
Sortable by time, khong can coordination
Su dung: Shopify, CockroachDB

KSUID (K-Sortable Unique Identifier):

160-bit: 32-bit timestamp (second) + 128-bit random payload
Encoded thanh 27-char Base62
Epoch: 2014-05-13 --- overflow sau ~136 nam
Su dung: Segment

MongoDB ObjectId:

96-bit (12 bytes): 32-bit timestamp (second) + 40-bit random + 24-bit counter
Sortable by time (second resolution)
Counter cho phep 16,777,216 unique IDs per second per process

Step 3 --- Design Deep Dive: Snowflake ID Generator

3.1 Bit Layout Chi Tiet

 0 | 00000000 00000000 00000000 00000000 00000000 0 | 00000 | 00000 | 000000000000
 ^ |                    41 bits                      | 5 bits| 5 bits|   12 bits
 |                       |                              |       |         |
Sign bit             Timestamp                     Datacenter Machine  Sequence
(luon = 0)     (ms since custom epoch)               ID       ID      Number

Thanh phan	So bit	Range	Giai thich
Sign bit	1	Luon = 0	Dam bao ID luon duong (positive)
Timestamp	41	0 --- $2^{41} - 1$ ms	Millisecond ke tu custom epoch
Datacenter ID	5	0 --- 31	Toi da 32 datacenter
Machine ID	5	0 --- 31	Toi da 32 machine per datacenter
Sequence	12	0 --- 4095	Counter reset moi millisecond

3.2 Cach tao ID (Thuat toan)

Moi khi mot service request ID:

1. Lay current_timestamp (ms since epoch)
2. Neu current_timestamp == last_timestamp:
     sequence = (sequence + 1) AND 0xFFF   // 12-bit mask
     Neu sequence == 0:  // overflow! da tao 4096 IDs trong 1ms
       Wait cho next millisecond
3. Neu current_timestamp > last_timestamp:
     sequence = 0   // reset counter
4. Neu current_timestamp < last_timestamp:
     CLOCK SKEW DETECTED! -> Handle error
5. last_timestamp = current_timestamp
6. Compose ID:
     id = (timestamp << 22) | (datacenter_id << 17) | (machine_id << 12) | sequence
7. Return id

3.3 Epoch Selection (Chon epoch)

Snowflake KHONG dung Unix epoch (1970-01-01). Tai sao?

Max timestamp = 2^{41} - 1 = 2, 199, 023, 255, 551 ms \approx 69.73 nam

Neu dung Unix epoch (1970-01-01):

Overflow date = 1970 + 69.73 \approx 2039-09-07

Da qua gan! Nen chon custom epoch = ngay he thong go-live:

Custom Epoch	Overflow Date	Con lai (tu 2026)
2010-01-01 (Twitter)	~2079	~53 nam
2015-01-01 (Discord)	~2084	~58 nam
2020-01-01	~2089	~63 nam
2025-01-01 (recommended cho he thong moi)	~2094	~68 nam
Unix epoch 1970-01-01	~2039	~13 nam

Aha Moment: Chon custom epoch gan voi ngay deploy de toi uu hoa thoi gian song cua he thong. Discord chon 2015-01-01 --- ngay dau tien cua Discord.

3.4 Clock Sync va NTP (Network Time Protocol)

Van de cot loi: Snowflake phu thuoc vao timestamp --- nhung dong ho cua cac machine KHONG BAO GIO dong bo hoan hao.

NTP (Network Time Protocol) dong bo dong ho giua cac may:

Stratum 0: Atomic clock, GPS receiver (do chinh xac ~nanosecond)
Stratum 1: Server ket noi truc tiep voi Stratum 0 (~microsecond)
Stratum 2: Server sync voi Stratum 1 (~millisecond)
Stratum 3+: Moi tang them ~1-10ms drift

  GPS Satellite / Atomic Clock
         |
    [Stratum 0]
         |
    [Stratum 1]  <-- Google, Amazon NTP servers
         |
    [Stratum 2]  <-- Datacenter NTP servers
         |
    [Stratum 3]  <-- Application servers (cac machine cua em)

Clock drift thuc te:

Server thuong: drift ~100ms/ngay neu khong co NTP
Voi NTP sync moi 30 giay: drift < 1ms
Google TrueTime (Spanner): uncertainty < 7ms (dung GPS + atomic clock)

3.5 Clock Skew Handling (Xu ly lech dong ho)

Scenario: Machine A co dong ho cham hon Machine B 5ms.

Machine A tao ID voi timestamp = 1000
Machine B tao ID voi timestamp = 1005
Thuc te hai ID duoc tao cung luc --- nhung ID cua B “co ve” tao sau

Cac cach xu ly:

Strategy	Mo ta	Trade-off
Wait	Neu `current_time < last_time`, doi cho den khi `current_time >= last_time`	Don gian nhung co the block neu clock nhay lui nhieu
Reject	Neu clock skew > threshold (vd: 5ms), reject request va raise alarm	An toan nhung co the mat ID trong thoi gian ngan
Logical clock	Dung Lamport timestamp hoac Hybrid Logical Clock (HLC) de dam bao causality	Phuc tap hon nhung dam bao ordering
Tolerance window	Chap nhan ID out-of-order trong mot window nho (vd: 10ms)	Thuc te nhat, hau het cac he thong dung cach nay

Cach Twitter Snowflake xu ly: Neu current_timestamp < last_timestamp, throw exception va log warning. Operations team se dieu tra NTP issue.

Aha Moment: Clock skew la unavoidable trong distributed systems. Snowflake khong dam bao strict global ordering --- chi “roughly sorted”. Neu can strict ordering, phai dung single-machine approach (Ticket Server) hoac Hybrid Logical Clock.

3.6 Sequence Overflow Scenarios

Khi mot machine tao > 4,096 IDs trong 1 millisecond:

Millisecond T=1000:
  ID #0:    ...1000...0000 00000000
  ID #1:    ...1000...0000 00000001
  ...
  ID #4095: ...1000...1111 11111111  <- MAX sequence
  ID #4096: OVERFLOW! sequence quay ve 0
            -> Phai WAIT cho T=1001

Handling strategies:

Strategy	Mo ta	Khi nao dung
Spin-wait	Busy-loop cho den next millisecond	Default trong Snowflake, OK cho burst ngan
Sleep	`Thread.sleep(1)`	Giam CPU nhung co the mat chinh xac
Pre-generate pool	Tao truoc mot pool IDs, tra ve tu pool	High-throughput services, giam latency spike
Borrow from future	Tao ID voi timestamp = T+1 va sequence = 0	Nguy hiem --- co the gay duplicate neu clock correction xay ra

3.7 Ordering Guarantees

Pham vi	Dam bao?	Giai thich
Cung machine, cung millisecond	Co, strict ordering	Sequence number tang dan
Cung machine, khac millisecond	Co, strict ordering	Timestamp lon hon = ID lon hon
Khac machine, cung datacenter	Khong dam bao	Clock skew co the lam out-of-order
Khac datacenter	Khong dam bao	Clock skew + network latency

ID_{A} > ID_{B} ⟺ ⎩ ⎨ ⎧ timestamp_{A} > timestamp_{B} OR timestamp_{A} = timestamp_{B} \land datacenter_{A} > datacenter_{B} OR timestamp_{A} = timestamp_{B} \land datacenter_{A} = datacenter_{B} \land machine_{A} > machine_{B} OR all equal \land sequence_{A} > sequence_{B}

Chu y: Dieu nay co nghia la 2 ID tu 2 machine khac nhau co the co timestamp giong nhau nhung thu tu so sanh phu thuoc vao datacenter_id va machine_id, khong phai thoi gian thuc.

3. Back-of-the-Envelope Estimation

3.1 IDs per Millisecond per Machine

IDs/ms/machine = 2^{12} = 4, 096

3.2 IDs per Second per Machine

IDs/sec/machine = 4, 096 \times 1, 000 = 4, 096, 000 \approx 4.1 M IDs/sec

3.3 Total Capacity voi N Machines

Total capacity = N_{datacenter} \times N_{machine} \times 4, 096, 000 IDs/sec

Voi max configuration (32 datacenter x 32 machine = 1,024 machines):

Max total = 1, 024 \times 4, 096, 000 = 4, 194, 304, 000 \approx 4.2 B IDs/sec

3.4 Yeu cau 10K IDs/sec can bao nhieu machine?

N_{machine} = ⌈ \frac{10 , 000}{4 , 096 , 000} ⌉ = 1

Chi can 1 machine la thua suc cho 10K IDs/sec. Thuc te, ta deploy nhieu machine de High Availability, khong phai vi throughput.

3.5 Timestamp Overflow

Max duration = \frac{2 ^{41} - 1}{1 , 000 \times 60 \times 60 \times 24 \times 365.25} \approx 69.73 nam

Voi custom epoch = 2025-01-01:

Overflow date \approx 2025 + 69.73 = 2094

3.6 Tong so ID co the tao trong 69 nam

Total IDs (1 machine) = 2^{41} \times 2^{12} = 2^{53} \approx 9.007 \times 1 0^{15} \approx 9 quadrillion

Total IDs (all machines) = 2^{41} \times 2^{5} \times 2^{5} \times 2^{12} = 2^{63} \approx 9.2 \times 1 0^{18}

3.7 Storage Estimation

Moi ID la 8 bytes (64-bit):

Storage (1B IDs) = 1 0^{9} \times 8 bytes = 8 GB

So sanh voi UUID (16 bytes) hoac UUID string (36 bytes):

Format	Size per ID	1 Billion IDs	DB Index Impact
Snowflake (64-bit)	8 bytes	8 GB	Sequential --- B+ tree insert O(1) amortized
UUID binary (128-bit)	16 bytes	16 GB	Random --- B+ tree split O(log n)
UUID string (36 char)	36 bytes	36 GB	Random + larger index pages

4. Security First --- Bao mat cho ID Generator

4.1 ID Predictability Attacks (Tan cong doan truoc ID)

Threat: Neu attacker biet ID structure, ho co the:

Attack	Mo ta	Impact
Enumeration attack	Doan ID cua record khac bang cach tang/giam ID	Truy cap du lieu trai phep
Timing attack	Extract timestamp tu ID de biet thoi gian tao record	Information leakage
Volume estimation	So sanh 2 IDs de uoc luong so record giua chung	Business intelligence leak
Infrastructure mapping	Extract datacenter_id va machine_id	Biet so luong server, datacenter topology

Vi du Enumeration:

Attacker thay order_id = 7100424390656 00001 00001 000000000001
                                        ^DC    ^Machine
Ho biet: datacenter 1, machine 1
Ho tang sequence: 000000000002, 000000000003, ...
-> Co the duyet qua toan bo orders cua machine do

4.2 Information Leakage tu Snowflake ID

Thanh phan	Thong tin bi lo	Muc do nghiem trong
Timestamp (41 bit)	Thoi diem chinh xac (ms) tao record	Trung binh --- attacker biet khi nao user dang ky, dat hang
Datacenter ID (5 bit)	Topology cua he thong	Thap --- nhung huu ich cho targeted attack
Machine ID (5 bit)	So luong va vi tri server	Thap --- co the dung cho DDoS targeting
Sequence (12 bit)	Traffic volume tai thoi diem do	Trung binh --- business intelligence

4.3 Mitigation Strategies (Chien luoc giam thieu)

Strategy	Mo ta	Trade-off
Khong expose internal ID	Dung public-facing ID khac (UUID hoac encrypted ID) cho API, giu Snowflake ID internal	Them mot lop mapping, tang storage
Encrypt ID truoc khi expose	Dung Format-Preserving Encryption (FPE) de encrypt 64-bit ID thanh 64-bit ciphertext	Van giu duoc size, nhung mat sortability
Shuffle bits	Xao tron thu tu cac bit de timestamp khong con o vi tri co dinh	Mat sortability, can lookup table
Add random bits	Thay datacenter+machine (10 bit) bang random bits	Mat kha nang trace ve source, nhung an toan hon
Hash-based public ID	`public_id = HMAC-SHA256(secret_key, snowflake_id)[:8]`	Mot chieu, khong reverse duoc, nhung can DB lookup
Rate limit API	Gioi han so request per user/IP de chan enumeration	Khong giai quyet root cause nhung giam impact

Recommended approach cho hau het he thong:

Internal: Snowflake ID (dung cho DB, inter-service communication)
External: Encrypted Snowflake ID hoac UUID mapping (dung cho public API)

Database:
  +------------------+------------------+
  | snowflake_id (PK)| public_id (UNIQUE)|
  | 71004243906560001 | a8f3k2m9x1       |
  +------------------+------------------+

4.4 OWASP Considerations

OWASP Risk	Ap dung cho ID Generator	Giai phap
A01 Broken Access Control	Attacker dung predicted ID de truy cap resource cua nguoi khac	Authorization check TREN MOI REQUEST, khong chi dua vao ID
A04 Insecure Design	Expose Snowflake ID tren public API	Dung public ID mapping hoac FPE encryption
A09 Security Logging	Khong log ai request ID nao	Log moi ID generation request voi caller identity

Aha Moment: ID Generator ban than khong co loi bao mat --- loi la o cach em expose ID ra ngoai. Luon co mot lop abstraction giua internal ID va public-facing ID.

5. DevOps --- Van hanh ID Generator

5.1 Monitoring Metrics

Metric	Mo ta	Alert Threshold
`id_generation_rate`	So ID tao per second per machine	Spike bat thuong > 2x baseline
`id_generation_latency_p99`	Latency de tao 1 ID	> 1ms (thuong < 0.01ms)
`sequence_overflow_count`	So lan sequence overflow (4096/ms)	> 0 per minute = can them machine hoac review traffic
`clock_skew_detected`	So lan `current_time < last_time`	> 0 = NTP issue, can dieu tra ngay
`ntp_offset_ms`	Do lech giua local clock va NTP server	> 10ms = critical alert
`ntp_sync_failures`	So lan NTP sync that bai	> 3 consecutive failures = critical
`machine_id_conflicts`	2 machine co cung datacenter_id + machine_id	> 0 = IMMEDIATE action (duplicate IDs!)

5.2 Clock Drift Detection

# prometheus-alerts.yml
groups:
  - name: id_generator_alerts
    rules:
      # Clock skew detected
      - alert: ClockSkewDetected
        expr: increase(clock_skew_detected_total[5m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Clock skew detected on {{ $labels.instance }}"
          description: "Machine {{ $labels.machine_id }} detected backward clock movement. IDs may be out of order."
 
      # NTP offset too high
      - alert: NTPOffsetHigh
        expr: abs(ntp_offset_milliseconds) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "NTP offset > 10ms on {{ $labels.instance }}: {{ $value }}ms"
 
      # NTP sync failure
      - alert: NTPSyncFailure
        expr: increase(ntp_sync_failures_total[15m]) > 3
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "NTP sync failing on {{ $labels.instance }}"
          description: "NTP has failed to sync 3+ times in 15 minutes. Clock drift will increase."
 
      # Sequence overflow
      - alert: SequenceOverflow
        expr: rate(sequence_overflow_total[1m]) > 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Sequence overflow on machine {{ $labels.machine_id }}"
          description: "Machine is generating > 4096 IDs/ms. Consider adding more machines."
 
      # ID generation latency spike
      - alert: IDGenerationLatencyHigh
        expr: histogram_quantile(0.99, rate(id_generation_duration_seconds_bucket[5m])) > 0.001
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 ID generation latency > 1ms on {{ $labels.instance }}"

5.3 NTP Monitoring Best Practices

Practice	Mo ta
Dung nhieu NTP source	Cau hinh it nhat 3 NTP server (vd: `time.google.com`, `time.aws.com`, `pool.ntp.org`)
Monitor NTP offset lien tuc	Export `ntp_offset_ms` metric qua node_exporter hoac custom exporter
Set NTP sync interval	Default 64-1024 giay, co the giam xuong 16-32 giay cho ID generator machine
Dung `chrony` thay `ntpd`	`chrony` handle drift tot hon, khoi dong nhanh hon, chinh xac hon
Avoid NTP step (nhay dong ho)	Cau hinh NTP chi slew (dieu chinh tu tu) thay vi step (nhay dot ngot)
Hardware timestamping	Dung NIC co ho tro hardware timestamping de giam jitter

5.4 Machine ID Management

Strategy	Mo ta	Khi nao dung
Static config	Cau hinh `datacenter_id` va `machine_id` trong config file hoac env var	He thong nho, it thay doi
ZooKeeper/etcd	Moi machine dang ky voi ZooKeeper de nhan machine_id unique	He thong lon, can auto-assignment
Cloud metadata	Dung cloud instance metadata (AWS instance ID, GCP zone) de tinh machine_id	Cloud-native systems
MAC address hash	Hash MAC address thanh 10-bit (datacenter + machine)	Don gian nhung co the conflict

Pitfall: Khi mot machine restart hoac bi thay the, machine_id PHAI duoc bao toan hoac de-register truoc khi assign cho machine moi. Neu 2 machine co cung ID chay dong thoi --- duplicate IDs se xay ra.

5.5 Grafana Dashboard Panels

Panel	Query	Visualization
ID Generation Rate	`sum(rate(ids_generated_total[1m])) by (machine_id)`	Time series, stacked
Latency P50/P95/P99	`histogram_quantile(0.99, rate(id_gen_duration_bucket[5m]))`	Time series, multi-line
Sequence Overflow Events	`increase(sequence_overflow_total[1h])`	Stat panel (should be 0)
NTP Offset per Machine	`ntp_offset_milliseconds`	Time series, threshold line at 10ms
Active Machines	`count(up{job="id-generator"} == 1)`	Gauge
Clock Skew Events (24h)	`increase(clock_skew_detected_total[24h])`	Stat panel (should be 0)

6. System Design Diagrams

6.1 Snowflake Bit Layout

block-beta
    columns 4
    block:header:4
        A["Snowflake ID: 64-bit Layout"]
    end
    block:bits:4
        B["0"]
        C["Timestamp (41 bits)"]
        D["DC(5) + Machine(5)"]
        E["Sequence (12 bits)"]
    end
    block:detail:4
        F["Sign<br/>(always 0)"]
        G["Milliseconds since<br/>custom epoch<br/>(~69.7 years)"]
        H["Datacenter: 0-31<br/>Machine: 0-31<br/>(1,024 nodes max)"]
        I["0-4095 per ms<br/>(4.096M IDs/sec<br/>per machine)"]
    end

    style B fill:#6c757d,color:#fff
    style C fill:#0d6efd,color:#fff
    style D fill:#198754,color:#fff
    style E fill:#dc3545,color:#fff
    style F fill:#6c757d,color:#fff
    style G fill:#0d6efd,color:#fff
    style H fill:#198754,color:#fff
    style I fill:#dc3545,color:#fff

6.2 Multi-Datacenter ID Generation Architecture

graph TB
    subgraph "Client Services"
        SvcA["Order Service"]
        SvcB["User Service"]
        SvcC["Payment Service"]
        SvcD["Inventory Service"]
    end

    subgraph "Load Balancer Layer"
        LB["Load Balancer<br/>(Route to nearest DC)"]
    end

    subgraph "Datacenter 1 (US-East) — dc_id=00001"
        direction TB
        subgraph "ID Generator Cluster DC1"
            M1_1["Machine 1<br/>machine_id=00001<br/>Snowflake Generator"]
            M1_2["Machine 2<br/>machine_id=00010<br/>Snowflake Generator"]
            M1_3["Machine 3<br/>machine_id=00011<br/>Snowflake Generator"]
        end
        NTP1["NTP Server DC1<br/>(Stratum 2)<br/>Sync: time.google.com"]
        ZK1["ZooKeeper Node 1<br/>(Machine ID Registry)"]
    end

    subgraph "Datacenter 2 (EU-West) — dc_id=00010"
        direction TB
        subgraph "ID Generator Cluster DC2"
            M2_1["Machine 1<br/>machine_id=00001<br/>Snowflake Generator"]
            M2_2["Machine 2<br/>machine_id=00010<br/>Snowflake Generator"]
        end
        NTP2["NTP Server DC2<br/>(Stratum 2)<br/>Sync: time.aws.com"]
        ZK2["ZooKeeper Node 2<br/>(Machine ID Registry)"]
    end

    subgraph "Datacenter 3 (AP-Southeast) — dc_id=00011"
        direction TB
        subgraph "ID Generator Cluster DC3"
            M3_1["Machine 1<br/>machine_id=00001<br/>Snowflake Generator"]
            M3_2["Machine 2<br/>machine_id=00010<br/>Snowflake Generator"]
        end
        NTP3["NTP Server DC3<br/>(Stratum 2)<br/>Sync: pool.ntp.org"]
        ZK3["ZooKeeper Node 3<br/>(Machine ID Registry)"]
    end

    subgraph "Monitoring"
        Prometheus["Prometheus<br/>(Metrics Collection)"]
        Grafana["Grafana<br/>(Dashboard)"]
        AlertManager["AlertManager<br/>→ Slack / PagerDuty"]
    end

    SvcA & SvcB & SvcC & SvcD --> LB
    LB --> M1_1 & M1_2 & M1_3
    LB --> M2_1 & M2_2
    LB --> M3_1 & M3_2

    NTP1 -.->|sync| M1_1 & M1_2 & M1_3
    NTP2 -.->|sync| M2_1 & M2_2
    NTP3 -.->|sync| M3_1 & M3_2

    ZK1 -.->|assign machine_id| M1_1 & M1_2 & M1_3
    ZK2 -.->|assign machine_id| M2_1 & M2_2
    ZK3 -.->|assign machine_id| M3_1 & M3_2

    ZK1 <-->|replicate| ZK2 <-->|replicate| ZK3

    M1_1 & M1_2 & M1_3 & M2_1 & M2_2 & M3_1 & M3_2 --> Prometheus
    Prometheus --> Grafana
    Prometheus --> AlertManager

    style M1_1 fill:#0d6efd,color:#fff
    style M1_2 fill:#0d6efd,color:#fff
    style M1_3 fill:#0d6efd,color:#fff
    style M2_1 fill:#198754,color:#fff
    style M2_2 fill:#198754,color:#fff
    style M3_1 fill:#dc3545,color:#fff
    style M3_2 fill:#dc3545,color:#fff
    style NTP1 fill:#f9a825,stroke:#333
    style NTP2 fill:#f9a825,stroke:#333
    style NTP3 fill:#f9a825,stroke:#333
    style ZK1 fill:#6f42c1,color:#fff
    style ZK2 fill:#6f42c1,color:#fff
    style ZK3 fill:#6f42c1,color:#fff
    style Prometheus fill:#e65100,color:#fff
    style Grafana fill:#388e3c,color:#fff

6.3 Clock Sync Flow

sequenceDiagram
    participant AC as Atomic Clock<br/>(Stratum 0)
    participant S1 as NTP Stratum 1<br/>(time.google.com)
    participant S2 as NTP Stratum 2<br/>(DC NTP Server)
    participant IDGen as ID Generator<br/>(Machine)
    participant Mon as Monitoring<br/>(Prometheus)

    Note over AC,S1: Stratum 0 → 1: ~microsecond accuracy
    AC->>S1: Time reference signal

    Note over S1,S2: Stratum 1 → 2: ~millisecond accuracy
    S2->>S1: NTP Request (T1)
    S1-->>S2: NTP Response (T2, T3)
    S2->>S2: Calculate offset<br/>offset = ((T2-T1) + (T3-T4)) / 2

    Note over S2,IDGen: Stratum 2 → 3: ~1-10ms accuracy
    loop Every 16-32 seconds
        IDGen->>S2: NTP Request
        S2-->>IDGen: NTP Response
        IDGen->>IDGen: Adjust clock (slew, not step)
    end

    IDGen->>Mon: Export ntp_offset_ms metric

    alt NTP offset > 10ms
        Mon->>Mon: Fire NTPOffsetHigh alert
        Mon-->>IDGen: Alert → Ops team investigates
    end

    alt Clock goes backward
        IDGen->>IDGen: DETECT: current_time < last_time
        IDGen->>Mon: Increment clock_skew_detected counter
        IDGen->>IDGen: WAIT until current_time >= last_time
        Mon->>Mon: Fire ClockSkewDetected alert
    end

6.4 ID Generation Request Flow

sequenceDiagram
    participant Svc as Client Service
    participant LB as Load Balancer
    participant IDGen as ID Generator<br/>(Machine #5, DC #2)
    participant Clock as System Clock

    Svc->>LB: Request new ID
    LB->>IDGen: Forward to nearest generator

    IDGen->>Clock: Get current_timestamp_ms
    Clock-->>IDGen: timestamp = 1742300000123

    alt timestamp == last_timestamp
        IDGen->>IDGen: sequence++
        alt sequence > 4095 (overflow)
            IDGen->>IDGen: WAIT for next millisecond
            IDGen->>Clock: Get current_timestamp_ms
            Clock-->>IDGen: timestamp = 1742300000124
            IDGen->>IDGen: sequence = 0
        end
    else timestamp > last_timestamp
        IDGen->>IDGen: sequence = 0
    else timestamp < last_timestamp (CLOCK SKEW!)
        IDGen->>IDGen: Log warning + wait
    end

    IDGen->>IDGen: Compose ID:<br/>= (timestamp << 22)<br/>| (dc_id << 17)<br/>| (machine_id << 12)<br/>| sequence

    IDGen-->>LB: Return ID = 7300811265438109697
    LB-->>Svc: ID = 7300811265438109697

    Note over Svc: Extract info from ID:<br/>timestamp: 2025-03-18T12:00:00.123Z<br/>datacenter: 2<br/>machine: 5<br/>sequence: 1

7. Aha Moments & Common Pitfalls

Aha Moments --- Duc ket

#1 --- Clock skew la dieu khong the tranh khoi trong distributed systems: Khong co cach nao dong bo hoan hao dong ho giua cac machine. Snowflake chap nhan “roughly sorted” thay vi “strictly sorted”. Neu can strict global ordering --- dung Lamport Clock, Hybrid Logical Clock, hoac Google TrueTime.

#2 --- UUID vs Snowflake khong phai chon 1 bo 1: UUID phu hop cho idempotency key, session ID, correlation ID (khong can sort, khong can compact). Snowflake phu hop cho primary key, event ID, message ID (can sort, can fit 64-bit). Nhieu he thong dung ca hai cho cac muc dich khac nhau.

#3 --- DB index performance voi random vs sequential IDs: Day la ly do lon nhat de khong dung UUID lam primary key trong relational DB. B+ tree index yeu sequential insert --- moi record moi append vao cuoi leaf node. Random UUID gay page split lien tuc, lam giam write throughput 2-5x va tang disk I/O.

#4 --- 1 machine Snowflake tao du 4.1M IDs/sec: Throughput khong phai ly do deploy nhieu machine. Ly do chinh la High Availability --- neu 1 machine chet, cac machine khac van tao ID binh thuong.

#5 --- Custom epoch la quyet dinh mot lan, khong the thay doi: Chon epoch sai (qua xa trong qua khu) = lang phi bit timestamp. Chon epoch trong tuong lai = ID am. Chon epoch = ngay go-live la tot nhat.

#6 --- Snowflake ID co the bi reverse-engineer: Bat ky ai co ID cung co the extract timestamp, datacenter, machine, va sequence. Dung bao gio expose Snowflake ID truc tiep tren public API neu em quan tam security.

Common Pitfalls --- Sai lam thuong gap

Pitfall 1: Khong xu ly clock skew

Sai: “Dong ho may luon chinh xac, khong can lo.” Dung: Clock skew LUON xay ra. NTP step, leap second, VM migration, hardware issue --- tat ca deu co the lam dong ho nhay lui. Phai co logic detect va handle.

Pitfall 2: Dung UUID lam primary key roi thac mac sao DB cham

Sai: “UUID unique, dung lam PK luon cho tien.” Dung: Random UUID gay B+ tree fragmentation. Neu phai dung UUID, dung UUID v7 (time-ordered) hoac ULID. Hoac dung Snowflake ID lam PK va UUID lam public-facing ID.

Pitfall 3: Hard-code machine_id

Sai: machine_id = 1 trong config file, deploy len 5 server. Dung: 5 server cung machine_id = 1 = duplicate IDs. Phai co co che assign unique machine_id --- ZooKeeper, etcd, hoac cloud metadata.

Pitfall 4: Khong monitor sequence overflow

Sai: “4096 IDs/ms la du roi, khong can lo.” Dung: Traffic spike, retry storm, hoac bug co the gay burst > 4096/ms. Sequence overflow = ID generator phai wait = latency spike. Phai monitor va alert.

Pitfall 5: Quen tinh epoch overflow date

Sai: Dung Unix epoch 1970 cho he thong deploy 2025. Dung: $1970 + 69.73 = 2039$ . Chi con 13 nam! Chon custom epoch = 2025 de co 69 nam runway.

Pitfall 6: Nghi Snowflake dam bao global ordering

Sai: “ID lon hon = tao sau, luon luon.” Dung: Chi dung tren cung 1 machine. Giua 2 machine khac nhau, clock skew co the lam ID nho hon duoc tao sau ID lon hon. Snowflake chi dam bao roughly time-sorted, khong phai strictly time-sorted.

Pitfall 7: Khong co ke hoach cho sau khi epoch overflow

Sai: “69 nam nua moi het, khong phai viec cua minh.” Dung: Document ro overflow date. Dat calendar reminder. Len ke hoach migration: tang timestamp len 42+ bit (giam machine/sequence bits), hoac chon epoch moi va migrate.

8. Wrap Up --- Tom tat cho Interviewer

Buoc	Noi dung chinh
Step 1: Scope	64-bit, globally unique, time-sortable, 10K+ IDs/sec, multi-datacenter
Step 2: High-Level	4 approaches: Multi-Master, UUID, Ticket Server, Snowflake (chon)
Step 3: Deep Dive	Bit layout (1+41+5+5+12), clock sync (NTP), clock skew handling, sequence overflow
Step 4: Wrap Up	Monitoring (clock drift, overflow), Security (ID enumeration), Alternatives (ULID, KSUID)

Interview tip: Khi interviewer hoi “Design a Unique ID Generator”, ho muon nghe:

Em hoi cau hoi de lam ro requirements (khong nhay vao Snowflake ngay)
Em trinh bay nhieu options roi chon co ly do
Em hieu trade-offs (clock skew, ordering guarantees, security)
Em biet van hanh he thong (monitoring, alerting, NTP)

9. Internal Links --- Lien ket noi bo

Topic	Link	Lien quan
Scaling	Tuan-01-Scale-From-Zero-To-Millions	Tai sao can distributed ID khi scale
Capacity Estimation	Tuan-02-Back-of-the-envelope	Cong thuc IDs/sec, overflow calculation
Database Sharding	Tuan-07-Database-Sharding-Replication	ID lam shard key, sequential vs random impact
Consistent Hashing	Tuan-10-Consistent-Hashing	Phan phoi ID generation across nodes
Monitoring	Tuan-13-Monitoring-Observability	Clock drift monitoring, alerting
Security	Tuan-14-AuthN-AuthZ-Security · Tuan-15-Data-Security-Encryption	ID enumeration, information leakage
URL Shortener	Tuan-16-Design-URL-Shortener	Snowflake ID dung lam base cho short code
Key-Value Store	Tuan-20-Design-Key-Value-Store	ID lam key trong distributed KV store

Tham khao

Alex Xu, System Design Interview --- Chapter 7: Design a Unique ID Generator in Distributed Systems
sdi.anhvy.dev --- Vietnamese System Design Reference
Twitter Snowflake (GitHub) --- Original Snowflake implementation
Announcing Snowflake (Twitter Engineering Blog, 2010)
Discord Snowflake IDs --- Discord’s Snowflake variant
Instagram Sharding & IDs --- Instagram’s ID generation
ULID Spec --- Universally Unique Lexicographically Sortable Identifier
KSUID (Segment) --- K-Sortable Unique Identifier
Baidu uid-generator --- Baidu’s Snowflake variant
Sony Sonyflake --- Sony’s Snowflake variant (focus on machine ID from IP)
NTP RFC 5905 --- Network Time Protocol v4
Google TrueTime (Spanner Paper) --- Globally-consistent distributed database

Case Study lien quan: Tuan-16-Design-URL-Shortener (Snowflake ID dung de tao short code) · Tuan-20-Design-Key-Value-Store (ID lam key trong distributed store)

lthieu's notes

Explorer

Case-Design-Unique-ID-Generator