Case Study: Design a Proximity Service (Nearby Businesses)

“Mỗi ngày hàng trăm triệu người mở Google Maps hỏi: ‘Quán phở nào gần đây nhất?’ — Câu hỏi tưởng đơn giản, nhưng phía sau là một bài toán geospatial indexing phức tạp, nơi toán học gặp engineering ở quy mô hành tinh.”

Tags: system-design proximity-service geospatial geohash quadtree alex-xu case-study vol2 Student: Hieu Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-06-Cache-Strategy · Tuan-07-Database-Sharding-Replication Lien quan: Tuan-05-Load-Balancer · Tuan-09-Rate-Limiter · Tuan-10-Consistent-Hashing · Tuan-13-Monitoring-Observability Reference: Alex Xu, System Design Interview — An Insider’s Guide, Volume 2 — Chapter 1: Proximity Service

1. Context & Why — Tai sao Proximity Service quan trong?

1.1 Analogy — Bai toan doi thuong

Hieu, hay tuong tuong 2 tinh huong hang ngay:

Tinh huong 1 — Google Maps: Em dang o quan 1 TP.HCM, mo Google Maps va go “pho”. Trong vong 0.5 giay, ban do hien ra 20 quan pho gan nhat, sap xep theo khoang cach. Phia sau cai “0.5 giay” do la gi? Lam sao tu toa do GPS cua em (10.7769, 106.7009), he thong tim duoc 20 quan pho trong hang tram trieu dia diem tren the gioi?

Tinh huong 2 — Grab tim tai xe: Em dat Grab, app can tim tat ca tai xe trong ban kinh 3km quanh vi tri cua em. Nhung tai xe di chuyen lien tuc — vi tri cua ho thay doi moi 3-5 giay. He thong phai tra loi cau hoi “ai gan em nhat?” cho hang trieu user dong thoi.

Ca hai tinh huong nay deu la bai toan Proximity Service — cho mot toa do (latitude, longitude) va ban kinh (radius), tim tat ca cac diem (businesses, tai xe, ban be) trong vung do.

1.2 Tai sao bai toan nay kho?

Van de	Giai thich	Quy mo
Du lieu khong lo	200 trieu businesses tren toan cau	~200M records
Two-dimensional search	Khong the dung B-tree index binh thuong — latitude va longitude la 2 chieu doc lap	O(N) naive search
Read-heavy	Hang tram trieu user search dong thoi, nhung businesses it khi thay doi	Read:Write ~ 1000:1
Low latency	User ky vong ket qua trong < 500ms	p99 < 200ms
Global scale	User va businesses o khap the gioi, moi region co density khac nhau	Tokyo dense, Sahara sparse

1.3 Real-world Systems su dung Proximity Service

Company	Use Case	Scale
Google Maps / Places	Tim nha hang, gas station, ATM	200M+ places
Uber / Grab	Tim tai xe gan nhat	Millions of active drivers
Yelp	Tim businesses theo category + location	~200M businesses
Facebook Nearby Friends	Tim ban be dang o gan	2B+ users
Tinder	Tim nguoi match trong ban kinh	75M+ users
DoorDash	Tim nha hang co delivery	390K+ merchants

Step 1 — Understand the Problem & Establish Design Scope

1.1 Clarifying Questions (Cau hoi lam ro)

Trong interview, luon hoi truoc khi thiet ke. Day la cac cau hoi quan trong:

Cau hoi	Tra loi	Ghi chu
User co the chi dinh ban kinh search?	Co — 0.5km, 1km, 2km, 5km, 20km	Default 5km
Radius toi da la bao nhieu?	20km	Gioi han de tranh query qua rong
Businesses co the update thong tin?	Co — nhung thay doi phan anh vao ngay hom sau la chap nhan duoc	Near real-time khong bat buoc
User co the di chuyen va search dong thoi?	Co, nhung khong can auto-refresh lien tuc — chi khi user bam search lai	Khac voi Nearby Friends
Can sap xep ket qua theo gi?	Khoang cach, hoac ranking/rating	Distance la primary
Business info gom nhung gi?	Ten, dia chi, toa do, category, gio mo cua, rating, anh	Rich metadata

1.2 Functional Requirements (Yeu cau chuc nang)

FR1: Cho toa do (lat, lng) va ban kinh (radius), tra ve danh sach businesses trong vung do
FR2: Business owners co the them/sua/xoa thong tin business (CRUD)
FR3: Users co the xem chi tiet business (ten, dia chi, rating, gio mo cua, anh)
FR4: Ket qua co the loc theo category (nha hang, cafe, gas station…)

1.3 Non-functional Requirements (Yeu cau phi chuc nang)

Yeu cau	Muc tieu	Giai thich
Low Latency	p99 < 200ms cho search	User ky vong ket qua nhanh
High Availability	99.99% uptime	LBS la core feature cua Maps
High Scalability	200M DAU, 100M businesses	Global scale
Eventual Consistency	Business updates co the delay vai phut	Khong can real-time cho business info
Read-heavy	Read QPS >> Write QPS	Search nhieu hon update

1.4 Capacity Estimation (Uoc luong nang luc)

Assumptions:

Thong so	Gia tri	Giai thich
DAU	200M	Quy mo Google Maps
Businesses	100M	Toan cau
Searches/user/day	5	Trung binh
Business updates/day	100K	Them/sua/xoa

QPS Calculation — Search:

T o t a l se a rc h es / d a y = 200 M \times 5 = 1 B se a rc h es / d a y

S e a rc h QP S_{a vg} = \frac{1 \times 1 0 ^{9}}{86 , 400} \approx 11, 574 se a rc h es / s \approx 12 K QPS

S e a rc h QP S_{p e ak} = 12 K \times 5 = 60 K QPS

Nhan xet: 60K peak QPS cho read — day la read-heavy system. Write QPS chi khoang 1-2 QPS cho business updates. Ty le Read:Write ~ 60,000:1.

Write QPS — Business Updates:

W r i t e QP S_{a vg} = \frac{100 , 000}{86 , 400} \approx 1.16 w r i t es / s

Write QPS cuc thap — hoan toan co the handle boi 1 primary database server.

Storage — Business Data:

B u s in ess recor d s i ze \approx 1 K B (m e t a d a t a) + 5 K B (ima g es re f s) = 6 K B

T o t a l b u s in ess s t or a g e = 100 M \times 6 K B = 600 GB

Storage — Geospatial Index:

G eos p a t ia l in d e x e n t ry \approx 8 b y t es (b u s in ess_i d) + 8 b y t es (l a t) + 8 b y t es (l n g) + 12 b y t es (g eo ha s h) = 36 b y t es

T o t a l g eos p a t ia l in d e x = 100 M \times 36 b y t es = 3.6 GB

Quan trong: 3.6 GB geospatial index vua du de load toan bo vao memory. Day la insight then chot — ta co the build in-memory index (Quadtree) tren 1 server.

Cache Sizing:

Ho t b u s in esses (t o p 20%) = 100 M \times 20% = 20 M b u s in esses

C a c h e s i ze = 20 M \times 6 K B = 120 GB

Su dung Redis cluster voi vai node la du. Tham chieu Tuan-06-Cache-Strategy.

Quadtree Memory Estimation (se deep dive o section 3):

I n t er na l n o d e s i ze \approx 32 b y t es (4 p o in t ers + m e t a d a t a)

L e a f n o d e s i ze \approx 8 b y t es (b u s in ess_i d p o in t er) \times a vg 100 b u s in esses = 800 b y t es + 32 b y t es = 832 b y t es

T o t a l l e a f n o d es \approx \frac{100 M}{100} = 1 M l e a v es

T o t a l in t er na l n o d es \approx \frac{1 M}{3} \approx 333 K n o d es

Q u a d t ree m e m ory \approx (333 K \times 32) + (1 M \times 832) \approx 10.6 MB + 832 MB \approx 843 MB

Aha Moment: Quadtree cho 100M businesses chi chiem khoang 843 MB RAM — thoai mai fit trong 1 server co 8 GB RAM. Day la ly do nhieu he thong chon Quadtree lam in-memory geospatial index.

Tom tat Estimation

Metric	Value
Search QPS (peak)	~60K/s
Write QPS (peak)	~5/s
Business data storage	~600 GB
Geospatial index size	~3.6 GB
Cache size (hot data)	~120 GB
Quadtree memory	~843 MB
Read:Write ratio	~60,000:1

Step 2 — High-Level Design

2.1 Kien truc tong quan — 2 Service chinh

Proximity Service bao gom 2 loai service co dac tinh hoan toan khac nhau:

Service	Chuc nang	Dac tinh	Scale strategy
LBS (Location-Based Service)	Nhan toa do + radius, tra ve businesses gan nhat	Read-heavy, stateless, no write	Horizontal scaling, read replicas
Business Service	CRUD operations cho businesses	Write operations, low QPS	Single primary + replicas

Tai sao tach 2 service? Vi read va write co pattern hoan toan khac nhau:

LBS: 60K QPS, stateless, can low latency → scale horizontally, dat sau Load Balancer
Business Service: 5 QPS, stateful, can consistency → single primary database

Principle: Tach read path va write path khi chung co scale requirements khac nhau. Day la CQRS (Command Query Responsibility Segregation) pattern — mot dang cua Tuan-11-Microservices-Pattern.

2.2 High-Level Architecture Diagram

flowchart TB
    subgraph Clients
        Mobile["Mobile App"]
        Web["Web App"]
    end

    LB["Load Balancer<br/>→ [[Tuan-05-Load-Balancer]]"]

    subgraph "Read Path (LBS)"
        LBS1["LBS Instance 1"]
        LBS2["LBS Instance 2"]
        LBSN["LBS Instance N"]
    end

    subgraph "Write Path"
        BS["Business Service"]
    end

    subgraph "Data Layer"
        Primary["Primary DB<br/>PostgreSQL"]
        R1["Read Replica 1"]
        R2["Read Replica 2"]
        RN["Read Replica N"]
    end

    subgraph "Geospatial Index"
        GI["Geohash Index<br/>(in DB read replicas)"]
        QT["Quadtree<br/>(in-memory, per LBS instance)"]
    end

    subgraph "Cache Layer"
        RC["Redis Cluster<br/>Business data cache<br/>Geohash → business_ids cache"]
    end

    Mobile --> LB
    Web --> LB
    LB -->|"GET /nearby"| LBS1 & LBS2 & LBSN
    LB -->|"POST/PUT/DELETE /businesses"| BS

    LBS1 & LBS2 & LBSN --> RC
    LBS1 & LBS2 & LBSN --> R1 & R2 & RN
    LBS1 & LBS2 & LBSN -.-> QT

    BS --> Primary
    Primary -->|"Replication"| R1 & R2 & RN

    R1 & R2 & RN --> GI

    style LB fill:#42a5f5,color:#fff
    style RC fill:#ef5350,color:#fff
    style Primary fill:#66bb6a,color:#fff
    style QT fill:#ffa726,color:#000

2.3 API Design

Search Nearby Businesses:

Field	Type	Description
`GET /v1/search/nearby`
`latitude`	double	Vi do cua user (-90 den 90)
`longitude`	double	Kinh do cua user (-180 den 180)
`radius`	int	Ban kinh search (meters). Default: 5000. Max: 20000
`category`	string	(Optional) Filter theo loai business
`page_token`	string	(Optional) Pagination token

Response:

Field	Type	Description
`businesses`	array	Danh sach businesses
`businesses[].id`	string	Business ID
`businesses[].name`	string	Ten business
`businesses[].lat`	double	Vi do
`businesses[].lng`	double	Kinh do
`businesses[].distance`	int	Khoang cach (meters)
`businesses[].rating`	float	Rating trung binh
`next_page_token`	string	Token cho trang tiep theo

Business CRUD:

Endpoint	Method	Description
`/v1/businesses`	POST	Tao business moi
`/v1/businesses/:id`	GET	Xem chi tiet business
`/v1/businesses/:id`	PUT	Cap nhat business
`/v1/businesses/:id`	DELETE	Xoa business

2.4 Data Flow — Read vs Write

Read Flow (Search):

1. User gui request GET /v1/search/nearby?lat=10.77&lng=106.70&radius=5000
2. Load Balancer route den 1 trong N LBS instances
3. LBS tinh geohash tu (lat, lng) va radius → xac dinh cac geohash cells can query
4. LBS check Redis cache: geohash_cells → list of business_ids
5. Neu cache hit → lay business details tu Business Data Cache
6. Neu cache miss → query Read Replica (geospatial index table)
7. Tinh khoang cach chinh xac, loc businesses ngoai radius
8. Sap xep theo distance, tra ve ket qua

Write Flow (Update Business):

1. Business owner gui PUT /v1/businesses/123
2. Load Balancer route den Business Service
3. Business Service update Primary DB
4. Primary DB replicate sang Read Replicas (async)
5. Cache invalidation: xoa cache entries lien quan
6. Neu dung Quadtree: rebuild scheduled (overnight hoac moi vai gio)

Trade-off: Business updates khong phan anh ngay lap tuc trong search results. Delay co the la vai phut (replication lag) den vai gio (quadtree rebuild). Day la chap nhan duoc vi businesses hiem khi thay doi vi tri.

Step 3 — Design Deep Dive

3.1 Bai toan cot loi: Geospatial Indexing

Hieu, day la phan kho nhat va quan trong nhat cua bai nay. Bai toan co ban la:

Cho diem P(lat, lng) va ban kinh r, tim tat ca businesses B sao cho distance(P, B) ⇐ r.

Cach naive — Two-dimensional search:

Cach don gian nhat: query truc tiep trong database.

SELECT * FROM businesses
WHERE latitude BETWEEN 10.77 - delta AND 10.77 + delta
  AND longitude BETWEEN 106.70 - delta AND 106.70 + delta

Van de: Ngay ca khi co index tren latitude VA index tren longitude, database phai:

Query index tren latitude → tap A (co the hang trieu records)
Query index tren longitude → tap B (co the hang trieu records)
Lay giao (intersection) A ∩ B → rat cham!

T im e co m pl e x i t y = O (N_{l a t}) + O (N_{l n g}) + O (in t ersec t)

Ly do: B-tree index chi hieu qua cho 1 chieu. Khi co 2 chieu doc lap (lat, lng), khong co B-tree nao co the index ca 2 dong thoi mot cach hieu qua.

Day la ly do ta can geospatial indexing — cac ky thuat chuyen biet de bien bai toan 2 chieu thanh bai toan 1 chieu hoac chia khong gian thanh cac o (cells) de thu hep pham vi search.

3.2 Approach 1 — Evenly Divided Grid

Y tuong: Chia toan bo ban do thanh cac o (cells) co kich thuoc bang nhau. Moi business thuoc ve dung 1 cell. Khi search, chi can tim cac businesses trong cell hien tai va cac cells lan can.

Van de nghiem trong:

Van de	Giai thich
Distribution khong deu	Tokyo co the co 100,000 businesses trong 1 cell, nhung Sahara co 0
Fixed granularity	Khong the adaptive — vung dong duc can grid nho hon, vung thua can grid lon hon
Qua nhieu empty cells	70% Trai Dat la nuoc → 70% cells rong
Edge cases	Business o sat ranh gioi cell → search can check nhieu cells

Ket luan: Evenly divided grid khong phu hop cho du lieu phan bo khong deu. Ta can cach tiep can thong minh hon.

3.3 Approach 2 — Geohash (Core Approach)

3.3.1 Geohash la gi?

Geohash la ky thuat bien toa do 2 chieu (lat, lng) thanh 1 chuoi string duy nhat bang cach chia doi xen ke (interleaving) latitude va longitude.

Quy trinh encoding (simplified):

Buoc 1 — Encode Longitude:

Bat dau voi khoang [-180, 180]
Chia doi: neu lng nam ben trai ([-180, 0]) → bit 0, ben phai ([0, 180]) → bit 1
Lap lai, chia doi tiep khoang duoc chon
Vi du: lng = 106.70 → [0, 180] → 1, [90, 180] → 1, [90, 135] → 1, [101.25, 112.5] → 1, …

Buoc 2 — Encode Latitude:

Tuong tu, bat dau voi [-90, 90]
Vi du: lat = 10.77 → [0, 90] → 1, [0, 45] → 0, [0, 22.5] → 0, …

Buoc 3 — Interleave Bits:

Xen ke: bit lon[0], bit lat[0], bit lon[1], bit lat[1], …
Ket qua la 1 chuoi bits

Buoc 4 — Encode thanh Base32:

Nhom 5 bits → 1 ky tu Base32 (0-9, b-z loai tru a, i, l, o)
Vi du: toa do TP.HCM (10.7769, 106.7009) → geohash: w3gvk1

3.3.2 Precision Levels — Do chinh xac

Geohash Length	Cell Width	Cell Height	Su dung
1	5,009 km	4,992 km	Toan bo luc dia
2	1,252 km	624 km	Quoc gia nho
3	156 km	156 km	Tinh / state
4	39.1 km	19.5 km	Thanh pho lon
5	4.89 km	4.89 km	Quan / district
6	1.22 km	0.61 km	Khu vuc nho — phu hop cho proximity search
7	153 m	153 m	Duong pho
8	38.2 m	19.1 m	Toa nha
9	4.77 m	4.77 m	Phong trong toa nha
12	3.7 cm	1.8 cm	Do chinh xac GPS max

Rule of thumb cho proximity search: Radius 5km → dung geohash length 5 (cell ~4.89 km). Radius 1km → dung geohash length 6 (cell ~1.22 km).

3.3.3 Mapping Radius to Geohash Precision

Radius	Geohash Precision	Giai thich
0.5 km	6	Cell 1.22km, can check cell hien tai + 8 neighbors
1 km	5	Cell 4.89km du cover
2 km	5	Cell 4.89km du cover
5 km	4	Cell 39.1km — du rong, co the tra ve qua nhieu ket qua
20 km	4	Cell 39.1km cover duoc

Chien luoc thuc te: Chon precision level sao cho cell size >= radius va query cell hien tai + 8 cells lan can (tong cong 9 cells). Dieu nay dam bao cover toan bo vung tron search.

3.3.4 Tinh chat quan trong cua Geohash

Tinh chat 1 — Shared prefix = Nearby (thuong la dung):

Hai diem co geohash chia se prefix cang dai → cang gan nhau.

Diem A	Diem B	Shared prefix	Khoang cach xap xi
`w3gvk1`	`w3gvk9`	`w3gvk` (5 chars)	< 5 km
`w3gvk1`	`w3gvm2`	`w3gv` (4 chars)	< 40 km
`w3gvk1`	`w3h001`	`w3` (2 chars)	< 1,252 km

Tinh chat 2 — Neighbor lookup:

Moi geohash cell co chinh xac 8 neighbors (tren, duoi, trai, phai, 4 goc). Co the tinh neighbors bang algorithm hieu qua trong O(1).

3.3.5 Edge Cases — Nhung truong hop dac biet

Edge Case 1 — Khac prefix nhung rat gan (Boundary Problem):

┌──────────┬──────────┐
│          │          │
│  w3gvk   │  w3gvm   │
│          │          │
│        A ● ● B     │
│          │          │
└──────────┴──────────┘

Diem A (w3gvk...) va B (w3gvm...) o 2 cells khac nhau — prefix khac hoan toan tu ky tu thu 5. Nhung thuc te chung chi cach nhau 50 met!

Giai phap: Luon query cell hien tai + 8 cells lan can. Dieu nay dam bao khong bo sot cac businesses o ranh gioi cell.

Edge Case 2 — Cung prefix nhung xa (Rare case):

Do cach Hilbert/Z-curve map 2D → 1D, co nhung truong hop 2 cells co cung prefix nhung khong thuc su lan can ve mat dia ly. Vi du: 2 cells o 2 ben cua duong ranh gioi encoding.

Giai phap: Sau khi lay danh sach businesses tu geohash cells, luon tinh khoang cach thuc te (Haversine formula) va loc bo nhung businesses ngoai radius.

d = 2 r arcsin (sin^{2} (\frac{ϕ _{2} - ϕ _{1}}{2}) + cos (ϕ_{1}) cos (ϕ_{2}) sin^{2} (\frac{λ _{2} - λ _{1}}{2}))

Trong do $r$ = ban kinh Trai Dat (6,371 km), $ϕ$ = latitude (radians), $λ$ = longitude (radians).

Edge Case 3 — Khong du businesses trong radius:

Khi search radius = 1km nhung chi co 2 businesses, user co the muon xem them. Chien luoc mo rong radius:

1. Search voi geohash precision tuong ung radius ban dau
2. Neu ket qua < min_results (vi du: 10)
3. Giam geohash precision 1 bac (mo rong vung search ~4x)
4. Search lai
5. Lap lai cho den khi du ket qua hoac dat max_radius

3.4 Approach 3 — Quadtree (Adaptive Grid)

3.4.1 Quadtree la gi?

Quadtree la cau truc du lieu hinh cay, trong do moi node co the co 0 hoac 4 children. No tu dong chia nho vung co nhieu businesses va giu nguyen vung it businesses.

Nguyen tac: Chia khong gian 2D thanh 4 phan. Neu 1 phan co nhieu hon K businesses (threshold, vi du K=100), tiep tuc chia phan do thanh 4. Lap lai cho den khi moi o co ⇐ K businesses.

3.4.2 Quy trinh Build Quadtree

Bat dau: Toan bo ban do la 1 node
  - Node chua 5,000,000 businesses (> threshold 100) → SPLIT

Node chia thanh 4 children:
  NW (Northwest): 1,200,000 businesses → SPLIT
  NE (Northeast): 2,500,000 businesses → SPLIT
  SW (Southwest): 800,000 businesses → SPLIT
  SE (Southeast): 500,000 businesses → SPLIT

Tiep tuc chia cho den khi moi leaf node co <= 100 businesses

Dac diem quan trong:

Dac diem	Giai thich
Adaptive	Vung Tokyo (dong duc) co cell nho ~100m. Vung Sahara co cell lon ~1000km
In-memory	Toan bo tree nam trong RAM — query cuc nhanh
Build time	Build 1 lan khi server khoi dong, hoac rebuild dinh ky
Not easily shareable	Moi server giu 1 ban copy cua tree — khong chia se qua network

3.4.3 Quadtree Structure — Chi tiet Node

Internal Node (node co children):

Field	Size	Giai thich
Top-left coordinates	16 bytes	(lat, lng) goc tren trai
Bottom-right coordinates	16 bytes	(lat, lng) goc duoi phai
Pointer to NW child	8 bytes
Pointer to NE child	8 bytes
Pointer to SW child	8 bytes
Pointer to SE child	8 bytes
Total	64 bytes

Leaf Node (node chua businesses):

Field	Size	Giai thich
Top-left coordinates	16 bytes	Ranh gioi vung
Bottom-right coordinates	16 bytes	Ranh gioi vung
List of business_ids	8 bytes x N	N ⇐ threshold (100)
Total	32 + 8N bytes	Voi N=100: ~832 bytes

3.4.4 Memory Estimation (Chi tiet)

T h res h o l d K = 100 b u s in esses p er l e a f

T o t a l l e a f n o d es = \frac{100 M}{100} = 1 M l e a v es

Quadtree la full 4-ary tree. Tong so internal nodes:

I n t er na l n o d es = \frac{l e a f n o d es - 1}{3} = \frac{1 M - 1}{3} \approx 333, 333 n o d es

M e m or y_{in t er na l} = 333, 333 \times 64 b y t es = 21.3 MB

M e m or y_{l e a v es} = 1 M \times (32 + 8 \times 100) b y t es = 1 M \times 832 b y t es = 832 MB

T o t a l Q u a d t ree m e m ory = 21.3 MB + 832 MB \approx 853 MB

Key insight: 853 MB — mot server voi 8 GB RAM co the chua thoai mai. Khong can distributed index! Day la dieu nhieu ung vien interview khong nhan ra.

3.4.5 Build Time Estimation

B u i l d t im e \approx O (N \times lo g_{4} N) \approx O (100 M \times lo g_{4} 100 M)

lo g_{4} (100 M) \approx lo g_{4} (1 0^{8}) \approx \frac{8}{lo g _{10} 4} \approx \frac{8}{0.602} \approx 13.3 l e v e l s

B u i l d t im e \approx 100 M \times 13.3 \approx 1.33 B o p er a t i o n s

Voi may tinh hien dai xu ly ~1 billion operations/second, build time khoang vai phut. Co the chap nhan cho startup time.

3.4.6 Quadtree Search Algorithm

Tim businesses trong radius R tu diem P(lat, lng):

1. Bat dau tu root node
2. Tim leaf node chua diem P (traverse tree theo toa do)
3. Tinh khoang cach tu P den moi business trong leaf node do
4. Neu vong tron (P, R) overlap voi cac neighboring cells:
   - Traverse len parent, check cac sibling nodes
   - Lap lai cho den khi vong tron duoc cover hoan toan
5. Loc businesses co distance > R
6. Sap xep theo distance, tra ve top N

S e a rc h t im e = O (lo g_{4} N + k)

Trong do $k$ = so businesses trong vung search. Voi $N = 100 M$ :

S e a rc h t im e = O (13.3 + k) \approx O (k) c h o k l o n

3.4.7 Update Strategy cho Quadtree

Van de: Businesses thay doi (them moi, dong cua, doi dia chi) — lam sao update Quadtree?

Option 1 — Rebuild dinh ky (Recommended):

Aspect	Detail
Frequency	Moi vai gio hoac overnight
Process	Build quadtree moi tu database, swap voi tree cu (atomic swap)
Downtime	Zero — tree cu van serve requests trong khi tree moi dang build
Phù hop khi	Business data thay doi cham (hang ngay), khong can real-time

Option 2 — Incremental update:

Aspect	Detail
Process	Them/xoa business truc tiep trong tree
Complexity	Cao — can handle split/merge khi node vuot threshold
Phù hop khi	Data thay doi thuong xuyen (vi du: driver locations trong Uber)

Cho bai toan Proximity Service (businesses), Option 1 la du vi businesses hiem khi thay doi vi tri.

3.5 Approach 4 — Google S2 Geometry Library

3.5.1 S2 la gi?

Google S2 la thu vien chuyen dung do Google phat trien, su dung Hilbert curve de map be mat hinh cau (Trai Dat) thanh cac S2 cells.

Diem khac biet voi Geohash:

Feature	Geohash	S2
Hinh dang cell	Hinh chu nhat tren ban do	Hinh thoi (rhombus) tren hinh cau
Mo hinh Trai Dat	Phang (flat projection)	Hinh cau (sphere)
Curve type	Z-order curve	Hilbert curve
Cell levels	1-12 (base32 chars)	0-30 (64-bit integer)
Do chinh xac	Tot, co boundary issues	Tot hon — Hilbert curve bao toan locality tot hon
Region covering	Manual (9 cells)	Tu dong — S2RegionCoverer tim toi uu cells

3.5.2 Hilbert Curve — Tai sao tot hon Z-curve?

Z-curve (Geohash) co van de “jumps” — doi khi 2 cells ke nhau tren ban do lai xa nhau tren Z-curve. Dieu nay gay ra cac boundary issues.

Hilbert curve dam bao: 2 diem gan nhau tren curve hau nhu luon gan nhau trong khong gian 2D. Tinh chat nay goi la locality-preserving — tot hon Z-curve dang ke.

3.5.3 S2 trong thuc te

Company	Su dung S2 cho
Google Maps	Core geospatial indexing
Uber H3	Inspired by S2, dung hexagonal cells
Foursquare	Place detection
MongoDB	2dsphere index su dung S2 internally

3.6 So sanh Geohash vs Quadtree vs S2

Tieu chi	Geohash	Quadtree	S2
Do phuc tap implement	Thap — chi la string operation	Trung binh — can build tree	Cao — can S2 library
Storage	Trong database (column)	In-memory tren moi server	Trong database (column)
Update	Update database record	Rebuild tree	Update database record
Adaptive density	Khong — fixed grid size	Co — tu dong chia nho vung dong duc	Co — S2RegionCoverer
Precision control	12 levels (base32)	Vo han (chia cho den khi du)	31 levels (0-30)
Boundary issues	Co — can query 8 neighbors	It — tree structure handle tu nhien	It nhat — Hilbert curve
Distributed	De — geohash la string, index binh thuong	Kho — tree phai o 1 server	De — S2 cell ID la integer
Database support	Redis, PostgreSQL, MySQL, DynamoDB	Phai tu implement	MongoDB (native), PostgreSQL (extension)
Phu hop cho	Hau het use cases, don gian	Dataset lon, can adaptive, in-memory ok	Google-scale, hinh cau chinh xac

Recommendation cho interview: Chon Geohash lam primary approach (de giai thich, database support tot). Nhac den Quadtree nhu alternative khi interviewer hoi ve adaptive density. Nhac den S2 de show depth of knowledge.

3.7 Database Schema Design

3.7.1 Business Table

Column	Type	Description	Index
`business_id`	UUID (PK)	ID duy nhat	Primary Key
`name`	VARCHAR(255)	Ten business	Full-text index
`address`	TEXT	Dia chi
`city`	VARCHAR(100)	Thanh pho
`country`	VARCHAR(2)	Ma quoc gia (ISO 3166)
`latitude`	DOUBLE	Vi do
`longitude`	DOUBLE	Kinh do
`category`	VARCHAR(50)	Loai business	Index
`rating`	DECIMAL(2,1)	Rating trung binh
`phone`	VARCHAR(20)	So dien thoai
`hours`	JSONB	Gio mo cua
`photos`	JSONB	URLs anh
`created_at`	TIMESTAMP
`updated_at`	TIMESTAMP		Index

3.7.2 Geospatial Index Table (Geohash approach)

Column	Type	Description	Index
`geohash`	VARCHAR(12)	Geohash cua business	Primary prefix index
`business_id`	UUID	FK to business table
`latitude`	DOUBLE	De tinh khoang cach chinh xac
`longitude`	DOUBLE	De tinh khoang cach chinh xac

Index Strategy: Tao compound index tren (geohash, business_id). Khi query, dung prefix search:

-- Tim businesses trong geohash cell "w3gvk" va cac neighbors
WHERE geohash LIKE 'w3gvk%'
   OR geohash LIKE 'w3gvm%'
   OR geohash LIKE 'w3gvj%'
   ... (tong 9 cells)

Optimization: Thay vi 9 queries rieng biet, dung IN clause hoac UNION ALL de batch thanh 1 query.

3.7.3 Database Architecture — Read Replicas

flowchart LR
    subgraph "Write Path"
        BS["Business Service"]
        Primary["Primary DB<br/>PostgreSQL"]
    end

    subgraph "Read Path"
        LBS["LBS Instances"]
        R1["Read Replica 1<br/>US-West"]
        R2["Read Replica 2<br/>US-East"]
        R3["Read Replica 3<br/>EU-West"]
        R4["Read Replica 4<br/>AP-Southeast"]
    end

    BS --> Primary
    Primary -->|"Async Replication"| R1 & R2 & R3 & R4
    LBS --> R1 & R2 & R3 & R4

    style Primary fill:#66bb6a,color:#fff
    style R1 fill:#42a5f5,color:#fff
    style R2 fill:#42a5f5,color:#fff
    style R3 fill:#42a5f5,color:#fff
    style R4 fill:#42a5f5,color:#fff

Tai sao Read Replicas theo region? Vi LBS la read-heavy (60K QPS peak). Moi region co read replicas rieng → giam latency (user o Singapore query replica o Singapore, khong phai di qua US). Tham chieu Tuan-07-Database-Sharding-Replication.

3.8 Caching Strategy

3.8.1 Cache Architecture

Su dung 2 lop cache:

Cache Layer 1 — Geohash to Business IDs:

Key	Value	TTL	Giai thich
`geo:w3gvk:restaurant`	`[id1, id2, id3, ...]`	1 hour	Danh sach business IDs trong geohash cell + category
`geo:w3gvk:*`	`[id1, id2, ...]`	1 hour	Tat ca businesses trong cell (khong filter category)

Cache Layer 2 — Business Details:

Key	Value	TTL	Giai thich
`biz:uuid-123`	`{name, lat, lng, rating, ...}`	24 hours	Chi tiet business

3.8.2 Cache Flow

Search request: lat=10.77, lng=106.70, radius=5000, category=restaurant

1. Tinh geohash: w3gvk (precision 5)
2. Tinh 8 neighbors: w3gvm, w3gvj, w3gvh, w3gvs, w3gvt, w3gve, w3gv7, w3gv6
3. Voi moi geohash cell:
   a. Check Redis: GET geo:w3gvk:restaurant
   b. Cache HIT → nhan list business_ids
   c. Cache MISS → query Read Replica → set cache
4. Merge tat ca business_ids tu 9 cells
5. Voi moi business_id:
   a. Check Redis: GET biz:uuid-123
   b. Cache HIT → nhan business details
   c. Cache MISS → query Read Replica → set cache
6. Tinh distance, filter, sort, return

3.8.3 Cache Invalidation

Event	Action	Ly do
Business update (name, hours, rating)	Invalidate `biz:{id}`	Chi tiet thay doi
Business move (lat/lng thay doi)	Invalidate `biz:{id}` + `geo:{old_hash}:` + `geo:{new_hash}:`	Vi tri thay doi → geohash cell thay doi
Business deleted	Invalidate `biz:{id}` + `geo:{hash}:*`	Xoa khoi tat ca cache
TTL expiry	Tu dong	Safety net

Tham chieu Tuan-06-Cache-Strategy cho cac pattern cache invalidation chi tiet.

3.9 Scaling Strategy

3.9.1 LBS — Stateless Horizontal Scaling

LBS instances la stateless (khong giu state nao). Moi request co the duoc xu ly boi bat ky instance nao.

L BS in s t an ces n ee d e d = \frac{QP S _{p e ak}}{QP S _{p er in s t an ce}}

= \frac{60 , 000}{2 , 000} = 30 in s t an ces (v o i m o i in s t an ce han d l e 2 K QPS)

Dat sau Load Balancer (Tuan-05-Load-Balancer), su dung round-robin hoac least-connections.

Neu dung Quadtree: Moi LBS instance giu 1 ban copy cua Quadtree trong memory. Khi scale them instance → instance moi build Quadtree tu database khi khoi dong (vai phut). Sau do serve requests binh thuong.

3.9.2 Database — Read Replicas per Region

Region	Read Replicas	Ly do
US-West	3	California, tech hub
US-East	3	New York, Washington DC
EU-West	2	London, Paris
AP-Southeast	3	Singapore, TP.HCM, Jakarta — high density
AP-Northeast	2	Tokyo, Seoul

T o t a l re a d re pl i c a s = 13

QP S_{p er re pl i c a} = \frac{60 K}{13} \approx 4, 615 QPS

PostgreSQL co the handle 10K+ simple reads/s → 4.6K QPS per replica la thoai mai.

3.9.3 Cache — Redis Cluster

C a c h e n o d es = \frac{120 GB}{16 GB p er n o d e} = 8 R e d i s n o d es

Su dung Redis Cluster voi sharding theo geohash prefix. Tat ca keys voi geohash bat dau “w3” (Vietnam) se o cung shard → locality tot cho cache.

Query Flow — End to End

sequenceDiagram
    participant U as User (Mobile)
    participant LB as Load Balancer
    participant LBS as LBS Instance
    participant Cache as Redis Cache
    participant DB as Read Replica

    U->>LB: GET /v1/search/nearby<br/>lat=10.77, lng=106.70, radius=5000

    LB->>LBS: Route to LBS Instance

    Note over LBS: 1. Tinh geohash: w3gvk (precision 5)
    Note over LBS: 2. Tinh 8 neighbors

    loop Cho moi 9 geohash cells
        LBS->>Cache: GET geo:w3gvk:*
        alt Cache HIT
            Cache-->>LBS: [business_id_1, business_id_2, ...]
        else Cache MISS
            LBS->>DB: SELECT business_id FROM geo_index<br/>WHERE geohash LIKE 'w3gvk%'
            DB-->>LBS: [business_id_1, business_id_2, ...]
            LBS->>Cache: SET geo:w3gvk:* (TTL 1h)
        end
    end

    Note over LBS: 3. Merge business_ids tu 9 cells

    loop Cho moi business_id
        LBS->>Cache: GET biz:{id}
        alt Cache HIT
            Cache-->>LBS: {name, lat, lng, rating, ...}
        else Cache MISS
            LBS->>DB: SELECT * FROM businesses<br/>WHERE id = ?
            DB-->>LBS: {name, lat, lng, rating, ...}
            LBS->>Cache: SET biz:{id} (TTL 24h)
        end
    end

    Note over LBS: 4. Tinh Haversine distance
    Note over LBS: 5. Filter distance > radius
    Note over LBS: 6. Sort by distance

    LBS-->>LB: {businesses: [...], next_page_token: ...}
    LB-->>U: 200 OK

Geohash Grid Visualization

graph TB
    subgraph "Geohash Grid — TP.HCM Area (Precision 5)"
        subgraph row1[""]
            C1["w3gv6<br/>0 biz"]
            C2["w3gv7<br/>12 biz"]
            C3["w3gve<br/>5 biz"]
        end
        subgraph row2[""]
            C4["w3gvj<br/>45 biz"]
            C5["w3gvk<br/>●USER<br/>78 biz"]
            C6["w3gvm<br/>23 biz"]
        end
        subgraph row3[""]
            C7["w3gvh<br/>8 biz"]
            C8["w3gvs<br/>156 biz"]
            C9["w3gvt<br/>34 biz"]
        end
    end

    style C5 fill:#ff9800,color:#000,stroke:#e65100,stroke-width:3px
    style C4 fill:#c8e6c9
    style C6 fill:#c8e6c9
    style C8 fill:#c8e6c9
    style C2 fill:#c8e6c9
    style C1 fill:#e8eaf6
    style C3 fill:#e8eaf6
    style C7 fill:#e8eaf6
    style C9 fill:#e8eaf6

User o cell w3gvk. He thong query cell nay + 8 neighbors (tong 9 cells, 361 businesses). Sau do tinh khoang cach thuc va loc.

Quadtree Structure Visualization

graph TD
    Root["ROOT<br/>Toan bo ban do<br/>100M businesses"]

    Root --> NW["NW<br/>North America<br/>25M biz"]
    Root --> NE["NE<br/>Europe + Africa<br/>20M biz"]
    Root --> SW["SW<br/>South America<br/>5M biz"]
    Root --> SE["SE<br/>Asia + Oceania<br/>50M biz"]

    SE --> SE_NW["SE-NW<br/>East Asia<br/>20M biz"]
    SE --> SE_NE["SE-NE<br/>Pacific<br/>2M biz"]
    SE --> SE_SW["SE-SW<br/>Southeast Asia<br/>18M biz"]
    SE --> SE_SE["SE-SE<br/>Oceania<br/>10M biz"]

    SE_SW --> VN1["VN-North<br/>Ha Noi area<br/>500K biz"]
    SE_SW --> VN2["VN-South<br/>TP.HCM area<br/>800K biz"]
    SE_SW --> TH["Thailand<br/>1.2M biz"]
    SE_SW --> ID["Indonesia<br/>2M biz"]

    VN2 --> D1["Quan 1<br/>5K biz"]
    VN2 --> D2["Quan 7<br/>3K biz"]
    VN2 --> D3["Thu Duc<br/>2K biz"]
    VN2 --> D4["Binh Thanh<br/>4K biz"]

    D1 --> L1["Block A<br/>95 biz ✓"]
    D1 --> L2["Block B<br/>78 biz ✓"]
    D1 --> L3["Block C<br/>88 biz ✓"]
    D1 --> L4["Block D<br/>62 biz ✓"]

    style Root fill:#e53935,color:#fff
    style SE fill:#ff9800,color:#000
    style SE_SW fill:#ffc107,color:#000
    style VN2 fill:#66bb6a,color:#fff
    style D1 fill:#42a5f5,color:#fff
    style L1 fill:#a5d6a7,stroke:#388e3c
    style L2 fill:#a5d6a7,stroke:#388e3c
    style L3 fill:#a5d6a7,stroke:#388e3c
    style L4 fill:#a5d6a7,stroke:#388e3c

Leaf nodes (mau xanh la) chua ⇐ 100 businesses. Vung dong duc (Quan 1 TP.HCM) co cells nho. Vung thua (Pacific Ocean) co cells lon. Day la suc manh cua adaptive grid.

4. Security

4.1 Location Privacy — Bao ve vi tri nguoi dung

Moi de	Giai phap	Chi tiet
Location fuzzing	Lam mo toa do GPS truoc khi log	Them noise ±100-500m vao toa do truoc khi luu vao analytics/logs. User thuc nhan ket qua chinh xac, nhung server logs chi luu vi tri xap xi
Aggregation	Chi luu aggregated data	Thay vi log “User A o toa do X luc 14:00”, log “50 users search khu vuc geohash w3gvk luc 14:00-15:00”
Retention policy	Xoa location data sau thoi gian nhat dinh	Raw GPS coordinates xoa sau 30 ngay. Chi giu aggregated stats
User consent	Hoi quyen truoc khi truy cap GPS	Mobile OS (iOS/Android) bat buoc hoi permission. Backend phai ton trong “While Using” vs “Always”
Encryption in transit	TLS 1.3 cho moi request	Toa do truyen qua HTTPS, khong bao gio HTTP

4.2 Rate Limiting on Location Queries

Tier	Limit	Ly do
Per user	30 requests/minute	Nguoi binh thuong search ~5 lan/phut max
Per IP	100 requests/minute	Cho shared networks (co-working spaces)
Per API key (business)	1,000 requests/minute	Cho third-party integrations
Burst allowance	2x limit trong 10 giay	Cho phep burst ngan khi user scroll ban do nhanh

Su dung Token Bucket algorithm. Tham chieu Tuan-09-Rate-Limiter.

4.3 Chong Data Scraping tu Doi thu

Moi de: Doi thu co the viet bot de scrape toan bo danh sach businesses va toa do tu API.

Bien phap	Chi tiet
API key required	Moi client can API key. Revoke key neu phat hien scraping pattern
Pagination limit	Toi da 20 results per page, toi da 100 pages per session
Honeypot businesses	Chen fake businesses (chi visible qua API, khong hien tren UI). Neu doi thu hien thi honeypot → co bang chung scraping
Request pattern analysis	Phat hien: systematic grid scanning (search moi 1km tren toan thanh pho), unusual QPS, sequential page requests
CAPTCHA	Trigger CAPTCHA sau 50 searches lien tiep khong co user interaction
Response watermarking	Chen metadata an vao response (user ID, timestamp) de trace nguon leak

Yeu cau GDPR	Implementation
Right to be forgotten	User yeu cau xoa → xoa tat ca location history, search history, cached data
Data portability	Export location history dang JSON/CSV khi user yeu cau
Purpose limitation	Location data chi dung cho search nearby — khong ban cho third-party advertising
Data minimization	Chi luu toa do khi can thiet. Khong luu location history cho tinh nang search (chi can real-time)
Consent	Explicit opt-in cho location tracking. Granular permissions: “Chi khi dang dung app”
Data Processing Agreement	Voi cloud provider (AWS/GCP) — dam bao data o region phu hop (EU data o EU region)

5. DevOps & Monitoring

5.1 Key Metrics to Monitor

5.1.1 Search Latency by Region

Metric	Alert Threshold	Dashboard
`search_latency_p50` by region	> 50ms	Time series, group by region
`search_latency_p99` by region	> 200ms	Time series, group by region
`search_latency_p99` global	> 500ms	Single number, red/green
`search_error_rate`	> 0.1%	Percentage gauge

Tai sao by region? Vi latency phu thuoc vao: khoang cach den read replica, business density (vung dong duc tra ve nhieu ket qua hon → cham hon), va cache hit ratio.

5.1.2 Cache Performance

Metric	Alert Threshold	Y nghia
`cache_hit_ratio_geo` (geohash → business_ids)	< 90%	Cache khong hieu qua → check TTL, check neu co geohash precision thay doi
`cache_hit_ratio_biz` (business details)	< 95%	Business data it thay doi, cache hit phai cao
`cache_eviction_rate`	> 1000/s	Redis het memory → can them node
`cache_latency_p99`	> 5ms	Redis cham → check network, check memory fragmentation

Cache Hit Ratio by Geohash Precision:

Precision	Expected Hit Ratio	Giai thich
4 (39km cells)	98%+	Cell lon → nhieu user search cung cell → cache warm
5 (4.9km cells)	95%+	Pho bien nhat — balance giua granularity va cache efficiency
6 (1.2km cells)	85-90%	Cell nho → nhieu cells → cache cold cho cells it search
7 (153m cells)	70-80%	Qua granular cho proximity search → khong khuyen nghi

5.1.3 Database Replication Lag

Metric	Alert Threshold	Impact
`replication_lag_seconds` per replica	> 30s	Business updates cham hien thi
`replication_lag_seconds` max	> 300s (5 min)	Can investigate — replica co the bi stuck
`replica_connections`	< expected	Replica bi disconnect → data stale

Tham chieu Tuan-07-Database-Sharding-Replication cho chi tiet ve replication monitoring.

5.1.4 Quadtree Health (neu su dung)

Metric	Alert Threshold	Y nghia
`quadtree_build_time_seconds`	> 600 (10 min)	Build cham → data tang hoac server cham
`quadtree_memory_bytes`	> 2 GB	Tree lon bat thuong → check data quality
`quadtree_last_build_timestamp`	> 6 hours ago	Quadtree outdated → trigger rebuild
`quadtree_max_depth`	> 20	Co the co du lieu bat thuong (nhieu businesses cung toa do)

5.2 Deployment Strategy

Aspect	Strategy	Ly do
LBS instances	Rolling update	Stateless → co the update tung instance
Quadtree rebuild	Blue-green	Build tree moi, swap khi xong. Zero downtime
Database schema migration	Online DDL (pt-online-schema-change)	Khong lock table
Cache warming	Pre-warm after deploy	Script query top geohash cells de warm cache
Canary deployment	5% traffic truoc	Phat hien van de som truoc khi rollout 100%

5.3 Alerting Rules

Alert	Condition	Severity	Action
High search latency	p99 > 500ms for 5 min	P1	On-call investigate, check DB/cache
Cache cluster down	> 1 node unreachable for 2 min	P1	Auto-failover, page on-call
Replication lag high	> 5 min for any replica	P2	Check replica health, network
Quadtree build failed	Build process crashed	P2	Serve voi tree cu, investigate
Error rate spike	> 1% for 3 min	P1	Check recent deploy, rollback if needed
Scraping detected	> 500 sequential searches from 1 IP	P3	Auto rate limit, review

Tham chieu Tuan-13-Monitoring-Observability cho alerting best practices.

6. Aha Moments & Pitfalls

6.1 Aha Moments — Nhung dieu bat ngo

Aha 1: Geohash Boundary Problem

Van de: Hai quan pho cach nhau 50m nhung o 2 geohash cells khac nhau. Neu chi query 1 cell → bo sot ket qua.

Giai phap: Luon query 9 cells (cell hien tai + 8 neighbors). Chi phi: 9x queries thay vi 1x. Nhung voi cache, chi phi nay giam dang ke.

Tui interview: Neu interviewer hoi “geohash co van de gi?”, PHAI nhac den boundary problem va giai phap 9-cell query. Day la diem phan biet ung vien trung binh va ung vien tot.

Aha 2: Quadtree fit trong 1 server

100 trieu businesses, quadtree chi chiem ~853 MB RAM. Mot server co 16 GB RAM du suc giu ca tree. Khong can distributed index.

Y nghia: Khi interviewer goi y “lam sao distribute quadtree across servers?”, cau tra loi dung la: khong can distribute. Moi LBS server giu 1 ban copy. Scale bang cach them server (moi server co full copy).

So sanh voi B-tree: Neu dung B-tree trong database, index cho 100M records co the chiem 10-20 GB (vi B-tree co overhead lon hon). Quadtree tiet kiem memory hon dang ke cho geospatial data.

Aha 3: Radius khong map 1-1 voi Geohash Precision

Khong co cong thuc chinh xac: radius = X → geohash precision = Y. Phai chon precision sao cho cell size >= radius, roi query 9 cells.

Vi du: Radius 2km. Cell precision 6 = 1.22km (nho hon radius) → khong du. Cell precision 5 = 4.89km (lon hon radius) → OK, nhung tra ve nhieu ket qua thua. Giai phap: dung precision 5, query 9 cells, roi filter bang Haversine distance.

Aha 4: Write path va Read path hoan toan tach biet

Proximity Service la mot trong nhung case study ro rang nhat cua CQRS pattern. Write QPS = 1-5, Read QPS = 60,000. Scale chung se lang phi tai nguyen.

Aha 5: Geohash la string → dung duoc voi moi database

Geohash chi la 1 chuoi string (vi du: “w3gvk1”). Co the luu trong bat ky database nao co string index: PostgreSQL, MySQL, DynamoDB, Redis, Cassandra. Khong can database dac biet.

6.2 Pitfalls — Nhung sai lam pho bien

Pitfall 1: Dung two-dimensional search truc tiep

“SELECT WHERE lat BETWEEN … AND lng BETWEEN …” — cham O(N) vi database khong the dung ca 2 index cung luc hieu qua.

Fix: Dung geospatial index (geohash, quadtree, hoac PostGIS).

Pitfall 2: Chi query 1 geohash cell

Bo qua boundary problem → mat ket qua o ranh gioi cell. Day la loi nghiem trong nhat khi implement geohash search.

Fix: Luon query cell hien tai + 8 neighbors.

Pitfall 3: Khong tinh Haversine distance sau khi query

Geohash query tra ve hinh vuong (hoac hinh chu nhat). Nhung user search theo vong tron (radius). Cac businesses o goc cua hinh vuong co the nam ngoai vong tron nhung van duoc tra ve.

Fix: Sau khi lay ket qua tu geohash cells, PHAI tinh Haversine distance va loc bo businesses ngoai radius.

Pitfall 4: Overcomplicate distributed Quadtree

Nhieu ung vien co gang distribute quadtree across servers. Khong can thiet — 853 MB fit trong 1 server. Complexity cua distributed tree khong dang.

Fix: Moi server giu full copy. Scale bang cach them servers.

Pitfall 5: Quen cache invalidation khi business di chuyen

Khi business thay doi toa do (vi du: food truck), geohash thay doi. Neu khong invalidate cache cu → user o vi tri cu van thay business do, user o vi tri moi khong thay.

Fix: Invalidate cache o CA HAI geohash cu va moi khi business update toa do.

Pitfall 6: Dung geohash precision qua cao

Precision 7 (153m cells) nghe co ve “chinh xac hon”. Nhung cell nho → can query nhieu cells hon de cover radius → nhieu queries hon → cham hon. Va cache hit ratio giam vi qua nhieu distinct cells.

Fix: Chon precision phu hop voi radius. Thuong la precision 4-6 cho hau het use cases.

7. Summary — Tom tat Decision Framework

7.1 Khi nao dung Geohash?

Tinh huong	Recommendation
Database co san (PostgreSQL, MySQL)	Geohash — chi can them 1 column
Can distributed index	Geohash — string column, dung duoc voi sharding
Team nho, can ship nhanh	Geohash — don gian nhat
Data thay doi thuong xuyen	Geohash — update record la xong

7.2 Khi nao dung Quadtree?

Tinh huong	Recommendation
Can adaptive density	Quadtree — tu dong chia nho vung dong duc
Query in-memory la chap nhan	Quadtree — cuc nhanh, O(log N)
Data it thay doi (businesses)	Quadtree — rebuild dinh ky la du
Single region deployment	Quadtree — khong can distribute

7.3 Khi nao dung S2?

Tinh huong	Recommendation
Can do chinh xac hinh cau	S2 — tot nhat cho bai toan toan cau
Google-scale	S2 — proven at Google
MongoDB ecosystem	S2 — native support (2dsphere)
Complex region queries	S2 — S2RegionCoverer toi uu

8. Further Reading & Internal Links

Kien thuc nen tang lien quan

Topic	Link	Ap dung trong Proximity Service
Cache Strategy	Tuan-06-Cache-Strategy	Geohash cache, business data cache, invalidation patterns
Database Sharding & Replication	Tuan-07-Database-Sharding-Replication	Read replicas per region, replication lag monitoring
Load Balancer	Tuan-05-Load-Balancer	Distribute traffic across stateless LBS instances
Rate Limiter	Tuan-09-Rate-Limiter	Chong scraping, protect LBS tu traffic spikes
Monitoring	Tuan-13-Monitoring-Observability	Search latency, cache hit ratio, replication lag alerts
Back-of-envelope	Tuan-02-Back-of-the-envelope	QPS, storage, memory estimation
Microservices Pattern	Tuan-11-Microservices-Pattern	CQRS — tach Read (LBS) va Write (Business Service)

So sanh voi cac Case Study khac

Case Study	Diem tuong dong	Diem khac biet
Tuan-16-Design-URL-Shortener	Read-heavy, caching quan trong	URL shortener khong co geospatial component
Tuan-18-Design-News-Feed	Fan-out pattern, caching	News Feed la social graph, khong phai location
Tuan-20-Design-Key-Value-Store	In-memory data structures, partitioning	KV Store la general purpose, Proximity la domain-specific

9. Interview Cheat Sheet

Buoc 1 — Hoi Clarifying Questions (2-3 phut)

Scale: bao nhieu businesses? bao nhieu DAU?
Radius: user co chon radius khong? Max radius?
Real-time: business updates can hien thi ngay khong?
Features: search only? hay ca CRUD businesses?

Buoc 2 — High-level Design (5-7 phut)

Ve 2 services: LBS (read) + Business Service (write)
Ve database: Primary + Read Replicas
Nhac CQRS pattern
Ve cache layer (Redis)

Buoc 3 — Deep Dive (15-20 phut)

Giai thich geohash: encoding, precision levels, 9-cell query
Nhac boundary problem va giai phap
Quadtree nhu alternative: adaptive, in-memory, 853 MB
Database schema: business table + geospatial index table
Caching: 2-layer (geohash → IDs, ID → details)

Buoc 4 — Wrap Up (3-5 phut)

Scaling: LBS stateless → horizontal, DB read replicas per region
Monitoring: search latency, cache hit ratio, replication lag
Security: location privacy, rate limiting, anti-scraping

Cau hoi interviewer thuong hoi

Cau hoi	Cach tra loi
”Geohash vs Quadtree?”	Geohash: distributed, database-friendly. Quadtree: adaptive, in-memory, faster query. Chon theo use case
”Geohash co van de gi?”	Boundary problem — 2 diem gan nhau nhung khac cell. Fix: query 9 cells
”Quadtree co scale khong?“	100M businesses chi 853 MB — moi server giu full copy. Scale horizontal
”Lam sao handle business updates?”	CQRS: write vao Primary, async replicate. Quadtree rebuild dinh ky. Cache invalidation
”Lam sao giam latency?”	Cache (2-layer), read replicas per region, geohash precision tuning

“Proximity Service la bai toan tuyet voi de hieu geospatial indexing — nen tang cua moi ung dung location-based. Khi em hieu cach bien bai toan 2 chieu thanh 1 chieu (geohash) hoac chia khong gian thong minh (quadtree), em co the ap dung cho bat ky bai toan location nao: tim tai xe, tim ban be, tim nha hang, hay tim bat ky thu gi tren ban do.”

lthieu's notes

Explorer

Case-Design-Proximity-Service