Case Study: Design a Proximity Service (Nearby Businesses)
“Mỗi ngày hàng trăm triệu người mở Google Maps hỏi: ‘Quán phở nào gần đây nhất?’ — Câu hỏi tưởng đơn giản, nhưng phía sau là một bài toán geospatial indexing phức tạp, nơi toán học gặp engineering ở quy mô hành tinh.”
Tags: system-design proximity-service geospatial geohash quadtree alex-xu case-study vol2 Student: Hieu Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-06-Cache-Strategy · Tuan-07-Database-Sharding-Replication Lien quan: Tuan-05-Load-Balancer · Tuan-09-Rate-Limiter · Tuan-10-Consistent-Hashing · Tuan-13-Monitoring-Observability Reference: Alex Xu, System Design Interview — An Insider’s Guide, Volume 2 — Chapter 1: Proximity Service
1. Context & Why — Tai sao Proximity Service quan trong?
1.1 Analogy — Bai toan doi thuong
Hieu, hay tuong tuong 2 tinh huong hang ngay:
Tinh huong 1 — Google Maps: Em dang o quan 1 TP.HCM, mo Google Maps va go “pho”. Trong vong 0.5 giay, ban do hien ra 20 quan pho gan nhat, sap xep theo khoang cach. Phia sau cai “0.5 giay” do la gi? Lam sao tu toa do GPS cua em (10.7769, 106.7009), he thong tim duoc 20 quan pho trong hang tram trieu dia diem tren the gioi?
Tinh huong 2 — Grab tim tai xe: Em dat Grab, app can tim tat ca tai xe trong ban kinh 3km quanh vi tri cua em. Nhung tai xe di chuyen lien tuc — vi tri cua ho thay doi moi 3-5 giay. He thong phai tra loi cau hoi “ai gan em nhat?” cho hang trieu user dong thoi.
Ca hai tinh huong nay deu la bai toan Proximity Service — cho mot toa do (latitude, longitude) va ban kinh (radius), tim tat ca cac diem (businesses, tai xe, ban be) trong vung do.
1.2 Tai sao bai toan nay kho?
| Van de | Giai thich | Quy mo |
|---|---|---|
| Du lieu khong lo | 200 trieu businesses tren toan cau | ~200M records |
| Two-dimensional search | Khong the dung B-tree index binh thuong — latitude va longitude la 2 chieu doc lap | O(N) naive search |
| Read-heavy | Hang tram trieu user search dong thoi, nhung businesses it khi thay doi | Read:Write ~ 1000:1 |
| Low latency | User ky vong ket qua trong < 500ms | p99 < 200ms |
| Global scale | User va businesses o khap the gioi, moi region co density khac nhau | Tokyo dense, Sahara sparse |
1.3 Real-world Systems su dung Proximity Service
| Company | Use Case | Scale |
|---|---|---|
| Google Maps / Places | Tim nha hang, gas station, ATM | 200M+ places |
| Uber / Grab | Tim tai xe gan nhat | Millions of active drivers |
| Yelp | Tim businesses theo category + location | ~200M businesses |
| Facebook Nearby Friends | Tim ban be dang o gan | 2B+ users |
| Tinder | Tim nguoi match trong ban kinh | 75M+ users |
| DoorDash | Tim nha hang co delivery | 390K+ merchants |
Step 1 — Understand the Problem & Establish Design Scope
1.1 Clarifying Questions (Cau hoi lam ro)
Trong interview, luon hoi truoc khi thiet ke. Day la cac cau hoi quan trong:
| Cau hoi | Tra loi | Ghi chu |
|---|---|---|
| User co the chi dinh ban kinh search? | Co — 0.5km, 1km, 2km, 5km, 20km | Default 5km |
| Radius toi da la bao nhieu? | 20km | Gioi han de tranh query qua rong |
| Businesses co the update thong tin? | Co — nhung thay doi phan anh vao ngay hom sau la chap nhan duoc | Near real-time khong bat buoc |
| User co the di chuyen va search dong thoi? | Co, nhung khong can auto-refresh lien tuc — chi khi user bam search lai | Khac voi Nearby Friends |
| Can sap xep ket qua theo gi? | Khoang cach, hoac ranking/rating | Distance la primary |
| Business info gom nhung gi? | Ten, dia chi, toa do, category, gio mo cua, rating, anh | Rich metadata |
1.2 Functional Requirements (Yeu cau chuc nang)
- FR1: Cho toa do (lat, lng) va ban kinh (radius), tra ve danh sach businesses trong vung do
- FR2: Business owners co the them/sua/xoa thong tin business (CRUD)
- FR3: Users co the xem chi tiet business (ten, dia chi, rating, gio mo cua, anh)
- FR4: Ket qua co the loc theo category (nha hang, cafe, gas station…)
1.3 Non-functional Requirements (Yeu cau phi chuc nang)
| Yeu cau | Muc tieu | Giai thich |
|---|---|---|
| Low Latency | p99 < 200ms cho search | User ky vong ket qua nhanh |
| High Availability | 99.99% uptime | LBS la core feature cua Maps |
| High Scalability | 200M DAU, 100M businesses | Global scale |
| Eventual Consistency | Business updates co the delay vai phut | Khong can real-time cho business info |
| Read-heavy | Read QPS >> Write QPS | Search nhieu hon update |
1.4 Capacity Estimation (Uoc luong nang luc)
Assumptions:
| Thong so | Gia tri | Giai thich |
|---|---|---|
| DAU | 200M | Quy mo Google Maps |
| Businesses | 100M | Toan cau |
| Searches/user/day | 5 | Trung binh |
| Business updates/day | 100K | Them/sua/xoa |
QPS Calculation — Search:
Nhan xet: 60K peak QPS cho read — day la read-heavy system. Write QPS chi khoang 1-2 QPS cho business updates. Ty le Read:Write ~ 60,000:1.
Write QPS — Business Updates:
Write QPS cuc thap — hoan toan co the handle boi 1 primary database server.
Storage — Business Data:
Storage — Geospatial Index:
Quan trong: 3.6 GB geospatial index vua du de load toan bo vao memory. Day la insight then chot — ta co the build in-memory index (Quadtree) tren 1 server.
Cache Sizing:
Su dung Redis cluster voi vai node la du. Tham chieu Tuan-06-Cache-Strategy.
Quadtree Memory Estimation (se deep dive o section 3):
Aha Moment: Quadtree cho 100M businesses chi chiem khoang 843 MB RAM — thoai mai fit trong 1 server co 8 GB RAM. Day la ly do nhieu he thong chon Quadtree lam in-memory geospatial index.
Tom tat Estimation
| Metric | Value |
|---|---|
| Search QPS (peak) | ~60K/s |
| Write QPS (peak) | ~5/s |
| Business data storage | ~600 GB |
| Geospatial index size | ~3.6 GB |
| Cache size (hot data) | ~120 GB |
| Quadtree memory | ~843 MB |
| Read:Write ratio | ~60,000:1 |
Step 2 — High-Level Design
2.1 Kien truc tong quan — 2 Service chinh
Proximity Service bao gom 2 loai service co dac tinh hoan toan khac nhau:
| Service | Chuc nang | Dac tinh | Scale strategy |
|---|---|---|---|
| LBS (Location-Based Service) | Nhan toa do + radius, tra ve businesses gan nhat | Read-heavy, stateless, no write | Horizontal scaling, read replicas |
| Business Service | CRUD operations cho businesses | Write operations, low QPS | Single primary + replicas |
Tai sao tach 2 service? Vi read va write co pattern hoan toan khac nhau:
- LBS: 60K QPS, stateless, can low latency → scale horizontally, dat sau Load Balancer
- Business Service: 5 QPS, stateful, can consistency → single primary database
Principle: Tach read path va write path khi chung co scale requirements khac nhau. Day la CQRS (Command Query Responsibility Segregation) pattern — mot dang cua Tuan-11-Microservices-Pattern.
2.2 High-Level Architecture Diagram
flowchart TB subgraph Clients Mobile["Mobile App"] Web["Web App"] end LB["Load Balancer<br/>→ [[Tuan-05-Load-Balancer]]"] subgraph "Read Path (LBS)" LBS1["LBS Instance 1"] LBS2["LBS Instance 2"] LBSN["LBS Instance N"] end subgraph "Write Path" BS["Business Service"] end subgraph "Data Layer" Primary["Primary DB<br/>PostgreSQL"] R1["Read Replica 1"] R2["Read Replica 2"] RN["Read Replica N"] end subgraph "Geospatial Index" GI["Geohash Index<br/>(in DB read replicas)"] QT["Quadtree<br/>(in-memory, per LBS instance)"] end subgraph "Cache Layer" RC["Redis Cluster<br/>Business data cache<br/>Geohash → business_ids cache"] end Mobile --> LB Web --> LB LB -->|"GET /nearby"| LBS1 & LBS2 & LBSN LB -->|"POST/PUT/DELETE /businesses"| BS LBS1 & LBS2 & LBSN --> RC LBS1 & LBS2 & LBSN --> R1 & R2 & RN LBS1 & LBS2 & LBSN -.-> QT BS --> Primary Primary -->|"Replication"| R1 & R2 & RN R1 & R2 & RN --> GI style LB fill:#42a5f5,color:#fff style RC fill:#ef5350,color:#fff style Primary fill:#66bb6a,color:#fff style QT fill:#ffa726,color:#000
2.3 API Design
Search Nearby Businesses:
| Field | Type | Description |
|---|---|---|
GET /v1/search/nearby | ||
latitude | double | Vi do cua user (-90 den 90) |
longitude | double | Kinh do cua user (-180 den 180) |
radius | int | Ban kinh search (meters). Default: 5000. Max: 20000 |
category | string | (Optional) Filter theo loai business |
page_token | string | (Optional) Pagination token |
Response:
| Field | Type | Description |
|---|---|---|
businesses | array | Danh sach businesses |
businesses[].id | string | Business ID |
businesses[].name | string | Ten business |
businesses[].lat | double | Vi do |
businesses[].lng | double | Kinh do |
businesses[].distance | int | Khoang cach (meters) |
businesses[].rating | float | Rating trung binh |
next_page_token | string | Token cho trang tiep theo |
Business CRUD:
| Endpoint | Method | Description |
|---|---|---|
/v1/businesses | POST | Tao business moi |
/v1/businesses/:id | GET | Xem chi tiet business |
/v1/businesses/:id | PUT | Cap nhat business |
/v1/businesses/:id | DELETE | Xoa business |
2.4 Data Flow — Read vs Write
Read Flow (Search):
1. User gui request GET /v1/search/nearby?lat=10.77&lng=106.70&radius=5000
2. Load Balancer route den 1 trong N LBS instances
3. LBS tinh geohash tu (lat, lng) va radius → xac dinh cac geohash cells can query
4. LBS check Redis cache: geohash_cells → list of business_ids
5. Neu cache hit → lay business details tu Business Data Cache
6. Neu cache miss → query Read Replica (geospatial index table)
7. Tinh khoang cach chinh xac, loc businesses ngoai radius
8. Sap xep theo distance, tra ve ket qua
Write Flow (Update Business):
1. Business owner gui PUT /v1/businesses/123
2. Load Balancer route den Business Service
3. Business Service update Primary DB
4. Primary DB replicate sang Read Replicas (async)
5. Cache invalidation: xoa cache entries lien quan
6. Neu dung Quadtree: rebuild scheduled (overnight hoac moi vai gio)
Trade-off: Business updates khong phan anh ngay lap tuc trong search results. Delay co the la vai phut (replication lag) den vai gio (quadtree rebuild). Day la chap nhan duoc vi businesses hiem khi thay doi vi tri.
Step 3 — Design Deep Dive
3.1 Bai toan cot loi: Geospatial Indexing
Hieu, day la phan kho nhat va quan trong nhat cua bai nay. Bai toan co ban la:
Cho diem P(lat, lng) va ban kinh r, tim tat ca businesses B sao cho distance(P, B) ⇐ r.
Cach naive — Two-dimensional search:
Cach don gian nhat: query truc tiep trong database.
SELECT * FROM businesses
WHERE latitude BETWEEN 10.77 - delta AND 10.77 + delta
AND longitude BETWEEN 106.70 - delta AND 106.70 + delta
Van de: Ngay ca khi co index tren latitude VA index tren longitude, database phai:
- Query index tren latitude → tap A (co the hang trieu records)
- Query index tren longitude → tap B (co the hang trieu records)
- Lay giao (intersection) A ∩ B → rat cham!
Ly do: B-tree index chi hieu qua cho 1 chieu. Khi co 2 chieu doc lap (lat, lng), khong co B-tree nao co the index ca 2 dong thoi mot cach hieu qua.
Day la ly do ta can geospatial indexing — cac ky thuat chuyen biet de bien bai toan 2 chieu thanh bai toan 1 chieu hoac chia khong gian thanh cac o (cells) de thu hep pham vi search.
3.2 Approach 1 — Evenly Divided Grid
Y tuong: Chia toan bo ban do thanh cac o (cells) co kich thuoc bang nhau. Moi business thuoc ve dung 1 cell. Khi search, chi can tim cac businesses trong cell hien tai va cac cells lan can.
Van de nghiem trong:
| Van de | Giai thich |
|---|---|
| Distribution khong deu | Tokyo co the co 100,000 businesses trong 1 cell, nhung Sahara co 0 |
| Fixed granularity | Khong the adaptive — vung dong duc can grid nho hon, vung thua can grid lon hon |
| Qua nhieu empty cells | 70% Trai Dat la nuoc → 70% cells rong |
| Edge cases | Business o sat ranh gioi cell → search can check nhieu cells |
Ket luan: Evenly divided grid khong phu hop cho du lieu phan bo khong deu. Ta can cach tiep can thong minh hon.
3.3 Approach 2 — Geohash (Core Approach)
3.3.1 Geohash la gi?
Geohash la ky thuat bien toa do 2 chieu (lat, lng) thanh 1 chuoi string duy nhat bang cach chia doi xen ke (interleaving) latitude va longitude.
Quy trinh encoding (simplified):
Buoc 1 — Encode Longitude:
- Bat dau voi khoang [-180, 180]
- Chia doi: neu lng nam ben trai ([-180, 0]) → bit 0, ben phai ([0, 180]) → bit 1
- Lap lai, chia doi tiep khoang duoc chon
- Vi du: lng = 106.70 → [0, 180] → 1, [90, 180] → 1, [90, 135] → 1, [101.25, 112.5] → 1, …
Buoc 2 — Encode Latitude:
- Tuong tu, bat dau voi [-90, 90]
- Vi du: lat = 10.77 → [0, 90] → 1, [0, 45] → 0, [0, 22.5] → 0, …
Buoc 3 — Interleave Bits:
- Xen ke: bit lon[0], bit lat[0], bit lon[1], bit lat[1], …
- Ket qua la 1 chuoi bits
Buoc 4 — Encode thanh Base32:
- Nhom 5 bits → 1 ky tu Base32 (0-9, b-z loai tru a, i, l, o)
- Vi du: toa do TP.HCM (10.7769, 106.7009) → geohash:
w3gvk1
3.3.2 Precision Levels — Do chinh xac
| Geohash Length | Cell Width | Cell Height | Su dung |
|---|---|---|---|
| 1 | 5,009 km | 4,992 km | Toan bo luc dia |
| 2 | 1,252 km | 624 km | Quoc gia nho |
| 3 | 156 km | 156 km | Tinh / state |
| 4 | 39.1 km | 19.5 km | Thanh pho lon |
| 5 | 4.89 km | 4.89 km | Quan / district |
| 6 | 1.22 km | 0.61 km | Khu vuc nho — phu hop cho proximity search |
| 7 | 153 m | 153 m | Duong pho |
| 8 | 38.2 m | 19.1 m | Toa nha |
| 9 | 4.77 m | 4.77 m | Phong trong toa nha |
| 12 | 3.7 cm | 1.8 cm | Do chinh xac GPS max |
Rule of thumb cho proximity search: Radius 5km → dung geohash length 5 (cell ~4.89 km). Radius 1km → dung geohash length 6 (cell ~1.22 km).
3.3.3 Mapping Radius to Geohash Precision
| Radius | Geohash Precision | Giai thich |
|---|---|---|
| 0.5 km | 6 | Cell 1.22km, can check cell hien tai + 8 neighbors |
| 1 km | 5 | Cell 4.89km du cover |
| 2 km | 5 | Cell 4.89km du cover |
| 5 km | 4 | Cell 39.1km — du rong, co the tra ve qua nhieu ket qua |
| 20 km | 4 | Cell 39.1km cover duoc |
Chien luoc thuc te: Chon precision level sao cho cell size >= radius va query cell hien tai + 8 cells lan can (tong cong 9 cells). Dieu nay dam bao cover toan bo vung tron search.
3.3.4 Tinh chat quan trong cua Geohash
Tinh chat 1 — Shared prefix = Nearby (thuong la dung):
Hai diem co geohash chia se prefix cang dai → cang gan nhau.
| Diem A | Diem B | Shared prefix | Khoang cach xap xi |
|---|---|---|---|
w3gvk1 | w3gvk9 | w3gvk (5 chars) | < 5 km |
w3gvk1 | w3gvm2 | w3gv (4 chars) | < 40 km |
w3gvk1 | w3h001 | w3 (2 chars) | < 1,252 km |
Tinh chat 2 — Neighbor lookup:
Moi geohash cell co chinh xac 8 neighbors (tren, duoi, trai, phai, 4 goc). Co the tinh neighbors bang algorithm hieu qua trong O(1).
3.3.5 Edge Cases — Nhung truong hop dac biet
Edge Case 1 — Khac prefix nhung rat gan (Boundary Problem):
┌──────────┬──────────┐
│ │ │
│ w3gvk │ w3gvm │
│ │ │
│ A ● ● B │
│ │ │
└──────────┴──────────┘
Diem A (w3gvk...) va B (w3gvm...) o 2 cells khac nhau — prefix khac hoan toan tu ky tu thu 5. Nhung thuc te chung chi cach nhau 50 met!
Giai phap: Luon query cell hien tai + 8 cells lan can. Dieu nay dam bao khong bo sot cac businesses o ranh gioi cell.
Edge Case 2 — Cung prefix nhung xa (Rare case):
Do cach Hilbert/Z-curve map 2D → 1D, co nhung truong hop 2 cells co cung prefix nhung khong thuc su lan can ve mat dia ly. Vi du: 2 cells o 2 ben cua duong ranh gioi encoding.
Giai phap: Sau khi lay danh sach businesses tu geohash cells, luon tinh khoang cach thuc te (Haversine formula) va loc bo nhung businesses ngoai radius.
Trong do = ban kinh Trai Dat (6,371 km), = latitude (radians), = longitude (radians).
Edge Case 3 — Khong du businesses trong radius:
Khi search radius = 1km nhung chi co 2 businesses, user co the muon xem them. Chien luoc mo rong radius:
1. Search voi geohash precision tuong ung radius ban dau
2. Neu ket qua < min_results (vi du: 10)
3. Giam geohash precision 1 bac (mo rong vung search ~4x)
4. Search lai
5. Lap lai cho den khi du ket qua hoac dat max_radius
3.4 Approach 3 — Quadtree (Adaptive Grid)
3.4.1 Quadtree la gi?
Quadtree la cau truc du lieu hinh cay, trong do moi node co the co 0 hoac 4 children. No tu dong chia nho vung co nhieu businesses va giu nguyen vung it businesses.
Nguyen tac: Chia khong gian 2D thanh 4 phan. Neu 1 phan co nhieu hon K businesses (threshold, vi du K=100), tiep tuc chia phan do thanh 4. Lap lai cho den khi moi o co ⇐ K businesses.
3.4.2 Quy trinh Build Quadtree
Bat dau: Toan bo ban do la 1 node
- Node chua 5,000,000 businesses (> threshold 100) → SPLIT
Node chia thanh 4 children:
NW (Northwest): 1,200,000 businesses → SPLIT
NE (Northeast): 2,500,000 businesses → SPLIT
SW (Southwest): 800,000 businesses → SPLIT
SE (Southeast): 500,000 businesses → SPLIT
Tiep tuc chia cho den khi moi leaf node co <= 100 businesses
Dac diem quan trong:
| Dac diem | Giai thich |
|---|---|
| Adaptive | Vung Tokyo (dong duc) co cell nho ~100m. Vung Sahara co cell lon ~1000km |
| In-memory | Toan bo tree nam trong RAM — query cuc nhanh |
| Build time | Build 1 lan khi server khoi dong, hoac rebuild dinh ky |
| Not easily shareable | Moi server giu 1 ban copy cua tree — khong chia se qua network |
3.4.3 Quadtree Structure — Chi tiet Node
Internal Node (node co children):
| Field | Size | Giai thich |
|---|---|---|
| Top-left coordinates | 16 bytes | (lat, lng) goc tren trai |
| Bottom-right coordinates | 16 bytes | (lat, lng) goc duoi phai |
| Pointer to NW child | 8 bytes | |
| Pointer to NE child | 8 bytes | |
| Pointer to SW child | 8 bytes | |
| Pointer to SE child | 8 bytes | |
| Total | 64 bytes |
Leaf Node (node chua businesses):
| Field | Size | Giai thich |
|---|---|---|
| Top-left coordinates | 16 bytes | Ranh gioi vung |
| Bottom-right coordinates | 16 bytes | Ranh gioi vung |
| List of business_ids | 8 bytes x N | N ⇐ threshold (100) |
| Total | 32 + 8N bytes | Voi N=100: ~832 bytes |
3.4.4 Memory Estimation (Chi tiet)
Quadtree la full 4-ary tree. Tong so internal nodes:
Key insight: 853 MB — mot server voi 8 GB RAM co the chua thoai mai. Khong can distributed index! Day la dieu nhieu ung vien interview khong nhan ra.
3.4.5 Build Time Estimation
Voi may tinh hien dai xu ly ~1 billion operations/second, build time khoang vai phut. Co the chap nhan cho startup time.
3.4.6 Quadtree Search Algorithm
Tim businesses trong radius R tu diem P(lat, lng):
1. Bat dau tu root node
2. Tim leaf node chua diem P (traverse tree theo toa do)
3. Tinh khoang cach tu P den moi business trong leaf node do
4. Neu vong tron (P, R) overlap voi cac neighboring cells:
- Traverse len parent, check cac sibling nodes
- Lap lai cho den khi vong tron duoc cover hoan toan
5. Loc businesses co distance > R
6. Sap xep theo distance, tra ve top N
Trong do = so businesses trong vung search. Voi :
3.4.7 Update Strategy cho Quadtree
Van de: Businesses thay doi (them moi, dong cua, doi dia chi) — lam sao update Quadtree?
Option 1 — Rebuild dinh ky (Recommended):
| Aspect | Detail |
|---|---|
| Frequency | Moi vai gio hoac overnight |
| Process | Build quadtree moi tu database, swap voi tree cu (atomic swap) |
| Downtime | Zero — tree cu van serve requests trong khi tree moi dang build |
| Phù hop khi | Business data thay doi cham (hang ngay), khong can real-time |
Option 2 — Incremental update:
| Aspect | Detail |
|---|---|
| Process | Them/xoa business truc tiep trong tree |
| Complexity | Cao — can handle split/merge khi node vuot threshold |
| Phù hop khi | Data thay doi thuong xuyen (vi du: driver locations trong Uber) |
Cho bai toan Proximity Service (businesses), Option 1 la du vi businesses hiem khi thay doi vi tri.
3.5 Approach 4 — Google S2 Geometry Library
3.5.1 S2 la gi?
Google S2 la thu vien chuyen dung do Google phat trien, su dung Hilbert curve de map be mat hinh cau (Trai Dat) thanh cac S2 cells.
Diem khac biet voi Geohash:
| Feature | Geohash | S2 |
|---|---|---|
| Hinh dang cell | Hinh chu nhat tren ban do | Hinh thoi (rhombus) tren hinh cau |
| Mo hinh Trai Dat | Phang (flat projection) | Hinh cau (sphere) |
| Curve type | Z-order curve | Hilbert curve |
| Cell levels | 1-12 (base32 chars) | 0-30 (64-bit integer) |
| Do chinh xac | Tot, co boundary issues | Tot hon — Hilbert curve bao toan locality tot hon |
| Region covering | Manual (9 cells) | Tu dong — S2RegionCoverer tim toi uu cells |
3.5.2 Hilbert Curve — Tai sao tot hon Z-curve?
Z-curve (Geohash) co van de “jumps” — doi khi 2 cells ke nhau tren ban do lai xa nhau tren Z-curve. Dieu nay gay ra cac boundary issues.
Hilbert curve dam bao: 2 diem gan nhau tren curve hau nhu luon gan nhau trong khong gian 2D. Tinh chat nay goi la locality-preserving — tot hon Z-curve dang ke.
3.5.3 S2 trong thuc te
| Company | Su dung S2 cho |
|---|---|
| Google Maps | Core geospatial indexing |
| Uber H3 | Inspired by S2, dung hexagonal cells |
| Foursquare | Place detection |
| MongoDB | 2dsphere index su dung S2 internally |
3.6 So sanh Geohash vs Quadtree vs S2
| Tieu chi | Geohash | Quadtree | S2 |
|---|---|---|---|
| Do phuc tap implement | Thap — chi la string operation | Trung binh — can build tree | Cao — can S2 library |
| Storage | Trong database (column) | In-memory tren moi server | Trong database (column) |
| Update | Update database record | Rebuild tree | Update database record |
| Adaptive density | Khong — fixed grid size | Co — tu dong chia nho vung dong duc | Co — S2RegionCoverer |
| Precision control | 12 levels (base32) | Vo han (chia cho den khi du) | 31 levels (0-30) |
| Boundary issues | Co — can query 8 neighbors | It — tree structure handle tu nhien | It nhat — Hilbert curve |
| Distributed | De — geohash la string, index binh thuong | Kho — tree phai o 1 server | De — S2 cell ID la integer |
| Database support | Redis, PostgreSQL, MySQL, DynamoDB | Phai tu implement | MongoDB (native), PostgreSQL (extension) |
| Phu hop cho | Hau het use cases, don gian | Dataset lon, can adaptive, in-memory ok | Google-scale, hinh cau chinh xac |
Recommendation cho interview: Chon Geohash lam primary approach (de giai thich, database support tot). Nhac den Quadtree nhu alternative khi interviewer hoi ve adaptive density. Nhac den S2 de show depth of knowledge.
3.7 Database Schema Design
3.7.1 Business Table
| Column | Type | Description | Index |
|---|---|---|---|
business_id | UUID (PK) | ID duy nhat | Primary Key |
name | VARCHAR(255) | Ten business | Full-text index |
address | TEXT | Dia chi | |
city | VARCHAR(100) | Thanh pho | |
country | VARCHAR(2) | Ma quoc gia (ISO 3166) | |
latitude | DOUBLE | Vi do | |
longitude | DOUBLE | Kinh do | |
category | VARCHAR(50) | Loai business | Index |
rating | DECIMAL(2,1) | Rating trung binh | |
phone | VARCHAR(20) | So dien thoai | |
hours | JSONB | Gio mo cua | |
photos | JSONB | URLs anh | |
created_at | TIMESTAMP | ||
updated_at | TIMESTAMP | Index |
3.7.2 Geospatial Index Table (Geohash approach)
| Column | Type | Description | Index |
|---|---|---|---|
geohash | VARCHAR(12) | Geohash cua business | Primary prefix index |
business_id | UUID | FK to business table | |
latitude | DOUBLE | De tinh khoang cach chinh xac | |
longitude | DOUBLE | De tinh khoang cach chinh xac |
Index Strategy: Tao compound index tren (geohash, business_id). Khi query, dung prefix search:
-- Tim businesses trong geohash cell "w3gvk" va cac neighbors
WHERE geohash LIKE 'w3gvk%'
OR geohash LIKE 'w3gvm%'
OR geohash LIKE 'w3gvj%'
... (tong 9 cells)
Optimization: Thay vi 9 queries rieng biet, dung
INclause hoacUNION ALLde batch thanh 1 query.
3.7.3 Database Architecture — Read Replicas
flowchart LR subgraph "Write Path" BS["Business Service"] Primary["Primary DB<br/>PostgreSQL"] end subgraph "Read Path" LBS["LBS Instances"] R1["Read Replica 1<br/>US-West"] R2["Read Replica 2<br/>US-East"] R3["Read Replica 3<br/>EU-West"] R4["Read Replica 4<br/>AP-Southeast"] end BS --> Primary Primary -->|"Async Replication"| R1 & R2 & R3 & R4 LBS --> R1 & R2 & R3 & R4 style Primary fill:#66bb6a,color:#fff style R1 fill:#42a5f5,color:#fff style R2 fill:#42a5f5,color:#fff style R3 fill:#42a5f5,color:#fff style R4 fill:#42a5f5,color:#fff
Tai sao Read Replicas theo region? Vi LBS la read-heavy (60K QPS peak). Moi region co read replicas rieng → giam latency (user o Singapore query replica o Singapore, khong phai di qua US). Tham chieu Tuan-07-Database-Sharding-Replication.
3.8 Caching Strategy
3.8.1 Cache Architecture
Su dung 2 lop cache:
Cache Layer 1 — Geohash to Business IDs:
| Key | Value | TTL | Giai thich |
|---|---|---|---|
geo:w3gvk:restaurant | [id1, id2, id3, ...] | 1 hour | Danh sach business IDs trong geohash cell + category |
geo:w3gvk:* | [id1, id2, ...] | 1 hour | Tat ca businesses trong cell (khong filter category) |
Cache Layer 2 — Business Details:
| Key | Value | TTL | Giai thich |
|---|---|---|---|
biz:uuid-123 | {name, lat, lng, rating, ...} | 24 hours | Chi tiet business |
3.8.2 Cache Flow
Search request: lat=10.77, lng=106.70, radius=5000, category=restaurant
1. Tinh geohash: w3gvk (precision 5)
2. Tinh 8 neighbors: w3gvm, w3gvj, w3gvh, w3gvs, w3gvt, w3gve, w3gv7, w3gv6
3. Voi moi geohash cell:
a. Check Redis: GET geo:w3gvk:restaurant
b. Cache HIT → nhan list business_ids
c. Cache MISS → query Read Replica → set cache
4. Merge tat ca business_ids tu 9 cells
5. Voi moi business_id:
a. Check Redis: GET biz:uuid-123
b. Cache HIT → nhan business details
c. Cache MISS → query Read Replica → set cache
6. Tinh distance, filter, sort, return
3.8.3 Cache Invalidation
| Event | Action | Ly do |
|---|---|---|
| Business update (name, hours, rating) | Invalidate biz:{id} | Chi tiet thay doi |
| Business move (lat/lng thay doi) | Invalidate biz:{id} + geo:{old_hash}:* + geo:{new_hash}:* | Vi tri thay doi → geohash cell thay doi |
| Business deleted | Invalidate biz:{id} + geo:{hash}:* | Xoa khoi tat ca cache |
| TTL expiry | Tu dong | Safety net |
Tham chieu Tuan-06-Cache-Strategy cho cac pattern cache invalidation chi tiet.
3.9 Scaling Strategy
3.9.1 LBS — Stateless Horizontal Scaling
LBS instances la stateless (khong giu state nao). Moi request co the duoc xu ly boi bat ky instance nao.
Dat sau Load Balancer (Tuan-05-Load-Balancer), su dung round-robin hoac least-connections.
Neu dung Quadtree: Moi LBS instance giu 1 ban copy cua Quadtree trong memory. Khi scale them instance → instance moi build Quadtree tu database khi khoi dong (vai phut). Sau do serve requests binh thuong.
3.9.2 Database — Read Replicas per Region
| Region | Read Replicas | Ly do |
|---|---|---|
| US-West | 3 | California, tech hub |
| US-East | 3 | New York, Washington DC |
| EU-West | 2 | London, Paris |
| AP-Southeast | 3 | Singapore, TP.HCM, Jakarta — high density |
| AP-Northeast | 2 | Tokyo, Seoul |
PostgreSQL co the handle 10K+ simple reads/s → 4.6K QPS per replica la thoai mai.
3.9.3 Cache — Redis Cluster
Su dung Redis Cluster voi sharding theo geohash prefix. Tat ca keys voi geohash bat dau “w3” (Vietnam) se o cung shard → locality tot cho cache.
Query Flow — End to End
sequenceDiagram participant U as User (Mobile) participant LB as Load Balancer participant LBS as LBS Instance participant Cache as Redis Cache participant DB as Read Replica U->>LB: GET /v1/search/nearby<br/>lat=10.77, lng=106.70, radius=5000 LB->>LBS: Route to LBS Instance Note over LBS: 1. Tinh geohash: w3gvk (precision 5) Note over LBS: 2. Tinh 8 neighbors loop Cho moi 9 geohash cells LBS->>Cache: GET geo:w3gvk:* alt Cache HIT Cache-->>LBS: [business_id_1, business_id_2, ...] else Cache MISS LBS->>DB: SELECT business_id FROM geo_index<br/>WHERE geohash LIKE 'w3gvk%' DB-->>LBS: [business_id_1, business_id_2, ...] LBS->>Cache: SET geo:w3gvk:* (TTL 1h) end end Note over LBS: 3. Merge business_ids tu 9 cells loop Cho moi business_id LBS->>Cache: GET biz:{id} alt Cache HIT Cache-->>LBS: {name, lat, lng, rating, ...} else Cache MISS LBS->>DB: SELECT * FROM businesses<br/>WHERE id = ? DB-->>LBS: {name, lat, lng, rating, ...} LBS->>Cache: SET biz:{id} (TTL 24h) end end Note over LBS: 4. Tinh Haversine distance Note over LBS: 5. Filter distance > radius Note over LBS: 6. Sort by distance LBS-->>LB: {businesses: [...], next_page_token: ...} LB-->>U: 200 OK
Geohash Grid Visualization
graph TB subgraph "Geohash Grid — TP.HCM Area (Precision 5)" subgraph row1[""] C1["w3gv6<br/>0 biz"] C2["w3gv7<br/>12 biz"] C3["w3gve<br/>5 biz"] end subgraph row2[""] C4["w3gvj<br/>45 biz"] C5["w3gvk<br/>●USER<br/>78 biz"] C6["w3gvm<br/>23 biz"] end subgraph row3[""] C7["w3gvh<br/>8 biz"] C8["w3gvs<br/>156 biz"] C9["w3gvt<br/>34 biz"] end end style C5 fill:#ff9800,color:#000,stroke:#e65100,stroke-width:3px style C4 fill:#c8e6c9 style C6 fill:#c8e6c9 style C8 fill:#c8e6c9 style C2 fill:#c8e6c9 style C1 fill:#e8eaf6 style C3 fill:#e8eaf6 style C7 fill:#e8eaf6 style C9 fill:#e8eaf6
User o cell
w3gvk. He thong query cell nay + 8 neighbors (tong 9 cells, 361 businesses). Sau do tinh khoang cach thuc va loc.
Quadtree Structure Visualization
graph TD Root["ROOT<br/>Toan bo ban do<br/>100M businesses"] Root --> NW["NW<br/>North America<br/>25M biz"] Root --> NE["NE<br/>Europe + Africa<br/>20M biz"] Root --> SW["SW<br/>South America<br/>5M biz"] Root --> SE["SE<br/>Asia + Oceania<br/>50M biz"] SE --> SE_NW["SE-NW<br/>East Asia<br/>20M biz"] SE --> SE_NE["SE-NE<br/>Pacific<br/>2M biz"] SE --> SE_SW["SE-SW<br/>Southeast Asia<br/>18M biz"] SE --> SE_SE["SE-SE<br/>Oceania<br/>10M biz"] SE_SW --> VN1["VN-North<br/>Ha Noi area<br/>500K biz"] SE_SW --> VN2["VN-South<br/>TP.HCM area<br/>800K biz"] SE_SW --> TH["Thailand<br/>1.2M biz"] SE_SW --> ID["Indonesia<br/>2M biz"] VN2 --> D1["Quan 1<br/>5K biz"] VN2 --> D2["Quan 7<br/>3K biz"] VN2 --> D3["Thu Duc<br/>2K biz"] VN2 --> D4["Binh Thanh<br/>4K biz"] D1 --> L1["Block A<br/>95 biz ✓"] D1 --> L2["Block B<br/>78 biz ✓"] D1 --> L3["Block C<br/>88 biz ✓"] D1 --> L4["Block D<br/>62 biz ✓"] style Root fill:#e53935,color:#fff style SE fill:#ff9800,color:#000 style SE_SW fill:#ffc107,color:#000 style VN2 fill:#66bb6a,color:#fff style D1 fill:#42a5f5,color:#fff style L1 fill:#a5d6a7,stroke:#388e3c style L2 fill:#a5d6a7,stroke:#388e3c style L3 fill:#a5d6a7,stroke:#388e3c style L4 fill:#a5d6a7,stroke:#388e3c
Leaf nodes (mau xanh la) chua ⇐ 100 businesses. Vung dong duc (Quan 1 TP.HCM) co cells nho. Vung thua (Pacific Ocean) co cells lon. Day la suc manh cua adaptive grid.
4. Security
4.1 Location Privacy — Bao ve vi tri nguoi dung
| Moi de | Giai phap | Chi tiet |
|---|---|---|
| Location fuzzing | Lam mo toa do GPS truoc khi log | Them noise ±100-500m vao toa do truoc khi luu vao analytics/logs. User thuc nhan ket qua chinh xac, nhung server logs chi luu vi tri xap xi |
| Aggregation | Chi luu aggregated data | Thay vi log “User A o toa do X luc 14:00”, log “50 users search khu vuc geohash w3gvk luc 14:00-15:00” |
| Retention policy | Xoa location data sau thoi gian nhat dinh | Raw GPS coordinates xoa sau 30 ngay. Chi giu aggregated stats |
| User consent | Hoi quyen truoc khi truy cap GPS | Mobile OS (iOS/Android) bat buoc hoi permission. Backend phai ton trong “While Using” vs “Always” |
| Encryption in transit | TLS 1.3 cho moi request | Toa do truyen qua HTTPS, khong bao gio HTTP |
4.2 Rate Limiting on Location Queries
| Tier | Limit | Ly do |
|---|---|---|
| Per user | 30 requests/minute | Nguoi binh thuong search ~5 lan/phut max |
| Per IP | 100 requests/minute | Cho shared networks (co-working spaces) |
| Per API key (business) | 1,000 requests/minute | Cho third-party integrations |
| Burst allowance | 2x limit trong 10 giay | Cho phep burst ngan khi user scroll ban do nhanh |
Su dung Token Bucket algorithm. Tham chieu Tuan-09-Rate-Limiter.
4.3 Chong Data Scraping tu Doi thu
Moi de: Doi thu co the viet bot de scrape toan bo danh sach businesses va toa do tu API.
| Bien phap | Chi tiet |
|---|---|
| API key required | Moi client can API key. Revoke key neu phat hien scraping pattern |
| Pagination limit | Toi da 20 results per page, toi da 100 pages per session |
| Honeypot businesses | Chen fake businesses (chi visible qua API, khong hien tren UI). Neu doi thu hien thi honeypot → co bang chung scraping |
| Request pattern analysis | Phat hien: systematic grid scanning (search moi 1km tren toan thanh pho), unusual QPS, sequential page requests |
| CAPTCHA | Trigger CAPTCHA sau 50 searches lien tiep khong co user interaction |
| Response watermarking | Chen metadata an vao response (user ID, timestamp) de trace nguon leak |
4.4 GDPR Compliance cho Location Data
| Yeu cau GDPR | Implementation |
|---|---|
| Right to be forgotten | User yeu cau xoa → xoa tat ca location history, search history, cached data |
| Data portability | Export location history dang JSON/CSV khi user yeu cau |
| Purpose limitation | Location data chi dung cho search nearby — khong ban cho third-party advertising |
| Data minimization | Chi luu toa do khi can thiet. Khong luu location history cho tinh nang search (chi can real-time) |
| Consent | Explicit opt-in cho location tracking. Granular permissions: “Chi khi dang dung app” |
| Data Processing Agreement | Voi cloud provider (AWS/GCP) — dam bao data o region phu hop (EU data o EU region) |
5. DevOps & Monitoring
5.1 Key Metrics to Monitor
5.1.1 Search Latency by Region
| Metric | Alert Threshold | Dashboard |
|---|---|---|
search_latency_p50 by region | > 50ms | Time series, group by region |
search_latency_p99 by region | > 200ms | Time series, group by region |
search_latency_p99 global | > 500ms | Single number, red/green |
search_error_rate | > 0.1% | Percentage gauge |
Tai sao by region? Vi latency phu thuoc vao: khoang cach den read replica, business density (vung dong duc tra ve nhieu ket qua hon → cham hon), va cache hit ratio.
5.1.2 Cache Performance
| Metric | Alert Threshold | Y nghia |
|---|---|---|
cache_hit_ratio_geo (geohash → business_ids) | < 90% | Cache khong hieu qua → check TTL, check neu co geohash precision thay doi |
cache_hit_ratio_biz (business details) | < 95% | Business data it thay doi, cache hit phai cao |
cache_eviction_rate | > 1000/s | Redis het memory → can them node |
cache_latency_p99 | > 5ms | Redis cham → check network, check memory fragmentation |
Cache Hit Ratio by Geohash Precision:
| Precision | Expected Hit Ratio | Giai thich |
|---|---|---|
| 4 (39km cells) | 98%+ | Cell lon → nhieu user search cung cell → cache warm |
| 5 (4.9km cells) | 95%+ | Pho bien nhat — balance giua granularity va cache efficiency |
| 6 (1.2km cells) | 85-90% | Cell nho → nhieu cells → cache cold cho cells it search |
| 7 (153m cells) | 70-80% | Qua granular cho proximity search → khong khuyen nghi |
5.1.3 Database Replication Lag
| Metric | Alert Threshold | Impact |
|---|---|---|
replication_lag_seconds per replica | > 30s | Business updates cham hien thi |
replication_lag_seconds max | > 300s (5 min) | Can investigate — replica co the bi stuck |
replica_connections | < expected | Replica bi disconnect → data stale |
Tham chieu Tuan-07-Database-Sharding-Replication cho chi tiet ve replication monitoring.
5.1.4 Quadtree Health (neu su dung)
| Metric | Alert Threshold | Y nghia |
|---|---|---|
quadtree_build_time_seconds | > 600 (10 min) | Build cham → data tang hoac server cham |
quadtree_memory_bytes | > 2 GB | Tree lon bat thuong → check data quality |
quadtree_last_build_timestamp | > 6 hours ago | Quadtree outdated → trigger rebuild |
quadtree_max_depth | > 20 | Co the co du lieu bat thuong (nhieu businesses cung toa do) |
5.2 Deployment Strategy
| Aspect | Strategy | Ly do |
|---|---|---|
| LBS instances | Rolling update | Stateless → co the update tung instance |
| Quadtree rebuild | Blue-green | Build tree moi, swap khi xong. Zero downtime |
| Database schema migration | Online DDL (pt-online-schema-change) | Khong lock table |
| Cache warming | Pre-warm after deploy | Script query top geohash cells de warm cache |
| Canary deployment | 5% traffic truoc | Phat hien van de som truoc khi rollout 100% |
5.3 Alerting Rules
| Alert | Condition | Severity | Action |
|---|---|---|---|
| High search latency | p99 > 500ms for 5 min | P1 | On-call investigate, check DB/cache |
| Cache cluster down | > 1 node unreachable for 2 min | P1 | Auto-failover, page on-call |
| Replication lag high | > 5 min for any replica | P2 | Check replica health, network |
| Quadtree build failed | Build process crashed | P2 | Serve voi tree cu, investigate |
| Error rate spike | > 1% for 3 min | P1 | Check recent deploy, rollback if needed |
| Scraping detected | > 500 sequential searches from 1 IP | P3 | Auto rate limit, review |
Tham chieu Tuan-13-Monitoring-Observability cho alerting best practices.
6. Aha Moments & Pitfalls
6.1 Aha Moments — Nhung dieu bat ngo
Aha 1: Geohash Boundary Problem
Van de: Hai quan pho cach nhau 50m nhung o 2 geohash cells khac nhau. Neu chi query 1 cell → bo sot ket qua.
Giai phap: Luon query 9 cells (cell hien tai + 8 neighbors). Chi phi: 9x queries thay vi 1x. Nhung voi cache, chi phi nay giam dang ke.
Tui interview: Neu interviewer hoi “geohash co van de gi?”, PHAI nhac den boundary problem va giai phap 9-cell query. Day la diem phan biet ung vien trung binh va ung vien tot.
Aha 2: Quadtree fit trong 1 server
100 trieu businesses, quadtree chi chiem ~853 MB RAM. Mot server co 16 GB RAM du suc giu ca tree. Khong can distributed index.
Y nghia: Khi interviewer goi y “lam sao distribute quadtree across servers?”, cau tra loi dung la: khong can distribute. Moi LBS server giu 1 ban copy. Scale bang cach them server (moi server co full copy).
So sanh voi B-tree: Neu dung B-tree trong database, index cho 100M records co the chiem 10-20 GB (vi B-tree co overhead lon hon). Quadtree tiet kiem memory hon dang ke cho geospatial data.
Aha 3: Radius khong map 1-1 voi Geohash Precision
Khong co cong thuc chinh xac: radius = X → geohash precision = Y. Phai chon precision sao cho cell size >= radius, roi query 9 cells.
Vi du: Radius 2km. Cell precision 6 = 1.22km (nho hon radius) → khong du. Cell precision 5 = 4.89km (lon hon radius) → OK, nhung tra ve nhieu ket qua thua. Giai phap: dung precision 5, query 9 cells, roi filter bang Haversine distance.
Aha 4: Write path va Read path hoan toan tach biet
Proximity Service la mot trong nhung case study ro rang nhat cua CQRS pattern. Write QPS = 1-5, Read QPS = 60,000. Scale chung se lang phi tai nguyen.
Aha 5: Geohash la string → dung duoc voi moi database
Geohash chi la 1 chuoi string (vi du: “w3gvk1”). Co the luu trong bat ky database nao co string index: PostgreSQL, MySQL, DynamoDB, Redis, Cassandra. Khong can database dac biet.
6.2 Pitfalls — Nhung sai lam pho bien
Pitfall 1: Dung two-dimensional search truc tiep
“SELECT WHERE lat BETWEEN … AND lng BETWEEN …” — cham O(N) vi database khong the dung ca 2 index cung luc hieu qua.
Fix: Dung geospatial index (geohash, quadtree, hoac PostGIS).
Pitfall 2: Chi query 1 geohash cell
Bo qua boundary problem → mat ket qua o ranh gioi cell. Day la loi nghiem trong nhat khi implement geohash search.
Fix: Luon query cell hien tai + 8 neighbors.
Pitfall 3: Khong tinh Haversine distance sau khi query
Geohash query tra ve hinh vuong (hoac hinh chu nhat). Nhung user search theo vong tron (radius). Cac businesses o goc cua hinh vuong co the nam ngoai vong tron nhung van duoc tra ve.
Fix: Sau khi lay ket qua tu geohash cells, PHAI tinh Haversine distance va loc bo businesses ngoai radius.
Pitfall 4: Overcomplicate distributed Quadtree
Nhieu ung vien co gang distribute quadtree across servers. Khong can thiet — 853 MB fit trong 1 server. Complexity cua distributed tree khong dang.
Fix: Moi server giu full copy. Scale bang cach them servers.
Pitfall 5: Quen cache invalidation khi business di chuyen
Khi business thay doi toa do (vi du: food truck), geohash thay doi. Neu khong invalidate cache cu → user o vi tri cu van thay business do, user o vi tri moi khong thay.
Fix: Invalidate cache o CA HAI geohash cu va moi khi business update toa do.
Pitfall 6: Dung geohash precision qua cao
Precision 7 (153m cells) nghe co ve “chinh xac hon”. Nhung cell nho → can query nhieu cells hon de cover radius → nhieu queries hon → cham hon. Va cache hit ratio giam vi qua nhieu distinct cells.
Fix: Chon precision phu hop voi radius. Thuong la precision 4-6 cho hau het use cases.
7. Summary — Tom tat Decision Framework
7.1 Khi nao dung Geohash?
| Tinh huong | Recommendation |
|---|---|
| Database co san (PostgreSQL, MySQL) | Geohash — chi can them 1 column |
| Can distributed index | Geohash — string column, dung duoc voi sharding |
| Team nho, can ship nhanh | Geohash — don gian nhat |
| Data thay doi thuong xuyen | Geohash — update record la xong |
7.2 Khi nao dung Quadtree?
| Tinh huong | Recommendation |
|---|---|
| Can adaptive density | Quadtree — tu dong chia nho vung dong duc |
| Query in-memory la chap nhan | Quadtree — cuc nhanh, O(log N) |
| Data it thay doi (businesses) | Quadtree — rebuild dinh ky la du |
| Single region deployment | Quadtree — khong can distribute |
7.3 Khi nao dung S2?
| Tinh huong | Recommendation |
|---|---|
| Can do chinh xac hinh cau | S2 — tot nhat cho bai toan toan cau |
| Google-scale | S2 — proven at Google |
| MongoDB ecosystem | S2 — native support (2dsphere) |
| Complex region queries | S2 — S2RegionCoverer toi uu |
8. Further Reading & Internal Links
Kien thuc nen tang lien quan
| Topic | Link | Ap dung trong Proximity Service |
|---|---|---|
| Cache Strategy | Tuan-06-Cache-Strategy | Geohash cache, business data cache, invalidation patterns |
| Database Sharding & Replication | Tuan-07-Database-Sharding-Replication | Read replicas per region, replication lag monitoring |
| Load Balancer | Tuan-05-Load-Balancer | Distribute traffic across stateless LBS instances |
| Rate Limiter | Tuan-09-Rate-Limiter | Chong scraping, protect LBS tu traffic spikes |
| Monitoring | Tuan-13-Monitoring-Observability | Search latency, cache hit ratio, replication lag alerts |
| Back-of-envelope | Tuan-02-Back-of-the-envelope | QPS, storage, memory estimation |
| Microservices Pattern | Tuan-11-Microservices-Pattern | CQRS — tach Read (LBS) va Write (Business Service) |
So sanh voi cac Case Study khac
| Case Study | Diem tuong dong | Diem khac biet |
|---|---|---|
| Tuan-16-Design-URL-Shortener | Read-heavy, caching quan trong | URL shortener khong co geospatial component |
| Tuan-18-Design-News-Feed | Fan-out pattern, caching | News Feed la social graph, khong phai location |
| Tuan-20-Design-Key-Value-Store | In-memory data structures, partitioning | KV Store la general purpose, Proximity la domain-specific |
9. Interview Cheat Sheet
Buoc 1 — Hoi Clarifying Questions (2-3 phut)
- Scale: bao nhieu businesses? bao nhieu DAU?
- Radius: user co chon radius khong? Max radius?
- Real-time: business updates can hien thi ngay khong?
- Features: search only? hay ca CRUD businesses?
Buoc 2 — High-level Design (5-7 phut)
- Ve 2 services: LBS (read) + Business Service (write)
- Ve database: Primary + Read Replicas
- Nhac CQRS pattern
- Ve cache layer (Redis)
Buoc 3 — Deep Dive (15-20 phut)
- Giai thich geohash: encoding, precision levels, 9-cell query
- Nhac boundary problem va giai phap
- Quadtree nhu alternative: adaptive, in-memory, 853 MB
- Database schema: business table + geospatial index table
- Caching: 2-layer (geohash → IDs, ID → details)
Buoc 4 — Wrap Up (3-5 phut)
- Scaling: LBS stateless → horizontal, DB read replicas per region
- Monitoring: search latency, cache hit ratio, replication lag
- Security: location privacy, rate limiting, anti-scraping
Cau hoi interviewer thuong hoi
| Cau hoi | Cach tra loi |
|---|---|
| ”Geohash vs Quadtree?” | Geohash: distributed, database-friendly. Quadtree: adaptive, in-memory, faster query. Chon theo use case |
| ”Geohash co van de gi?” | Boundary problem — 2 diem gan nhau nhung khac cell. Fix: query 9 cells |
| ”Quadtree co scale khong?“ | 100M businesses chi 853 MB — moi server giu full copy. Scale horizontal |
| ”Lam sao handle business updates?” | CQRS: write vao Primary, async replicate. Quadtree rebuild dinh ky. Cache invalidation |
| ”Lam sao giam latency?” | Cache (2-layer), read replicas per region, geohash precision tuning |
“Proximity Service la bai toan tuyet voi de hieu geospatial indexing — nen tang cua moi ung dung location-based. Khi em hieu cach bien bai toan 2 chieu thanh 1 chieu (geohash) hoac chia khong gian thong minh (quadtree), em co the ap dung cho bat ky bai toan location nao: tim tai xe, tim ban be, tim nha hang, hay tim bat ky thu gi tren ban do.”