Case Study: Design Google Maps
“Ngày xua, muon di tu Sai Gon ra Ha Noi, ong ba phai trai ban do giay ra san, dung ngon tay do theo quoc lo 1A. Hom nay, mot cai cham vao man hinh — Google Maps tinh cho em duong nhanh nhat trong 0.5 giay. Dang sau su ‘don gian’ do la mot trong nhung he thong phuc tap nhat the gioi.”
Tags: system-design case-study google-maps navigation geo alex-xu-vol2 Student: Hieu Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-03-Networking-DNS-CDN · Tuan-08-Message-Queue Lien quan: Case-Design-Proximity-Service · Tuan-07-Database-Sharding-Replication · Tuan-06-Cache-Strategy · Tuan-13-Monitoring-Observability · Tuan-14-AuthN-AuthZ-Security · Tuan-15-Data-Security-Encryption Reference: Alex Xu, System Design Interview Volume 2 — Chapter 2: Nearby Friends / Google Maps
Context & Why — Tu ban do giay den ban do so thong minh
Analogy: Me cung thanh pho
Hieu, hay tuong tuong em dung giua mot me cung khong lo — me cung do chinh la mang luoi duong pho cua mot thanh pho. Ban do giay ngay xua giong nhu em cam mot tam anh chup tu tren cao, roi tu minh do duong bang mat. Van de la:
- Ban do giay khong biet traffic — em khong biet duong nao dang ket xe
- Ban do giay khong cap nhat — con duong moi xay xong thang truoc chua co tren ban do
- Ban do giay khong tinh duoc ETA — em chi uoc luong bang cam giac
- Ban do giay khong zoom duoc — muon xem chi tiet phai mua ban do ty le khac
Google Maps giai quyet tat ca nhung van de nay bang cach so hoa toan bo ban do the gioi, chia nho thanh hang ty tile nho, va su dung algorithm de tim duong nhanh nhat trong thoi gian thuc.
Tai sao bai nay quan trong?
| Khia canh | Ly do |
|---|---|
| Geo-spatial data | Day la nền tang cho moi ung dung location-based (Grab, Uber, delivery apps) |
| Scale khong lo | 1 billion DAU, hang chuc trieu navigation requests/ngay |
| Real-time processing | Traffic data thay doi moi giay, ETA phai cap nhat lien tuc |
| Pre-computation | Khong the tinh routing on-the-fly cho moi request — can pre-compute |
| CDN & Caching | Map tiles la bai toan CDN kinh dien — Tuan-03-Networking-DNS-CDN |
| Stream processing | Location updates tu hang ty device — Tuan-08-Message-Queue |
Step 1 — Understand the Problem & Establish Design Scope
1.1 Clarifying Questions (Cau hoi lam ro)
| Cau hoi | Tra loi | Ghi chu |
|---|---|---|
| He thong can ho tro nhung feature chinh nao? | Map rendering, navigation (routing + ETA), location update | 3 pillars chinh |
| Scale bao lon? | ~1 Billion DAU | Quy mo Google-level |
| Can ho tro real-time traffic? | Co — traffic overlay tren map va ETA tinh toan | Stream processing |
| Map data tu dau? | Satellite imagery, street-level photography, user contributions, third-party data | Khong thiet ke phan thu thap, chi dung data co san |
| Ho tro bao nhieu loai phuong tien? | Xe hoi, xe may, di bo, xe dap, transit (bus/tau) | Moi loai co routing algorithm rieng |
| Offline support? | Co — download map tiles cho mot region | Mobile-first concern |
| Turn-by-turn navigation? | Co — real-time voice guidance | Streaming direction updates |
| Search (geocoding)? | Co — tim dia diem bang ten, dia chi | Geocoding + reverse geocoding |
1.2 Functional Requirements (Yeu cau chuc nang)
- FR1: Map Rendering — hien thi ban do tai bat ky vi tri nao tren the gioi, ho tro zoom in/out (21 zoom levels)
- FR2: Navigation/Routing — tim duong di ngan nhat hoac nhanh nhat giua 2 diem (hoac nhieu diem), tinh ETA
- FR3: Location Update — nhan va luu vi tri hien tai cua user (GPS coordinates) de phuc vu traffic va personalization
- FR4: Geocoding — chuyen doi giua dia chi text va toa do (lat/lng) va nguoc lai
- FR5: Traffic Overlay — hien thi tinh trang giao thong real-time tren ban do (xanh/vang/do)
- FR6: Alternative Routes — de xuat 2-3 tuyen duong thay the voi ETA khac nhau
- FR7: Turn-by-turn Navigation — huong dan chi tiet tung buoc trong qua trinh di chuyen
1.3 Non-Functional Requirements (Yeu cau phi chuc nang)
| Yeu cau | Muc tieu | Giai thich |
|---|---|---|
| Availability | 99.99% | Navigation la safety-critical — user dang lai xe tren cao toc |
| Latency (Map) | Tile load < 200ms | User ky vong ban do hien thi gan nhu tuc thi khi scroll/zoom |
| Latency (Routing) | Route calculation < 3s | Bao gom tinh toan tren server + network round trip |
| Scalability | 1B DAU, 33M location updates/s peak | Xem phan Estimation ben duoi |
| Accuracy (ETA) | Sai lech < 10% so voi thuc te | ETA sai = user mat tin tuong |
| Freshness (Traffic) | Cap nhat moi 30-60 giay | Traffic cu 5 phut la vo nghia |
| Offline Support | Download map tiles cho offline use | Mobile users o vung yeu signal |
| Data Durability | 99.999999999% (eleven 9s) | Map data la tai san core — mat la mat tat ca |
Step 2 — High-Level Design
2.1 Ba service chinh
Google Maps co the chia thanh 3 service chinh (theo Alex Xu):
| Service | Nhiem vu | Input | Output |
|---|---|---|---|
| Location Service | Thu thap va luu vi tri cua user | GPS coordinates tu mobile client | Location history, real-time traffic data |
| Navigation Service | Tim duong va tinh ETA | Origin + Destination + preferences | Route (list of coordinates), ETA, turn-by-turn instructions |
| Map Rendering Service | Phuc vu map tiles cho client hien thi | Tile coordinates (zoom level, x, y) | Image tiles hoac vector tiles |
Ngoai ra con co cac supporting services:
| Service | Nhiem vu |
|---|---|
| Geocoding Service | Chuyen doi dia chi ←> toa do |
| Traffic Service | Xu ly va aggregation traffic data real-time |
| ETA Service | Tinh toan ETA dua tren historical + real-time + ML |
| Search/POI Service | Tim kiem dia diem (Points of Interest) |
2.2 High-Level Architecture Diagram
graph TB subgraph Clients MobileApp["Mobile App<br/>(iOS/Android)"] WebApp["Web Browser"] end subgraph "API Gateway & Load Balancer" APIGW["API Gateway<br/>Rate Limiting, Auth, Routing"] end subgraph "Core Services" LocSvc["Location Service<br/>(Write-heavy)"] NavSvc["Navigation Service<br/>(Compute-heavy)"] MapSvc["Map Rendering Service<br/>(Read-heavy)"] GeoSvc["Geocoding Service"] TrafficSvc["Traffic Service"] ETASvc["ETA Service"] SearchSvc["Search/POI Service"] end subgraph "Data Layer" Kafka["Apache Kafka<br/>Location Stream"] LocDB["Location History DB<br/>(Cassandra/TimescaleDB)"] RoadGraph["Road Graph DB<br/>(Adjacency list, pre-computed)"] TileStorage["Tile Storage<br/>(Object Storage — S3)"] GeoDB["Geocoding DB<br/>(Elasticsearch)"] TrafficDB["Traffic DB<br/>(Time-series)"] end subgraph "CDN Layer" CDN["Tile CDN<br/>(CloudFront/Akamai)<br/>Global Edge PoPs"] end subgraph "ML Pipeline" MLPipeline["ML Model<br/>ETA Prediction<br/>Traffic Prediction"] end MobileApp --> APIGW WebApp --> APIGW MobileApp -->|"Tile requests"| CDN APIGW --> LocSvc APIGW --> NavSvc APIGW --> MapSvc APIGW --> GeoSvc APIGW --> SearchSvc LocSvc -->|"Publish location"| Kafka Kafka --> TrafficSvc Kafka --> LocDB TrafficSvc --> TrafficDB TrafficSvc --> MLPipeline NavSvc --> RoadGraph NavSvc --> ETASvc ETASvc --> TrafficDB ETASvc --> MLPipeline MapSvc --> TileStorage TileStorage --> CDN GeoSvc --> GeoDB SearchSvc --> GeoDB
2.3 Request Flow — 3 luong chinh
Luong 1: Map Rendering (User mo app, xem ban do)
- Client tinh toan can tiles nao dua tren viewport (vi tri + zoom level)
- Kiem tra local cache (mobile) hoac browser cache (web)
- Neu cache miss → request toi CDN
- Neu CDN cache miss → request toi Map Rendering Service → lay tu Tile Storage (S3)
- CDN cache tile → tra ve client
- Client render tiles ghep lai thanh ban do hoan chinh
Luong 2: Navigation (User bam “Chi duong”)
- Client gui origin (vi tri hien tai) + destination + preferences (xe hoi/di bo/…)
- API Gateway route toi Navigation Service
- Navigation Service goi Geocoding Service (neu user nhap dia chi text)
- Navigation Service query Road Graph DB de tim route
- Navigation Service goi ETA Service de tinh thoi gian cho moi route
- ETA Service lay traffic data real-time + historical patterns + ML prediction
- Tra ve 2-3 routes voi ETA, khoang cach, turn-by-turn instructions
Luong 3: Location Update (Background, lien tuc)
- Mobile client thu thap GPS coordinates (moi 1-3 giay khi navigating, moi 15-30 giay khi idle)
- Client batch nhieu data points lai → gui 1 request moi 15-30 giay (tiet kiem battery + bandwidth)
- Location Service nhan batch → publish len Kafka
- Kafka consumers: (a) Luu vao Location History DB, (b) Feed vao Traffic Service de tinh real-time traffic
Step 3 — Deep Dive
3.1 Map Tiles — Chia ban do thanh mien ghep hinh
Khai niem Tile Pyramid
Hieu, hay tuong tuong em co mot buc anh ve tinh chup toan bo Trai Dat. Buc anh nay co do phan giai cuc ky lon — neu in ra se to bang mot san bong da. Hien thi nguyen buc anh nay tren dien thoai la bat kha thi (qua nang, qua lon). Giai phap: cat nho buc anh thanh nhieu manh vuong (tiles), va chi load nhung manh ma user dang nhin thay.
21 Zoom Levels (tu 0 den 20):
| Zoom Level | So tiles | Moi tile bao phu | Su dung |
|---|---|---|---|
| 0 | 1 tile (1x1) | Toan bo the gioi | Overview toan cau |
| 1 | 4 tiles (2x2) | 1/4 the gioi | Nhin cac luc dia |
| 2 | 16 tiles (4x4) | 1/16 the gioi | Nhin cac quoc gia lon |
| 3 | 64 tiles (8x8) | … | Nhin nhom quoc gia |
| … | … | … | … |
| 10 | ~1 trieu tiles | ~150m x 150m | Nhin thanh pho |
| 15 | ~1 ty tiles | ~5m x 5m | Nhin khu pho, duong |
| 20 | ~1 ngan ty tiles | ~0.15m x 0.15m | Nhin chi tiet nha, via he |
Cong thuc tinh so tiles tai moi zoom level:
Ví du:
Tile Pyramid Diagram
graph TD subgraph "Tile Pyramid — Zoom Levels" Z0["Zoom 0<br/>1 tile<br/>Toan bo the gioi"] Z1["Zoom 1<br/>4 tiles (2x2)<br/>Cac luc dia"] Z2["Zoom 2<br/>16 tiles (4x4)<br/>Cac quoc gia"] Z10["Zoom 10<br/>~1M tiles<br/>Thanh pho"] Z15["Zoom 15<br/>~1B tiles<br/>Duong pho"] Z20["Zoom 20<br/>~1T tiles<br/>Chi tiet nha"] end Z0 -->|"Moi tile chia thanh 4"| Z1 Z1 -->|"Moi tile chia thanh 4"| Z2 Z2 -->|"..."| Z10 Z10 -->|"..."| Z15 Z15 -->|"..."| Z20 style Z0 fill:#e1f5fe style Z1 fill:#b3e5fc style Z2 fill:#81d4fa style Z10 fill:#4fc3f7 style Z15 fill:#29b6f6 style Z20 fill:#0288d1
Vector Tiles vs Raster Tiles
| Dac diem | Raster Tiles | Vector Tiles |
|---|---|---|
| Format | PNG/JPEG image | Protocol Buffers (binary data) |
| Kich thuoc | 20-100 KB/tile | 5-30 KB/tile (nho hon 3-5x) |
| Rendering | Server render san → client chi hien thi | Client render tu data (GPU-accelerated) |
| Zoom | Mo khi zoom giua 2 level (pixelated) | Muot ma o moi muc zoom (smooth) |
| Customization | Khong — anh da render co dinh | Co — client tu style (dark mode, highlight roads, etc.) |
| Offline size | Lon | Nho hon nhieu |
| CPU client | Thap (chi hien anh) | Cao hon (phai render) |
| Bandwidth | Lon hon | Tiet kiem hon |
Xu huong hien tai: Google Maps, Apple Maps, Mapbox deu chuyen sang vector tiles. Ly do chinh: bandwidth tiet kiem + smooth zoom + dynamic styling (dark mode khi lai xe ban dem).
Tai sao vector tiles thang? Hieu, hay nghi the nay: raster tile giong nhu em chup anh ban do roi gui qua. Vector tile giong nhu em gui data (duong nay o dau, nha nay hinh gi, ten duong la gi) roi de dien thoai tu ve. Gui data nhe hon gui anh, va dien thoai ngay nay du manh de ve rat nhanh.
Tile Addressing — Cach xac dinh tile nao can load
Moi tile duoc xac dinh boi 3 tham so: (zoom, x, y)
- zoom: zoom level (0-20)
- x: column index (0 den )
- y: row index (0 den )
URL pattern:
https://tiles.maps.example.com/{zoom}/{x}/{y}.pbf
Ví du: https://tiles.maps.example.com/15/26177/15123.pbf — tile o zoom 15, cot 26177, hang 15123.
Client tinh toan tiles can hien thi dua tren:
- Viewport (man hinh dang nhin thay pham vi nao)
- Zoom level hien tai
- Pre-fetch tiles xung quanh viewport (anticipatory loading)
3.2 Map Rendering — Tu tile den ban do tren man hinh
Tile CDN Architecture
Map tiles la static content (it thay doi — chi cap nhat khi co road moi hoac building moi). Day la bai toan CDN kinh dien — tham khao Tuan-03-Networking-DNS-CDN.
Multi-layer caching strategy:
| Layer | Noi luu | Cache Duration | Hit Ratio (uoc tinh) |
|---|---|---|---|
| L1: Client Cache | Mobile app local storage / Browser cache | 30 ngay (tiles it thay doi) | ~70-80% |
| L2: CDN Edge | CloudFront/Akamai edge PoPs | 7-30 ngay | ~95-99% cua requests con lai |
| L3: CDN Origin Shield | Regional origin cache | 30 ngay | ~99% cua requests con lai |
| L4: Origin (S3) | Object storage | Permanent | 100% (source of truth) |
Key insight: Voi 4 layers nay, chi khoang 0.01-0.1% tong request thuc su cham toi origin storage. Day la ly do tai sao map rendering co the phuc vu hang ty request/ngay ma chi can vai chuc origin servers.
Tinh toan cache hit ratio tong the:
Chi 7.5 trong 100,000 requests thuc su phai lay tile tu origin. Day la suc manh cua multi-layer caching.
Progressive Loading
Khi user zoom tu level 10 xuong level 15, ban do khong doi tuc thi — no progressive load:
- Hien thi tiles cu (level 10) dang co trong cache — upscale (phong to, hoi mo)
- Request tiles moi (level 15) tu CDN
- Khi tiles moi ve → fade-in thay the tiles cu → ban do sac net dan
Ky thuat nay goi la “show stale, fetch fresh” — user luon nhin thay gi do thay vi man hinh trang, tao cam giac instant response.
Tile Pre-rendering Pipeline
Map tiles duoc pre-render offline (khong render on-the-fly khi user request):
graph LR RawData["Raw Map Data<br/>(Roads, Buildings,<br/>POIs, Terrain)"] Pipeline["Tile Rendering Pipeline<br/>(Batch processing)"] TileStore["Tile Storage<br/>(S3 / GCS)"] CDN["CDN<br/>Global Distribution"] Client["Client<br/>Mobile / Web"] RawData -->|"Nightly batch +<br/>incremental updates"| Pipeline Pipeline -->|"Pre-rendered tiles<br/>21 zoom levels"| TileStore TileStore -->|"Origin"| CDN CDN -->|"Edge cache"| Client
Khi nao re-render tiles?
- Scheduled: Moi tuan hoac moi thang — cap nhat roads moi, buildings moi
- Incremental: Chi re-render tiles bi anh huong boi thay doi (khong render lai toan bo)
- Priority: Khu vuc dong dan (thanh pho lon) re-render thuong xuyen hon khu vuc hoang vu
3.3 Location Service — Thu thap vi tri tu hang ty device
Bai toan
Moi giay, hang trieu device gui vi tri cua minh (latitude, longitude, timestamp, speed, heading) ve server. Day la write-heavy workload voi volume cuc lon.
Batching Strategy
Gui GPS coordinates moi giay la lang phi (ton battery, ton bandwidth, server khong can do chi tiet). Thay vao do:
| Trang thai user | Tan suat gui | Ly do |
|---|---|---|
| Dang navigate (lai xe) | Batch moi 15 giay (gom 15 data points) | Can do chinh xac cao de cap nhat traffic |
| App mo nhung khong navigate | Batch moi 30 giay (gom 5-10 data points) | Tiet kiem battery, van cap nhat traffic |
| App chay background | Batch moi 5 phut hoac khong gui | Toi uu battery |
| App dong | Khong gui | Khong can |
Payload cua mot batch:
| Field | Type | Size | Mo ta |
|---|---|---|---|
user_id | string | 16 bytes | Anonymized ID |
timestamps[] | int64 array | 8 bytes x N | Unix timestamp moi data point |
latitudes[] | double array | 8 bytes x N | Vi do |
longitudes[] | double array | 8 bytes x N | Kinh do |
speeds[] | float array | 4 bytes x N | Toc do (m/s) |
headings[] | float array | 4 bytes x N | Huong di chuyen (0-360 do) |
accuracy | float | 4 bytes | Do chinh xac GPS (met) |
transport_mode | enum | 1 byte | Car, bike, walk, etc. |
Voi N = 15 data points, payload khoang 700 bytes — rat nhe.
Ingestion Pipeline
graph LR subgraph "Millions of Devices" D1["Device 1"] D2["Device 2"] DN["Device N"] end subgraph "Ingestion Layer" LB["Load Balancer"] LS1["Location Service<br/>Instance 1"] LS2["Location Service<br/>Instance 2"] LSN["Location Service<br/>Instance N"] end subgraph "Message Queue" K["Apache Kafka<br/>Partitioned by<br/>geo-region"] end subgraph "Consumers" C1["Traffic Aggregator<br/>(Real-time)"] C2["Location History<br/>Writer (Batch)"] C3["ML Feature<br/>Pipeline"] end subgraph "Storage" TSDB["Location History<br/>Cassandra / TimescaleDB"] TrafficStore["Traffic Store<br/>Redis + Time-series DB"] end D1 --> LB D2 --> LB DN --> LB LB --> LS1 LB --> LS2 LB --> LSN LS1 --> K LS2 --> K LSN --> K K --> C1 K --> C2 K --> C3 C1 --> TrafficStore C2 --> TSDB
Tai sao dung Kafka? (Tham khao Tuan-08-Message-Queue)
| Ly do | Giai thich |
|---|---|
| Decouple | Location Service chi can ghi vao Kafka — khong can biet ai se doc |
| Buffer | Neu Traffic Aggregator bi cham, Kafka giu data — khong mat |
| Multiple consumers | Cung mot location event duoc doc boi Traffic Aggregator, History Writer, ML Pipeline — fan-out |
| Partitioning | Partition theo geo-region → cac consumer xu ly song song |
| Throughput | Kafka xu ly duoc hang trieu events/giay tren mot cluster — phu hop voi 33M updates/s |
| Durability | Data duoc replicate, khong mat khi node chet |
Location History Storage
Lua chon database: Cassandra hoac TimescaleDB
| Tieu chi | Cassandra | TimescaleDB |
|---|---|---|
| Data model | Wide-column | Relational (PostgreSQL extension) |
| Write throughput | Cuc cao (hang trieu writes/s) | Cao (tram nghin writes/s) |
| Time-series query | Can thiet ke schema can than | Built-in time-series functions |
| Compression | Tot (50-70%) | Rat tot (90%+ voi columnar compression) |
| Scale | Horizontal (them node) | Horizontal (distributed hypertables) |
Schema (Cassandra):
| Column | Type | Mo ta |
|---|---|---|
user_id | UUID | Partition key — phan tan data theo user |
timestamp | TIMESTAMP | Clustering key — sap xep theo thoi gian |
lat | DOUBLE | Vi do |
lng | DOUBLE | Kinh do |
speed | FLOAT | Toc do |
heading | FLOAT | Huong |
accuracy | FLOAT | Do chinh xac GPS |
Partition key = user_id → tat ca location history cua 1 user nam tren cung partition → query “lich su di chuyen cua user X trong 7 ngay qua” chi can doc 1 partition.
TTL (Time-to-Live): Location history chi giu 30-90 ngay → tu dong xoa data cu → tiet kiem storage.
3.4 Navigation/Routing — Tim duong trong me cung
Day la trai tim cua Google Maps — va cung la phan phuc tap nhat.
Road Network la mot Weighted Graph
Hieu, toan bo mang luoi duong giao thong co the bieu dien nhu mot do thi co trong so (weighted graph):
| Khai niem do thi | Tuong ung trong ban do |
|---|---|
| Node (dinh) | Giao lo (intersection), diem uon cua duong |
| Edge (canh) | Doan duong noi 2 giao lo |
| Weight (trong so) | Thoi gian di chuyen (tinh bang giay/phut), KHONG phai khoang cach |
Tai sao weight la thoi gian, khong phai khoang cach? Vi user muon den nhanh nhat, khong phai ngan nhat. Duong cao toc 50km co the nhanh hon duong tat 10km neu duong tat ket xe.
Quy mo cua road graph:
| Thong so | Gia tri uoc tinh |
|---|---|
| So node (intersections) toan cau | ~500 trieu |
| So edge (road segments) toan cau | ~1 ty |
| Average edges per node | ~4 (moi giao lo thuong co 3-5 nhanh) |
| Data size (adjacency list) | ~100 GB (compressed) |
Tien hoa cua Routing Algorithm
Giai doan 1: Dijkstra’s Algorithm
- Tim duong ngan nhat tu 1 diem den tat ca cac diem khac
- Do phuc tap: voi priority queue
- Van de: Voi 500M nodes, Dijkstra duyet qua hang trieu nodes truoc khi tim duoc duong — qua cham cho real-time (co the mat vai giay)
Giai doan 2: A Algorithm*
- Cai tien Dijkstra bang heuristic — uu tien duyet nodes theo huong ve dich
- Heuristic: Khoang cach duong chim bay (haversine distance) tu node hien tai den dich
- Nhanh hon Dijkstra ~2-5x vi loai bot nhieu nodes khong can thiet
- Van de: Van cham khi route dai (Sai Gon → Ha Noi) vi graph qua lon
Giai doan 3: Contraction Hierarchies (CH) — Pre-computation la chia khoa
Day la breakthrough giup Google Maps tim duong trong milliseconds thay vi seconds.
Y tuong cot loi: Pre-compute “shortcuts” trong graph.
| Buoc | Mo ta | Analogy |
|---|---|---|
| 1. Xep hang nodes | Moi node duoc gan mot “importance level” — giao lo lon quan trong hon giao lo nho | Quoc lo quan trong hon hem |
| 2. Contract nodes | Bat dau tu node it quan trong nhat, “loai bo” no va them shortcut edges noi cac node lan can | Loai bo cac hem nho, them shortcut “tu giao lo A den giao lo C, mat 5 phut” |
| 3. Xay hierarchy | Sau khi contract het, ta co mot hierarchy graph — level thap la duong nho, level cao la duong lon | Giong nhu ban do chi co quoc lo + cao toc |
| 4. Query | Tim duong = search tu origin len hierarchy + search tu destination len hierarchy → gap nhau o giua | ”Lat xe ra duong lon gan nhat → di duong lon → re vao duong nho gan dich” |
Hieu qua:
Aha Moment: Pre-computation la ly do Google Maps tra loi route gan nhu tuc thi. Viec contract graph mat vai gio de tinh (chay offline), nhung moi query chi mat < 1 millisecond. Trade-off kinh dien: offline compute time ↔ online query speed.
Nhung pre-computation co van de: Khi traffic thay doi (ket xe), edge weights thay doi → pre-computed shortcuts khong con chinh xac. Giai phap: Routing Tiles (xem ben duoi).
Routing Tiles — Partition Road Graph
Thay vi luu toan bo road graph trong 1 cuc, Google chia graph thanh nhieu routing tiles (tuong tu map tiles nhung cho road data):
| Level | Tile bao phu | Chua gi | Su dung |
|---|---|---|---|
| Level 0 (Local) | ~1km x 1km | Duong nho, hem, ngo | Routing trong pham vi khu pho |
| Level 1 (Regional) | ~10km x 10km | Duong chinh, tinh lo | Routing trong thanh pho |
| Level 2 (Country) | ~100km x 100km | Quoc lo, cao toc | Routing lien tinh |
| Level 3 (Global) | ~1000km x 1000km | Cao toc quoc te | Routing quoc te |
Routing algorithm voi tiles:
- Origin nam trong tile A (local level) → load tile A
- Destination nam trong tile B (local level) → load tile B
- Tim duong trong tile A tu origin den boundary cua tile
- Tim duong giua boundaries bang higher-level tiles (regional/country)
- Tim duong trong tile B tu boundary den destination
- Ghep 3 doan lai → route hoan chinh
Loi ich:
- Chi can load vai routing tiles thay vi toan bo graph → tiet kiem memory
- Co the cache routing tiles tai CDN edge
- Co the cap nhat traffic cho tung tile doc lap (khong can re-compute toan bo)
Routing Algorithm Flow
graph TD Start["User request:<br/>Origin → Destination"] Geocode["Geocode dia chi<br/>(neu can)"] FindTiles["Xac dinh routing tiles<br/>chua Origin va Destination"] LoadTiles["Load routing tiles<br/>(local + regional + country)"] LocalO["Tim duong trong<br/>local tile cua Origin"] Regional["Tim duong qua<br/>regional/country tiles"] LocalD["Tim duong trong<br/>local tile cua Destination"] Merge["Ghep 3 doan<br/>thanh route hoan chinh"] Traffic["Overlay real-time<br/>traffic data"] ETA["Tinh ETA cho route"] AltRoutes["Tinh 2-3 alternative routes"] Response["Tra ve routes + ETA<br/>+ turn-by-turn instructions"] Start --> Geocode Geocode --> FindTiles FindTiles --> LoadTiles LoadTiles --> LocalO LoadTiles --> Regional LoadTiles --> LocalD LocalO --> Merge Regional --> Merge LocalD --> Merge Merge --> Traffic Traffic --> ETA ETA --> AltRoutes AltRoutes --> Response
Real-time Traffic Overlay
Pre-computed routes dua tren static weights (toc do gioi han cua duong). Nhung thuc te, traffic thay doi moi phut. Giai phap:
- Pre-compute routes voi static weights (Contraction Hierarchies)
- Overlay real-time traffic data len route:
- Moi road segment co mot traffic factor (1.0 = thong thoang, 2.0 = cham gap doi, 5.0 = ket nang)
- Nhan static travel time voi traffic factor → adjusted travel time
- Neu traffic lam route hien tai cham hon alternative → re-route (tinh lai route)
Alternative Routes
Google Maps thuong hien thi 2-3 routes. Cach tao alternative routes:
| Phuong phap | Mo ta |
|---|---|
| Penalty method | Sau khi tim route tot nhat, tang weight cua cac edges trong route do, roi tim lai → duoc route khac |
| Plateau method | Tim cac doan duong “plateau” (doan ma route tot nhat va alternative deu di qua) → buoc route alternative di duong khac o phan con lai |
| k-shortest paths | Tim k duong ngan nhat trong graph (thuong k = 3) |
Moi alternative route duoc loc de dam bao:
- Khac biet it nhat 30% so voi route chinh (khong muon 2 routes gan giong nhau)
- ETA khong qua 50% lon hon route nhanh nhat (khong de xuat duong qua cham)
- Khong di qua nhung doan nguy hiem hoac bi cam
3.5 Geocoding & Reverse Geocoding
Geocoding: Dia chi → Toa do
| Input | Output | Vi du |
|---|---|---|
| Dia chi text | Latitude, Longitude | ”227 Nguyen Van Cu, Q5, HCM” → (10.7594, 106.6815) |
Cach hoat dong:
- Parsing: Tach dia chi thanh cac thanh phan (so nha, ten duong, quan, thanh pho)
- Standardization: Chuan hoa viet tat (“Q5” → “Quan 5”, “HCM” → “Ho Chi Minh”)
- Lookup: Tim trong geocoding database (Elasticsearch voi geo-index)
- Interpolation: Neu dia chi chinh xac khong co trong DB → noi suy vi tri giua 2 dia chi da biet
Address Interpolation (Noi suy dia chi):
Neu DB co:
- So 221 Nguyen Van Cu → toa do A
- So 231 Nguyen Van Cu → toa do B
Thi so 227 ≈ vi tri (227 - 221) / (231 - 221) = 60% tu A den B
Reverse Geocoding: Toa do → Dia chi
| Input | Output | Vi du |
|---|---|---|
| Latitude, Longitude | Dia chi text | (10.7594, 106.6815) → “227 Nguyen Van Cu, Quan 5, TP.HCM” |
Cach hoat dong:
- Tim road segment gan nhat voi toa do (nearest-neighbor search bang spatial index — R-tree hoac geohash)
- Project diem len road segment → tim vi tri chinh xac tren duong
- Interpolation nguoc → uoc tinh so nha
- Tra ve dia chi day du
Database: Elasticsearch voi geo_point va geo_shape types, hoac PostgreSQL + PostGIS extension.
3.6 Traffic Data — Mach mau cua ban do
Nguon du lieu traffic
| Nguon | Mo ta | Do tin cay | Latency |
|---|---|---|---|
| User GPS data | Vi tri + toc do cua hang trieu user dang di tren duong | Cao (so luong lon → trung binh chinh xac) | Real-time (30-60s delay) |
| Historical patterns | Du lieu giao thong lich su theo gio/ngay/mua | Cao cho patterns lap lai | N/A (offline) |
| Government sensors | Camera giao thong, vong tu (induction loops) | Rat cao nhung coverage thap | Near real-time |
| Third-party data | Data tu taxi companies, ride-hailing, logistics | Trung binh | Near real-time |
| Events | Tai nan, sua duong, su kien lon, thoi tiet | Bien dong | Reported manually hoac ML-detected |
Real-time Traffic Processing Pipeline
- Ingest: Nhan location updates tu Kafka (33M updates/s peak)
- Map-match: Snap GPS coordinates len road segments (GPS co sai so 5-15m → can xac dinh user dang tren duong nao)
- Speed calculation: Tinh toc do trung binh tren moi road segment trong window 2-5 phut
- Aggregation: Gom nhom theo road segment → tinh toc do trung binh, density
- Traffic level: So sanh toc do thuc te voi toc do gioi han → xac dinh traffic level
| Traffic Factor | Mau | Nghia |
|---|---|---|
| 1.0 - 1.3 | Xanh la | Thong thoang |
| 1.3 - 2.0 | Vang/Cam | Cham |
| 2.0 - 5.0 | Do | Ket xe |
| > 5.0 | Do dam | Ket nang |
Map-Matching — Bai toan khong he don gian
GPS coordinates co sai so (5-15m trong thanh pho, lon hon trong khu vuc nhieu toa nha cao tang). Bai toan: xac dinh user dang tren duong nao?
| Tinh huong kho | Van de |
|---|---|
| 2 duong song song cach 10m | GPS khong du chinh xac de phan biet |
| Duong tren cau va duong ben duoi | Cung toa do lat/lng nhung khac do cao |
| User o giao lo | Dang tren duong nao trong 4 duong? |
| GPS nhay (urban canyon) | Toa nha cao phan xa GPS signal |
Giai phap: Hidden Markov Model (HMM) — xem xet chuoi cac GPS points lien tiep, khong chi 1 point doc lap. Neu 10 points truoc deu tren duong A, thi point hien tai co xac suat cao cung tren duong A (du GPS nhay sang gan duong B).
3.7 Adaptive ETA — Tinh thoi gian den chinh xac
ETA (Estimated Time of Arrival) la metric quan trong nhat cua navigation. User danh gia Google Maps dua tren ETA co chinh xac khong.
3 nguon du lieu cho ETA
| Nguon | Mo ta | Uu diem | Nhuoc diem |
|---|---|---|---|
| Historical data | Trung binh thoi gian di qua road segment X vao thu 2, 8:00 AM | On dinh, phu hop patterns lap lai | Khong biet tai nan bat ngo |
| Real-time traffic | Toc do hien tai tren road segment X | Phan anh tinh hinh ngay luc nay | Chi biet hien tai, khong du doan tuong lai |
| ML Prediction | Model du doan traffic 15-30-60 phut toi | Du doan truoc — ETA tinh cho duong em se di qua trong tuong lai | Can data + compute lon |
ML Model cho ETA Prediction
| Feature | Mo ta |
|---|---|
| Time features | Gio, thu, thang, ngay le, ngay thuong |
| Current traffic | Toc do hien tai tren route va cac road segments xung quanh |
| Historical traffic | Trung binh toc do cua cung gio/thu/mua trong 12 thang qua |
| Weather | Mua, suong mu, bao → anh huong toc do |
| Events | Concert, tran bong da, ho tro → anh huong traffic |
| Road attributes | So lan, toc do gioi han, co traffic light khong |
| Segment sequence | Cac road segments noi tiep nhau tren route → model hoc dependencies |
Output: ETA prediction cho moi road segment tren route → cong lai = tong ETA.
Google bao cao ETA accuracy: sai lech trung binh khoang 5-8% so voi thuc te — cuc ky chinh xac cho mot he thong phuc vu hang ty requests.
ETA Update trong qua trinh navigate
Khi user dang lai xe, ETA duoc cap nhat lien tuc (moi 30-60 giay):
- He thong biet vi tri hien tai cua user (tu location updates)
- Tinh lai ETA cho phan duong con lai (khong phai toan bo route)
- Neu traffic thay doi dang ke → co the re-route (de xuat duong khac)
Capacity Estimation (Uoc luong dung luong)
Assumptions (Gia thiet)
| Thong so | Gia tri | Ghi chu |
|---|---|---|
| DAU | 1 Billion (1B) | Google Maps scale |
| Users dong thoi gui location updates | 10% DAU = 100M | Khong phai ai cung mo app cung luc |
| Location update interval | 30 giay/batch | Trung binh (navigate: 15s, idle: 30s) |
| Navigation requests/user/day | 2 | Trung binh (di lam + ve nha) |
| Map tile requests/user/session | 50 tiles | Scroll + zoom trung binh |
| Sessions/user/day | 3 | Mo app 3 lan/ngay |
| Peak multiplier | 3x | Gio cao diem sang + chieu |
Location Update QPS
Luu y: Con so 33M/s (1B users / 30s) la theoretical maximum neu tat ca user gui cung luc. Thuc te, chi 10% DAU active tai mot thoi diem → 10M peak.
Bandwidth cho location updates:
Con so 56 Gbps la rat lon — can nhieu server va Kafka cluster de xu ly.
Navigation QPS
Map Tile QPS
Nhung nho multi-layer caching (99.99% cache hit), chi ~500/s thuc su cham origin. CDN xu ly phan lon.
Storage Estimation
Map Tile Storage
Nhung khong phai tat ca tiles deu co data (70% Trai Dat la bien, sa mac, rung khong can tiles chi tiet):
Location History Storage
Road Graph Storage
Road graph chi 80 GB — du nho de load vao RAM cua mot cluster nho. Day la ly do routing co the rat nhanh.
Tong hop Storage
| Data | Size | Ghi chu |
|---|---|---|
| Map tiles (all zoom levels) | ~2.3 PB | Lon nhat, luu tren object storage + CDN |
| Location history (30 ngay) | ~430 TB | Time-series, TTL 30 ngay |
| Road graph + CH | ~80 GB | Nho, co the in-memory |
| Geocoding index | ~50 GB | Elasticsearch cluster |
| Traffic data (7 ngay) | ~10 TB | Time-series, TTL 7 ngay |
| Tong | ~2.8 PB | Map tiles chiem ~80% |
Security — Bao ve du lieu vi tri
4.1 Location Data Privacy — Van de nghiem trong nhat
Vi tri cua user la du lieu nhay cam bac nhat — biet vi tri la biet nha, cho lam viec, thoi quen di lai, moi quan he ca nhan.
| Moi de doa | Mo ta | Giai phap |
|---|---|---|
| Data breach | Hacker lay duoc location history cua user | Encryption at rest (AES-256), encryption in transit (TLS 1.3) |
| Internal abuse | Nhan vien truy cap data user | Access control nghiem ngat, audit log, least-privilege, data masking |
| Government request | Chinh phu yeu cau data vi tri | Legal team review, chi cung cap khi co lenh toa, minimization (chi cung cap data can thiet) |
| Third-party sharing | Ban data vi tri cho advertiser | Anonymization, aggregation (chi chia se aggregate traffic, khong chia se individual location) |
4.2 Anonymization of Traffic Data
Khi su dung GPS data de tinh traffic, can anonymize de khong the truy nguon ve individual user:
| Ky thuat | Mo ta |
|---|---|
| K-anonymity | Chi hien thi traffic data khi co it nhat K users tren road segment (thuong K >= 5) — khong the identify individual |
| Differential privacy | Them noise vao aggregate data — ket qua tong the van chinh xac nhung khong the suy ra individual |
| Data aggregation | Chi luu aggregate speed/density per road segment, khong luu “user X di qua diem Y luc Z” |
| Ephemeral IDs | Thay user_id bang random session ID, rotate moi session → khong lien ket duoc sessions |
| Geo-fencing sensitive areas | Khong thu thap chi tiet location gan benh vien, tru so ton giao, co so quan su |
4.3 Fake GPS Injection Prevention
Mot so user dung app gia lap GPS (mock location) de gian lan (vi du: tai xe Grab gia vi tri). Cach ngan chan:
| Ky thuat | Mo ta |
|---|---|
| Sensor fusion | So sanh GPS voi accelerometer, gyroscope, cell tower triangulation — neu GPS noi user dang chay 60 km/h nhung accelerometer khong rung → fake |
| Movement pattern analysis | GPS teleport (vi tri nhay dot ngot 10km) → flag suspicious |
| Speed consistency | Toc do thay doi qua nhanh (0 → 200 km/h trong 1 giay) → khong hop ly |
| Cell tower cross-reference | Vi tri GPS phai gan cell tower dang ket noi |
| Device attestation | Android SafetyNet / iOS DeviceCheck — xac minh device khong bi root/jailbreak |
4.4 Rate Limiting
| Endpoint | Rate Limit | Ly do |
|---|---|---|
| Location update | 10 requests/phut/user | Ngan spam location (1 batch moi 15-30s la du) |
| Navigation request | 30 requests/phut/user | Ngan automated scraping routes |
| Tile request | 1000 requests/phut/user | Ngan bot download toan bo map data |
| Geocoding | 60 requests/phut/user | Ngan batch geocoding (dich vu tra phi rieng) |
Tham khao Tuan-09-Rate-Limiter cho chi tiet implementation.
4.5 Data Retention Policies
| Du lieu | Retention | Ly do |
|---|---|---|
| Raw GPS location | 30 ngay | Chi can cho traffic analysis, sau do aggregate va xoa raw |
| Aggregated traffic | 2 nam | Historical patterns cho ML model |
| User search history | 90 ngay (hoac user xoa) | Personalization, GDPR right to delete |
| Navigation history | 90 ngay (hoac user xoa) | Trip history feature |
| Map tiles | Vinh vien (cap nhat theo version) | Core asset |
| Anonymized analytics | 5 nam | Business intelligence |
4.6 Compliance
| Regulation | Yeu cau | Anh huong |
|---|---|---|
| GDPR (EU) | Right to access, right to delete, data minimization | User co the download hoac xoa tat ca location data |
| CCPA (California) | Tuong tu GDPR cho California residents | Opt-out of data sale |
| LGPD (Brazil) | Tuong tu GDPR cho Brazil | Can consent ro rang |
| PDPA (Vietnam/Thailand) | Luat bao ve du lieu ca nhan | Can consent, data localization co the yeu cau |
DevOps — Monitoring & Observability
Tham khao Tuan-13-Monitoring-Observability cho nen tang. Duoi day la cac metrics dac thu cho Google Maps.
5.1 Location Ingestion Pipeline Monitoring
| Metric | Muc tieu | Alert khi |
|---|---|---|
| Kafka consumer lag | < 10 giay | > 30 giay (data cu qua, traffic khong real-time) |
| Location update throughput | 3-10M events/s | < 1M events/s (co van de o ingestion) hoac > 15M events/s (bat thuong, co the la attack) |
| Failed location updates | < 0.1% | > 1% |
| Map-matching accuracy | > 95% GPS points matched thanh cong | < 90% |
| Kafka partition skew | Even distribution | Mot partition co > 2x trung binh |
5.2 Routing Performance Monitoring
| Metric | Muc tieu | Alert khi |
|---|---|---|
| Route calculation latency (p50) | < 200ms | > 500ms |
| Route calculation latency (p99) | < 2s | > 5s |
| Routing success rate | > 99.9% | < 99.5% (khong tim duoc route) |
| Contraction Hierarchies freshness | < 24 gio tu lan rebuild cuoi | > 48 gio |
| Routing tile cache hit ratio | > 90% | < 80% |
| Alternative routes generated | >= 2 routes cho 95% requests | < 2 routes cho > 10% requests |
5.3 Tile CDN Monitoring
| Metric | Muc tieu | Alert khi |
|---|---|---|
| CDN cache hit ratio | > 99% | < 95% (cache invalidation sai?) |
| Tile latency (CDN edge, p50) | < 50ms | > 100ms |
| Tile latency (origin, p50) | < 200ms | > 500ms |
| 404 rate (missing tiles) | < 0.01% | > 0.1% (tile generation pipeline bi loi) |
| Tile freshness | < 7 ngay tu lan render cuoi | > 30 ngay cho khu vuc thanh pho |
| Bandwidth cost | Budget-based | > 120% budget |
5.4 ETA Accuracy Monitoring
| Metric | Muc tieu | Alert khi |
|---|---|---|
| ETA error (median) | < 8% | > 15% |
| ETA error (p95) | < 25% | > 40% |
| ETA under-estimate rate | < 30% | > 50% (user den tre → trai nghiem xau) |
| ML model staleness | < 24 gio tu lan retrain cuoi | > 48 gio |
5.5 Overall System Health Dashboard
| Category | Key Metrics |
|---|---|
| Availability | Service uptime per component, error rate (5xx), circuit breaker trips |
| Latency | p50/p95/p99 cho moi service (location, routing, tiles, geocoding) |
| Throughput | QPS cho moi service, Kafka throughput, CDN bandwidth |
| Saturation | CPU/Memory/Disk cho moi cluster, Kafka partition fill rate |
| Business metrics | DAU, navigation starts/completions, search queries, user ratings |
5.6 Alerting Hierarchy
| Level | Mo ta | Vi du | Response time |
|---|---|---|---|
| P0 — Critical | Core feature bi down | Navigation khong hoat dong, Kafka cluster down | < 5 phut |
| P1 — High | Performance degradation nghiem trong | Routing latency p99 > 10s, ETA sai > 30% | < 15 phut |
| P2 — Medium | Degradation nhe | CDN cache hit < 95%, location lag > 1 phut | < 1 gio |
| P3 — Low | Cosmetic hoac non-critical | Tile render pipeline cham, analytics delay | < 24 gio |
Mermaid Diagrams — Tong hop
Overall System Architecture (Chi tiet)
graph TB subgraph "Client Layer" Mobile["Mobile Client<br/>iOS / Android"] Web["Web Client<br/>Browser"] end subgraph "Edge Layer" CDN["Tile CDN<br/>Global Edge PoPs<br/>99%+ cache hit"] APIGW["API Gateway<br/>Auth · Rate Limit · Route"] end subgraph "Application Layer" direction TB LocSvc["Location Service<br/>Write: 10M/s peak"] NavSvc["Navigation Service<br/>Read: 70K/s peak"] MapSvc["Map Rendering<br/>Tile metadata"] GeoSvc["Geocoding Service<br/>Address ↔ LatLng"] TrafficSvc["Traffic Aggregator<br/>Stream processing"] ETASvc["ETA Service<br/>ML-powered"] end subgraph "Message Queue" Kafka["Apache Kafka<br/>Location Events<br/>Partitioned by geo-region"] end subgraph "Storage Layer" S3["Object Storage (S3)<br/>Map Tiles — 2.3 PB"] Cassandra["Cassandra<br/>Location History — 430 TB"] TSDB["Time-series DB<br/>Traffic Data — 10 TB"] Graph["Road Graph (In-memory)<br/>+ Contraction Hierarchies<br/>80 GB"] ES["Elasticsearch<br/>Geocoding Index<br/>50 GB"] Redis["Redis Cluster<br/>Real-time traffic cache"] end subgraph "ML Platform" ETAModel["ETA Prediction Model"] TrafficModel["Traffic Prediction Model"] FeatureStore["Feature Store"] end Mobile -->|"Tile requests"| CDN Web -->|"Tile requests"| CDN Mobile --> APIGW Web --> APIGW CDN -->|"Cache miss"| S3 APIGW --> LocSvc APIGW --> NavSvc APIGW --> GeoSvc LocSvc --> Kafka Kafka --> TrafficSvc Kafka --> Cassandra TrafficSvc --> TSDB TrafficSvc --> Redis TrafficSvc --> FeatureStore NavSvc --> Graph NavSvc --> ETASvc NavSvc --> GeoSvc ETASvc --> Redis ETASvc --> TSDB ETASvc --> ETAModel ETAModel --> FeatureStore TrafficModel --> FeatureStore GeoSvc --> ES MapSvc --> S3
Map Tile Pyramid (Visual)
graph TD subgraph "Tile Pyramid" L0["Level 0: 1 tile<br/>🌍 Toan cau<br/>~100 KB"] L5["Level 5: 1,024 tiles<br/>Cac quoc gia<br/>~10 MB total"] L10["Level 10: ~1M tiles<br/>Thanh pho<br/>~15 GB total"] L15["Level 15: ~1B tiles<br/>Duong pho<br/>~15 TB total"] L20["Level 20: ~1T tiles<br/>Chi tiet nha<br/>~2.3 PB total"] end L0 -->|"x4 moi level"| L5 L5 -->|"x1024"| L10 L10 -->|"x1024"| L15 L15 -->|"x1024"| L20 subgraph "Caching Strategy" Client["Client Cache<br/>~70-80% hit"] Edge["CDN Edge<br/>~97% hit"] Origin["Origin S3<br/>Source of truth"] end L20 -.->|"Served via"| Origin Origin -->|"Cached at"| Edge Edge -->|"Cached at"| Client
Location Update Pipeline (Chi tiet)
sequenceDiagram participant Device as Mobile Device participant LB as Load Balancer participant LS as Location Service participant K as Kafka participant TA as Traffic Aggregator participant HW as History Writer participant ML as ML Pipeline participant Redis as Redis (Traffic Cache) participant Cass as Cassandra (History) Note over Device: Thu thap GPS moi 1-3s Note over Device: Batch 15 data points Device->>LB: POST /location/batch (700 bytes) LB->>LS: Forward request LS->>LS: Validate + enrich data LS->>K: Publish to "location-events" topic LS-->>Device: 202 Accepted par Consumer Group 1 K->>TA: Consume location events TA->>TA: Map-match GPS → road segments TA->>TA: Calculate speed per segment TA->>Redis: Update real-time traffic and Consumer Group 2 K->>HW: Consume location events HW->>Cass: Write location history (TTL 30d) and Consumer Group 3 K->>ML: Consume for feature engineering ML->>ML: Update feature store end
Navigation Request Flow (Chi tiet)
sequenceDiagram participant Client as Mobile Client participant APIGW as API Gateway participant Nav as Navigation Service participant Geo as Geocoding Service participant Graph as Road Graph (Memory) participant ETA as ETA Service participant Traffic as Traffic Cache (Redis) participant MLModel as ML Model Client->>APIGW: GET /route?origin=A&dest=B&mode=driving APIGW->>APIGW: Auth + Rate Limit APIGW->>Nav: Forward request Nav->>Geo: Geocode "A" → (lat1, lng1) Nav->>Geo: Geocode "B" → (lat2, lng2) Geo-->>Nav: Coordinates Nav->>Graph: Find route via Contraction Hierarchies Graph-->>Nav: Primary route (list of road segments) Nav->>Graph: Find 2 alternative routes Graph-->>Nav: Alt routes loop For each route Nav->>ETA: Calculate ETA for route ETA->>Traffic: Get real-time traffic per segment Traffic-->>ETA: Traffic factors ETA->>MLModel: Predict segment travel times MLModel-->>ETA: Predicted times ETA->>ETA: Combine: historical + real-time + ML ETA-->>Nav: ETA + traffic-adjusted times end Nav->>Nav: Rank routes by ETA Nav->>Nav: Generate turn-by-turn instructions Nav-->>APIGW: 3 routes + ETAs + instructions APIGW-->>Client: Response (< 3s total)
Aha Moments & Pitfalls
Aha Moments — Nhung insight quan trong
| # | Insight | Giai thich |
|---|---|---|
| 1 | Pre-computation la chia khoa cua routing | Contraction Hierarchies mat vai gio de build offline, nhung moi query chi mat < 1ms. Day la trade-off kinh dien: offline compute ↔ online speed. Khong co pre-computation, Google Maps khong the phuc vu 70K routing requests/s. |
| 2 | Tile caching tiet kiem 99%+ compute | Nho multi-layer caching (client → CDN edge → origin shield → S3), chi 0.01% requests cham origin. Map rendering service thuc ra chi phuc vu pre-rendering pipeline, khong phai user requests truc tiep. |
| 3 | Location data volume la massive | 10M updates/s, 14 TB/ngay, 430 TB/thang — chi rieng location data. Can Kafka de buffer, Cassandra/TimescaleDB de luu, va TTL de tu dong xoa data cu. Khong the dung traditional RDBMS. |
| 4 | Real-time traffic doi hoi stream processing | Traffic data phai duoc xu ly trong giay, khong phai phut. Can Kafka + stream processing framework (Flink/Spark Streaming). Batch processing (chay moi 5 phut) la qua cham. |
| 5 | ETA accuracy can ML | Simple formula (khoang cach / toc do gioi han) cho ETA sai 30-50%. Can combine historical patterns + real-time traffic + ML prediction de dat sai so < 10%. |
| 6 | Road graph nho bat ngo | Toan bo road graph toan cau chi ~80 GB — du nho de load vao RAM. Day la ly do routing cuc nhanh: khong can disk I/O. |
| 7 | Map-matching khong don gian | GPS co sai so 5-15m. O thanh pho (nhieu toa nha), sai so con lon hon. Can HMM (Hidden Markov Model) de xac dinh user dang tren duong nao — chi GPS point don le la khong du. |
| 8 | Vector tiles la tuong lai | Nho hon 3-5x so voi raster, smooth zoom, dynamic styling (dark mode) — tat ca cac map providers lon deu chuyen sang vector tiles. Client can GPU nhung smartphones hien dai du manh. |
Common Pitfalls — Cac loi thuong gap khi thiet ke
| # | Pitfall | Tai sao sai | Cach dung |
|---|---|---|---|
| 1 | Tinh route on-the-fly bang Dijkstra | O(V log V) voi 500M nodes = vai giay/request. Khong scale voi 70K QPS | Dung Contraction Hierarchies (pre-computed), query < 1ms |
| 2 | Render tiles on-the-fly | 5M tile requests/s peak — khong server farm nao render kip | Pre-render offline + CDN caching |
| 3 | Gui GPS moi giay tu client | Ton battery, ton bandwidth, server bi overwhelm | Batch moi 15-30 giay, 15 data points/batch |
| 4 | Dung RDBMS cho location history | 14 TB/ngay writes, 10M writes/s — PostgreSQL khong chiu noi | Dung Cassandra hoac TimescaleDB (write-optimized) |
| 5 | Bo qua map-matching | GPS “227 Nguyen Van Cu” nhung thuc te user dang tren duong ben canh → traffic data sai | Dung HMM-based map-matching truoc khi xu ly traffic |
| 6 | Khong partition road graph | Load 80 GB graph moi lan routing request → cham, ton memory | Dung routing tiles — chi load tiles lien quan |
| 7 | ETA chi dung toc do hien tai | Traffic se thay doi trong 30 phut toi — route dai can du doan tuong lai | Combine historical + real-time + ML prediction |
| 8 | Luu raw GPS data vinh vien | 430 TB/thang → 5 PB/nam → khong the luu mai | TTL 30-90 ngay, aggregate roi xoa raw |
| 9 | Khong anonymize traffic data | Vi pham privacy — co the truy nguon user tu traffic patterns | K-anonymity, differential privacy, ephemeral IDs |
| 10 | Khong co offline support | User mat signal trong ham, o nong thon → ban do trang | Cho phep download tiles + routing data cho offline use |
So sanh voi cac bai khac da hoc
| Khai niem | O Google Maps | O bai da hoc |
|---|---|---|
| CDN | Tile CDN — 99%+ cache hit | Tuan-03-Networking-DNS-CDN — ly thuyet CDN |
| Message Queue | Kafka cho location ingestion — 10M events/s, fan-out 3 consumers | Tuan-08-Message-Queue — ly thuyet message queue |
| Stream Processing | Traffic aggregation real-time tu location events | Chua hoc chi tiet — nhung la extension cua Tuan-08-Message-Queue |
| Geo-spatial | Road graph, routing tiles, map tiles, geocoding | Case-Design-Proximity-Service — geohash, spatial index |
| Pre-computation | Contraction Hierarchies, tile pre-rendering | Khai niem moi — rat quan trong cho interview |
| Time-series data | Location history, traffic data | Tuong tu log data trong Tuan-13-Monitoring-Observability |
| Privacy | Location anonymization, GDPR compliance | Tuan-15-Data-Security-Encryption — encryption, data protection |
Cau hoi tu kiem tra
Level 1 — Recall (Nho lai)
- Google Maps co may zoom level? Moi level tang bao nhieu tiles so voi level truoc?
- Ba service chinh cua Google Maps la gi?
- Tai sao weight cua road graph la thoi gian, khong phai khoang cach?
- Contraction Hierarchies pre-compute gi?
- Tan suat gui location update khi user dang navigate la bao nhieu?
Level 2 — Understanding (Hieu)
- Giai thich tai sao vector tiles tot hon raster tiles cho mobile.
- Tai sao can multi-layer caching (client → CDN → origin) cho map tiles?
- Map-matching la gi va tai sao can dung HMM thay vi nearest-road?
- Tai sao ETA can ca 3 nguon (historical + real-time + ML) thay vi chi 1?
- Tai sao location history dung Cassandra thay vi PostgreSQL?
Level 3 — Application (Van dung)
- Neu CDN cache hit giam tu 99% xuong 90%, anh huong gi den origin? Tinh QPS moi o origin.
- Neu GPS accuracy giam (do urban canyon), he thong nao bi anh huong va cach xu ly?
- Thiet ke data retention policy cho mot ung dung ride-hailing (Grab) su dung he thong nay.
- Neu can ho tro “avoid toll roads” va “avoid highways”, thay doi gi trong routing?
- Tinh storage can thiet neu tang retention tu 30 ngay len 90 ngay cho location history.
Tong ket — What to Remember for Interview
1-minute Pitch (Neu chi co 1 phut)
“Google Maps gom 3 services chinh: Location Service (thu thap GPS tu users qua Kafka, 10M events/s), Map Rendering (pre-rendered tiles phuc vu qua CDN, 99%+ cache hit), va Navigation (road graph voi Contraction Hierarchies cho < 1ms query, overlay real-time traffic, ML-powered ETA). Key insight: pre-computation o ca map tiles lan routing la ly do he thong co the phuc vu 1B DAU.”
Checklist khi ve diagram
- Client → CDN (tiles) va Client → API Gateway (navigation, location)
- Location Service → Kafka → 3 consumers (traffic, history, ML)
- Navigation Service → Road Graph (in-memory) + ETA Service
- ETA Service → Traffic Cache (Redis) + ML Model
- Multi-layer caching cho tiles (Client → CDN → Origin Shield → S3)
- Geocoding Service (Elasticsearch) cho address ↔ coordinates
3 dieu interviewer muon nghe
- Pre-computation: “Contraction Hierarchies giup routing query < 1ms thay vi vai giay. Tiles duoc pre-render offline va cache o CDN.”
- Scale numbers: “10M location updates/s, 70K routing requests/s, 5M tile requests/s (nhung 99%+ served by CDN).”
- Real-time traffic: “User GPS data → Kafka → stream processing → real-time traffic overlay → adaptive ETA voi ML.”
Hieu, bai nay dai nhung moi phan deu quan trong. Google Maps la mot trong nhung he thong phuc tap nhat — no ket hop CDN, stream processing, graph algorithms, ML, va geo-spatial data. Hieu bai nay, em se co nen tang vung cho bat ky bai toan location-based nao trong interview.
Next: Case-Design-Proximity-Service — Proximity Service (geohash, quadtree) la nen tang cho location-based search ma Google Maps su dung.