Case Study: Design Google Maps

“Ngày xua, muon di tu Sai Gon ra Ha Noi, ong ba phai trai ban do giay ra san, dung ngon tay do theo quoc lo 1A. Hom nay, mot cai cham vao man hinh — Google Maps tinh cho em duong nhanh nhat trong 0.5 giay. Dang sau su ‘don gian’ do la mot trong nhung he thong phuc tap nhat the gioi.”

Tags: system-design case-study google-maps navigation geo alex-xu-vol2 Student: Hieu Prerequisite: Tuan-02-Back-of-the-envelope · Tuan-03-Networking-DNS-CDN · Tuan-08-Message-Queue Lien quan: Case-Design-Proximity-Service · Tuan-07-Database-Sharding-Replication · Tuan-06-Cache-Strategy · Tuan-13-Monitoring-Observability · Tuan-14-AuthN-AuthZ-Security · Tuan-15-Data-Security-Encryption Reference: Alex Xu, System Design Interview Volume 2 — Chapter 2: Nearby Friends / Google Maps


Context & Why — Tu ban do giay den ban do so thong minh

Analogy: Me cung thanh pho

Hieu, hay tuong tuong em dung giua mot me cung khong lo — me cung do chinh la mang luoi duong pho cua mot thanh pho. Ban do giay ngay xua giong nhu em cam mot tam anh chup tu tren cao, roi tu minh do duong bang mat. Van de la:

  • Ban do giay khong biet traffic — em khong biet duong nao dang ket xe
  • Ban do giay khong cap nhat — con duong moi xay xong thang truoc chua co tren ban do
  • Ban do giay khong tinh duoc ETA — em chi uoc luong bang cam giac
  • Ban do giay khong zoom duoc — muon xem chi tiet phai mua ban do ty le khac

Google Maps giai quyet tat ca nhung van de nay bang cach so hoa toan bo ban do the gioi, chia nho thanh hang ty tile nho, va su dung algorithm de tim duong nhanh nhat trong thoi gian thuc.

Tai sao bai nay quan trong?

Khia canhLy do
Geo-spatial dataDay la nền tang cho moi ung dung location-based (Grab, Uber, delivery apps)
Scale khong lo1 billion DAU, hang chuc trieu navigation requests/ngay
Real-time processingTraffic data thay doi moi giay, ETA phai cap nhat lien tuc
Pre-computationKhong the tinh routing on-the-fly cho moi request — can pre-compute
CDN & CachingMap tiles la bai toan CDN kinh dien — Tuan-03-Networking-DNS-CDN
Stream processingLocation updates tu hang ty device — Tuan-08-Message-Queue

Step 1 — Understand the Problem & Establish Design Scope

1.1 Clarifying Questions (Cau hoi lam ro)

Cau hoiTra loiGhi chu
He thong can ho tro nhung feature chinh nao?Map rendering, navigation (routing + ETA), location update3 pillars chinh
Scale bao lon?~1 Billion DAUQuy mo Google-level
Can ho tro real-time traffic?Co — traffic overlay tren map va ETA tinh toanStream processing
Map data tu dau?Satellite imagery, street-level photography, user contributions, third-party dataKhong thiet ke phan thu thap, chi dung data co san
Ho tro bao nhieu loai phuong tien?Xe hoi, xe may, di bo, xe dap, transit (bus/tau)Moi loai co routing algorithm rieng
Offline support?Co — download map tiles cho mot regionMobile-first concern
Turn-by-turn navigation?Co — real-time voice guidanceStreaming direction updates
Search (geocoding)?Co — tim dia diem bang ten, dia chiGeocoding + reverse geocoding

1.2 Functional Requirements (Yeu cau chuc nang)

  • FR1: Map Rendering — hien thi ban do tai bat ky vi tri nao tren the gioi, ho tro zoom in/out (21 zoom levels)
  • FR2: Navigation/Routing — tim duong di ngan nhat hoac nhanh nhat giua 2 diem (hoac nhieu diem), tinh ETA
  • FR3: Location Update — nhan va luu vi tri hien tai cua user (GPS coordinates) de phuc vu traffic va personalization
  • FR4: Geocoding — chuyen doi giua dia chi text va toa do (lat/lng) va nguoc lai
  • FR5: Traffic Overlay — hien thi tinh trang giao thong real-time tren ban do (xanh/vang/do)
  • FR6: Alternative Routes — de xuat 2-3 tuyen duong thay the voi ETA khac nhau
  • FR7: Turn-by-turn Navigation — huong dan chi tiet tung buoc trong qua trinh di chuyen

1.3 Non-Functional Requirements (Yeu cau phi chuc nang)

Yeu cauMuc tieuGiai thich
Availability99.99%Navigation la safety-critical — user dang lai xe tren cao toc
Latency (Map)Tile load < 200msUser ky vong ban do hien thi gan nhu tuc thi khi scroll/zoom
Latency (Routing)Route calculation < 3sBao gom tinh toan tren server + network round trip
Scalability1B DAU, 33M location updates/s peakXem phan Estimation ben duoi
Accuracy (ETA)Sai lech < 10% so voi thuc teETA sai = user mat tin tuong
Freshness (Traffic)Cap nhat moi 30-60 giayTraffic cu 5 phut la vo nghia
Offline SupportDownload map tiles cho offline useMobile users o vung yeu signal
Data Durability99.999999999% (eleven 9s)Map data la tai san core — mat la mat tat ca

Step 2 — High-Level Design

2.1 Ba service chinh

Google Maps co the chia thanh 3 service chinh (theo Alex Xu):

ServiceNhiem vuInputOutput
Location ServiceThu thap va luu vi tri cua userGPS coordinates tu mobile clientLocation history, real-time traffic data
Navigation ServiceTim duong va tinh ETAOrigin + Destination + preferencesRoute (list of coordinates), ETA, turn-by-turn instructions
Map Rendering ServicePhuc vu map tiles cho client hien thiTile coordinates (zoom level, x, y)Image tiles hoac vector tiles

Ngoai ra con co cac supporting services:

ServiceNhiem vu
Geocoding ServiceChuyen doi dia chi > toa do
Traffic ServiceXu ly va aggregation traffic data real-time
ETA ServiceTinh toan ETA dua tren historical + real-time + ML
Search/POI ServiceTim kiem dia diem (Points of Interest)

2.2 High-Level Architecture Diagram

graph TB
    subgraph Clients
        MobileApp["Mobile App<br/>(iOS/Android)"]
        WebApp["Web Browser"]
    end

    subgraph "API Gateway & Load Balancer"
        APIGW["API Gateway<br/>Rate Limiting, Auth, Routing"]
    end

    subgraph "Core Services"
        LocSvc["Location Service<br/>(Write-heavy)"]
        NavSvc["Navigation Service<br/>(Compute-heavy)"]
        MapSvc["Map Rendering Service<br/>(Read-heavy)"]
        GeoSvc["Geocoding Service"]
        TrafficSvc["Traffic Service"]
        ETASvc["ETA Service"]
        SearchSvc["Search/POI Service"]
    end

    subgraph "Data Layer"
        Kafka["Apache Kafka<br/>Location Stream"]
        LocDB["Location History DB<br/>(Cassandra/TimescaleDB)"]
        RoadGraph["Road Graph DB<br/>(Adjacency list, pre-computed)"]
        TileStorage["Tile Storage<br/>(Object Storage — S3)"]
        GeoDB["Geocoding DB<br/>(Elasticsearch)"]
        TrafficDB["Traffic DB<br/>(Time-series)"]
    end

    subgraph "CDN Layer"
        CDN["Tile CDN<br/>(CloudFront/Akamai)<br/>Global Edge PoPs"]
    end

    subgraph "ML Pipeline"
        MLPipeline["ML Model<br/>ETA Prediction<br/>Traffic Prediction"]
    end

    MobileApp --> APIGW
    WebApp --> APIGW
    MobileApp -->|"Tile requests"| CDN

    APIGW --> LocSvc
    APIGW --> NavSvc
    APIGW --> MapSvc
    APIGW --> GeoSvc
    APIGW --> SearchSvc

    LocSvc -->|"Publish location"| Kafka
    Kafka --> TrafficSvc
    Kafka --> LocDB
    TrafficSvc --> TrafficDB
    TrafficSvc --> MLPipeline

    NavSvc --> RoadGraph
    NavSvc --> ETASvc
    ETASvc --> TrafficDB
    ETASvc --> MLPipeline

    MapSvc --> TileStorage
    TileStorage --> CDN

    GeoSvc --> GeoDB
    SearchSvc --> GeoDB

2.3 Request Flow — 3 luong chinh

Luong 1: Map Rendering (User mo app, xem ban do)

  1. Client tinh toan can tiles nao dua tren viewport (vi tri + zoom level)
  2. Kiem tra local cache (mobile) hoac browser cache (web)
  3. Neu cache miss → request toi CDN
  4. Neu CDN cache miss → request toi Map Rendering Service → lay tu Tile Storage (S3)
  5. CDN cache tile → tra ve client
  6. Client render tiles ghep lai thanh ban do hoan chinh

Luong 2: Navigation (User bam “Chi duong”)

  1. Client gui origin (vi tri hien tai) + destination + preferences (xe hoi/di bo/…)
  2. API Gateway route toi Navigation Service
  3. Navigation Service goi Geocoding Service (neu user nhap dia chi text)
  4. Navigation Service query Road Graph DB de tim route
  5. Navigation Service goi ETA Service de tinh thoi gian cho moi route
  6. ETA Service lay traffic data real-time + historical patterns + ML prediction
  7. Tra ve 2-3 routes voi ETA, khoang cach, turn-by-turn instructions

Luong 3: Location Update (Background, lien tuc)

  1. Mobile client thu thap GPS coordinates (moi 1-3 giay khi navigating, moi 15-30 giay khi idle)
  2. Client batch nhieu data points lai → gui 1 request moi 15-30 giay (tiet kiem battery + bandwidth)
  3. Location Service nhan batch → publish len Kafka
  4. Kafka consumers: (a) Luu vao Location History DB, (b) Feed vao Traffic Service de tinh real-time traffic

Step 3 — Deep Dive

3.1 Map Tiles — Chia ban do thanh mien ghep hinh

Khai niem Tile Pyramid

Hieu, hay tuong tuong em co mot buc anh ve tinh chup toan bo Trai Dat. Buc anh nay co do phan giai cuc ky lon — neu in ra se to bang mot san bong da. Hien thi nguyen buc anh nay tren dien thoai la bat kha thi (qua nang, qua lon). Giai phap: cat nho buc anh thanh nhieu manh vuong (tiles), va chi load nhung manh ma user dang nhin thay.

21 Zoom Levels (tu 0 den 20):

Zoom LevelSo tilesMoi tile bao phuSu dung
01 tile (1x1)Toan bo the gioiOverview toan cau
14 tiles (2x2)1/4 the gioiNhin cac luc dia
216 tiles (4x4)1/16 the gioiNhin cac quoc gia lon
364 tiles (8x8)Nhin nhom quoc gia
10~1 trieu tiles~150m x 150mNhin thanh pho
15~1 ty tiles~5m x 5mNhin khu pho, duong
20~1 ngan ty tiles~0.15m x 0.15mNhin chi tiet nha, via he

Cong thuc tinh so tiles tai moi zoom level:

Ví du:

Tile Pyramid Diagram

graph TD
    subgraph "Tile Pyramid — Zoom Levels"
        Z0["Zoom 0<br/>1 tile<br/>Toan bo the gioi"]
        Z1["Zoom 1<br/>4 tiles (2x2)<br/>Cac luc dia"]
        Z2["Zoom 2<br/>16 tiles (4x4)<br/>Cac quoc gia"]
        Z10["Zoom 10<br/>~1M tiles<br/>Thanh pho"]
        Z15["Zoom 15<br/>~1B tiles<br/>Duong pho"]
        Z20["Zoom 20<br/>~1T tiles<br/>Chi tiet nha"]
    end

    Z0 -->|"Moi tile chia thanh 4"| Z1
    Z1 -->|"Moi tile chia thanh 4"| Z2
    Z2 -->|"..."| Z10
    Z10 -->|"..."| Z15
    Z15 -->|"..."| Z20

    style Z0 fill:#e1f5fe
    style Z1 fill:#b3e5fc
    style Z2 fill:#81d4fa
    style Z10 fill:#4fc3f7
    style Z15 fill:#29b6f6
    style Z20 fill:#0288d1

Vector Tiles vs Raster Tiles

Dac diemRaster TilesVector Tiles
FormatPNG/JPEG imageProtocol Buffers (binary data)
Kich thuoc20-100 KB/tile5-30 KB/tile (nho hon 3-5x)
RenderingServer render san → client chi hien thiClient render tu data (GPU-accelerated)
ZoomMo khi zoom giua 2 level (pixelated)Muot ma o moi muc zoom (smooth)
CustomizationKhong — anh da render co dinhCo — client tu style (dark mode, highlight roads, etc.)
Offline sizeLonNho hon nhieu
CPU clientThap (chi hien anh)Cao hon (phai render)
BandwidthLon honTiet kiem hon

Xu huong hien tai: Google Maps, Apple Maps, Mapbox deu chuyen sang vector tiles. Ly do chinh: bandwidth tiet kiem + smooth zoom + dynamic styling (dark mode khi lai xe ban dem).

Tai sao vector tiles thang? Hieu, hay nghi the nay: raster tile giong nhu em chup anh ban do roi gui qua. Vector tile giong nhu em gui data (duong nay o dau, nha nay hinh gi, ten duong la gi) roi de dien thoai tu ve. Gui data nhe hon gui anh, va dien thoai ngay nay du manh de ve rat nhanh.

Tile Addressing — Cach xac dinh tile nao can load

Moi tile duoc xac dinh boi 3 tham so: (zoom, x, y)

  • zoom: zoom level (0-20)
  • x: column index (0 den )
  • y: row index (0 den )

URL pattern:

https://tiles.maps.example.com/{zoom}/{x}/{y}.pbf

Ví du: https://tiles.maps.example.com/15/26177/15123.pbf — tile o zoom 15, cot 26177, hang 15123.

Client tinh toan tiles can hien thi dua tren:

  1. Viewport (man hinh dang nhin thay pham vi nao)
  2. Zoom level hien tai
  3. Pre-fetch tiles xung quanh viewport (anticipatory loading)

3.2 Map Rendering — Tu tile den ban do tren man hinh

Tile CDN Architecture

Map tiles la static content (it thay doi — chi cap nhat khi co road moi hoac building moi). Day la bai toan CDN kinh dien — tham khao Tuan-03-Networking-DNS-CDN.

Multi-layer caching strategy:

LayerNoi luuCache DurationHit Ratio (uoc tinh)
L1: Client CacheMobile app local storage / Browser cache30 ngay (tiles it thay doi)~70-80%
L2: CDN EdgeCloudFront/Akamai edge PoPs7-30 ngay~95-99% cua requests con lai
L3: CDN Origin ShieldRegional origin cache30 ngay~99% cua requests con lai
L4: Origin (S3)Object storagePermanent100% (source of truth)

Key insight: Voi 4 layers nay, chi khoang 0.01-0.1% tong request thuc su cham toi origin storage. Day la ly do tai sao map rendering co the phuc vu hang ty request/ngay ma chi can vai chuc origin servers.

Tinh toan cache hit ratio tong the:

Chi 7.5 trong 100,000 requests thuc su phai lay tile tu origin. Day la suc manh cua multi-layer caching.

Progressive Loading

Khi user zoom tu level 10 xuong level 15, ban do khong doi tuc thi — no progressive load:

  1. Hien thi tiles cu (level 10) dang co trong cache — upscale (phong to, hoi mo)
  2. Request tiles moi (level 15) tu CDN
  3. Khi tiles moi ve → fade-in thay the tiles cu → ban do sac net dan

Ky thuat nay goi la “show stale, fetch fresh” — user luon nhin thay gi do thay vi man hinh trang, tao cam giac instant response.

Tile Pre-rendering Pipeline

Map tiles duoc pre-render offline (khong render on-the-fly khi user request):

graph LR
    RawData["Raw Map Data<br/>(Roads, Buildings,<br/>POIs, Terrain)"]
    Pipeline["Tile Rendering Pipeline<br/>(Batch processing)"]
    TileStore["Tile Storage<br/>(S3 / GCS)"]
    CDN["CDN<br/>Global Distribution"]
    Client["Client<br/>Mobile / Web"]

    RawData -->|"Nightly batch +<br/>incremental updates"| Pipeline
    Pipeline -->|"Pre-rendered tiles<br/>21 zoom levels"| TileStore
    TileStore -->|"Origin"| CDN
    CDN -->|"Edge cache"| Client

Khi nao re-render tiles?

  • Scheduled: Moi tuan hoac moi thang — cap nhat roads moi, buildings moi
  • Incremental: Chi re-render tiles bi anh huong boi thay doi (khong render lai toan bo)
  • Priority: Khu vuc dong dan (thanh pho lon) re-render thuong xuyen hon khu vuc hoang vu

3.3 Location Service — Thu thap vi tri tu hang ty device

Bai toan

Moi giay, hang trieu device gui vi tri cua minh (latitude, longitude, timestamp, speed, heading) ve server. Day la write-heavy workload voi volume cuc lon.

Batching Strategy

Gui GPS coordinates moi giay la lang phi (ton battery, ton bandwidth, server khong can do chi tiet). Thay vao do:

Trang thai userTan suat guiLy do
Dang navigate (lai xe)Batch moi 15 giay (gom 15 data points)Can do chinh xac cao de cap nhat traffic
App mo nhung khong navigateBatch moi 30 giay (gom 5-10 data points)Tiet kiem battery, van cap nhat traffic
App chay backgroundBatch moi 5 phut hoac khong guiToi uu battery
App dongKhong guiKhong can

Payload cua mot batch:

FieldTypeSizeMo ta
user_idstring16 bytesAnonymized ID
timestamps[]int64 array8 bytes x NUnix timestamp moi data point
latitudes[]double array8 bytes x NVi do
longitudes[]double array8 bytes x NKinh do
speeds[]float array4 bytes x NToc do (m/s)
headings[]float array4 bytes x NHuong di chuyen (0-360 do)
accuracyfloat4 bytesDo chinh xac GPS (met)
transport_modeenum1 byteCar, bike, walk, etc.

Voi N = 15 data points, payload khoang 700 bytes — rat nhe.

Ingestion Pipeline

graph LR
    subgraph "Millions of Devices"
        D1["Device 1"]
        D2["Device 2"]
        DN["Device N"]
    end

    subgraph "Ingestion Layer"
        LB["Load Balancer"]
        LS1["Location Service<br/>Instance 1"]
        LS2["Location Service<br/>Instance 2"]
        LSN["Location Service<br/>Instance N"]
    end

    subgraph "Message Queue"
        K["Apache Kafka<br/>Partitioned by<br/>geo-region"]
    end

    subgraph "Consumers"
        C1["Traffic Aggregator<br/>(Real-time)"]
        C2["Location History<br/>Writer (Batch)"]
        C3["ML Feature<br/>Pipeline"]
    end

    subgraph "Storage"
        TSDB["Location History<br/>Cassandra / TimescaleDB"]
        TrafficStore["Traffic Store<br/>Redis + Time-series DB"]
    end

    D1 --> LB
    D2 --> LB
    DN --> LB
    LB --> LS1
    LB --> LS2
    LB --> LSN
    LS1 --> K
    LS2 --> K
    LSN --> K
    K --> C1
    K --> C2
    K --> C3
    C1 --> TrafficStore
    C2 --> TSDB

Tai sao dung Kafka? (Tham khao Tuan-08-Message-Queue)

Ly doGiai thich
DecoupleLocation Service chi can ghi vao Kafka — khong can biet ai se doc
BufferNeu Traffic Aggregator bi cham, Kafka giu data — khong mat
Multiple consumersCung mot location event duoc doc boi Traffic Aggregator, History Writer, ML Pipeline — fan-out
PartitioningPartition theo geo-region → cac consumer xu ly song song
ThroughputKafka xu ly duoc hang trieu events/giay tren mot cluster — phu hop voi 33M updates/s
DurabilityData duoc replicate, khong mat khi node chet

Location History Storage

Lua chon database: Cassandra hoac TimescaleDB

Tieu chiCassandraTimescaleDB
Data modelWide-columnRelational (PostgreSQL extension)
Write throughputCuc cao (hang trieu writes/s)Cao (tram nghin writes/s)
Time-series queryCan thiet ke schema can thanBuilt-in time-series functions
CompressionTot (50-70%)Rat tot (90%+ voi columnar compression)
ScaleHorizontal (them node)Horizontal (distributed hypertables)

Schema (Cassandra):

ColumnTypeMo ta
user_idUUIDPartition key — phan tan data theo user
timestampTIMESTAMPClustering key — sap xep theo thoi gian
latDOUBLEVi do
lngDOUBLEKinh do
speedFLOATToc do
headingFLOATHuong
accuracyFLOATDo chinh xac GPS

Partition key = user_id → tat ca location history cua 1 user nam tren cung partition → query “lich su di chuyen cua user X trong 7 ngay qua” chi can doc 1 partition.

TTL (Time-to-Live): Location history chi giu 30-90 ngay → tu dong xoa data cu → tiet kiem storage.


3.4 Navigation/Routing — Tim duong trong me cung

Day la trai tim cua Google Maps — va cung la phan phuc tap nhat.

Road Network la mot Weighted Graph

Hieu, toan bo mang luoi duong giao thong co the bieu dien nhu mot do thi co trong so (weighted graph):

Khai niem do thiTuong ung trong ban do
Node (dinh)Giao lo (intersection), diem uon cua duong
Edge (canh)Doan duong noi 2 giao lo
Weight (trong so)Thoi gian di chuyen (tinh bang giay/phut), KHONG phai khoang cach

Tai sao weight la thoi gian, khong phai khoang cach? Vi user muon den nhanh nhat, khong phai ngan nhat. Duong cao toc 50km co the nhanh hon duong tat 10km neu duong tat ket xe.

Quy mo cua road graph:

Thong soGia tri uoc tinh
So node (intersections) toan cau~500 trieu
So edge (road segments) toan cau~1 ty
Average edges per node~4 (moi giao lo thuong co 3-5 nhanh)
Data size (adjacency list)~100 GB (compressed)

Tien hoa cua Routing Algorithm

Giai doan 1: Dijkstra’s Algorithm

  • Tim duong ngan nhat tu 1 diem den tat ca cac diem khac
  • Do phuc tap: voi priority queue
  • Van de: Voi 500M nodes, Dijkstra duyet qua hang trieu nodes truoc khi tim duoc duong — qua cham cho real-time (co the mat vai giay)

Giai doan 2: A Algorithm*

  • Cai tien Dijkstra bang heuristic — uu tien duyet nodes theo huong ve dich
  • Heuristic: Khoang cach duong chim bay (haversine distance) tu node hien tai den dich
  • Nhanh hon Dijkstra ~2-5x vi loai bot nhieu nodes khong can thiet
  • Van de: Van cham khi route dai (Sai Gon → Ha Noi) vi graph qua lon

Giai doan 3: Contraction Hierarchies (CH) — Pre-computation la chia khoa

Day la breakthrough giup Google Maps tim duong trong milliseconds thay vi seconds.

Y tuong cot loi: Pre-compute “shortcuts” trong graph.

BuocMo taAnalogy
1. Xep hang nodesMoi node duoc gan mot “importance level” — giao lo lon quan trong hon giao lo nhoQuoc lo quan trong hon hem
2. Contract nodesBat dau tu node it quan trong nhat, “loai bo” no va them shortcut edges noi cac node lan canLoai bo cac hem nho, them shortcut “tu giao lo A den giao lo C, mat 5 phut”
3. Xay hierarchySau khi contract het, ta co mot hierarchy graph — level thap la duong nho, level cao la duong lonGiong nhu ban do chi co quoc lo + cao toc
4. QueryTim duong = search tu origin len hierarchy + search tu destination len hierarchy → gap nhau o giua”Lat xe ra duong lon gan nhat → di duong lon → re vao duong nho gan dich”

Hieu qua:

Aha Moment: Pre-computation la ly do Google Maps tra loi route gan nhu tuc thi. Viec contract graph mat vai gio de tinh (chay offline), nhung moi query chi mat < 1 millisecond. Trade-off kinh dien: offline compute time ↔ online query speed.

Nhung pre-computation co van de: Khi traffic thay doi (ket xe), edge weights thay doi → pre-computed shortcuts khong con chinh xac. Giai phap: Routing Tiles (xem ben duoi).

Routing Tiles — Partition Road Graph

Thay vi luu toan bo road graph trong 1 cuc, Google chia graph thanh nhieu routing tiles (tuong tu map tiles nhung cho road data):

LevelTile bao phuChua giSu dung
Level 0 (Local)~1km x 1kmDuong nho, hem, ngoRouting trong pham vi khu pho
Level 1 (Regional)~10km x 10kmDuong chinh, tinh loRouting trong thanh pho
Level 2 (Country)~100km x 100kmQuoc lo, cao tocRouting lien tinh
Level 3 (Global)~1000km x 1000kmCao toc quoc teRouting quoc te

Routing algorithm voi tiles:

  1. Origin nam trong tile A (local level) → load tile A
  2. Destination nam trong tile B (local level) → load tile B
  3. Tim duong trong tile A tu origin den boundary cua tile
  4. Tim duong giua boundaries bang higher-level tiles (regional/country)
  5. Tim duong trong tile B tu boundary den destination
  6. Ghep 3 doan lai → route hoan chinh

Loi ich:

  • Chi can load vai routing tiles thay vi toan bo graph → tiet kiem memory
  • Co the cache routing tiles tai CDN edge
  • Co the cap nhat traffic cho tung tile doc lap (khong can re-compute toan bo)

Routing Algorithm Flow

graph TD
    Start["User request:<br/>Origin → Destination"]
    Geocode["Geocode dia chi<br/>(neu can)"]
    FindTiles["Xac dinh routing tiles<br/>chua Origin va Destination"]
    LoadTiles["Load routing tiles<br/>(local + regional + country)"]
    LocalO["Tim duong trong<br/>local tile cua Origin"]
    Regional["Tim duong qua<br/>regional/country tiles"]
    LocalD["Tim duong trong<br/>local tile cua Destination"]
    Merge["Ghep 3 doan<br/>thanh route hoan chinh"]
    Traffic["Overlay real-time<br/>traffic data"]
    ETA["Tinh ETA cho route"]
    AltRoutes["Tinh 2-3 alternative routes"]
    Response["Tra ve routes + ETA<br/>+ turn-by-turn instructions"]

    Start --> Geocode
    Geocode --> FindTiles
    FindTiles --> LoadTiles
    LoadTiles --> LocalO
    LoadTiles --> Regional
    LoadTiles --> LocalD
    LocalO --> Merge
    Regional --> Merge
    LocalD --> Merge
    Merge --> Traffic
    Traffic --> ETA
    ETA --> AltRoutes
    AltRoutes --> Response

Real-time Traffic Overlay

Pre-computed routes dua tren static weights (toc do gioi han cua duong). Nhung thuc te, traffic thay doi moi phut. Giai phap:

  1. Pre-compute routes voi static weights (Contraction Hierarchies)
  2. Overlay real-time traffic data len route:
    • Moi road segment co mot traffic factor (1.0 = thong thoang, 2.0 = cham gap doi, 5.0 = ket nang)
    • Nhan static travel time voi traffic factor → adjusted travel time
  1. Neu traffic lam route hien tai cham hon alternative → re-route (tinh lai route)

Alternative Routes

Google Maps thuong hien thi 2-3 routes. Cach tao alternative routes:

Phuong phapMo ta
Penalty methodSau khi tim route tot nhat, tang weight cua cac edges trong route do, roi tim lai → duoc route khac
Plateau methodTim cac doan duong “plateau” (doan ma route tot nhat va alternative deu di qua) → buoc route alternative di duong khac o phan con lai
k-shortest pathsTim k duong ngan nhat trong graph (thuong k = 3)

Moi alternative route duoc loc de dam bao:

  • Khac biet it nhat 30% so voi route chinh (khong muon 2 routes gan giong nhau)
  • ETA khong qua 50% lon hon route nhanh nhat (khong de xuat duong qua cham)
  • Khong di qua nhung doan nguy hiem hoac bi cam

3.5 Geocoding & Reverse Geocoding

Geocoding: Dia chi → Toa do

InputOutputVi du
Dia chi textLatitude, Longitude”227 Nguyen Van Cu, Q5, HCM” → (10.7594, 106.6815)

Cach hoat dong:

  1. Parsing: Tach dia chi thanh cac thanh phan (so nha, ten duong, quan, thanh pho)
  2. Standardization: Chuan hoa viet tat (“Q5” → “Quan 5”, “HCM” → “Ho Chi Minh”)
  3. Lookup: Tim trong geocoding database (Elasticsearch voi geo-index)
  4. Interpolation: Neu dia chi chinh xac khong co trong DB → noi suy vi tri giua 2 dia chi da biet

Address Interpolation (Noi suy dia chi):

Neu DB co:

  • So 221 Nguyen Van Cu → toa do A
  • So 231 Nguyen Van Cu → toa do B

Thi so 227 ≈ vi tri (227 - 221) / (231 - 221) = 60% tu A den B

Reverse Geocoding: Toa do → Dia chi

InputOutputVi du
Latitude, LongitudeDia chi text(10.7594, 106.6815) → “227 Nguyen Van Cu, Quan 5, TP.HCM”

Cach hoat dong:

  1. Tim road segment gan nhat voi toa do (nearest-neighbor search bang spatial index — R-tree hoac geohash)
  2. Project diem len road segment → tim vi tri chinh xac tren duong
  3. Interpolation nguoc → uoc tinh so nha
  4. Tra ve dia chi day du

Database: Elasticsearch voi geo_point va geo_shape types, hoac PostgreSQL + PostGIS extension.


3.6 Traffic Data — Mach mau cua ban do

Nguon du lieu traffic

NguonMo taDo tin cayLatency
User GPS dataVi tri + toc do cua hang trieu user dang di tren duongCao (so luong lon → trung binh chinh xac)Real-time (30-60s delay)
Historical patternsDu lieu giao thong lich su theo gio/ngay/muaCao cho patterns lap laiN/A (offline)
Government sensorsCamera giao thong, vong tu (induction loops)Rat cao nhung coverage thapNear real-time
Third-party dataData tu taxi companies, ride-hailing, logisticsTrung binhNear real-time
EventsTai nan, sua duong, su kien lon, thoi tietBien dongReported manually hoac ML-detected

Real-time Traffic Processing Pipeline

  1. Ingest: Nhan location updates tu Kafka (33M updates/s peak)
  2. Map-match: Snap GPS coordinates len road segments (GPS co sai so 5-15m → can xac dinh user dang tren duong nao)
  3. Speed calculation: Tinh toc do trung binh tren moi road segment trong window 2-5 phut
  4. Aggregation: Gom nhom theo road segment → tinh toc do trung binh, density
  5. Traffic level: So sanh toc do thuc te voi toc do gioi han → xac dinh traffic level
Traffic FactorMauNghia
1.0 - 1.3Xanh laThong thoang
1.3 - 2.0Vang/CamCham
2.0 - 5.0DoKet xe
> 5.0Do damKet nang

Map-Matching — Bai toan khong he don gian

GPS coordinates co sai so (5-15m trong thanh pho, lon hon trong khu vuc nhieu toa nha cao tang). Bai toan: xac dinh user dang tren duong nao?

Tinh huong khoVan de
2 duong song song cach 10mGPS khong du chinh xac de phan biet
Duong tren cau va duong ben duoiCung toa do lat/lng nhung khac do cao
User o giao loDang tren duong nao trong 4 duong?
GPS nhay (urban canyon)Toa nha cao phan xa GPS signal

Giai phap: Hidden Markov Model (HMM) — xem xet chuoi cac GPS points lien tiep, khong chi 1 point doc lap. Neu 10 points truoc deu tren duong A, thi point hien tai co xac suat cao cung tren duong A (du GPS nhay sang gan duong B).


3.7 Adaptive ETA — Tinh thoi gian den chinh xac

ETA (Estimated Time of Arrival) la metric quan trong nhat cua navigation. User danh gia Google Maps dua tren ETA co chinh xac khong.

3 nguon du lieu cho ETA

NguonMo taUu diemNhuoc diem
Historical dataTrung binh thoi gian di qua road segment X vao thu 2, 8:00 AMOn dinh, phu hop patterns lap laiKhong biet tai nan bat ngo
Real-time trafficToc do hien tai tren road segment XPhan anh tinh hinh ngay luc nayChi biet hien tai, khong du doan tuong lai
ML PredictionModel du doan traffic 15-30-60 phut toiDu doan truoc — ETA tinh cho duong em se di qua trong tuong laiCan data + compute lon

ML Model cho ETA Prediction

FeatureMo ta
Time featuresGio, thu, thang, ngay le, ngay thuong
Current trafficToc do hien tai tren route va cac road segments xung quanh
Historical trafficTrung binh toc do cua cung gio/thu/mua trong 12 thang qua
WeatherMua, suong mu, bao → anh huong toc do
EventsConcert, tran bong da, ho tro → anh huong traffic
Road attributesSo lan, toc do gioi han, co traffic light khong
Segment sequenceCac road segments noi tiep nhau tren route → model hoc dependencies

Output: ETA prediction cho moi road segment tren route → cong lai = tong ETA.

Google bao cao ETA accuracy: sai lech trung binh khoang 5-8% so voi thuc te — cuc ky chinh xac cho mot he thong phuc vu hang ty requests.

ETA Update trong qua trinh navigate

Khi user dang lai xe, ETA duoc cap nhat lien tuc (moi 30-60 giay):

  1. He thong biet vi tri hien tai cua user (tu location updates)
  2. Tinh lai ETA cho phan duong con lai (khong phai toan bo route)
  3. Neu traffic thay doi dang ke → co the re-route (de xuat duong khac)

Capacity Estimation (Uoc luong dung luong)

Assumptions (Gia thiet)

Thong soGia triGhi chu
DAU1 Billion (1B)Google Maps scale
Users dong thoi gui location updates10% DAU = 100MKhong phai ai cung mo app cung luc
Location update interval30 giay/batchTrung binh (navigate: 15s, idle: 30s)
Navigation requests/user/day2Trung binh (di lam + ve nha)
Map tile requests/user/session50 tilesScroll + zoom trung binh
Sessions/user/day3Mo app 3 lan/ngay
Peak multiplier3xGio cao diem sang + chieu

Location Update QPS

Luu y: Con so 33M/s (1B users / 30s) la theoretical maximum neu tat ca user gui cung luc. Thuc te, chi 10% DAU active tai mot thoi diem → 10M peak.

Bandwidth cho location updates:

Con so 56 Gbps la rat lon — can nhieu server va Kafka cluster de xu ly.

Map Tile QPS

Nhung nho multi-layer caching (99.99% cache hit), chi ~500/s thuc su cham origin. CDN xu ly phan lon.

Storage Estimation

Map Tile Storage

Nhung khong phai tat ca tiles deu co data (70% Trai Dat la bien, sa mac, rung khong can tiles chi tiet):

Location History Storage

Road Graph Storage

Road graph chi 80 GB — du nho de load vao RAM cua mot cluster nho. Day la ly do routing co the rat nhanh.

Tong hop Storage

DataSizeGhi chu
Map tiles (all zoom levels)~2.3 PBLon nhat, luu tren object storage + CDN
Location history (30 ngay)~430 TBTime-series, TTL 30 ngay
Road graph + CH~80 GBNho, co the in-memory
Geocoding index~50 GBElasticsearch cluster
Traffic data (7 ngay)~10 TBTime-series, TTL 7 ngay
Tong~2.8 PBMap tiles chiem ~80%

Security — Bao ve du lieu vi tri

4.1 Location Data Privacy — Van de nghiem trong nhat

Vi tri cua user la du lieu nhay cam bac nhat — biet vi tri la biet nha, cho lam viec, thoi quen di lai, moi quan he ca nhan.

Moi de doaMo taGiai phap
Data breachHacker lay duoc location history cua userEncryption at rest (AES-256), encryption in transit (TLS 1.3)
Internal abuseNhan vien truy cap data userAccess control nghiem ngat, audit log, least-privilege, data masking
Government requestChinh phu yeu cau data vi triLegal team review, chi cung cap khi co lenh toa, minimization (chi cung cap data can thiet)
Third-party sharingBan data vi tri cho advertiserAnonymization, aggregation (chi chia se aggregate traffic, khong chia se individual location)

4.2 Anonymization of Traffic Data

Khi su dung GPS data de tinh traffic, can anonymize de khong the truy nguon ve individual user:

Ky thuatMo ta
K-anonymityChi hien thi traffic data khi co it nhat K users tren road segment (thuong K >= 5) — khong the identify individual
Differential privacyThem noise vao aggregate data — ket qua tong the van chinh xac nhung khong the suy ra individual
Data aggregationChi luu aggregate speed/density per road segment, khong luu “user X di qua diem Y luc Z”
Ephemeral IDsThay user_id bang random session ID, rotate moi session → khong lien ket duoc sessions
Geo-fencing sensitive areasKhong thu thap chi tiet location gan benh vien, tru so ton giao, co so quan su

4.3 Fake GPS Injection Prevention

Mot so user dung app gia lap GPS (mock location) de gian lan (vi du: tai xe Grab gia vi tri). Cach ngan chan:

Ky thuatMo ta
Sensor fusionSo sanh GPS voi accelerometer, gyroscope, cell tower triangulation — neu GPS noi user dang chay 60 km/h nhung accelerometer khong rung → fake
Movement pattern analysisGPS teleport (vi tri nhay dot ngot 10km) → flag suspicious
Speed consistencyToc do thay doi qua nhanh (0 → 200 km/h trong 1 giay) → khong hop ly
Cell tower cross-referenceVi tri GPS phai gan cell tower dang ket noi
Device attestationAndroid SafetyNet / iOS DeviceCheck — xac minh device khong bi root/jailbreak

4.4 Rate Limiting

EndpointRate LimitLy do
Location update10 requests/phut/userNgan spam location (1 batch moi 15-30s la du)
Navigation request30 requests/phut/userNgan automated scraping routes
Tile request1000 requests/phut/userNgan bot download toan bo map data
Geocoding60 requests/phut/userNgan batch geocoding (dich vu tra phi rieng)

Tham khao Tuan-09-Rate-Limiter cho chi tiet implementation.

4.5 Data Retention Policies

Du lieuRetentionLy do
Raw GPS location30 ngayChi can cho traffic analysis, sau do aggregate va xoa raw
Aggregated traffic2 namHistorical patterns cho ML model
User search history90 ngay (hoac user xoa)Personalization, GDPR right to delete
Navigation history90 ngay (hoac user xoa)Trip history feature
Map tilesVinh vien (cap nhat theo version)Core asset
Anonymized analytics5 namBusiness intelligence

4.6 Compliance

RegulationYeu cauAnh huong
GDPR (EU)Right to access, right to delete, data minimizationUser co the download hoac xoa tat ca location data
CCPA (California)Tuong tu GDPR cho California residentsOpt-out of data sale
LGPD (Brazil)Tuong tu GDPR cho BrazilCan consent ro rang
PDPA (Vietnam/Thailand)Luat bao ve du lieu ca nhanCan consent, data localization co the yeu cau

DevOps — Monitoring & Observability

Tham khao Tuan-13-Monitoring-Observability cho nen tang. Duoi day la cac metrics dac thu cho Google Maps.

5.1 Location Ingestion Pipeline Monitoring

MetricMuc tieuAlert khi
Kafka consumer lag< 10 giay> 30 giay (data cu qua, traffic khong real-time)
Location update throughput3-10M events/s< 1M events/s (co van de o ingestion) hoac > 15M events/s (bat thuong, co the la attack)
Failed location updates< 0.1%> 1%
Map-matching accuracy> 95% GPS points matched thanh cong< 90%
Kafka partition skewEven distributionMot partition co > 2x trung binh

5.2 Routing Performance Monitoring

MetricMuc tieuAlert khi
Route calculation latency (p50)< 200ms> 500ms
Route calculation latency (p99)< 2s> 5s
Routing success rate> 99.9%< 99.5% (khong tim duoc route)
Contraction Hierarchies freshness< 24 gio tu lan rebuild cuoi> 48 gio
Routing tile cache hit ratio> 90%< 80%
Alternative routes generated>= 2 routes cho 95% requests< 2 routes cho > 10% requests

5.3 Tile CDN Monitoring

MetricMuc tieuAlert khi
CDN cache hit ratio> 99%< 95% (cache invalidation sai?)
Tile latency (CDN edge, p50)< 50ms> 100ms
Tile latency (origin, p50)< 200ms> 500ms
404 rate (missing tiles)< 0.01%> 0.1% (tile generation pipeline bi loi)
Tile freshness< 7 ngay tu lan render cuoi> 30 ngay cho khu vuc thanh pho
Bandwidth costBudget-based> 120% budget

5.4 ETA Accuracy Monitoring

MetricMuc tieuAlert khi
ETA error (median)< 8%> 15%
ETA error (p95)< 25%> 40%
ETA under-estimate rate< 30%> 50% (user den tre → trai nghiem xau)
ML model staleness< 24 gio tu lan retrain cuoi> 48 gio

5.5 Overall System Health Dashboard

CategoryKey Metrics
AvailabilityService uptime per component, error rate (5xx), circuit breaker trips
Latencyp50/p95/p99 cho moi service (location, routing, tiles, geocoding)
ThroughputQPS cho moi service, Kafka throughput, CDN bandwidth
SaturationCPU/Memory/Disk cho moi cluster, Kafka partition fill rate
Business metricsDAU, navigation starts/completions, search queries, user ratings

5.6 Alerting Hierarchy

LevelMo taVi duResponse time
P0 — CriticalCore feature bi downNavigation khong hoat dong, Kafka cluster down< 5 phut
P1 — HighPerformance degradation nghiem trongRouting latency p99 > 10s, ETA sai > 30%< 15 phut
P2 — MediumDegradation nheCDN cache hit < 95%, location lag > 1 phut< 1 gio
P3 — LowCosmetic hoac non-criticalTile render pipeline cham, analytics delay< 24 gio

Mermaid Diagrams — Tong hop

Overall System Architecture (Chi tiet)

graph TB
    subgraph "Client Layer"
        Mobile["Mobile Client<br/>iOS / Android"]
        Web["Web Client<br/>Browser"]
    end

    subgraph "Edge Layer"
        CDN["Tile CDN<br/>Global Edge PoPs<br/>99%+ cache hit"]
        APIGW["API Gateway<br/>Auth · Rate Limit · Route"]
    end

    subgraph "Application Layer"
        direction TB
        LocSvc["Location Service<br/>Write: 10M/s peak"]
        NavSvc["Navigation Service<br/>Read: 70K/s peak"]
        MapSvc["Map Rendering<br/>Tile metadata"]
        GeoSvc["Geocoding Service<br/>Address ↔ LatLng"]
        TrafficSvc["Traffic Aggregator<br/>Stream processing"]
        ETASvc["ETA Service<br/>ML-powered"]
    end

    subgraph "Message Queue"
        Kafka["Apache Kafka<br/>Location Events<br/>Partitioned by geo-region"]
    end

    subgraph "Storage Layer"
        S3["Object Storage (S3)<br/>Map Tiles — 2.3 PB"]
        Cassandra["Cassandra<br/>Location History — 430 TB"]
        TSDB["Time-series DB<br/>Traffic Data — 10 TB"]
        Graph["Road Graph (In-memory)<br/>+ Contraction Hierarchies<br/>80 GB"]
        ES["Elasticsearch<br/>Geocoding Index<br/>50 GB"]
        Redis["Redis Cluster<br/>Real-time traffic cache"]
    end

    subgraph "ML Platform"
        ETAModel["ETA Prediction Model"]
        TrafficModel["Traffic Prediction Model"]
        FeatureStore["Feature Store"]
    end

    Mobile -->|"Tile requests"| CDN
    Web -->|"Tile requests"| CDN
    Mobile --> APIGW
    Web --> APIGW
    CDN -->|"Cache miss"| S3

    APIGW --> LocSvc
    APIGW --> NavSvc
    APIGW --> GeoSvc

    LocSvc --> Kafka
    Kafka --> TrafficSvc
    Kafka --> Cassandra

    TrafficSvc --> TSDB
    TrafficSvc --> Redis
    TrafficSvc --> FeatureStore

    NavSvc --> Graph
    NavSvc --> ETASvc
    NavSvc --> GeoSvc

    ETASvc --> Redis
    ETASvc --> TSDB
    ETASvc --> ETAModel

    ETAModel --> FeatureStore
    TrafficModel --> FeatureStore

    GeoSvc --> ES
    MapSvc --> S3

Map Tile Pyramid (Visual)

graph TD
    subgraph "Tile Pyramid"
        L0["Level 0: 1 tile<br/>🌍 Toan cau<br/>~100 KB"]
        L5["Level 5: 1,024 tiles<br/>Cac quoc gia<br/>~10 MB total"]
        L10["Level 10: ~1M tiles<br/>Thanh pho<br/>~15 GB total"]
        L15["Level 15: ~1B tiles<br/>Duong pho<br/>~15 TB total"]
        L20["Level 20: ~1T tiles<br/>Chi tiet nha<br/>~2.3 PB total"]
    end

    L0 -->|"x4 moi level"| L5
    L5 -->|"x1024"| L10
    L10 -->|"x1024"| L15
    L15 -->|"x1024"| L20

    subgraph "Caching Strategy"
        Client["Client Cache<br/>~70-80% hit"]
        Edge["CDN Edge<br/>~97% hit"]
        Origin["Origin S3<br/>Source of truth"]
    end

    L20 -.->|"Served via"| Origin
    Origin -->|"Cached at"| Edge
    Edge -->|"Cached at"| Client

Location Update Pipeline (Chi tiet)

sequenceDiagram
    participant Device as Mobile Device
    participant LB as Load Balancer
    participant LS as Location Service
    participant K as Kafka
    participant TA as Traffic Aggregator
    participant HW as History Writer
    participant ML as ML Pipeline
    participant Redis as Redis (Traffic Cache)
    participant Cass as Cassandra (History)

    Note over Device: Thu thap GPS moi 1-3s
    Note over Device: Batch 15 data points

    Device->>LB: POST /location/batch (700 bytes)
    LB->>LS: Forward request
    LS->>LS: Validate + enrich data
    LS->>K: Publish to "location-events" topic
    LS-->>Device: 202 Accepted

    par Consumer Group 1
        K->>TA: Consume location events
        TA->>TA: Map-match GPS → road segments
        TA->>TA: Calculate speed per segment
        TA->>Redis: Update real-time traffic
    and Consumer Group 2
        K->>HW: Consume location events
        HW->>Cass: Write location history (TTL 30d)
    and Consumer Group 3
        K->>ML: Consume for feature engineering
        ML->>ML: Update feature store
    end
sequenceDiagram
    participant Client as Mobile Client
    participant APIGW as API Gateway
    participant Nav as Navigation Service
    participant Geo as Geocoding Service
    participant Graph as Road Graph (Memory)
    participant ETA as ETA Service
    participant Traffic as Traffic Cache (Redis)
    participant MLModel as ML Model

    Client->>APIGW: GET /route?origin=A&dest=B&mode=driving
    APIGW->>APIGW: Auth + Rate Limit
    APIGW->>Nav: Forward request

    Nav->>Geo: Geocode "A" → (lat1, lng1)
    Nav->>Geo: Geocode "B" → (lat2, lng2)
    Geo-->>Nav: Coordinates

    Nav->>Graph: Find route via Contraction Hierarchies
    Graph-->>Nav: Primary route (list of road segments)

    Nav->>Graph: Find 2 alternative routes
    Graph-->>Nav: Alt routes

    loop For each route
        Nav->>ETA: Calculate ETA for route
        ETA->>Traffic: Get real-time traffic per segment
        Traffic-->>ETA: Traffic factors
        ETA->>MLModel: Predict segment travel times
        MLModel-->>ETA: Predicted times
        ETA->>ETA: Combine: historical + real-time + ML
        ETA-->>Nav: ETA + traffic-adjusted times
    end

    Nav->>Nav: Rank routes by ETA
    Nav->>Nav: Generate turn-by-turn instructions
    Nav-->>APIGW: 3 routes + ETAs + instructions
    APIGW-->>Client: Response (< 3s total)

Aha Moments & Pitfalls

Aha Moments — Nhung insight quan trong

#InsightGiai thich
1Pre-computation la chia khoa cua routingContraction Hierarchies mat vai gio de build offline, nhung moi query chi mat < 1ms. Day la trade-off kinh dien: offline compute ↔ online speed. Khong co pre-computation, Google Maps khong the phuc vu 70K routing requests/s.
2Tile caching tiet kiem 99%+ computeNho multi-layer caching (client → CDN edge → origin shield → S3), chi 0.01% requests cham origin. Map rendering service thuc ra chi phuc vu pre-rendering pipeline, khong phai user requests truc tiep.
3Location data volume la massive10M updates/s, 14 TB/ngay, 430 TB/thang — chi rieng location data. Can Kafka de buffer, Cassandra/TimescaleDB de luu, va TTL de tu dong xoa data cu. Khong the dung traditional RDBMS.
4Real-time traffic doi hoi stream processingTraffic data phai duoc xu ly trong giay, khong phai phut. Can Kafka + stream processing framework (Flink/Spark Streaming). Batch processing (chay moi 5 phut) la qua cham.
5ETA accuracy can MLSimple formula (khoang cach / toc do gioi han) cho ETA sai 30-50%. Can combine historical patterns + real-time traffic + ML prediction de dat sai so < 10%.
6Road graph nho bat ngoToan bo road graph toan cau chi ~80 GB — du nho de load vao RAM. Day la ly do routing cuc nhanh: khong can disk I/O.
7Map-matching khong don gianGPS co sai so 5-15m. O thanh pho (nhieu toa nha), sai so con lon hon. Can HMM (Hidden Markov Model) de xac dinh user dang tren duong nao — chi GPS point don le la khong du.
8Vector tiles la tuong laiNho hon 3-5x so voi raster, smooth zoom, dynamic styling (dark mode) — tat ca cac map providers lon deu chuyen sang vector tiles. Client can GPU nhung smartphones hien dai du manh.

Common Pitfalls — Cac loi thuong gap khi thiet ke

#PitfallTai sao saiCach dung
1Tinh route on-the-fly bang DijkstraO(V log V) voi 500M nodes = vai giay/request. Khong scale voi 70K QPSDung Contraction Hierarchies (pre-computed), query < 1ms
2Render tiles on-the-fly5M tile requests/s peak — khong server farm nao render kipPre-render offline + CDN caching
3Gui GPS moi giay tu clientTon battery, ton bandwidth, server bi overwhelmBatch moi 15-30 giay, 15 data points/batch
4Dung RDBMS cho location history14 TB/ngay writes, 10M writes/s — PostgreSQL khong chiu noiDung Cassandra hoac TimescaleDB (write-optimized)
5Bo qua map-matchingGPS “227 Nguyen Van Cu” nhung thuc te user dang tren duong ben canh → traffic data saiDung HMM-based map-matching truoc khi xu ly traffic
6Khong partition road graphLoad 80 GB graph moi lan routing request → cham, ton memoryDung routing tiles — chi load tiles lien quan
7ETA chi dung toc do hien taiTraffic se thay doi trong 30 phut toi — route dai can du doan tuong laiCombine historical + real-time + ML prediction
8Luu raw GPS data vinh vien430 TB/thang → 5 PB/nam → khong the luu maiTTL 30-90 ngay, aggregate roi xoa raw
9Khong anonymize traffic dataVi pham privacy — co the truy nguon user tu traffic patternsK-anonymity, differential privacy, ephemeral IDs
10Khong co offline supportUser mat signal trong ham, o nong thon → ban do trangCho phep download tiles + routing data cho offline use

So sanh voi cac bai khac da hoc

Khai niemO Google MapsO bai da hoc
CDNTile CDN — 99%+ cache hitTuan-03-Networking-DNS-CDN — ly thuyet CDN
Message QueueKafka cho location ingestion — 10M events/s, fan-out 3 consumersTuan-08-Message-Queue — ly thuyet message queue
Stream ProcessingTraffic aggregation real-time tu location eventsChua hoc chi tiet — nhung la extension cua Tuan-08-Message-Queue
Geo-spatialRoad graph, routing tiles, map tiles, geocodingCase-Design-Proximity-Service — geohash, spatial index
Pre-computationContraction Hierarchies, tile pre-renderingKhai niem moi — rat quan trong cho interview
Time-series dataLocation history, traffic dataTuong tu log data trong Tuan-13-Monitoring-Observability
PrivacyLocation anonymization, GDPR complianceTuan-15-Data-Security-Encryption — encryption, data protection

Cau hoi tu kiem tra

Level 1 — Recall (Nho lai)

  1. Google Maps co may zoom level? Moi level tang bao nhieu tiles so voi level truoc?
  2. Ba service chinh cua Google Maps la gi?
  3. Tai sao weight cua road graph la thoi gian, khong phai khoang cach?
  4. Contraction Hierarchies pre-compute gi?
  5. Tan suat gui location update khi user dang navigate la bao nhieu?

Level 2 — Understanding (Hieu)

  1. Giai thich tai sao vector tiles tot hon raster tiles cho mobile.
  2. Tai sao can multi-layer caching (client → CDN → origin) cho map tiles?
  3. Map-matching la gi va tai sao can dung HMM thay vi nearest-road?
  4. Tai sao ETA can ca 3 nguon (historical + real-time + ML) thay vi chi 1?
  5. Tai sao location history dung Cassandra thay vi PostgreSQL?

Level 3 — Application (Van dung)

  1. Neu CDN cache hit giam tu 99% xuong 90%, anh huong gi den origin? Tinh QPS moi o origin.
  2. Neu GPS accuracy giam (do urban canyon), he thong nao bi anh huong va cach xu ly?
  3. Thiet ke data retention policy cho mot ung dung ride-hailing (Grab) su dung he thong nay.
  4. Neu can ho tro “avoid toll roads” va “avoid highways”, thay doi gi trong routing?
  5. Tinh storage can thiet neu tang retention tu 30 ngay len 90 ngay cho location history.

Tong ket — What to Remember for Interview

1-minute Pitch (Neu chi co 1 phut)

“Google Maps gom 3 services chinh: Location Service (thu thap GPS tu users qua Kafka, 10M events/s), Map Rendering (pre-rendered tiles phuc vu qua CDN, 99%+ cache hit), va Navigation (road graph voi Contraction Hierarchies cho < 1ms query, overlay real-time traffic, ML-powered ETA). Key insight: pre-computation o ca map tiles lan routing la ly do he thong co the phuc vu 1B DAU.”

Checklist khi ve diagram

  • Client → CDN (tiles) va Client → API Gateway (navigation, location)
  • Location Service → Kafka → 3 consumers (traffic, history, ML)
  • Navigation Service → Road Graph (in-memory) + ETA Service
  • ETA Service → Traffic Cache (Redis) + ML Model
  • Multi-layer caching cho tiles (Client → CDN → Origin Shield → S3)
  • Geocoding Service (Elasticsearch) cho address ↔ coordinates

3 dieu interviewer muon nghe

  1. Pre-computation: “Contraction Hierarchies giup routing query < 1ms thay vi vai giay. Tiles duoc pre-render offline va cache o CDN.”
  2. Scale numbers: “10M location updates/s, 70K routing requests/s, 5M tile requests/s (nhung 99%+ served by CDN).”
  3. Real-time traffic: “User GPS data → Kafka → stream processing → real-time traffic overlay → adaptive ETA voi ML.”

Hieu, bai nay dai nhung moi phan deu quan trong. Google Maps la mot trong nhung he thong phuc tap nhat — no ket hop CDN, stream processing, graph algorithms, ML, va geo-spatial data. Hieu bai nay, em se co nen tang vung cho bat ky bai toan location-based nao trong interview.

Next: Case-Design-Proximity-Service — Proximity Service (geohash, quadtree) la nen tang cho location-based search ma Google Maps su dung.