Case Study 01 — Data Layer cho E-commerce
“Design data layer cho e-commerce: 1M products, 10M users, 1K orders/sec đỉnh, inventory concurrency, recommendations, search. Apply mọi thứ đã học.”
Tags: case-study ecommerce database-design Prerequisites: Tất cả 16 weekly chapters Liên quan: Tuan-02-Schema-Design-Normalization · Tuan-05-Transactions-Locking · Tuan-15-Vector-DB-AI
1. Requirements
Functional
- User signup, login, profile
- Product catalog: 1M SKUs, attributes by category
- Search: keyword + filter + sort
- Cart, checkout, order
- Payment integration (Stripe)
- Inventory: real-time stock
- Reviews & ratings
- Recommendations
- Admin (product mgmt, order mgmt)
- Analytics dashboard
Non-functional
- 10M users, 1M products, 100M orders/year
- Peak 1K orders/sec (Black Friday)
- Search latency P99 < 200ms
- Product page P99 < 100ms
- Order placement P99 < 500ms
- Inventory accurate (no oversell)
- Multi-region (US, EU)
- Compliance: GDPR, PCI-DSS
2. Data Stores Choice
graph TB Client[Web/Mobile] --> LB[Load Balancer] LB --> App[App Servers] App --> PG[(Postgres<br/>Source of truth)] App --> Redis[(Redis<br/>cache, session, rate limit)] App --> ES[(Elasticsearch<br/>search)] App --> Vector[(pgvector<br/>recommendations)] App --> S3[(S3<br/>product images)] PG -.CDC.-> Kafka[(Kafka)] Kafka --> ES Kafka --> Redis Kafka --> CH[(ClickHouse<br/>analytics)] App --> Stripe[Stripe API<br/>payments] style PG fill:#c8e6c9 style Redis fill:#fff9c4 style ES fill:#fff9c4 style CH fill:#fff9c4
Justifications:
- Postgres: ACID for orders, payments. Single source of truth. Use partitioning + read replica for scale.
- Redis: Cache, session, rate limit, cart (in-memory ~30 min TTL)
- Elasticsearch: Full-text search + filter aggregations (1M products)
- pgvector: Product recommendation via embedding (since data already in Postgres)
- ClickHouse: Analytics (revenue, top products, funnel)
- Kafka + Debezium: Sync from Postgres to other stores
3. Postgres Schema
3.1 Core entities
-- Users
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email citext NOT NULL,
name text NOT NULL,
created_at timestamptz DEFAULT now(),
deleted_at timestamptz
);
CREATE UNIQUE INDEX idx_users_email_active ON users(email) WHERE deleted_at IS NULL;
-- Addresses, payment methods (similar pattern)
-- Products
CREATE TABLE products (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
sku text UNIQUE NOT NULL,
name text NOT NULL,
description text,
category_id bigint REFERENCES categories(id),
price_cents bigint NOT NULL CHECK (price_cents >= 0),
currency char(3) NOT NULL DEFAULT 'USD',
status text CHECK (status IN ('draft', 'active', 'archived')),
attributes jsonb DEFAULT '{}',
tags text[],
embedding vector(1536), -- for similarity
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now()
);
CREATE INDEX idx_products_category ON products(category_id) WHERE status = 'active';
CREATE INDEX idx_products_tags ON products USING gin(tags);
CREATE INDEX idx_products_embedding ON products USING hnsw(embedding vector_cosine_ops);
-- Inventory (separate for concurrency)
CREATE TABLE inventory (
product_id bigint PRIMARY KEY REFERENCES products(id),
quantity int NOT NULL CHECK (quantity >= 0),
reserved int NOT NULL DEFAULT 0 CHECK (reserved >= 0),
updated_at timestamptz DEFAULT now()
);
-- Orders (partitioned by month)
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY,
user_id bigint NOT NULL REFERENCES users(id),
status text NOT NULL CHECK (status IN ('pending', 'paid', 'shipping', 'delivered', 'cancelled', 'refunded')),
currency char(3) NOT NULL,
subtotal_cents bigint NOT NULL,
tax_cents bigint NOT NULL,
shipping_cents bigint NOT NULL,
total_cents bigint GENERATED ALWAYS AS (subtotal_cents + tax_cents + shipping_cents) STORED,
payment_intent_id text,
placed_at timestamptz NOT NULL DEFAULT now(),
paid_at timestamptz,
PRIMARY KEY (placed_at, id) -- composite for partitioning
) PARTITION BY RANGE (placed_at);
CREATE TABLE orders_2026_q2 PARTITION OF orders FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
-- create future partitions via pg_partman
CREATE INDEX idx_orders_user_placed ON orders (user_id, placed_at DESC);
CREATE INDEX idx_orders_status_placed ON orders (status, placed_at);
-- Order items with snapshot
CREATE TABLE order_items (
id bigint GENERATED ALWAYS AS IDENTITY,
order_placed_at timestamptz NOT NULL,
order_id bigint NOT NULL,
product_id bigint NOT NULL REFERENCES products(id),
qty int NOT NULL CHECK (qty > 0),
product_name_snapshot text NOT NULL,
product_sku_snapshot text NOT NULL,
unit_price_cents bigint NOT NULL,
line_total_cents bigint GENERATED ALWAYS AS (unit_price_cents * qty) STORED,
PRIMARY KEY (order_placed_at, id),
FOREIGN KEY (order_placed_at, order_id) REFERENCES orders(placed_at, id) ON DELETE CASCADE
) PARTITION BY RANGE (order_placed_at);
-- Reviews
CREATE TABLE reviews (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
product_id bigint REFERENCES products(id),
user_id bigint REFERENCES users(id),
rating smallint CHECK (rating BETWEEN 1 AND 5),
title text,
body text,
created_at timestamptz DEFAULT now(),
UNIQUE (product_id, user_id)
);
CREATE INDEX idx_reviews_product ON reviews (product_id, created_at DESC);
-- Outbox for CDC
CREATE TABLE outbox (
id bigserial PRIMARY KEY,
aggregate_type text,
aggregate_id text,
event_type text,
payload jsonb,
created_at timestamptz DEFAULT now()
);3.2 Key decisions
| Decision | Justification |
|---|---|
bigint identity PK | OLTP single-region writes, no UUID overhead |
citext email | Case-insensitive uniqueness |
price_cents bigint | Exact arithmetic |
Generated total_cents | Auto-compute |
| Orders partitioned by month | 100M rows/year manageable |
| Snapshot price in order_items | Immutable order history |
embedding in products | Recommendation lookups |
outbox table | Atomic event publishing |
4. Inventory Concurrency
4.1 The challenge
100 users try to buy last 1 unit simultaneously. Need exactly 1 succeeds.
4.2 Approach: Atomic decrement + check
-- Reserve at cart checkout
UPDATE inventory
SET reserved = reserved + 1
WHERE product_id = $1 AND (quantity - reserved) >= 1
RETURNING quantity - reserved AS remaining;
-- If 0 rows → out of stockUPDATE ... WHERE atomic. No explicit lock needed.
4.3 Reservation timeout
Reservation expires if cart abandoned:
CREATE TABLE inventory_reservations (
id bigserial PRIMARY KEY,
product_id bigint REFERENCES products(id),
user_id bigint,
qty int,
expires_at timestamptz NOT NULL
);
-- Cron job: release expired
DELETE FROM inventory_reservations WHERE expires_at < now() RETURNING product_id, qty;
-- Update inventory.reserved accordingly4.4 On checkout payment success
BEGIN;
-- Convert reservation to actual decrement
UPDATE inventory SET quantity = quantity - 1, reserved = reserved - 1
WHERE product_id = $1;
DELETE FROM inventory_reservations WHERE id = $2;
-- Insert order, order_items
-- Insert outbox event
COMMIT;4.5 Hot product (1M views/sec for trending item)
- Read inventory from Redis cache, refresh every 1s
- Write goes to Postgres
- Eventually consistent for “remaining: 5” display
- Strong consistency for actual checkout (Postgres atomic UPDATE)
5. Cart Management
5.1 Where to store cart?
Options:
- A. Redis hash:
user:42:cart - B. Postgres
cartstable - C. Postgres for logged-in, Redis for anonymous
Recommended: Redis for active carts (30-day TTL), Postgres for “saved for later”.
HSET cart:user:42 product:101 2 product:102 1
EXPIRE cart:user:42 2592000 # 30 daysPros:
- Fast (5ms)
- Auto-expire (no cleanup)
- Cart updates atomic
Cons:
- Lost if Redis goes down (acceptable trade)
- Migration to checkout = read Redis, write Postgres order
5.2 Cart → Order
async def checkout(user_id):
cart = redis.hgetall(f"cart:user:{user_id}")
if not cart: raise EmptyCart
# Reserve inventory for each item
reservations = []
for product_id, qty in cart.items():
reserved = await reserve_inventory(product_id, int(qty))
if not reserved:
# Rollback prior reservations
for r in reservations: await release(r)
raise OutOfStock(product_id)
reservations.append((product_id, qty))
# Create payment intent at Stripe
intent = stripe.PaymentIntent.create(...)
# Save pending order
async with db.transaction():
order_id = await create_order(user_id, intent.id, items)
await create_outbox_event("OrderPlaced", order_id, ...)
return order_id, intent.client_secretAfter Stripe webhook confirms payment:
async def stripe_webhook(event):
if event.type == 'payment_intent.succeeded':
async with db.transaction():
await update_order_status(event.data.order_id, 'paid')
await commit_inventory(reservations)
await create_outbox_event("OrderPaid", order_id)
# Clear cart
redis.delete(f"cart:user:{user_id}")6. Search Architecture
6.1 Elasticsearch sync
graph LR PG[(Postgres products)] -.Debezium CDC.-> Kafka[(Kafka)] Kafka --> Sync[ES sync service] Sync --> ES[(Elasticsearch)]
ES index document:
{
"id": 42,
"sku": "...",
"name": "iPhone 15 Pro",
"description": "...",
"category_path": ["electronics", "phones", "iphone"],
"price_cents": 99900,
"currency": "USD",
"tags": ["new", "premium"],
"attributes": {"color": "blue", "storage": "256GB"},
"rating_avg": 4.7,
"rating_count": 1234,
"stock_status": "in_stock",
"search_vector": "iphone pro premium phone"
}6.2 Search query
POST /products/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "iphone case",
"fields": ["name^3", "description", "tags^2"],
"type": "best_fields"
}
},
"filter": [
{"term": {"category_path": "phones"}},
{"range": {"price_cents": {"gte": 1000, "lte": 100000}}},
{"term": {"stock_status": "in_stock"}}
]
}
},
"aggs": {
"categories": {"terms": {"field": "category_path", "size": 10}},
"price_ranges": {"range": {"field": "price_cents", "ranges": [{"to": 5000}, {"from": 5000, "to": 50000}, {"from": 50000}]}}
},
"sort": [{"_score": "desc"}, {"rating_avg": "desc"}]
}6.3 Search-as-you-type
ES completion suggester for autocomplete. Sub-50ms response.
7. Product Recommendations
7.1 “Similar products” via pgvector
Generate embedding for each product (name + description + category) using OpenAI or BGE.
-- On product create/update, compute embedding and store
UPDATE products SET embedding = $1 WHERE id = $2;
-- Find similar
SELECT id, name, price_cents
FROM products
WHERE status = 'active' AND id != $1
ORDER BY embedding <=> (SELECT embedding FROM products WHERE id = $1)
LIMIT 10;7.2 Collaborative filtering
For “users who bought X also bought Y”:
-- Daily batch job, store in materialized view
CREATE MATERIALIZED VIEW product_co_purchases AS
SELECT
oi1.product_id AS product_a,
oi2.product_id AS product_b,
count(*) AS co_purchase_count
FROM order_items oi1
JOIN order_items oi2 USING (order_placed_at, order_id)
WHERE oi1.product_id < oi2.product_id
GROUP BY oi1.product_id, oi2.product_id
HAVING count(*) > 5;
-- Query (cached in Redis with 1-day TTL)
SELECT product_b FROM product_co_purchases WHERE product_a = $1 ORDER BY co_purchase_count DESC LIMIT 10;7.3 Personalized recommendations
User embedding from history → vector search → personalized products.
async def personalized_recs(user_id):
user_history = get_recent_purchases(user_id) # last 20
user_embedding = np.mean([p.embedding for p in user_history], axis=0)
return vector_search(user_embedding, exclude=user_history.product_ids, top_k=20)Cache in Redis 30 min TTL.
8. Order Lifecycle
stateDiagram-v2 [*] --> Pending: place order Pending --> Paid: Stripe success Pending --> Cancelled: timeout or user cancel Paid --> Shipping: fulfillment Shipping --> Delivered: shipping done Delivered --> [*] Paid --> Refunded: refund request
Each state transition:
- Updates
orders.status - Writes outbox event
- CDC propagates to ES (for “my orders” search), to analytics
9. Analytics Pipeline
graph LR PG[(Postgres orders)] -.Debezium.-> Kafka App[(App events)] -.kafka producer.-> Kafka Kafka --> CH[(ClickHouse)] CH -.materialized.-> CHAgg[(Hourly/Daily aggs)] CHAgg --> BI[Grafana/Metabase]
ClickHouse tables:
CREATE TABLE events (
ts DateTime,
event_type LowCardinality(String),
user_id UInt64,
product_id UInt64,
order_id UInt64,
revenue_cents UInt64,
payload String
) ENGINE = MergeTree() PARTITION BY toYYYYMM(ts) ORDER BY (ts, event_type);
CREATE MATERIALIZED VIEW daily_revenue
ENGINE = SummingMergeTree() ORDER BY (date, category) AS
SELECT toDate(ts) AS date, payload['category'] AS category, sum(revenue_cents) AS revenue
FROM events
WHERE event_type = 'order_paid'
GROUP BY date, category;Dashboards: revenue by day/category, top products, funnel (view → cart → purchase), AOV (average order value).
10. Caching Strategy
10.1 Cache layers
graph TB Client[Client] --> CDN[CDN<br/>static images, product pages SSG] CDN --> App[App server] App --> AppCache[App-local cache<br/>5min TTL] App --> Redis[(Redis<br/>distributed cache)] Redis --> PG[(Postgres)]
10.2 What to cache where
| Data | Cache layer | TTL |
|---|---|---|
| Product page (rendered) | CDN | 5 min |
| Product details | Redis | 5 min |
| User profile | Redis | 30 min |
| Cart | Redis | 30 days (sticky) |
| Inventory display | Redis | 1 sec |
| Search results | Redis | 30 sec |
| Recommendations | Redis | 30 min |
| Aggregated counts (reviews) | Postgres (denormalized) | n/a |
10.3 Cache invalidation
Via CDC events:
- Product updated → invalidate
product:42cache - Review added → recompute product.rating_avg → invalidate product cache
# CDC consumer
def on_product_change(event):
redis.delete(f"product:{event.id}")
# ES sync handled separately11. Multi-region Strategy
11.1 Read replicas in EU
Primary in US-East. Read replica in EU-West.
EU users:
- Reads → EU replica (latency ~10ms vs 150ms)
- Writes → US primary (write latency 150ms accepted at checkout)
- Inventory check → must be primary (consistency)
11.2 Future: Multi-active
Move to Aurora DSQL or CockroachDB for true multi-region writes if growth justifies.
11.3 Data residency
EU users data physically in EU:
- Separate tenant per region
- Or Postgres FDW for cross-region read
- Or accept “stored in US, replica in EU for reads”
GDPR requires care. Talk to legal.
12. Capacity Planning
12.1 Data sizing
- Users: 10M × 5KB = 50GB
- Products: 1M × 10KB (with description) = 10GB
- Orders: 100M × 5KB = 500GB
- Order items: 500M × 1KB = 500GB
- Total Postgres: ~1.5TB working set, 2-3TB with indexes
12.2 Throughput
- Read: 100K req/s peak (product page, search, cart)
- Write: 1K orders/sec peak
Postgres r6i.4xlarge (16 vCPU, 128GB RAM): handles. Plus read replicas for scale-out reads.
12.3 Redis sizing
Active sessions + carts + cache: ~50GB RAM. Use ElastiCache cluster (3-node primary + 3 replicas).
12.4 ES sizing
1M products × ~5KB indexed = 5GB. Comfortable on 2-3 nodes for HA.
12.5 ClickHouse sizing
5 years history: 500M orders × 1KB = 500GB compressed. 1-2 nodes sufficient.
13. Failure Modes
13.1 Postgres primary down
Promote read replica via Patroni. RTO 30s.
13.2 Redis down
Cache miss → DB → app slower but works. Set sensible timeouts.
13.3 ES down
Search degraded. Fallback to basic Postgres search? Or display error.
13.4 Stripe down
Webhook delayed. Reconciliation job handles eventually.
13.5 Kafka down
Debezium pauses. WAL accumulates (monitor!). Recover when Kafka up.
14. Observability
- pg_stat_statements: top queries
- pgBouncer: pool health
- Redis: hit rate, evictions
- ES: query latency, indexing rate
- ClickHouse: ingestion lag
- Application: APM (Datadog/Honeycomb)
- Business: dashboards on order rate, revenue, errors
Alerts:
- Order rate drop >50% → page on-call
- Stripe webhook lag >5 min
- Inventory negative (impossible should alert)
- Replication lag >10s
- pg connection count >80% max
15. Implementation Phases
gantt title E-commerce Data Layer Build section MVP Postgres + basic schema : m1, 0, 2w Redis cache + cart : m2, after m1, 1w Stripe integration : m3, after m2, 2w ES search : m4, after m3, 2w section Growth pgBouncer : g1, after m4, 1w CDC to ES/CH : g2, after g1, 2w Vector recommendations : g3, after g2, 2w Read replica : g4, after g3, 1w section Scale Partitioning orders : s1, after g4, 2w Multi-region : s2, after s1, 4w ClickHouse analytics : s3, after s1, 3w
16. Tiếp theo
Case-Design-Data-SaaS-Multi-tenant
Cập nhật: 2026-05-16