Tuần 11 — DynamoDB & Cassandra
“Postgres dev thinking ‘how to model my data’. DynamoDB dev thinking ‘what queries do I run’. Đây không phải khác biệt nhỏ. Đây là đảo ngược thứ tự suy nghĩ. Sai thứ tự = redesign từ đầu.”
Tags: database dynamodb cassandra wide-column nosql access-patterns Thời lượng: 7 ngày (5-7h/ngày) Prerequisites: Tuan-02-Schema-Design-Normalization (để hiểu khác biệt thinking) Liên quan: Tuan-Bonus-Distributed-SQL-Engineering · Case-Design-Data-Realtime-Analytics
1. Context & Why
1.1 Access-pattern-first design
graph LR subgraph "Relational thinking" A1[Entity model] --> A2[Schema normalization] A2 --> A3[Indexes for queries] A3 --> A4[JOIN at query time] end subgraph "NoSQL (DynamoDB/Cassandra) thinking" B1[List ALL access patterns] --> B2[Design table for those patterns] B2 --> B3[Denormalize aggressively] B3 --> B4[No JOIN, query is single trip] end style A4 fill:#fff9c4 style B4 fill:#c8e6c9
If you don’t know all access patterns upfront → DynamoDB/Cassandra is wrong choice. Use Postgres + denormalize when needed.
1.2 DynamoDB vs Cassandra in 2024-2026
| DynamoDB | Cassandra | |
|---|---|---|
| Type | Managed (AWS only) | OSS, self-host or managed (DataStax, ScyllaDB) |
| Model | Key-Value + Document | Wide-column |
| Consistency | Strong/Eventual (per request) | Tunable (per request) |
| Indexes | LSI, GSI | Secondary indexes, MVs |
| Transactions | Yes (since 2018) | Lightweight only (Paxos-based) |
| Streams/CDC | Yes (DynamoDB Streams, Kinesis) | Yes (CDC log) |
| Cost model | Pay per request (RCU/WCU) or capacity | Self-host: server cost; DataStax: monthly |
| Best for | Serverless, AWS-native | Hybrid cloud, on-prem, write-heavy |
1.3 Mục tiêu tuần
- DynamoDB single-table design — pattern + practice
- Partition key strategy + hot partition mitigation
- LSI vs GSI — pick đúng
- Cassandra data model — partition key + clustering columns
- Tunable consistency — pick R, W, N
- Tombstones — silent killer
- CQL antipatterns
- Migration: Postgres → DynamoDB use case
- Pricing math — RCU/WCU calculation
1.4 Tham chiếu
- DynamoDB Book — Alex DeBrie — https://www.dynamodbbook.com/
- DynamoDB Best Practices — https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html
- Cassandra: The Definitive Guide — 3rd ed
- ScyllaDB University — https://university.scylladb.com/ (free)
- DataStax Academy — free courses
- The Last Pickle blog — Cassandra deep dives
2. DynamoDB Core
2.1 Item, Table, Key
Table: Users
├── Item:
│ ├── PK (Partition Key): "USER#42"
│ ├── SK (Sort Key): "PROFILE"
│ ├── name: "Alice"
│ ├── email: "[email protected]"
│ └── ...
Mandatory:
- Partition Key (PK) — hash → physical partition
- Sort Key (SK) (optional) — order within partition
Item = row. Up to 400KB.
2.2 Primary key designs
Simple PK (single):
PK: user_id
Composite PK (PK + SK):
PK: user_id
SK: order_timestamp
→ Query “user X’s orders sorted by time”: single partition lookup, walk SK.
2.3 Partition mechanism
DynamoDB internally partitions data by PK hash. Per-partition burst guidance (AWS docs):
- ~10 GB storage soft target
- ~3000 RCU / 1000 WCU burst throughput
⚠️ 2024 reality: Adaptive Capacity (default since 2019) tự re-balance heat → một partition có thể serve hơn nếu hot key spread đúng. Số trên chỉ là burst hint, không phải hard cap. Đừng design dựa cứng vào nó — đo thực tế qua CloudWatch.
graph TB subgraph "Table" P1[Partition 1<br/>users 0-99M] P2[Partition 2<br/>users 100M-200M] P3[Partition 3<br/>users 200M-300M] P4[Partition N] end Client[Request user 142M] -->|hash → P2| P2
Auto-split as data grows. No manual intervention.
2.4 Hot partition problem
If all writes target one PK → that partition saturates → throttling errors.
Example: PK = "GLOBAL_STATS" for counters.
graph LR A[High write workload] --> B[Hash PK = GLOBAL_STATS<br/>always same partition] B --> C[Partition limit 1000 WCU exceeded] C --> D[ProvisionedThroughputExceededException<br/>503 throttle]
Mitigations:
- Sharded write: PK = “GLOBAL_STATS#0”, “#1”, …, “#15” (random) → spread
- Write sharded, read aggregate: query all shards, sum
- Time-bucket PK:
PK = "STATS#2026-05-16"instead of single global - Pre-aggregate in app
3. Single-Table Design
3.1 Concept
Multiple entity types in one table with composite key encoding.
Table: AppData
├── PK: USER#42 SK: PROFILE attrs: name, email
├── PK: USER#42 SK: ORDER#1700000000-001 attrs: total, status
├── PK: USER#42 SK: ORDER#1700000050-002 attrs: total, status
├── PK: USER#42 SK: ADDRESS#1 attrs: line1, city
├── PK: USER#42 SK: ADDRESS#2 attrs: line1, city
├── PK: ORDER#1 SK: METADATA attrs: ...
├── PK: ORDER#1 SK: ITEM#1 attrs: product, qty
Query: “Get user 42 with all orders”:
# Single request, single partition
response = dynamodb.query(
TableName='AppData',
KeyConditionExpression='PK = :pk',
ExpressionAttributeValues={':pk': {'S': 'USER#42'}}
)
# Returns: profile + all orders + addressesvs multi-table: 3 separate requests + JOIN in app.
3.2 Why single-table
- One trip per access pattern
- Cost efficient — RCU per request, not per table
- Predictable latency
3.3 Tooling
NoSQL Workbench (AWS free tool) — design and visualize.
3.4 Access pattern table
Document upfront:
| Access Pattern | Operation | Key |
|---|---|---|
| Get user profile | GetItem | PK=USER#id, SK=PROFILE |
| Get user with orders | Query | PK=USER#id |
| Get specific order | GetItem | PK=ORDER#id, SK=METADATA |
| List order items | Query | PK=ORDER#id, SK begins_with ITEM# |
| Orders by date range | Query GSI1 | GSI1PK=ORDER, GSI1SK between dates |
| User by email | Query GSI2 | GSI2PK=EMAIL#x |
→ Then design table satisfying all.
4. Secondary Indexes
4.1 LSI vs GSI
graph TB subgraph "LSI - Local Secondary Index" L1[Same PK as base table] L2[Different SK] L3[Created at table creation only] L4[Strongly consistent reads available] L5[Max 5 per table] L6[Counted in 10GB partition limit] end subgraph "GSI - Global Secondary Index" G1[Different PK + SK] G2[Created/dropped anytime] G3[Eventually consistent] G4[Max 50 per table since May 2024] G5[Separate throughput] G6[Reverse indexing pattern] end
4.2 LSI example
Table: PK=user_id, SK=created_at. Want query “user’s orders by total”?
LSI1: PK=user_id, SK=total
Query: Query LSI1 WHERE PK=42 ORDER BY total DESC.
4.3 GSI example
Pivot table to enable inverted queries.
Base: PK=USER#42, SK=ORDER#timestamp
GSI1: PK=GSI1PK, SK=GSI1SK
Populate GSI1 attrs:
GSI1PK = "ORDERS#TODAY"
GSI1SK = timestamp
Query “all orders today” via GSI1.
4.4 Sparse GSI
Pattern: only items with attribute appear in GSI.
Item with status=pending: GSI1PK=PENDING, GSI1SK=created_at → appears in GSI
Item with status=delivered: no GSI1PK → not in GSI
→ Efficient “all pending orders” without scanning everything.
4.5 Adjacency list pattern
Many-to-many in single table:
PK: PRODUCT#1 SK: METADATA
PK: PRODUCT#1 SK: USER#42#FAVORITE (user 42 favorited)
PK: USER#42 SK: PRODUCT#1#FAVORITE (inverse)
Both directions queryable.
5. Capacity & Cost
5.1 RCU / WCU
Capacity unit:
- 1 RCU: 1 strongly consistent read of item up to 4KB, OR 2 eventually consistent
- 1 WCU: 1 write of item up to 1KB
For 5KB item:
- Read strong: 2 RCU (ceil 5/4)
- Read eventual: 1 RCU
- Write: 5 WCU
5.2 Provisioned vs On-demand
| Provisioned | On-Demand | |
|---|---|---|
| Cost | Cheaper if predictable | Pay-per-request |
| Throttle | If exceed | Auto-scale |
| Setup | Pick RCU/WCU + auto-scaling | Just use |
| Best for | Steady workload | Spiky/unpredictable |
2024 default: on-demand for new tables (simpler, slightly more expensive). Migrate to provisioned if sustained predictable load.
5.3 Cost example
Workload: 1B reads/month, 100M writes/month, eventually consistent, 1KB items.
On-demand:
- Reads: 1B × 250
- Writes: 100M × 125
- Storage 100GB × 25
- Total: ~$400/month
Equivalent Postgres RDS db.r6i.large: $260/month + IOPS + backup. Roughly comparable but DynamoDB scales hands-off.
5.4 Reserved capacity
Up to 50% discount with 1-year or 3-year commit. Use when production load stable.
6. Querying
6.1 GetItem — primary key lookup
response = client.get_item(
TableName='AppData',
Key={'PK': {'S': 'USER#42'}, 'SK': {'S': 'PROFILE'}}
)
# Returns single item (or None)6.2 Query — partition + SK conditions
response = client.query(
TableName='AppData',
KeyConditionExpression='PK = :pk AND begins_with(SK, :sk_prefix)',
ExpressionAttributeValues={
':pk': {'S': 'USER#42'},
':sk_prefix': {'S': 'ORDER#'}
},
Limit=10,
ScanIndexForward=False # descending
)SK conditions:
=equality<,<=,>,>=BETWEEN low AND highbegins_with(value)
NO OR, IN, contains in KeyCondition (those are FilterExpression — applied after fetch, wasteful).
6.3 Scan — table scan (avoid)
client.scan(TableName='AppData') # full table scanUse for:
- One-off admin queries
- Reporting batch jobs (with parallel scan)
Don’t use in app code. Always Query with proper key design.
6.4 Pagination
response = client.query(...)
items = response['Items']
last_key = response.get('LastEvaluatedKey')
if last_key:
next_response = client.query(..., ExclusiveStartKey=last_key)Cursor-based, scales naturally. Don’t try offset-style.
6.5 BatchGetItem, BatchWriteItem
client.batch_get_item(
RequestItems={
'AppData': {
'Keys': [{'PK': 'USER#42'}, {'PK': 'USER#43'}, ...]
}
}
)
# Up to 100 items / 16MB per callclient.batch_write_item(
RequestItems={
'AppData': [
{'PutRequest': {'Item': {...}}},
{'DeleteRequest': {'Key': {...}}},
... # up to 25 items
]
}
)Save RCU/WCU on overhead. Each item still counts individually.
7. Transactions
7.1 TransactWriteItems
client.transact_write_items(
TransactItems=[
{'Put': {'TableName': 'AppData', 'Item': {...}}},
{'Update': {'TableName': 'AppData', 'Key': {...}, 'UpdateExpression': '...'}},
{'Delete': {'TableName': 'AppData', 'Key': {...}}},
{'ConditionCheck': {'TableName': 'AppData', 'Key': {...}, 'ConditionExpression': '...'}}
]
)
# Up to 100 items per transaction
# Costs 2x WCU per itemAtomic across multiple items (different PKs even).
7.2 TransactGetItems
Atomic read snapshot across items. 2x RCU cost.
7.3 Optimistic concurrency
Conditional update:
client.update_item(
TableName='AppData',
Key={'PK': 'USER#42'},
UpdateExpression='SET balance = balance - :amt',
ConditionExpression='balance >= :amt',
ExpressionAttributeValues={':amt': {'N': '100'}}
)
# Fails if balance < 1008. Streams & TTL
8.1 DynamoDB Streams
CDC for DynamoDB. Configure on table:
- New image
- Old image
- Both
- Keys only
# Lambda triggered on every change
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
# process new item
elif record['eventName'] == 'MODIFY':
# process updateUse cases:
- Sync to Elasticsearch
- Materialize aggregates
- Audit log
- Cross-region replication
8.2 TTL
client.update_item(
Key={'PK': 'SESSION#abc'},
UpdateExpression='SET expires_at = :ts',
ExpressionAttributeValues={':ts': {'N': str(int(time.time()) + 3600)}}
)DynamoDB auto-deletes within ~48 hours of TTL pass (not exact). Free (no WCU charged).
8.3 Global Tables (multi-region)
Replicate table across regions. Active-active. Last-writer-wins conflict resolution.
aws dynamodb update-table --table-name AppData \
--replica-updates '[{"Create": {"RegionName": "eu-west-1"}}]'9. Cassandra Core
9.1 Data Model
CQL table:
CREATE TABLE orders (
user_id uuid,
order_id timeuuid,
total decimal,
status text,
PRIMARY KEY (user_id, order_id)
);user_id= partition key (hash)order_id= clustering key (sorted within partition)
Query:
SELECT * FROM orders WHERE user_id = ? ORDER BY order_id DESC LIMIT 10;Same insight as DynamoDB: one-trip access to user’s orders sorted.
9.2 Composite partition key
PRIMARY KEY ((country, user_id), order_id)Inner parens = partition key composite. Hash combined.
SELECT * FROM orders WHERE country = 'VN' AND user_id = 42;Must provide ALL partition key components in WHERE.
9.3 Token-based partitioning
Each node owns a token range. Key → hash (Murmur3) → token → node.
graph TB subgraph "Cassandra Ring" N1[Node 1<br/>tokens 0-25%] N2[Node 2<br/>tokens 25-50%] N3[Node 3<br/>tokens 50-75%] N4[Node 4<br/>tokens 75-100%] end Client[Key user:42] -->|murmur3 hash → token 17%| N1
Virtual nodes (vnodes): each physical node owns multiple non-contiguous token ranges → better rebalance.
9.4 Tunable Consistency
N = replication factor (typical 3)
W = write quorum
R = read quorum
W + R > N = strong consistency
CQL consistency levels:
ANY— write to any node, then hinted handoff (weakest)ONE— 1 replica respondsTWO,THREEQUORUM— majority (N/2+1)LOCAL_QUORUM— quorum in local DCEACH_QUORUM— quorum each DCALL— all replicas (strongest, slowest)
Pattern:
- Strong consistency: W=QUORUM + R=QUORUM (N=3 → W=2, R=2, W+R=4 > 3)
- Write-fast eventual: W=ONE
- Read-fast eventual: R=ONE
9.5 Replication strategy
CREATE KEYSPACE myapp WITH replication = {
'class': 'NetworkTopologyStrategy',
'us-east': 3,
'eu-west': 3
};SimpleStrategy— single DC (dev/test)NetworkTopologyStrategy— multi-DC (production)
9.6 Read repair & hinted handoff
- Hinted handoff: if replica down on write, coordinator stores hint, replays when back
- Read repair: on read, if replicas have inconsistent data, fix during read
- Anti-entropy: periodic Merkle tree comparison + sync
These maintain eventual consistency.
10. Cassandra Tombstones — Silent Killer
10.1 What
DELETE doesn’t remove data; writes tombstone marker. After gc_grace_seconds (default 10 days), compaction removes for real.
10.2 Why this matters
Query for partition with 10,000 tombstones + 100 live rows:
- Reads ALL 10,000 + 100 = 10,100 cells
- Filters out tombstones in memory
- Returns 100
Slow! Worse: TombstoneOverwhelmingException at 100K tombstones default threshold.
10.3 Causes
- Frequent deletes — TTL on records
- Update with null — also creates tombstone
- Range delete — many tombstones in one op
10.4 Avoid
- Don’t model “delete heavy” workload in Cassandra
- For TTL queue: use partition per time bucket (
PK = day) - Avoid updates that set values to null
-- BAD: setting null = tombstone
UPDATE users SET phone = null WHERE id = 42;
-- BETTER: omit
UPDATE users SET email = '[email protected]' WHERE id = 42;10.5 Monitor
SELECT tombstone_score FROM system.size_estimates WHERE keyspace_name = ?;cqlsh trace shows tombstones per query.
11. CQL Anti-patterns
| Anti-pattern | Why bad | Fix |
|---|---|---|
ALLOW FILTERING | Full scan, slow | Design partition key |
| Secondary index on high-cardinality column | Coordinator hits all nodes | Materialized view |
| Wide partition (>1M rows) | Slow read, GC pressure | Bucket by time |
| Heavy delete workload | Tombstone hell | Different DB or partition strategy |
| Single-partition counter | Hot partition | Sharded counter |
| Range scan across partitions | Anti-pattern | Single partition by design |
| Use Cassandra as primary DB without backups | Bit-rot risk | Snapshot + offsite |
| Mixed read/write CL | Inconsistent | Pick R+W > N or accept eventual |
12. Migration Patterns
12.1 Postgres → DynamoDB
Use case: scale-out, serverless, AWS-native.
Steps:
- Enumerate access patterns
- Design DynamoDB table(s)
- Build dual-write in app (write both Postgres + DynamoDB)
- Backfill from Postgres
- Switch reads to DynamoDB
- Stop dual-write, decommission Postgres
Tools: AWS DMS (Database Migration Service) for initial bulk load.
12.2 DynamoDB → Postgres (rare)
Sometimes scale-back when access patterns get too complex.
Use logical export (DynamoDB Export to S3) → load to Postgres.
12.3 ScyllaDB — Cassandra-compatible faster
Drop-in Cassandra replacement written in C++ (vs Java). 10x throughput claims.
Use case: existing Cassandra under pressure → switch to ScyllaDB without code change.
13. Choosing DynamoDB vs Cassandra vs Postgres
flowchart TD A[Need DB] --> B{AWS-only stack?} B -->|Yes| C{Predictable access patterns?<br/>Variable workload?} C -->|Yes| D[DynamoDB] B -->|No / multi-cloud| E{Write-heavy<br/>>100K writes/sec?} E -->|Yes| F[Cassandra / ScyllaDB] E -->|No| G{Need OLTP joins,<br/>flexible queries?} G -->|Yes| H[Postgres] G -->|No - simple KV| I[Cassandra or DynamoDB] style D fill:#fff9c4 style F fill:#fff9c4 style H fill:#c8e6c9
14. Lab
14.1 Day 1: DynamoDB Local
docker run -p 8000:8000 amazon/dynamodb-local
aws dynamodb create-table --endpoint-url http://localhost:8000 \
--table-name AppData \
--attribute-definitions \
AttributeName=PK,AttributeType=S \
AttributeName=SK,AttributeType=S \
--key-schema \
AttributeName=PK,KeyType=HASH \
AttributeName=SK,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST14.2 Day 2: Single-table design
Take e-commerce schema (Tuần 02). Redesign as single table:
- USER, ORDER, ITEM, REVIEW entities
- List 10 access patterns
- Design PK/SK + GSIs
14.3 Day 3: Hot partition simulation
Create table with low-cardinality PK. Run write load. Watch ThrottlingException. Fix with sharded PK.
14.4 Day 4: Cassandra setup
docker compose up cassandra
cqlsh
CREATE KEYSPACE myapp WITH replication = {'class':'SimpleStrategy','replication_factor':1};
USE myapp;
CREATE TABLE users (id uuid PRIMARY KEY, name text);
INSERT INTO users VALUES (uuid(), 'Alice');
SELECT * FROM users;14.5 Day 5: Tunable consistency
Setup 3-node Cassandra cluster (docker-compose). Test:
- W=ONE + R=ONE — eventual visible after delay
- W=QUORUM + R=QUORUM — immediate visible
14.6 Day 6: Tombstone
Insert + delete heavily, query. Observe slow query.
14.7 Day 7: Migration
Plan: migrate users table from Postgres → DynamoDB. List access patterns, design, dual-write code.
15. Self-check
- Access-pattern-first design — vì sao mandatory cho DynamoDB?
- PK + SK trong DynamoDB — vai trò mỗi cái?
- LSI vs GSI — 4 khác biệt?
- Hot partition — cơ chế, mitigation?
- Single-table design — 1 ưu điểm chính?
- RCU vs WCU — definition, cost example?
- Cassandra W+R>N — ý nghĩa, ví dụ N=3?
- Tombstone — vì sao silent killer?
- ALLOW FILTERING — vì sao tệ?
- DynamoDB vs Cassandra vs Postgres — pick mỗi cái khi nào?
16. Tiếp theo
Bài tiếp: Tuan-12-MongoDB-Document-DB — document model.
Tuần 11 hoàn thành. Query first, model second. Cập nhật: 2026-05-16