Tuần 11 — DynamoDB & Cassandra

“Postgres dev thinking ‘how to model my data’. DynamoDB dev thinking ‘what queries do I run’. Đây không phải khác biệt nhỏ. Đây là đảo ngược thứ tự suy nghĩ. Sai thứ tự = redesign từ đầu.”

Tags: database dynamodb cassandra wide-column nosql access-patterns Thời lượng: 7 ngày (5-7h/ngày) Prerequisites: Tuan-02-Schema-Design-Normalization (để hiểu khác biệt thinking) Liên quan: Tuan-Bonus-Distributed-SQL-Engineering · Case-Design-Data-Realtime-Analytics

1. Context & Why

1.1 Access-pattern-first design

graph LR
    subgraph "Relational thinking"
        A1[Entity model] --> A2[Schema normalization]
        A2 --> A3[Indexes for queries]
        A3 --> A4[JOIN at query time]
    end

    subgraph "NoSQL (DynamoDB/Cassandra) thinking"
        B1[List ALL access patterns] --> B2[Design table for those patterns]
        B2 --> B3[Denormalize aggressively]
        B3 --> B4[No JOIN, query is single trip]
    end

    style A4 fill:#fff9c4
    style B4 fill:#c8e6c9

If you don’t know all access patterns upfront → DynamoDB/Cassandra is wrong choice. Use Postgres + denormalize when needed.

1.2 DynamoDB vs Cassandra in 2024-2026

	DynamoDB	Cassandra
Type	Managed (AWS only)	OSS, self-host or managed (DataStax, ScyllaDB)
Model	Key-Value + Document	Wide-column
Consistency	Strong/Eventual (per request)	Tunable (per request)
Indexes	LSI, GSI	Secondary indexes, MVs
Transactions	Yes (since 2018)	Lightweight only (Paxos-based)
Streams/CDC	Yes (DynamoDB Streams, Kinesis)	Yes (CDC log)
Cost model	Pay per request (RCU/WCU) or capacity	Self-host: server cost; DataStax: monthly
Best for	Serverless, AWS-native	Hybrid cloud, on-prem, write-heavy

1.3 Mục tiêu tuần

DynamoDB single-table design — pattern + practice
Partition key strategy + hot partition mitigation
LSI vs GSI — pick đúng
Cassandra data model — partition key + clustering columns
Tunable consistency — pick R, W, N
Tombstones — silent killer
CQL antipatterns
Migration: Postgres → DynamoDB use case
Pricing math — RCU/WCU calculation

1.4 Tham chiếu

DynamoDB Book — Alex DeBrie — https://www.dynamodbbook.com/
DynamoDB Best Practices — https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html
Cassandra: The Definitive Guide — 3rd ed
ScyllaDB University — https://university.scylladb.com/ (free)
DataStax Academy — free courses
The Last Pickle blog — Cassandra deep dives

2. DynamoDB Core

2.1 Item, Table, Key

Table: Users
├── Item:
│   ├── PK (Partition Key): "USER#42"
│   ├── SK (Sort Key): "PROFILE"
│   ├── name: "Alice"
│   ├── email: "[email protected]"
│   └── ...

Mandatory:

Partition Key (PK) — hash → physical partition
Sort Key (SK) (optional) — order within partition

Item = row. Up to 400KB.

2.2 Primary key designs

Simple PK (single):

PK: user_id

Composite PK (PK + SK):

PK: user_id
SK: order_timestamp

→ Query “user X’s orders sorted by time”: single partition lookup, walk SK.

2.3 Partition mechanism

DynamoDB internally partitions data by PK hash. Per-partition burst guidance (AWS docs):

~10 GB storage soft target
~3000 RCU / 1000 WCU burst throughput

⚠️ 2024 reality: Adaptive Capacity (default since 2019) tự re-balance heat → một partition có thể serve hơn nếu hot key spread đúng. Số trên chỉ là burst hint, không phải hard cap. Đừng design dựa cứng vào nó — đo thực tế qua CloudWatch.

graph TB
    subgraph "Table"
        P1[Partition 1<br/>users 0-99M]
        P2[Partition 2<br/>users 100M-200M]
        P3[Partition 3<br/>users 200M-300M]
        P4[Partition N]
    end

    Client[Request user 142M] -->|hash → P2| P2

Auto-split as data grows. No manual intervention.

2.4 Hot partition problem

If all writes target one PK → that partition saturates → throttling errors.

Example: PK = "GLOBAL_STATS" for counters.

graph LR
    A[High write workload] --> B[Hash PK = GLOBAL_STATS<br/>always same partition]
    B --> C[Partition limit 1000 WCU exceeded]
    C --> D[ProvisionedThroughputExceededException<br/>503 throttle]

Mitigations:

Sharded write: PK = “GLOBAL_STATS#0”, “#1”, …, “#15” (random) → spread
Write sharded, read aggregate: query all shards, sum
Time-bucket PK: PK = "STATS#2026-05-16" instead of single global
Pre-aggregate in app

3. Single-Table Design

3.1 Concept

Multiple entity types in one table with composite key encoding.

Table: AppData
├── PK: USER#42       SK: PROFILE                attrs: name, email
├── PK: USER#42       SK: ORDER#1700000000-001   attrs: total, status
├── PK: USER#42       SK: ORDER#1700000050-002   attrs: total, status
├── PK: USER#42       SK: ADDRESS#1               attrs: line1, city
├── PK: USER#42       SK: ADDRESS#2               attrs: line1, city
├── PK: ORDER#1       SK: METADATA                attrs: ...
├── PK: ORDER#1       SK: ITEM#1                  attrs: product, qty

Query: “Get user 42 with all orders”:

# Single request, single partition
response = dynamodb.query(
    TableName='AppData',
    KeyConditionExpression='PK = :pk',
    ExpressionAttributeValues={':pk': {'S': 'USER#42'}}
)
# Returns: profile + all orders + addresses

vs multi-table: 3 separate requests + JOIN in app.

3.2 Why single-table

One trip per access pattern
Cost efficient — RCU per request, not per table
Predictable latency

3.3 Tooling

NoSQL Workbench (AWS free tool) — design and visualize.

3.4 Access pattern table

Document upfront:

Access Pattern	Operation	Key
Get user profile	GetItem	PK=USER#id, SK=PROFILE
Get user with orders	Query	PK=USER#id
Get specific order	GetItem	PK=ORDER#id, SK=METADATA
List order items	Query	PK=ORDER#id, SK begins_with ITEM#
Orders by date range	Query GSI1	GSI1PK=ORDER, GSI1SK between dates
User by email	Query GSI2	GSI2PK=EMAIL#x

→ Then design table satisfying all.

4. Secondary Indexes

4.1 LSI vs GSI

graph TB
    subgraph "LSI - Local Secondary Index"
        L1[Same PK as base table]
        L2[Different SK]
        L3[Created at table creation only]
        L4[Strongly consistent reads available]
        L5[Max 5 per table]
        L6[Counted in 10GB partition limit]
    end

    subgraph "GSI - Global Secondary Index"
        G1[Different PK + SK]
        G2[Created/dropped anytime]
        G3[Eventually consistent]
        G4[Max 50 per table since May 2024]
        G5[Separate throughput]
        G6[Reverse indexing pattern]
    end

4.2 LSI example

Table: PK=user_id, SK=created_at. Want query “user’s orders by total”?

LSI1: PK=user_id, SK=total

Query: Query LSI1 WHERE PK=42 ORDER BY total DESC.

4.3 GSI example

Pivot table to enable inverted queries.

Base: PK=USER#42, SK=ORDER#timestamp
GSI1: PK=GSI1PK, SK=GSI1SK

Populate GSI1 attrs:
  GSI1PK = "ORDERS#TODAY"
  GSI1SK = timestamp

Query “all orders today” via GSI1.

4.4 Sparse GSI

Pattern: only items with attribute appear in GSI.

Item with status=pending: GSI1PK=PENDING, GSI1SK=created_at → appears in GSI
Item with status=delivered: no GSI1PK → not in GSI

→ Efficient “all pending orders” without scanning everything.

4.5 Adjacency list pattern

Many-to-many in single table:

PK: PRODUCT#1   SK: METADATA
PK: PRODUCT#1   SK: USER#42#FAVORITE   (user 42 favorited)
PK: USER#42     SK: PRODUCT#1#FAVORITE  (inverse)

Both directions queryable.

5. Capacity & Cost

5.1 RCU / WCU

Capacity unit:

1 RCU: 1 strongly consistent read of item up to 4KB, OR 2 eventually consistent
1 WCU: 1 write of item up to 1KB

For 5KB item:

Read strong: 2 RCU (ceil 5/4)
Read eventual: 1 RCU
Write: 5 WCU

5.2 Provisioned vs On-demand

	Provisioned	On-Demand
Cost	Cheaper if predictable	Pay-per-request
Throttle	If exceed	Auto-scale
Setup	Pick RCU/WCU + auto-scaling	Just use
Best for	Steady workload	Spiky/unpredictable

2024 default: on-demand for new tables (simpler, slightly more expensive). Migrate to provisioned if sustained predictable load.

5.3 Cost example

Workload: 1B reads/month, 100M writes/month, eventually consistent, 1KB items.

On-demand:

Reads: 1B × $0.25/ M =$ 250
Writes: 100M × $1.25/ M =$ 125
Storage 100GB × $0.25/ GB =$ 25
Total: ~$400/month

Equivalent Postgres RDS db.r6i.large: $260/month + IOPS + backup. Roughly comparable but DynamoDB scales hands-off.

5.4 Reserved capacity

Up to 50% discount with 1-year or 3-year commit. Use when production load stable.

6. Querying

6.1 GetItem — primary key lookup

response = client.get_item(
    TableName='AppData',
    Key={'PK': {'S': 'USER#42'}, 'SK': {'S': 'PROFILE'}}
)
# Returns single item (or None)

6.2 Query — partition + SK conditions

response = client.query(
    TableName='AppData',
    KeyConditionExpression='PK = :pk AND begins_with(SK, :sk_prefix)',
    ExpressionAttributeValues={
        ':pk': {'S': 'USER#42'},
        ':sk_prefix': {'S': 'ORDER#'}
    },
    Limit=10,
    ScanIndexForward=False  # descending
)

SK conditions:

= equality
<, <=, >, >=
BETWEEN low AND high
begins_with(value)

NO OR, IN, contains in KeyCondition (those are FilterExpression — applied after fetch, wasteful).

6.3 Scan — table scan (avoid)

client.scan(TableName='AppData')  # full table scan

Use for:

One-off admin queries
Reporting batch jobs (with parallel scan)

Don’t use in app code. Always Query with proper key design.

6.4 Pagination

response = client.query(...)
items = response['Items']
last_key = response.get('LastEvaluatedKey')
 
if last_key:
    next_response = client.query(..., ExclusiveStartKey=last_key)

Cursor-based, scales naturally. Don’t try offset-style.

6.5 BatchGetItem, BatchWriteItem

client.batch_get_item(
    RequestItems={
        'AppData': {
            'Keys': [{'PK': 'USER#42'}, {'PK': 'USER#43'}, ...]
        }
    }
)
# Up to 100 items / 16MB per call

client.batch_write_item(
    RequestItems={
        'AppData': [
            {'PutRequest': {'Item': {...}}},
            {'DeleteRequest': {'Key': {...}}},
            ...  # up to 25 items
        ]
    }
)

Save RCU/WCU on overhead. Each item still counts individually.

7. Transactions

7.1 TransactWriteItems

client.transact_write_items(
    TransactItems=[
        {'Put': {'TableName': 'AppData', 'Item': {...}}},
        {'Update': {'TableName': 'AppData', 'Key': {...}, 'UpdateExpression': '...'}},
        {'Delete': {'TableName': 'AppData', 'Key': {...}}},
        {'ConditionCheck': {'TableName': 'AppData', 'Key': {...}, 'ConditionExpression': '...'}}
    ]
)
# Up to 100 items per transaction
# Costs 2x WCU per item

Atomic across multiple items (different PKs even).

7.2 TransactGetItems

Atomic read snapshot across items. 2x RCU cost.

7.3 Optimistic concurrency

Conditional update:

client.update_item(
    TableName='AppData',
    Key={'PK': 'USER#42'},
    UpdateExpression='SET balance = balance - :amt',
    ConditionExpression='balance >= :amt',
    ExpressionAttributeValues={':amt': {'N': '100'}}
)
# Fails if balance < 100

8. Streams & TTL

8.1 DynamoDB Streams

CDC for DynamoDB. Configure on table:

New image
Old image
Both
Keys only

# Lambda triggered on every change
def lambda_handler(event, context):
    for record in event['Records']:
        if record['eventName'] == 'INSERT':
            # process new item
        elif record['eventName'] == 'MODIFY':
            # process update

Use cases:

Sync to Elasticsearch
Materialize aggregates
Audit log
Cross-region replication

8.2 TTL

client.update_item(
    Key={'PK': 'SESSION#abc'},
    UpdateExpression='SET expires_at = :ts',
    ExpressionAttributeValues={':ts': {'N': str(int(time.time()) + 3600)}}
)

DynamoDB auto-deletes within ~48 hours of TTL pass (not exact). Free (no WCU charged).

8.3 Global Tables (multi-region)

Replicate table across regions. Active-active. Last-writer-wins conflict resolution.

aws dynamodb update-table --table-name AppData \
    --replica-updates '[{"Create": {"RegionName": "eu-west-1"}}]'

9. Cassandra Core

9.1 Data Model

CQL table:

CREATE TABLE orders (
    user_id uuid,
    order_id timeuuid,
    total decimal,
    status text,
    PRIMARY KEY (user_id, order_id)
);

user_id = partition key (hash)
order_id = clustering key (sorted within partition)

Query:

SELECT * FROM orders WHERE user_id = ? ORDER BY order_id DESC LIMIT 10;

Same insight as DynamoDB: one-trip access to user’s orders sorted.

9.2 Composite partition key

PRIMARY KEY ((country, user_id), order_id)

Inner parens = partition key composite. Hash combined.

SELECT * FROM orders WHERE country = 'VN' AND user_id = 42;

Must provide ALL partition key components in WHERE.

9.3 Token-based partitioning

Each node owns a token range. Key → hash (Murmur3) → token → node.

graph TB
    subgraph "Cassandra Ring"
        N1[Node 1<br/>tokens 0-25%]
        N2[Node 2<br/>tokens 25-50%]
        N3[Node 3<br/>tokens 50-75%]
        N4[Node 4<br/>tokens 75-100%]
    end

    Client[Key user:42] -->|murmur3 hash → token 17%| N1

Virtual nodes (vnodes): each physical node owns multiple non-contiguous token ranges → better rebalance.

9.4 Tunable Consistency

N = replication factor (typical 3)
W = write quorum
R = read quorum
W + R > N = strong consistency

CQL consistency levels:

ANY — write to any node, then hinted handoff (weakest)
ONE — 1 replica responds
TWO, THREE
QUORUM — majority (N/2+1)
LOCAL_QUORUM — quorum in local DC
EACH_QUORUM — quorum each DC
ALL — all replicas (strongest, slowest)

Pattern:

Strong consistency: W=QUORUM + R=QUORUM (N=3 → W=2, R=2, W+R=4 > 3)
Write-fast eventual: W=ONE
Read-fast eventual: R=ONE

9.5 Replication strategy

CREATE KEYSPACE myapp WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,
    'eu-west': 3
};

SimpleStrategy — single DC (dev/test)
NetworkTopologyStrategy — multi-DC (production)

9.6 Read repair & hinted handoff

Hinted handoff: if replica down on write, coordinator stores hint, replays when back
Read repair: on read, if replicas have inconsistent data, fix during read
Anti-entropy: periodic Merkle tree comparison + sync

These maintain eventual consistency.

10. Cassandra Tombstones — Silent Killer

10.1 What

DELETE doesn’t remove data; writes tombstone marker. After gc_grace_seconds (default 10 days), compaction removes for real.

10.2 Why this matters

Query for partition with 10,000 tombstones + 100 live rows:

Reads ALL 10,000 + 100 = 10,100 cells
Filters out tombstones in memory
Returns 100

Slow! Worse: TombstoneOverwhelmingException at 100K tombstones default threshold.

10.3 Causes

Frequent deletes — TTL on records
Update with null — also creates tombstone
Range delete — many tombstones in one op

10.4 Avoid

Don’t model “delete heavy” workload in Cassandra
For TTL queue: use partition per time bucket (PK = day)
Avoid updates that set values to null

-- BAD: setting null = tombstone
UPDATE users SET phone = null WHERE id = 42;
 
-- BETTER: omit
UPDATE users SET email = '[email protected]' WHERE id = 42;

10.5 Monitor

SELECT tombstone_score FROM system.size_estimates WHERE keyspace_name = ?;

cqlsh trace shows tombstones per query.

11. CQL Anti-patterns

Anti-pattern	Why bad	Fix
`ALLOW FILTERING`	Full scan, slow	Design partition key
Secondary index on high-cardinality column	Coordinator hits all nodes	Materialized view
Wide partition (>1M rows)	Slow read, GC pressure	Bucket by time
Heavy delete workload	Tombstone hell	Different DB or partition strategy
Single-partition counter	Hot partition	Sharded counter
Range scan across partitions	Anti-pattern	Single partition by design
Use Cassandra as primary DB without backups	Bit-rot risk	Snapshot + offsite
Mixed read/write CL	Inconsistent	Pick R+W > N or accept eventual

12. Migration Patterns

12.1 Postgres → DynamoDB

Use case: scale-out, serverless, AWS-native.

Steps:

Enumerate access patterns
Design DynamoDB table(s)
Build dual-write in app (write both Postgres + DynamoDB)
Backfill from Postgres
Switch reads to DynamoDB
Stop dual-write, decommission Postgres

Tools: AWS DMS (Database Migration Service) for initial bulk load.

12.2 DynamoDB → Postgres (rare)

Sometimes scale-back when access patterns get too complex.

Use logical export (DynamoDB Export to S3) → load to Postgres.

12.3 ScyllaDB — Cassandra-compatible faster

Drop-in Cassandra replacement written in C++ (vs Java). 10x throughput claims.

Use case: existing Cassandra under pressure → switch to ScyllaDB without code change.

13. Choosing DynamoDB vs Cassandra vs Postgres

flowchart TD
    A[Need DB] --> B{AWS-only stack?}
    B -->|Yes| C{Predictable access patterns?<br/>Variable workload?}
    C -->|Yes| D[DynamoDB]

    B -->|No / multi-cloud| E{Write-heavy<br/>>100K writes/sec?}
    E -->|Yes| F[Cassandra / ScyllaDB]
    E -->|No| G{Need OLTP joins,<br/>flexible queries?}
    G -->|Yes| H[Postgres]
    G -->|No - simple KV| I[Cassandra or DynamoDB]

    style D fill:#fff9c4
    style F fill:#fff9c4
    style H fill:#c8e6c9

14. Lab

14.1 Day 1: DynamoDB Local

docker run -p 8000:8000 amazon/dynamodb-local
aws dynamodb create-table --endpoint-url http://localhost:8000 \
    --table-name AppData \
    --attribute-definitions \
        AttributeName=PK,AttributeType=S \
        AttributeName=SK,AttributeType=S \
    --key-schema \
        AttributeName=PK,KeyType=HASH \
        AttributeName=SK,KeyType=RANGE \
    --billing-mode PAY_PER_REQUEST

14.2 Day 2: Single-table design

Take e-commerce schema (Tuần 02). Redesign as single table:

USER, ORDER, ITEM, REVIEW entities
List 10 access patterns
Design PK/SK + GSIs

14.3 Day 3: Hot partition simulation

Create table with low-cardinality PK. Run write load. Watch ThrottlingException. Fix with sharded PK.

14.4 Day 4: Cassandra setup

docker compose up cassandra
cqlsh
 
CREATE KEYSPACE myapp WITH replication = {'class':'SimpleStrategy','replication_factor':1};
USE myapp;
CREATE TABLE users (id uuid PRIMARY KEY, name text);
INSERT INTO users VALUES (uuid(), 'Alice');
SELECT * FROM users;

14.5 Day 5: Tunable consistency

Setup 3-node Cassandra cluster (docker-compose). Test:

W=ONE + R=ONE — eventual visible after delay
W=QUORUM + R=QUORUM — immediate visible

14.6 Day 6: Tombstone

Insert + delete heavily, query. Observe slow query.

14.7 Day 7: Migration

Plan: migrate users table from Postgres → DynamoDB. List access patterns, design, dual-write code.

15. Self-check

Access-pattern-first design — vì sao mandatory cho DynamoDB?
PK + SK trong DynamoDB — vai trò mỗi cái?
LSI vs GSI — 4 khác biệt?
Hot partition — cơ chế, mitigation?
Single-table design — 1 ưu điểm chính?
RCU vs WCU — definition, cost example?
Cassandra W+R>N — ý nghĩa, ví dụ N=3?
Tombstone — vì sao silent killer?
ALLOW FILTERING — vì sao tệ?
DynamoDB vs Cassandra vs Postgres — pick mỗi cái khi nào?

16. Tiếp theo

Bài tiếp: Tuan-12-MongoDB-Document-DB — document model.

Tuần 11 hoàn thành. Query first, model second. Cập nhật: 2026-05-16

lthieu's notes

Explorer

Tuan-11-DynamoDB-Cassandra