Tuần Bonus: Outbox Pattern, CDC & Saga Orchestration
“Junior dev: ‘Sau khi save order, gửi message vào Kafka.’ Senior dev: ‘Nếu DB commit nhưng Kafka send fail thì sao?’ Architect: ‘Đó là dual-write problem — và đó là lý do Outbox Pattern tồn tại.‘”
Tags: system-design outbox cdc debezium saga microservices event-driven bonus Student: Hieu (Backend Dev → Architect) Prerequisite: Tuan-08-Message-Queue · Tuan-11-Microservices-Pattern · Tuan-Bonus-Consistency-Models-Isolation Liên quan: Case-Design-Payment-System · Case-Design-Digital-Wallet · Case-Design-Distributed-Message-Queue · Case-Design-Hotel-Reservation-System
1. Context & Why
Analogy đời thường — Bưu điện và sổ sách
Hieu, tưởng tượng em là chủ một cửa hàng online. Mỗi lần có đơn hàng mới, em phải:
- Ghi vào sổ kế toán (database): “Đơn hàng #123, 500K”
- Gửi email cho kho (message queue): “Chuẩn bị đóng gói #123”
Vấn đề: 2 hành động này không nằm trong cùng một transaction. Có 4 kịch bản:
| Kịch bản | Sổ sách | Hậu quả | |
|---|---|---|---|
| ✅ Happy | OK | OK | Mọi thứ ổn |
| ❌ A | OK | Fail | Có đơn hàng nhưng kho không biết → mất khách |
| ❌ B | Fail | OK | Kho đóng gói nhưng không có record → mất tiền |
| ❌ C | OK | Sent 2 lần | Kho đóng 2 đơn → over-shipping |
Tỉ lệ xảy ra trong production: ~0.01-1% nếu không có pattern đúng. Với 1M đơn/ngày → 100-10,000 đơn lỗi/ngày. Đây không phải edge case, đây là dual-write problem — và nó xảy ra mọi lúc khi em “save DB rồi send message”.
Outbox Pattern giải quyết bằng nguyên tắc đơn giản:
Chỉ ghi vào 1 nơi (database). Một process khác đọc từ đó và push ra message queue.
Tại sao Backend Dev cần hiểu Outbox?
| Lý do | Ví dụ |
|---|---|
| Microservice + event-driven gặp phải | Order service → Kafka → Inventory, Notification, Analytics |
| Transactional consistency | Không thể 2PC giữa Postgres + Kafka |
| Đảm bảo at-least-once | Message không bao giờ mất |
| Audit trail | Outbox table = log đầy đủ events |
| Decoupling | Producer không phụ thuộc availability của Kafka |
| CDC chuẩn industry | Debezium, Kafka Connect dùng pattern này |
Tại sao Alex Xu không đi sâu vào Outbox?
Alex Xu vol 1+2 nói về Saga và microservices ở mức conceptual. Nhưng implementation chi tiết — làm sao đảm bảo:
- Message luôn được publish nếu DB commit
- Không publish nếu DB rollback
- Idempotent consumer
- Order preservation
— đây là chi tiết mà 80% production system làm sai ở giai đoạn đầu, dẫn đến inconsistency, lost events, duplicate processing. Outbox Pattern là answer định hình mà mọi senior microservice engineer phải biết.
Tham chiếu chính
- Microservices.io patterns — Chris Richardson — https://microservices.io/patterns/data/transactional-outbox.html
- Debezium docs — https://debezium.io/documentation/reference/stable/transformations/outbox-event-router.html
- Confluent blog: Outbox Pattern — https://www.confluent.io/blog/microservices-outbox-pattern-for-data-consistency/
- DDIA Chapter 11 (Stream Processing) — Kleppmann
- Saga paper (1987) — Hector Garcia-Molina, Kenneth Salem — http://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf
2. Deep Dive — Khái niệm cốt lõi
2.1 The Dual-Write Problem
Naive approach:
def create_order(order_data):
# Write 1: Database
order = db.execute("INSERT INTO orders ... RETURNING id")
# Write 2: Message queue
kafka.send("orders", {"id": order.id, ...})
return orderFailure modes:
Time →
Mode A: DB commit, Kafka fail
T0: BEGIN
T1: INSERT order ✓
T2: COMMIT ✓ (data persisted)
T3: kafka.send() → NETWORK ERROR
Result: Order in DB, no event sent → downstream systems out of sync
Mode B: Kafka succeed, DB rollback
T0: BEGIN
T1: INSERT order ✓
T2: kafka.send() ✓ (event sent!)
T3: ... some error → ROLLBACK
Result: Event sent for non-existent order → ghost data
Mode C: Process crash between
T0: COMMIT ✓
T1: kafka.send() called
T2: ⚡ Process crashes ⚡
Result: Sent or not? Depends on TCP buffer state. Cannot retry safely.
Mode D: Send retry creates duplicate
T0: kafka.send() → timeout (but actually succeeded)
T1: Retry → 2 events for same order
Result: Downstream processes order twice
Key insight: 2PC giữa Postgres và Kafka không khả thi trong practice (Kafka không phải XA-compliant; lock distributed quá đắt). Cần pattern khác.
2.2 Outbox Pattern — Nguyên lý
Nguyên tắc: Chỉ commit vào 1 atomic boundary (database transaction). Một process riêng đọc từ DB và publish vào MQ.
┌──────────────────────────────────────────────────────────┐
│ Application │
│ │
│ BEGIN TRANSACTION │
│ ├─ INSERT INTO orders (...) VALUES (...) │
│ ├─ INSERT INTO outbox (event_type, payload, created_at) │
│ COMMIT ← một atomic boundary │
└──────────────────────────────────────────────────────────┘
│
│ poll/CDC
▼
┌──────────────────────────────────────────────────────────┐
│ Outbox Relay │
│ │
│ Read unpublished events from outbox table │
│ → Publish to Kafka │
│ → Mark as published (or delete) │
└──────────────────────────────────────────────────────────┘
│
│ Kafka
▼
┌────────────────────┐
│ Downstream Consumers│
│ - Inventory │
│ - Notification │
│ - Analytics │
└────────────────────┘
Tính chất đảm bảo:
- At-least-once: Nếu DB commit → event sẽ eventually publish (relay retry vô hạn)
- No ghost events: Nếu DB rollback → outbox row không tồn tại → không publish
- Ordering: Có thể đảm bảo per-aggregate order (qua partition key)
- Idempotency: Consumer phải idempotent (vì có thể duplicate)
2.3 Outbox Table Schema
CREATE TABLE outbox (
id BIGSERIAL PRIMARY KEY,
aggregate_type TEXT NOT NULL, -- 'Order', 'Payment', 'User'
aggregate_id TEXT NOT NULL, -- ID của entity
event_type TEXT NOT NULL, -- 'OrderCreated', 'PaymentCompleted'
payload JSONB NOT NULL, -- Event body
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- For polling-based relay
published_at TIMESTAMPTZ, -- NULL = chưa publish
-- For ordering
sequence_number BIGINT GENERATED ALWAYS AS IDENTITY,
-- For tracing
trace_id TEXT,
correlation_id TEXT
);
-- Index quan trọng: lookup unpublished
CREATE INDEX idx_outbox_unpublished
ON outbox (created_at)
WHERE published_at IS NULL;
-- Per-aggregate ordering
CREATE INDEX idx_outbox_aggregate
ON outbox (aggregate_type, aggregate_id, sequence_number);Insert pattern:
BEGIN;
-- Business logic
INSERT INTO orders (id, customer_id, total)
VALUES ('ord-123', 'cust-1', 500000);
-- Outbox event
INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES (
'Order',
'ord-123',
'OrderCreated',
'{"id": "ord-123", "customer_id": "cust-1", "total": 500000}'::jsonb
);
COMMIT;2.4 Two implementation styles
2.4.1 Style 1: Polling (đơn giản, latency cao)
Cơ chế: Một worker poll outbox table mỗi N giây, publish unpublished events, mark as published.
async def outbox_worker():
while True:
async with db.transaction():
# Lock row để tránh 2 worker process cùng lúc
rows = await db.fetch("""
SELECT id, aggregate_type, aggregate_id, event_type, payload
FROM outbox
WHERE published_at IS NULL
ORDER BY id
LIMIT 100
FOR UPDATE SKIP LOCKED
""")
for row in rows:
try:
await kafka.send(
topic=topic_for(row['aggregate_type']),
key=row['aggregate_id'],
value=row['payload'],
)
await db.execute(
"UPDATE outbox SET published_at = NOW() WHERE id = $1",
row['id']
)
except KafkaError:
# Don't mark as published; will retry next iteration
break
await asyncio.sleep(1.0) # Poll intervalPros:
- Simple, no extra infra
- Easy to debug (just SELECT outbox)
- Works with any DB
Cons:
- Latency = poll interval (e.g., 1s)
- DB load: continuous polling
- Doesn’t scale well past ~10K events/s
2.4.2 Style 2: CDC (Change Data Capture)
Cơ chế: Debezium đọc WAL (Write-Ahead Log) của DB, stream changes thành Kafka events real-time.
┌─────────────┐ WAL ┌──────────────┐ Kafka ┌──────────┐
│ PostgreSQL │ ────────►│ Debezium │ ─────────►│ Kafka │
│ │ changes │ (Kafka Conn) │ events │ │
└─────────────┘ └──────────────┘ └──────────┘
│
▼
┌──────────────┐
│ Consumers │
└──────────────┘
Pros:
- Latency < 100ms (real-time)
- No DB polling load
- Scales to 100K+ events/s
- Captures ALL changes (not just outbox)
Cons:
- Extra infrastructure (Kafka Connect cluster)
- Complex setup (logical replication slots)
- Schema evolution challenges
- Operational overhead
2.4.3 Hybrid: Outbox table + CDC
Best of both worlds (Debezium recommended):
- App ghi vào
outboxtable (controlled schema) - Debezium stream chỉ outbox table thông qua Outbox Event Router SMT
- Kafka topic structure:
<routedBy>.<aggregateType>(e.g.,events.Order,events.Payment)
# debezium-connector.json
{
"name": "outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "debezium",
"database.password": "debezium",
"database.dbname": "ordersdb",
"table.include.list": "public.outbox",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.replacement": "events.${routedByValue}",
"transforms.outbox.table.expand.json.payload": "true"
}
}Tham chiếu: https://debezium.io/documentation/reference/stable/transformations/outbox-event-router.html
2.5 Outbox Cleanup
Outbox table sẽ tăng vô hạn nếu không cleanup. Strategies:
| Strategy | Ưu điểm | Nhược điểm |
|---|---|---|
| Delete after publish | DB nhỏ | Mất audit trail; race condition risk |
| TTL (delete > 7 ngày) | Cân bằng | Cần job định kỳ |
| Archive to cold storage | Audit trail đầy đủ | Phức tạp |
| Partitioning by month | Easy drop old | Cần PostgreSQL 11+ với declarative partitioning |
Recommended: TTL 7-30 ngày + monthly archive.
-- Cleanup job (cron daily)
DELETE FROM outbox
WHERE published_at < NOW() - INTERVAL '7 days';2.6 Idempotent Consumer — Bắt buộc
Vì outbox đảm bảo at-least-once (có thể duplicate), consumer bắt buộc idempotent.
Pattern 1: Unique key check
def handle_order_created(event):
try:
db.execute(
"INSERT INTO inventory_reservations (event_id, ...) VALUES (...)",
event['event_id'], ...
)
except UniqueViolation:
# Đã xử lý, skip
log.info(f"Event {event['event_id']} already processed")Pattern 2: Outbox-style on consumer side
CREATE TABLE processed_events (
event_id TEXT PRIMARY KEY,
processed_at TIMESTAMPTZ DEFAULT NOW()
);
-- In handler
BEGIN;
INSERT INTO processed_events (event_id) VALUES ('evt-123');
-- ... business logic ...
COMMIT;Pattern 3: Versioning
def update_inventory(event):
db.execute("""
UPDATE inventory
SET quantity = %s, version = %s
WHERE product_id = %s AND version < %s
""", event['new_qty'], event['version'], event['product_id'], event['version'])
# Nếu version cũ → no-op2.7 Saga Pattern — Distributed Transactions
Vấn đề: Một business transaction span nhiều service. Không thể 2PC. Cần coordinate.
Saga: Chia transaction thành chuỗi local transactions, mỗi step có compensating action nếu sau đó fail.
Saga "Place Order":
Step 1: Order Service — create order (compensate: cancel order)
Step 2: Inventory Service — reserve items (compensate: release items)
Step 3: Payment Service — charge customer (compensate: refund)
Step 4: Shipping Service — schedule delivery (compensate: cancel shipment)
If Step 4 fails:
→ Compensate Step 3 (refund)
→ Compensate Step 2 (release items)
→ Compensate Step 1 (cancel order)
2.8 Saga Orchestration vs Choreography
2.8.1 Choreography — Event-driven, không có “boss”
Mỗi service publish event khi xong; service khác subscribe và tự quyết định action.
Order Service ──[OrderCreated]──► Kafka
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
Inventory Service Payment Service Notification Service
──[ItemsReserved]──► ──[PaymentCharged]──► (no further events)
│ │
└──────────┬───────────┘
▼
Order Service
──[OrderFulfilled]──►
Pros:
- Loose coupling
- Easy to add new services
- No SPOF
- Naturally scales
Cons:
- Hard to debug (“where is the order?“)
- Hard to track end-to-end status
- Compensation logic spread across services
- Cyclic dependencies risk
Latency: Sum of each step’s processing time + Kafka delivery (~50-100ms per step).
2.8.2 Orchestration — Có “boss” coordinator
Một Saga Orchestrator điều phối, gọi RPC tới mỗi service tuần tự.
┌─────────────────────┐
│ Saga Orchestrator │
│ (state machine) │
└──────────┬──────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
Order Service Inventory Service Payment Service
(RPC: create) (RPC: reserve) (RPC: charge)
Pros:
- Easy to track status (all in orchestrator)
- Compensation logic centralized
- Easy to debug (one place to look)
- Can implement complex flows (parallel, conditional)
Cons:
- Orchestrator = SPOF (cần HA)
- Tighter coupling
- Orchestrator có thể become “god class”
Implementation tools:
- Temporal (https://temporal.io/) — Uber’s Cadence open-sourced; production-grade
- Camunda (BPMN-based)
- AWS Step Functions (serverless)
- Netflix Conductor
Latency: Round-trip cho mỗi step (~10-50ms RPC) + state persistence.
2.8.3 So sánh chi tiết
| Tiêu chí | Choreography | Orchestration |
|---|---|---|
| Coupling | Loose | Tight (services biết orchestrator) |
| Latency per step | 50-100ms (Kafka) | 10-50ms (RPC) |
| Total latency (4 steps) | 200-400ms | 80-200ms |
| Debugging | Hard (distributed trace required) | Easy (look at orchestrator state) |
| Add new service | Easy (subscribe to events) | Modify orchestrator |
| Testing | Hard (need full event chain) | Easy (mock services) |
| Compensation | Distributed across services | Centralized |
| Use case | Open ecosystem, many teams | Enterprise workflows, regulatory |
| Tool | Kafka + custom | Temporal, AWS Step Functions |
Khuyến nghị:
- Choreography cho ≤3 services, simple flow
- Orchestration cho ≥4 services hoặc complex compensation
- Có thể hybrid: orchestrator gọi service, service publish event downstream
2.9 Saga Compensation Patterns
2.9.1 Backward Recovery (rollback)
Compensate by undoing previous steps in reverse order.
saga = OrderSaga()
try:
saga.create_order() # Step 1
saga.reserve_inventory() # Step 2
saga.charge_payment() # Step 3 — FAIL
except PaymentFailed:
saga.release_inventory() # Compensate 2
saga.cancel_order() # Compensate 12.9.2 Forward Recovery (retry + alternative)
Some operations can’t be undone (e.g., email sent). Try alternative path.
try:
saga.send_confirmation_email()
except EmailServiceDown:
saga.send_sms_instead() # Forward path2.9.3 Pivot Transactions
Some saga steps are “point of no return” — sau đó chỉ có forward recovery.
Step 1: Reserve items ← compensatable
Step 2: Charge payment ← compensatable (refund)
Step 3: Pivot — Ship ← NOT compensatable (đã giao)
Step 4: Send tracking ← retryable forward
Sau Pivot, không thể rollback. Phải hoàn thành forward.
2.10 Inbox Pattern — Idempotency on Receiver
Counterpart của Outbox: Inbox đảm bảo idempotency consumer-side.
CREATE TABLE inbox (
event_id TEXT PRIMARY KEY,
received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
processed BOOLEAN NOT NULL DEFAULT false,
payload JSONB
);def consume(event):
with db.transaction():
try:
db.execute(
"INSERT INTO inbox (event_id, payload) VALUES (%s, %s)",
event['id'], event['payload']
)
except UniqueViolation:
return # Already received
# Process event
process_business_logic(event)
db.execute(
"UPDATE inbox SET processed = true WHERE event_id = %s",
event['id']
)2.11 Real-world implementations
| System | Approach |
|---|---|
| Stripe | Idempotency keys on API; outbox internally for webhooks |
| Shopify | Outbox + Kafka for inventory events |
| Uber | Cadence (now Temporal) for orchestration |
| Netflix | Conductor for orchestration; Choreography for parts |
| Amazon | Step Functions for many internal workflows |
| GitHub | Outbox for webhook delivery |
| Confluent | Standard recommendation: Outbox + Debezium CDC |
3. Estimation — Latency & Throughput
3.1 Outbox latency budget
Polling style (poll interval 1s):
- P50: 500ms (random in poll window)
- P99: 1100ms (poll + Kafka send + ack)
CDC style (Debezium):
- P50: 50ms
- P99: 200ms
Comparison:
| Component | Polling | CDC |
|---|---|---|
| Detect change | 0-1000ms | <10ms (WAL tail) |
| Read | 5ms (DB query) | 1ms (in-memory) |
| Publish to Kafka | 5ms | 5ms |
| Total P50 | ~510ms | ~16ms |
| Total P99 | ~1100ms | ~200ms |
3.2 Throughput Capacity
Polling:
- Single worker: ~5K events/s (limited by DB query overhead)
- With multiple workers +
FOR UPDATE SKIP LOCKED: ~20-50K events/s - Beyond that: DB IOPS becomes bottleneck
CDC:
- Debezium single connector: ~10-30K events/s
- Multiple connectors (sharded): 100K+ events/s
- Limited by Kafka producer + DB WAL throughput
3.3 Saga latency
Example: 4-step saga
| Pattern | Step latency | Total (sequential) |
|---|---|---|
| Choreography (Kafka) | ~80ms (publish + consume + process) | ~320ms |
| Orchestration (RPC) | ~30ms (RPC + processing) | ~120ms |
| Choreography (parallel) | n/a (sequential by nature) | ~320ms |
| Orchestration (parallel steps 2-3) | step1 + max(step2, step3) + step4 | ~90ms |
Aha: Orchestration is faster but tighter coupled. Choreography is slower but more decoupled.
3.4 Outbox table size estimation
Scenario: 1000 orders/s, 5 events per order, 1KB per event, 7-day retention.
Events/day = 1000 × 5 × 86400 = 432M events/day
Storage/day = 432M × 1KB = 432 GB/day
Retention = 432 × 7 = 3.024 TB
→ Đáng kể. Cần:
- Partitioning (monthly)
- Compression (TOAST)
- Cleanup job
- Hoặc: chỉ giữ unpublished + archive published
3.5 Saga state storage (Orchestration)
Temporal ví dụ:
- 1M concurrent sagas, mỗi saga state ~10KB → 10GB
- Add history: ~100 events/saga × 500 bytes = 50KB → 50GB total
- Cassandra cluster 5 nodes, ~10GB/node → fits comfortably
4. Security First — Outbox Vulnerabilities
4.1 Outbox Table = Sensitive Data Store
Outbox chứa payload đầy đủ của event — có thể bao gồm:
- PII (customer info)
- Financial data (amount, account number)
- Auth tokens (đừng làm thế!)
Mitigation:
-- Encrypt payload at rest
CREATE EXTENSION pgcrypto;
INSERT INTO outbox (..., payload_encrypted)
VALUES (..., pgp_sym_encrypt(payload::text, current_setting('app.encryption_key')));
-- Reader decrypts
SELECT pgp_sym_decrypt(payload_encrypted, ...)::jsonb FROM outbox;Hoặc better: lưu reference thay vì full data:
// BAD
{"event": "PaymentCompleted", "card_number": "4111111111111111"}
// GOOD
{"event": "PaymentCompleted", "payment_id": "pay-123"}
// Consumer fetches full data via authenticated API4.2 Event Tampering — Signing
Nếu consumer tin tưởng event payload mù quáng → attacker có thể inject malicious events.
Mitigation: HMAC signing
import hmac, hashlib, json
def sign_event(payload, secret_key):
payload_bytes = json.dumps(payload, sort_keys=True).encode()
signature = hmac.new(secret_key, payload_bytes, hashlib.sha256).hexdigest()
return {"payload": payload, "signature": signature}
def verify_event(event, secret_key):
payload_bytes = json.dumps(event['payload'], sort_keys=True).encode()
expected = hmac.new(secret_key, payload_bytes, hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, event['signature']):
raise SecurityError("Invalid signature")4.3 Saga Compensation = Attack Surface
Compensation actions thường có higher privilege (e.g., refund, cancel order). Attacker có thể trigger fake “failure” → trigger compensation → free refund.
Mitigation:
- Authenticate inter-service calls (mTLS)
- Validate compensation conditions (don’t refund if amount=0)
- Audit log every compensation
- Rate limit compensations per user/account
4.4 Replay Attack
Attacker capture event → replay nhiều lần → trigger duplicate side effect.
Mitigation:
- Idempotency key bắt buộc trên consumer
- Event TTL: reject events > 24h cũ
- Monotonic sequence number check
4.5 Audit Trail
Outbox table = audit trail tự nhiên. Nhưng cần:
- Append-only (REVOKE UPDATE/DELETE on outbox FROM application_user)
- Separate read-only role for auditors
- Forward to immutable log (S3 with Object Lock, blockchain)
5. DevOps — Vận hành Outbox & Saga
5.1 Docker Compose: Postgres + Debezium + Kafka
# docker-compose.yml
version: "3.8"
services:
postgres:
image: postgres:15
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: ordersdb
command:
- "postgres"
- "-c"
- "wal_level=logical"
- "-c"
- "max_replication_slots=10"
- "-c"
- "max_wal_senders=10"
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on: [zookeeper]
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
ports:
- "9093:9093"
debezium:
image: debezium/connect:2.5
depends_on: [kafka, postgres]
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_statuses
ports:
- "8083:8083"
volumes:
pgdata:Setup steps:
# 1. Start
docker compose up -d
# 2. Create outbox table
docker exec -it postgres psql -U app -d ordersdb -c "
CREATE TABLE outbox (
id BIGSERIAL PRIMARY KEY,
aggregate_type TEXT NOT NULL,
aggregate_id TEXT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
"
# 3. Register Debezium connector
curl -X POST http://localhost:8083/connectors -H 'Content-Type: application/json' -d '{
"name": "outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.user": "app",
"database.password": "secret",
"database.dbname": "ordersdb",
"topic.prefix": "ordersdb",
"table.include.list": "public.outbox",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.replacement": "events.${routedByValue}",
"plugin.name": "pgoutput"
}
}'
# 4. Test: insert and verify
docker exec -it postgres psql -U app -d ordersdb -c "
INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES ('Order', 'ord-1', 'OrderCreated', '{\"id\":\"ord-1\"}'::jsonb);
"
# Check Kafka
docker exec -it kafka kafka-console-consumer.sh \
--bootstrap-server kafka:9092 \
--topic events.Order \
--from-beginning5.2 Prometheus Metrics & Alerts
groups:
- name: outbox_alerts
rules:
# Outbox lag growing → relay can't keep up
- alert: OutboxLagHigh
expr: outbox_unpublished_count > 10000
for: 5m
labels: { severity: warning }
annotations:
summary: "Outbox has {{ $value }} unpublished events"
description: "Relay lagging — check Debezium/worker health"
# Old unpublished events
- alert: OutboxOldestUnpublishedHigh
expr: outbox_oldest_unpublished_seconds > 300
for: 5m
labels: { severity: critical }
annotations:
summary: "Oldest unpublished event is {{ $value }}s old"
# Debezium connector failed
- alert: DebeziumConnectorDown
expr: kafka_connect_connector_status{state!="RUNNING"} > 0
for: 1m
labels: { severity: critical }
# WAL replication slot lag
- alert: PostgresReplicationSlotLag
expr: pg_replication_slot_lag_bytes > 1073741824 # 1 GB
for: 10m
labels: { severity: warning }
annotations:
description: "Slot {{ $labels.slot_name }} has {{ $value | humanize1024 }}B lag"
# Saga compensation rate spike
- alert: SagaCompensationRateHigh
expr: rate(saga_compensation_total[5m]) > 10
for: 5m
labels: { severity: warning }
annotations:
description: "{{ $value }}/s sagas compensating — investigate downstream failures"5.3 Grafana Dashboard
| Panel | Query | Mục đích |
|---|---|---|
| Outbox lag | outbox_unpublished_count | Phát hiện relay chậm |
| Events published/s | rate(outbox_events_published_total[5m]) | Throughput |
| P99 publish latency | histogram_quantile(0.99, rate(outbox_publish_duration_bucket[5m])) | SLO |
| Debezium connector status | kafka_connect_connector_status | Health |
| WAL replication lag | pg_replication_slot_lag_bytes | DB health |
| Saga in-flight | saga_in_flight_count | Capacity |
| Saga success rate | rate(saga_completed_total[5m]) / rate(saga_started_total[5m]) | Quality |
5.4 Disaster Scenarios
Scenario A: Debezium connector down
Symptom: Outbox events accumulate, no Kafka events. Recovery:
- Check Connect cluster health
- Restart connector:
curl -X POST http://localhost:8083/connectors/outbox-connector/restart - If WAL slot bloated → consider drop + recreate slot (lose unpublished, but they retry from outbox table)
Scenario B: Kafka cluster down
Symptom: Debezium retries indefinitely. Recovery:
- Outbox table accumulates → check disk space
- Once Kafka recovers, Debezium auto-resumes from last offset
Scenario C: Outbox table full disk
Symptom: INSERT INTO outbox fails → application errors. Recovery:
- Run cleanup:
DELETE FROM outbox WHERE published_at < NOW() - INTERVAL '1 day' - If no published events → check relay first
Scenario D: Saga orchestrator crash
Symptom: In-flight sagas frozen. Recovery:
- With Temporal: workers reconnect, sagas resume from last state
- Without Temporal: depends on state storage. If using DB, manual recovery.
5.5 Testing Saga End-to-End
import pytest
from testcontainers.compose import DockerCompose
@pytest.fixture(scope="module")
def saga_env():
with DockerCompose(".", compose_file_name="docker-compose.test.yml") as env:
env.wait_for("http://localhost:8083/connectors")
yield env
def test_order_saga_happy_path(saga_env):
response = requests.post("http://order-service/orders", json={...})
order_id = response.json()['id']
# Wait for saga completion
deadline = time.time() + 30
while time.time() < deadline:
order = requests.get(f"http://order-service/orders/{order_id}").json()
if order['status'] == 'completed':
return
time.sleep(0.5)
pytest.fail("Saga did not complete in 30s")
def test_order_saga_payment_failure_compensates(saga_env):
# Simulate payment failure
requests.post("http://payment-service/_test/fail-next", json={"count": 1})
response = requests.post("http://order-service/orders", json={...})
order_id = response.json()['id']
time.sleep(5)
# Verify compensation
order = requests.get(f"http://order-service/orders/{order_id}").json()
assert order['status'] == 'cancelled'
inventory = requests.get(f"http://inventory-service/items/{...}").json()
assert inventory['reserved'] == 0 # Released6. Code Implementation
6.1 Python Outbox Pattern
"""
Outbox Pattern — Python implementation with Postgres + Kafka
"""
import asyncio
import json
import logging
from contextlib import asynccontextmanager
from datetime import datetime
from uuid import uuid4
import asyncpg
from aiokafka import AIOKafkaProducer
log = logging.getLogger(__name__)
class OutboxPublisher:
"""Single transaction boundary: business write + outbox event."""
def __init__(self, pool: asyncpg.Pool):
self.pool = pool
@asynccontextmanager
async def transaction(self):
async with self.pool.acquire() as conn:
async with conn.transaction():
yield OutboxTransaction(conn)
class OutboxTransaction:
def __init__(self, conn):
self.conn = conn
async def execute(self, query, *args):
return await self.conn.execute(query, *args)
async def fetch(self, query, *args):
return await self.conn.fetch(query, *args)
async def emit(self, aggregate_type: str, aggregate_id: str,
event_type: str, payload: dict, trace_id: str = None):
"""Emit an event to outbox. Will be published after commit."""
event_id = str(uuid4())
await self.conn.execute("""
INSERT INTO outbox
(id, aggregate_type, aggregate_id, event_type, payload, trace_id)
VALUES ($1, $2, $3, $4, $5, $6)
""", event_id, aggregate_type, aggregate_id, event_type,
json.dumps(payload), trace_id)
return event_id
# === Business Logic ===
class OrderService:
def __init__(self, publisher: OutboxPublisher):
self.publisher = publisher
async def create_order(self, customer_id: str, items: list, total: int):
order_id = str(uuid4())
async with self.publisher.transaction() as txn:
# Business write
await txn.execute("""
INSERT INTO orders (id, customer_id, total, status)
VALUES ($1, $2, $3, 'pending')
""", order_id, customer_id, total)
await txn.execute("""
INSERT INTO order_items (order_id, product_id, quantity)
SELECT $1, product_id, quantity FROM jsonb_to_recordset($2::jsonb)
AS x(product_id TEXT, quantity INT)
""", order_id, json.dumps(items))
# Outbox event in same transaction
await txn.emit(
aggregate_type='Order',
aggregate_id=order_id,
event_type='OrderCreated',
payload={
'id': order_id,
'customer_id': customer_id,
'items': items,
'total': total,
'created_at': datetime.utcnow().isoformat(),
}
)
return order_id
# === Polling Relay ===
class OutboxRelay:
"""Polls outbox table and publishes to Kafka."""
def __init__(self, pool: asyncpg.Pool, kafka_servers: str,
batch_size: int = 100, poll_interval: float = 1.0):
self.pool = pool
self.kafka_servers = kafka_servers
self.batch_size = batch_size
self.poll_interval = poll_interval
self.producer: AIOKafkaProducer = None
async def start(self):
self.producer = AIOKafkaProducer(
bootstrap_servers=self.kafka_servers,
enable_idempotence=True,
acks='all',
)
await self.producer.start()
try:
await self._loop()
finally:
await self.producer.stop()
async def _loop(self):
while True:
try:
published = await self._publish_batch()
if published == 0:
await asyncio.sleep(self.poll_interval)
except Exception:
log.exception("Relay error")
await asyncio.sleep(self.poll_interval)
async def _publish_batch(self) -> int:
async with self.pool.acquire() as conn:
async with conn.transaction():
# SKIP LOCKED: multiple workers can run concurrently
rows = await conn.fetch("""
SELECT id, aggregate_type, aggregate_id, event_type, payload, trace_id
FROM outbox
WHERE published_at IS NULL
ORDER BY id
LIMIT $1
FOR UPDATE SKIP LOCKED
""", self.batch_size)
if not rows:
return 0
published_ids = []
for row in rows:
topic = f"events.{row['aggregate_type']}"
headers = [
('event_type', row['event_type'].encode()),
('trace_id', (row['trace_id'] or '').encode()),
]
try:
await self.producer.send_and_wait(
topic=topic,
key=row['aggregate_id'].encode(),
value=row['payload'].encode() if isinstance(row['payload'], str)
else json.dumps(row['payload']).encode(),
headers=headers,
)
published_ids.append(row['id'])
except Exception:
log.exception(f"Failed to publish event {row['id']}")
break
if published_ids:
await conn.execute("""
UPDATE outbox SET published_at = NOW()
WHERE id = ANY($1::bigint[])
""", published_ids)
return len(published_ids)
# === Idempotent Consumer ===
class IdempotentConsumer:
"""Consumer with inbox pattern."""
def __init__(self, pool: asyncpg.Pool):
self.pool = pool
async def handle(self, event: dict, processor):
event_id = event['id']
async with self.pool.acquire() as conn:
async with conn.transaction():
# Try to mark as received
try:
await conn.execute("""
INSERT INTO inbox (event_id) VALUES ($1)
""", event_id)
except asyncpg.UniqueViolationError:
log.info(f"Event {event_id} already processed")
return
# Process
await processor(conn, event)
# === Main ===
async def main():
pool = await asyncpg.create_pool(
"postgresql://app:secret@localhost/ordersdb",
min_size=2, max_size=10,
)
publisher = OutboxPublisher(pool)
order_service = OrderService(publisher)
# Create order
order_id = await order_service.create_order(
customer_id='cust-1',
items=[{'product_id': 'p1', 'quantity': 2}],
total=500000,
)
print(f"Created order {order_id}")
# Start relay (in production: separate process)
relay = OutboxRelay(pool, "localhost:9093")
await relay.start()
if __name__ == "__main__":
asyncio.run(main())6.2 Saga Orchestration với Temporal (skeleton)
"""
Saga Orchestration with Temporal
Reference: https://temporal.io/
"""
from datetime import timedelta
from temporalio import workflow, activity
from temporalio.client import Client
from temporalio.worker import Worker
# === Activities (idempotent steps) ===
@activity.defn
async def create_order(customer_id: str, items: list, total: int) -> str:
# Call OrderService API
order_id = await order_api.create(customer_id, items, total)
return order_id
@activity.defn
async def cancel_order(order_id: str):
await order_api.cancel(order_id)
@activity.defn
async def reserve_inventory(order_id: str, items: list) -> str:
reservation_id = await inventory_api.reserve(order_id, items)
return reservation_id
@activity.defn
async def release_inventory(reservation_id: str):
await inventory_api.release(reservation_id)
@activity.defn
async def charge_payment(order_id: str, amount: int) -> str:
payment_id = await payment_api.charge(order_id, amount)
return payment_id
@activity.defn
async def refund_payment(payment_id: str):
await payment_api.refund(payment_id)
@activity.defn
async def schedule_shipping(order_id: str):
await shipping_api.schedule(order_id)
# === Workflow (Orchestrator) ===
@workflow.defn
class PlaceOrderSaga:
@workflow.run
async def run(self, customer_id: str, items: list, total: int):
order_id = None
reservation_id = None
payment_id = None
try:
# Step 1: Create order
order_id = await workflow.execute_activity(
create_order, args=[customer_id, items, total],
start_to_close_timeout=timedelta(seconds=30),
retry_policy={"maximum_attempts": 3},
)
# Step 2: Reserve inventory
reservation_id = await workflow.execute_activity(
reserve_inventory, args=[order_id, items],
start_to_close_timeout=timedelta(seconds=30),
retry_policy={"maximum_attempts": 5},
)
# Step 3: Charge payment
payment_id = await workflow.execute_activity(
charge_payment, args=[order_id, total],
start_to_close_timeout=timedelta(seconds=30),
retry_policy={"maximum_attempts": 3},
)
# Step 4 (pivot): Schedule shipping — no compensation after this
await workflow.execute_activity(
schedule_shipping, args=[order_id],
start_to_close_timeout=timedelta(seconds=60),
retry_policy={"maximum_attempts": 10},
)
return {"order_id": order_id, "status": "completed"}
except Exception as e:
# Compensate in reverse order
workflow.logger.error(f"Saga failed: {e}, compensating")
if payment_id:
await workflow.execute_activity(
refund_payment, args=[payment_id],
start_to_close_timeout=timedelta(seconds=30),
)
if reservation_id:
await workflow.execute_activity(
release_inventory, args=[reservation_id],
start_to_close_timeout=timedelta(seconds=30),
)
if order_id:
await workflow.execute_activity(
cancel_order, args=[order_id],
start_to_close_timeout=timedelta(seconds=30),
)
return {"order_id": order_id, "status": "cancelled", "reason": str(e)}
# === Worker setup ===
async def run_worker():
client = await Client.connect("localhost:7233")
worker = Worker(
client,
task_queue="order-saga",
workflows=[PlaceOrderSaga],
activities=[create_order, cancel_order, reserve_inventory,
release_inventory, charge_payment, refund_payment,
schedule_shipping],
)
await worker.run()
# === Trigger saga ===
async def start_saga(customer_id, items, total):
client = await Client.connect("localhost:7233")
handle = await client.start_workflow(
PlaceOrderSaga.run,
args=[customer_id, items, total],
id=f"order-saga-{customer_id}-{int(time.time())}",
task_queue="order-saga",
)
return await handle.result()7. System Design Diagrams
7.1 Outbox Pattern — Big Picture
flowchart TB Client[Client] -->|POST /orders| App[Application] subgraph Atomic["Single DB Transaction"] OrderTbl[(orders table)] OutboxTbl[(outbox table)] end App --> OrderTbl App --> OutboxTbl subgraph Relay["Outbox Relay"] Poller[Polling Worker<br/>or Debezium CDC] end OutboxTbl -.read.-> Poller Poller -->|publish| Kafka[(Kafka)] Kafka --> Inv[Inventory Service] Kafka --> Notif[Notification Service] Kafka --> Analytics[Analytics Service] style Atomic fill:#e1f5fe,stroke:#01579b style Relay fill:#fff9c4,stroke:#f57f17
7.2 Outbox vs Naive Dual-Write
flowchart LR subgraph Bad["NAIVE (broken)"] direction TB B1[BEGIN tx] --> B2[INSERT order] B2 --> B3[COMMIT] B3 --> B4[kafka.send<br/>⚡ may fail ⚡] B4 --> B5[Inconsistent!] end subgraph Good["OUTBOX (correct)"] direction TB G1[BEGIN tx] --> G2[INSERT order] G2 --> G3[INSERT outbox] G3 --> G4[COMMIT atomic] G4 --> G5[Relay reads outbox<br/>publishes to Kafka<br/>retries until success] G5 --> G6[Consistent ✓] end style Bad fill:#ffcdd2,color:#000 style Good fill:#c8e6c9,color:#000
7.3 CDC with Debezium
sequenceDiagram participant App participant PG as PostgreSQL participant WAL as PG WAL participant Deb as Debezium participant K as Kafka participant C as Consumer App->>PG: BEGIN App->>PG: INSERT orders App->>PG: INSERT outbox App->>PG: COMMIT PG->>WAL: write WAL records Deb->>WAL: tail (logical replication) WAL-->>Deb: change event Deb->>Deb: filter outbox table<br/>apply EventRouter SMT Deb->>K: publish to events.Order K-->>C: deliver event C->>C: process (idempotent)
7.4 Saga Choreography
flowchart LR O[Order Service] -->|OrderCreated| K1[(Kafka: events.Order)] K1 --> I[Inventory Service] K1 --> P[Payment Service] I -->|ItemsReserved| K2[(Kafka: events.Inventory)] P -->|PaymentCharged| K3[(Kafka: events.Payment)] K2 --> S[Shipping Service] K3 --> S S -->|OrderShipped| K4[(Kafka: events.Shipping)] K4 --> O style O fill:#bbdefb style I fill:#c8e6c9 style P fill:#fff9c4 style S fill:#ffe0b2
7.5 Saga Orchestration
sequenceDiagram participant C as Client participant Orch as Saga Orchestrator participant O as Order Service participant I as Inventory Service participant P as Payment Service participant S as Shipping Service C->>Orch: PlaceOrder(customer, items, total) Orch->>O: createOrder() O-->>Orch: order_id Orch->>I: reserveItems(order_id, items) I-->>Orch: reservation_id Orch->>P: chargePayment(order_id, total) alt payment success P-->>Orch: payment_id Orch->>S: scheduleShipping(order_id) S-->>Orch: tracking_id Orch-->>C: completed else payment fail P-->>Orch: error Note over Orch: Compensate in reverse Orch->>I: releaseItems(reservation_id) Orch->>O: cancelOrder(order_id) Orch-->>C: cancelled end
7.6 Inbox Pattern (Consumer Idempotency)
flowchart TB K[(Kafka)] --> C[Consumer] subgraph TX["Single DB Transaction"] Inbox[(inbox table)] Business[(business tables)] end C -->|"INSERT inbox<br/>(event_id)"| Inbox Inbox -->|"if duplicate<br/>UNIQUE error"| Skip[Skip — already processed] Inbox -->|"if new"| Business Business --> Commit[COMMIT] style TX fill:#e1f5fe style Skip fill:#ffe0b2 style Commit fill:#c8e6c9
8. Aha Moments & Pitfalls
Aha Moments
#1: Dual-write problem là vô phương cứu chữa mà không có pattern. 2PC giữa Postgres + Kafka không khả thi trong production. Outbox biến 2 writes thành 1 atomic write + 1 async publish.
#2: Outbox không khó implement — nó chỉ là 1 thêm INSERT trong cùng transaction. Cái khó là operationalizing: relay HA, lag monitoring, cleanup, idempotent consumer.
#3: Debezium = Outbox CDC mode. Em không cần tự code polling worker. Debezium read WAL real-time và route events theo
aggregate_typefield.
#4: Idempotent consumer là MUST. Outbox cho at-least-once. Consumer nhận duplicate → must handle. Pattern phổ biến: inbox table với UNIQUE event_id.
#5: Saga ≠ Distributed Transaction. Saga không cung cấp ACID. Có thể có intermediate state inconsistency (e.g., payment đã charge nhưng order chưa created yet). UI phải design cho điều này.
#6: Choreography vs Orchestration không phải binary. Có thể hybrid: orchestrator cho main flow, choreography cho side effects (notifications, analytics).
#7: Compensation phải idempotent. Refund 2 lần = mất tiền. Cancel order 2 lần = không sao. Design every compensation operation idempotent (UPDATE WHERE status=…).
#8: Pivot transaction là điểm “no return”. Sau ship, không undo được. Phải xác định pivot rõ ràng trong saga design.
Pitfalls — Sai lầm thường gặp
Pitfall 1: Send Kafka message rồi commit DB
# BAD
kafka.send(...) # Sent before commit
db.commit() # Commit may fail
# GOOD: outbox
db.execute("INSERT INTO outbox ...")
db.commit()
# Relay publishesPitfall 2: Outbox không có index
Relay query
WHERE published_at IS NULLmà thiếu partial index → full table scan → slow → backlog grow.
-- Fix
CREATE INDEX idx_outbox_unpublished ON outbox (created_at) WHERE published_at IS NULL;Pitfall 3: Đợi 2PC với Kafka
Kafka không support XA. Đừng tìm cách 2PC. Dùng outbox.
Pitfall 4: Forget cleanup
Outbox grow vô tận → 100M rows → query chậm → relay fall behind.
-- Fix: daily cleanup job
DELETE FROM outbox WHERE published_at < NOW() - INTERVAL '7 days';Pitfall 5: Consumer không idempotent
Network glitch → relay republish → duplicate. Consumer process 2 lần → double-charge customer.
# Fix: inbox pattern
try:
db.execute("INSERT INTO inbox (event_id) VALUES (%s)", event['id'])
except UniqueViolation:
return # Skip
process(event)Pitfall 6: Saga compensation không idempotent
# BAD
def refund(payment_id):
payment = get(payment_id)
refund_amount(payment.amount) # Called twice → 2x refund!
# GOOD
def refund(payment_id):
db.execute("""
UPDATE payments SET status='refunded', refunded_at=NOW()
WHERE id = %s AND status = 'charged'
""", payment_id)
if rowcount == 0:
return # Already refunded, skip
refund_amount(payment.amount)Pitfall 7: Dùng choreography cho complex flow
6 services, mỗi service publish event, listen 2-3 events khác → event spaghetti. Không ai hiểu state của order.
Fix: Switch to orchestration với Temporal/Step Functions cho complex flows. Choreography phù hợp ≤ 3 services hoặc các side effects.
Pitfall 8: Outbox publish mất ordering
Default polling/Debezium không guarantee per-aggregate order nếu publish parallel. Inventory event before Order event → consumer confused.
Fix: Use
aggregate_idas Kafka partition key → events of same aggregate go to same partition → ordered.
Pitfall 9: WAL slot bloat
Debezium connector down lâu → WAL slot lag → Postgres giữ WAL files → disk fill up → DB crash.
Fix: Monitor
pg_replication_slot_lag_bytes. Alert khi > 1GB. Có thể drop slot nếu acceptable mất events (sẽ resync from outbox).
Pitfall 10: Outbox event payload thay đổi
Schema evolution: thêm field vào event → consumer cũ không hiểu → break.
Fix: Use schema registry (Confluent Schema Registry, Apicurio). Schemas backward + forward compatible. Avro hoặc Protobuf.
9. Internal Links — Liên kết kiến thức
Outbox & Saga trong các tuần
| Tuần | Liên hệ |
|---|---|
| Tuan-08-Message-Queue | Kafka transactional producer = alternative outbox; ISR replication |
| Tuan-11-Microservices-Pattern | Saga là core microservice pattern; Outbox enable event-driven |
| Tuan-Bonus-Consensus-Raft-Paxos | Temporal dùng Cassandra/Postgres + leader election (not Raft) |
| Tuan-Bonus-Consistency-Models-Isolation | Outbox = atomic single-DB transaction; consumer cần idempotency |
| Case-Design-Payment-System | Payment dùng outbox + saga cho refund flow |
| Case-Design-Digital-Wallet | Wallet update + event sourcing (similar to outbox) |
| Case-Design-Hotel-Reservation-System | Booking saga: reserve room → charge → confirm |
| Case-Design-Distributed-Message-Queue | Kafka transactional API |
| Tuan-13-Monitoring-Observability | Monitor outbox lag, saga in-flight, compensation rate |
Tham khảo bắt buộc đọc
Books:
- Chris Richardson, Microservices Patterns — Ch.4 (Saga), Ch.6 (Domain Events) — https://microservices.io/book
- Martin Kleppmann, DDIA — Ch.11 (Stream Processing) — https://dataintensive.net/
Patterns:
- Microservices.io — Outbox: https://microservices.io/patterns/data/transactional-outbox.html
- Microservices.io — Saga: https://microservices.io/patterns/data/saga.html
- Microservices.io — Inbox: https://microservices.io/patterns/data/idempotent-consumer.html
Engineering blogs:
- Confluent, Outbox Pattern for Microservices — https://www.confluent.io/blog/microservices-outbox-pattern-for-data-consistency/
- Debezium, Outbox Event Router — https://debezium.io/documentation/reference/stable/transformations/outbox-event-router.html
- Stripe, Idempotency keys — https://stripe.com/blog/idempotency
- Stripe, Designing for failure — https://stripe.com/blog/designing-for-failure
- Uber Engineering, Cadence/Temporal origin — https://www.uber.com/blog/cadence-microservice-architecture/
Papers:
- Garcia-Molina & Salem, Sagas (1987) — http://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf
- Hellerstein, Looking Inside the Box: Programming with Outboxes (2020) — recent academic treatment
Tools:
- Debezium — https://debezium.io/
- Temporal — https://temporal.io/
- AWS Step Functions — https://aws.amazon.com/step-functions/
- Netflix Conductor — https://conductor.netflix.com/
Hoàn thành Batch A bonus chapters: Tuan-Bonus-Consensus-Raft-Paxos · Tuan-Bonus-Consistency-Models-Isolation · Tuan-Bonus-Outbox-Pattern
Quay lại: Tuan-11-Microservices-Pattern để áp dụng các pattern này vào microservice design.