Tuần 02 — Schema Design & Normalization

“Schema xấu không kill bạn ngay. Nó kill bạn 2 năm sau khi data đã 500GB, business logic đã chằng chịt, và mọi migration đều rủi ro. Schema tốt rẻ hơn 100x nếu thiết kế đúng từ đầu.”

Tags: database schema-design normalization postgresql data-modeling Thời lượng: 7 ngày (4-6h/ngày) Prerequisites: Tuan-01-DB-Internals-Refresh — hiểu tuple, page, TOAST giúp đánh giá schema decisions Liên quan: Tuan-03-Indexing-Mastery · Tuan-08-Zero-Downtime-Migration · Tuan-16-ORM-CQRS-Multi-Tenancy

1. Context & Why

1.1 Schema là quyết định “khó hoàn tác nhất”

Code refactor = 1 tuần. Schema migration trên bảng 100M rows = vài tuần planning + tuần thực hiện + rủi ro production. Vì vậy schema design là quyết định có blast radius lớn nhất trong vòng đời một sản phẩm.

graph LR
    A[Code change<br/>1 hour] --> B[Deploy<br/>15 min]
    C[API change<br/>1 day] --> D[Version + deprecate<br/>weeks]
    E[Schema change<br/>1 week design] --> F[Migration on prod<br/>weeks-months]
    G[Data model wrong<br/>1 quarter to realize] --> H[Painful rewrite<br/>6+ months]

    style A fill:#c8e6c9
    style C fill:#fff9c4
    style E fill:#ffccbc
    style G fill:#ef9a9a

1.2 Tuần này dạy gì

Normalization 1NF → BCNF thực dụng (không hàn lâm)
Khi nào denormalize — quan trọng hơn normalization
Key strategy: surrogate vs natural vs composite, UUID v7, ULID, Snowflake (2024-2026 patterns)
Data types Postgres — chọn đúng tiết kiệm 30-50% storage
Constraints: CHECK, EXCLUDE, foreign key — pháo đài chống bad data
Soft delete, audit log, temporal tables — pattern chuẩn 2024
Anti-patterns: EAV, polymorphic FK, JSON-everywhere
Multi-tenancy schema design (preview, đào sâu Tuần 16)

1.3 Tham chiếu chính

Database Design for Mere Mortals — Hernandez (5th ed 2024)
SQL Antipatterns: Avoiding the Pitfalls of Database Programming — Bill Karwin (2nd ed 2022)
PostgreSQL Data Types — https://www.postgresql.org/docs/current/datatype.html
The Art of PostgreSQL — Dimitri Fontaine
DDIA (Kleppmann) Ch.2 (Data Models)
Postgres Weekly — https://postgresweekly.com/ (cập nhật pattern mới)
Patterns for Managing Source Code Branches — Martin Fowler (tangential, cho versioning thinking)

2. Normalization — Lý thuyết tối thiểu, thực dụng tối đa

2.1 Tại sao normalize?

Trước normalize:

order_id	customer_name	customer_email	customer_phone	product_name	product_price	qty
1	Alice	[email protected]	0900	iPhone	999	1
2	Alice	[email protected]	0900	iPad	599	2
3	Bob	[email protected]	0901	iPhone	999	1

Vấn đề:

Update anomaly: Alice đổi email → update 1000 rows
Insert anomaly: muốn lưu customer mới mà chưa có order → không thể
Delete anomaly: xóa order cuối của Bob → mất luôn info Bob
Storage waste: customer info lặp lại
Inconsistency risk: gõ sai 1 chỗ → có Alice với 2 email khác nhau

2.2 1NF — First Normal Form

Rule: Mỗi cell chứa đúng 1 atomic value, không có nhóm lặp lại.

❌ Vi phạm:

CREATE TABLE users (
    id int,
    name text,
    phones text  -- '0900,0901,0902'  (multiple values in one cell)
);

✅ 1NF:

CREATE TABLE users (id int, name text);
CREATE TABLE user_phones (user_id int, phone text);

Ngoại lệ 2024: Array types Postgres và JSONB có thể vi phạm 1NF nhưng được dùng hợp lý trong nhiều case:

-- Array Postgres - vẫn 1NF kỹ thuật vì là single value với operator dedicated
CREATE TABLE articles (
    id int,
    title text,
    tags text[]   -- ['rust', 'database']
);
 
-- Indexable by GIN, queryable: WHERE 'rust' = ANY(tags)

Khi nào dùng array/JSONB thay vì child table:

Tags, labels — số lượng ít, không cần JOIN
Settings/preferences — không cần query individual value với index
Embedded data luôn truy cập cùng parent

Khi nào KHÔNG dùng:

Cần JOIN, FK, constraint trên element
Element có thuộc tính riêng (vd: tag có description, color)
Số lượng element có thể vô hạn

2.3 2NF — partial dependency

Rule: Mỗi non-key column phụ thuộc vào toàn bộ primary key, không phải một phần.

Chỉ áp dụng khi có composite key. Với surrogate single-column PK (id), 2NF tự động pass.

❌ Vi phạm:

CREATE TABLE order_items (
    order_id int,
    product_id int,
    product_name text,   -- phụ thuộc product_id, không cần order_id
    qty int,
    PRIMARY KEY (order_id, product_id)
);

✅ 2NF:

CREATE TABLE products (id int PRIMARY KEY, name text);
CREATE TABLE order_items (
    order_id int,
    product_id int REFERENCES products(id),
    qty int,
    PRIMARY KEY (order_id, product_id)
);

2.4 3NF — transitive dependency

Rule: Non-key column không phụ thuộc vào non-key column khác.

❌ Vi phạm:

CREATE TABLE orders (
    id int PRIMARY KEY,
    customer_id int,
    customer_name text,    -- phụ thuộc customer_id, không phải order id
    customer_email text,   -- transitive
    total numeric
);

✅ 3NF:

CREATE TABLE customers (id int PRIMARY KEY, name text, email text);
CREATE TABLE orders (
    id int PRIMARY KEY,
    customer_id int REFERENCES customers(id),
    total numeric
);

2.5 BCNF — Boyce-Codd

Stricter version 3NF. Trong 99% case thực tế, 3NF = BCNF. Không cần lo nếu schema bạn đến 3NF.

2.6 4NF, 5NF, 6NF

Hiếm khi cần ở app code. Khi nào quan tâm:

6NF dùng cho temporal database (Tuần 08 + Bonus chapters). Ví dụ: anchor modeling, data warehouse với SCD Type 6.

2.7 Recipe thực dụng

flowchart TD
    A[Start: list mọi attribute] --> B{Có atomic không?}
    B -->|No| B1[Split column hoặc dùng JSONB có chủ đích] --> C
    B -->|Yes| C{Composite PK?}
    C -->|Yes| D{Có partial dep?}
    C -->|No| E
    D -->|Yes| D1[Tách bảng] --> E
    D -->|No| E{Có transitive dep?}
    E -->|Yes| E1[Tách bảng] --> F
    E -->|No| F[3NF OK]
    F --> G{Performance issue?}
    G -->|Yes| G1[Denormalize có chủ đích<br/>+ trigger/CDC để sync]
    G -->|No| H[Done]

    style F fill:#c8e6c9
    style G1 fill:#fff9c4

3. Denormalization — Phá quy tắc một cách có chủ đích

3.1 Khi nào denormalize

Normalize default. Denormalize khi:

Read >> write ratio (>100:1) và join expensive
Aggregation hot path — SUM(order_total) per user được gọi mọi request
Cross-shard join — join sau khi shard rất tốn (Tuần 08)
Historical/immutable data — order_items lưu price_at_purchase thay vì JOIN products (data có thể đổi)
Materialized view không đủ — cần update realtime

3.2 Pattern 1 — Computed column / generated column

Postgres 12+ có GENERATED ... STORED:

CREATE TABLE orders (
    id bigserial PRIMARY KEY,
    subtotal numeric,
    tax_rate numeric,
    total numeric GENERATED ALWAYS AS (subtotal * (1 + tax_rate)) STORED
);

Stored on disk → index được
Auto-update khi base columns change
Không thể write trực tiếp

Pattern 2024: created_year generated cho partition:

CREATE TABLE events (
    id bigserial,
    created_at timestamptz NOT NULL,
    created_year int GENERATED ALWAYS AS (EXTRACT(year FROM created_at)::int) STORED,
    payload jsonb
) PARTITION BY RANGE (created_year);

3.3 Pattern 2 — Materialized view

Một query phức tạp được “đóng băng” thành table:

CREATE MATERIALIZED VIEW user_order_summary AS
SELECT
    u.id AS user_id,
    u.email,
    count(o.id) AS order_count,
    sum(o.total) AS lifetime_value,
    max(o.created_at) AS last_order_at
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.email;
 
CREATE UNIQUE INDEX ON user_order_summary(user_id);
 
-- Refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY user_order_summary;

Trade-off: Refresh tốn time. CONCURRENTLY không lock query nhưng chậm hơn. Trên bảng 100M rows, refresh có thể mất giờ.

Pattern hiện đại: Incremental materialized view với pg_ivm extension hoặc dùng dbt + scheduled refresh.

3.4 Pattern 3 — Denormalized counter

Counter cập nhật bằng trigger:

CREATE TABLE posts (id bigserial PRIMARY KEY, title text, comment_count int DEFAULT 0);
CREATE TABLE comments (id bigserial PRIMARY KEY, post_id bigint REFERENCES posts(id));
 
CREATE OR REPLACE FUNCTION update_comment_count() RETURNS trigger AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        UPDATE posts SET comment_count = comment_count + 1 WHERE id = NEW.post_id;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE posts SET comment_count = comment_count - 1 WHERE id = OLD.post_id;
    END IF;
    RETURN NULL;
END $$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_comment_count
AFTER INSERT OR DELETE ON comments
FOR EACH ROW EXECUTE FUNCTION update_comment_count();

Trade-off:

Pros: instant comment_count, không cần count(*)
Cons: lock contention trên posts.comment_count khi high write
Hot row problem: post viral có 10k comments/sec → mọi insert phải update cùng row → block lẫn nhau

Fix hot row 2024: Sharded counter

CREATE TABLE post_comment_counters (
    post_id bigint,
    shard smallint,
    count int DEFAULT 0,
    PRIMARY KEY (post_id, shard)
);
-- Insert chọn random shard
INSERT INTO post_comment_counters (post_id, shard, count) VALUES ($1, random_shard(), 1)
ON CONFLICT (post_id, shard) DO UPDATE SET count = post_comment_counters.count + 1;
-- Read sum
SELECT sum(count) FROM post_comment_counters WHERE post_id = $1;

3.5 Pattern 4 — Embedded snapshot (immutable history)

Order phải lưu giá lúc mua, không phải giá hiện tại của product:

CREATE TABLE order_items (
    id bigserial PRIMARY KEY,
    order_id bigint REFERENCES orders(id),
    product_id bigint REFERENCES products(id),
 
    -- Snapshot data tại thời điểm purchase
    product_name_snapshot text NOT NULL,
    product_price_snapshot numeric NOT NULL,
    product_sku_snapshot text NOT NULL,
 
    qty int NOT NULL
);

Tại sao: Product có thể đổi tên, đổi giá, bị xóa. Order phải reflect đúng những gì user mua.

Đây không phải “denormalize vì performance”. Đây là business requirement — schema phản ánh “snapshot semantics”.

4. Key Strategy — 2024-2026

4.1 Surrogate vs Natural Key

Loại	Ví dụ	Khi nào dùng
Natural	`email`, `ssn`, `isbn`	Khi key thật sự stable + unique + có meaning
Surrogate	`id bigserial`, `uuid`	Default. Stable, không tiết lộ semantics

Rule: Default surrogate. Add unique constraint cho natural key.

CREATE TABLE users (
    id bigserial PRIMARY KEY,           -- surrogate
    email text UNIQUE NOT NULL,         -- natural, có thể đổi
    username text UNIQUE NOT NULL
);

Tại sao không dùng email làm PK:

User đổi email → CASCADE update FK của 100 bảng khác → nightmare
Email có max length lớn → index lớn
Leak PII trong URL nếu dùng làm public key

4.2 Integer vs UUID vs ULID — bảng quyết định 2024

flowchart TD
    A[Cần ID mới] --> B{Distributed system?<br/>Cần gen ID offline?}
    B -->|No| C[bigserial / bigint identity]
    B -->|Yes| D{Sortable theo thời gian quan trọng?}
    D -->|Yes| E{Public-facing?}
    D -->|No| F[UUID v4]
    E -->|Yes| G[UUID v7 / ULID]
    E -->|No| H[Snowflake / KSUID]

    style C fill:#c8e6c9
    style F fill:#fff9c4
    style G fill:#c8e6c9
    style H fill:#fff9c4

Postgres 17 update: uuid_generate_v7() chưa native nhưng có extension pg_uuidv7 widely adopted 2024. UUID v7 đã được standardize trong RFC 9562 (May 2024).

4.3 Vì sao UUID v4 BAD cho PK

UUID v4 = random 128-bit. PK index là B-tree theo key. Insert UUID v4:

Mỗi insert đi vào random leaf của B-tree
Cold page bị fetch → I/O random
Index bloat nhanh (page split)
WAL volume to (full page image cho random pages)

Benchmark direction (illustrative, order of magnitude — actual numbers vary by hardware, RAM, fillfactor; see Brandur’s “Identity Crisis” + Buchanan’s UUIDv7 benchmark for canonical data):

bigserial: baseline (fastest, smallest index)
uuid v4: typically 2-3x slower insert, index ~50% larger
uuid v7: gần baseline (sequential prefix → page locality)

Đo trên hardware của bạn trước khi pick — pattern matters hơn số tuyệt đối.

4.4 UUID v7 — winner 2024-2026

Format:

unix_ts_ms (48 bits) | ver (4) | rand_a (12) | var (2) | rand_b (62)

48-bit Unix timestamp ms → sortable, B-tree friendly
Random tail → unique không cần coordination
RFC 9562 chuẩn hóa

Setup Postgres:

CREATE EXTENSION pg_uuidv7;
 
CREATE TABLE orders (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v7(),
    created_at timestamptz DEFAULT now()
);
 
INSERT INTO orders (created_at) VALUES (now());
-- id sẽ sortable theo time

4.5 ULID — alternative

Format Crockford Base32: 01HVNQK4XK0M2PHB...

26 chars, URL-safe, không có 0/O/I/L confusion
Sortable theo time
Phổ biến trong Node.js/Python ecosystem

-- ULID lưu dạng text 26 chars hoặc uuid (vì cùng 128-bit)
CREATE TABLE events (
    id text PRIMARY KEY,  -- '01HVNQ...'
    -- HOẶC
    id uuid PRIMARY KEY   -- convert từ ULID
);

UUID v7 vs ULID: gần như tương đương. UUID v7 là standard (RFC 9562) → khuyến nghị cho new project.

4.6 Snowflake (Twitter/Discord style)

64-bit:

1 bit (sign) | 41 bits (timestamp ms) | 10 bits (worker id) | 12 bits (sequence)

Fit trong bigint (8 byte vs UUID 16 byte) → index nhỏ hơn
Phải config worker_id riêng cho mỗi service instance
Sortable theo time

Implement: extension pg_snowflake hoặc generate ở application layer.

Discord 2024: dùng Snowflake cho mọi entity. Lý do: bigint nhỏ hơn UUID nên index/cache hiệu quả hơn.

4.7 Composite PK — khi nào hợp lý

CREATE TABLE order_items (
    order_id bigint REFERENCES orders(id),
    product_id bigint REFERENCES products(id),
    qty int,
    PRIMARY KEY (order_id, product_id)  -- composite
);

OK khi:

Junction table (many-to-many)
Composition tự nhiên 1-1 unique trong domain
Không cần FK trỏ vào bảng này

Không OK khi:

Bảng có thể được reference từ bảng khác → FK 2 column awkward
Composite key có thể grow (vd: thêm tenant_id)

Pattern 2024 cho multi-tenancy: Composite (tenant_id, id) — every table có tenant_id ở đầu:

CREATE TABLE orders (
    tenant_id bigint NOT NULL,
    id bigint NOT NULL,
    PRIMARY KEY (tenant_id, id),
    ...
);
-- Index theo (tenant_id, ...) tự nhiên cho RLS

5. Data Types — chọn đúng tiết kiệm 30-50% storage

5.1 Integer types

Type	Bytes	Range	Khi dùng
`smallint`	2	-32K → 32K	Status enum, age, quantity nhỏ
`integer`	4	-2B → 2B	Default cho ID, count
`bigint`	8	-9 quintillion	ID khi expect >2B rows, financial cents
`serial`/`bigserial`	4/8	auto-increment	Pre-PG10. PG10+ dùng `GENERATED ... AS IDENTITY`

Pattern 2024:

-- Modern syntax (PG10+)
CREATE TABLE users (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    ...
);

GENERATED ALWAYS (vs BY DEFAULT): ngăn insert giá trị bằng tay. Tốt cho serial PK.

5.2 Numeric — money handling

❌ Đừng dùng float/real/double cho money. Floating point có round error.

SELECT 0.1::float + 0.2::float;  -- 0.30000000000000004

✅ Dùng numeric(precision, scale) hoặc bigint (cents):

-- Option A: numeric với scale = currency decimals
price numeric(12, 2)   -- max 9,999,999,999.99
 
-- Option B: integer cents (Stripe, PayPal pattern)
price_cents bigint NOT NULL  -- $99.99 = 9999

Option B faster, nhỏ hơn, không có precision issue. Đa số fintech 2024 dùng option B.

5.3 Text — `text` vs `varchar` vs `char`

Trong Postgres, 3 cái này identical về performance. varchar(n) chỉ thêm check length. char(n) pad spaces (legacy).

Khuyến nghị: Dùng text, thêm CHECK (length(col) <= 255) nếu cần limit.

CREATE TABLE users (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text NOT NULL CHECK (length(name) BETWEEN 1 AND 255),
    bio text  -- unlimited
);

5.4 Time types — `timestamp` vs `timestamptz`

❌ timestamp (without time zone) — KHÔNG dùng cho event time. Postgres không lưu timezone → ambiguous, race conditions khi serve global users.

✅ timestamptz (with time zone) — default cho mọi event time.

CREATE TABLE events (
    occurred_at timestamptz NOT NULL DEFAULT now(),  -- ALWAYS timestamptz
    ...
);

Insight: Postgres timestamptz không thực sự lưu timezone. Nó lưu UTC + convert dựa trên session timezone khi display. Đủ cho 99% case.

Khi nào dùng timestamp without time zone:

“Local time” thuần (vd: birthday, opening hours store) — không có ý niệm timezone
Hiếm

5.5 Boolean

is_active boolean NOT NULL DEFAULT true

Không dùng 0/1 int hay 'Y/N' char. Postgres boolean 1 byte, tự dispose.

5.6 Enum types

CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipped', 'cancelled');
 
CREATE TABLE orders (
    status order_status NOT NULL DEFAULT 'pending'
);

Pros:

Type-safe
Storage 4 bytes (vs text variable)

Cons:

Không thể remove value — Postgres (đến PG17/18) vẫn KHÔNG hỗ trợ ALTER TYPE ... DROP VALUE. Đây là 1 trong những feature thiếu kinh điển. Workaround: tạo type mới, migrate column, drop type cũ.
Add value cần migration (ALTER TYPE ... ADD VALUE)
Khó query metadata
Rename value: PG10+ có ALTER TYPE ... RENAME VALUE

Alternative: lookup table với FK

CREATE TABLE order_statuses (code text PRIMARY KEY, description text);
INSERT INTO order_statuses VALUES ('pending', '...'), ...;
CREATE TABLE orders (status text REFERENCES order_statuses(code));

Rule 2024: Dùng enum khi value SET ổn định (vd: HTTP method). Dùng lookup table khi value có thể đổi/thêm thường xuyên.

5.7 JSONB — không phải vũ khí vạn năng

JSONB lưu JSON dạng binary (decoded once, fast access), indexable bằng GIN.

✅ Use case tốt:

Schema flexible (product attributes khác nhau giữa category)
Event payload (audit log, webhook)
Settings/config per user
3rd party integration data (Stripe metadata)

❌ Use case xấu:

Field truy vấn thường xuyên → tách thành column riêng
Cần JOIN với JSONB field → chậm
Cần CHECK constraint phức tạp trong JSONB

5.7.1 Operators (must-know)

-- Access path
data -> 'key'                    -- returns jsonb
data ->> 'key'                   -- returns text (cast)
data #> '{path,to,nested}'       -- jsonb at path
data #>> '{path,to,nested}'      -- text at path
data -> 0                         -- array element by index
 
-- Containment / existence
data @> '{"key": "value"}'        -- left contains right (most-used)
data <@ '{"key": "value"}'        -- left contained in right
data ? 'key'                      -- key exists
data ?| array['k1','k2']          -- any key exists
data ?& array['k1','k2']          -- all keys exist

Examples:

-- Match by nested field
SELECT * FROM users WHERE preferences @> '{"theme": "dark"}';
 
-- Extract field
SELECT data->>'email' FROM events;
 
-- Filter by extracted field
SELECT * FROM users WHERE preferences ->> 'lang' = 'vi';
 
-- Path access
SELECT data #>> '{user,profile,country}' FROM events;
 
-- Key check
SELECT * FROM users WHERE preferences ? 'beta_features';

5.7.2 Modification

-- Set value
UPDATE users SET preferences = jsonb_set(preferences, '{theme}', '"light"');
UPDATE users SET preferences = jsonb_set(preferences, '{nested,key}', '"value"', true);  -- create if missing
 
-- Remove key
UPDATE users SET preferences = preferences - 'old_key';
 
-- Concat (merge top-level)
UPDATE users SET preferences = preferences || '{"new_key": "value"}'::jsonb;
 
-- Numeric increment (PG14+ has jsonb_path_query for SQL/JSON)
UPDATE counters SET data = jsonb_set(data, '{count}', (((data->>'count')::int + 1))::text::jsonb);

5.7.3 GIN index variants

-- Default: support @>, ?, ?|, ?& - most general
CREATE INDEX idx_meta ON t USING gin(metadata);
 
-- jsonb_path_ops: smaller index, faster, but ONLY supports @>
CREATE INDEX idx_meta_path ON t USING gin(metadata jsonb_path_ops);

jsonb_path_ops ~50% smaller, ~2x faster query for @>. Use when only containment queries needed.

5.7.4 Path index (specific field)

When always querying same field:

-- Index just one path - small, fast
CREATE INDEX idx_users_country ON users((preferences->>'country'));
SELECT * FROM users WHERE preferences->>'country' = 'VN';

Better than GIN if only 1-3 fields queried. GIN better for arbitrary field access.

5.7.5 jsonb_path_query (SQL/JSON paths, PG12+)

SELECT jsonb_path_query(data, '$.users[*] ? (@.age > 18).name')
FROM events;
 
SELECT * FROM events WHERE data @? '$.users[*] ? (@.age > 18)';
SELECT * FROM events WHERE data @@ '$.users[*].age > 18';

SQL/JSON path language (similar to JSONPath). Useful for complex JSON queries.

5.7.6 JSON vs JSONB

json — raw text, preserve whitespace, duplicate keys allowed
jsonb — binary parsed, no whitespace, no duplicates, indexable

→ Default JSONB. JSON only if exact preserve needed (rare).

5.7.7 Performance tips

JSONB column < 1KB inline. > 2KB → TOAST’d → slower access
@> containment fastest with GIN
->> extract field is fast but no index unless expression index
Don’t store huge JSONB (>1MB) — split into separate table

5.7.8 Example: structured + extensible

-- GOOD: structured + extensible
CREATE TABLE products (
    id bigint PRIMARY KEY,
    name text NOT NULL,         -- always queried
    price_cents bigint NOT NULL,
    category text NOT NULL,
    attributes jsonb            -- varies by category
);
 
CREATE INDEX idx_products_attributes ON products USING gin(attributes jsonb_path_ops);
SELECT * FROM products WHERE attributes @> '{"color": "red"}';
 
-- BAD: JSONB-everywhere
CREATE TABLE everything (
    id bigint,
    data jsonb   -- {'name': '...', 'price': ..., 'category': ...}
);
-- Mất type safety, mất constraint, query phức tạp, no FK

5.8 Array types — Postgres unique strength

CREATE TABLE articles (
    id bigint PRIMARY KEY,
    title text,
    tags text[]
);
 
CREATE INDEX idx_articles_tags ON articles USING gin(tags);
 
SELECT * FROM articles WHERE 'rust' = ANY(tags);
SELECT * FROM articles WHERE tags && ARRAY['rust', 'database'];  -- overlap
SELECT * FROM articles WHERE tags @> ARRAY['rust'];               -- contains

Khi nào array thắng child table:

Element không có thuộc tính riêng
Số lượng nhỏ (<100)
Truy cập luôn cùng parent

5.9 Range types — underused gem

CREATE TABLE bookings (
    id bigint PRIMARY KEY,
    room_id bigint,
    during tstzrange NOT NULL,
 
    -- EXCLUDE constraint: ngăn 2 booking overlap cùng room
    EXCLUDE USING gist (room_id WITH =, during WITH &&)
);
 
INSERT INTO bookings (room_id, during) VALUES
  (1, '[2026-05-20 10:00, 2026-05-20 12:00)'::tstzrange);
 
-- This fails (overlap):
INSERT INTO bookings (room_id, during) VALUES
  (1, '[2026-05-20 11:00, 2026-05-20 13:00)'::tstzrange);

Range type là siêu vũ khí cho:

Booking, reservation
Pricing (price range theo date)
Versioned data
Validity period

5.10 Type checklist

flowchart TD
    A[Cần lưu cái gì?] --> B{Số nguyên?}
    B -->|Yes| B1{Range?}
    B1 -->|<32K| B2[smallint]
    B1 -->|<2B| B3[integer]
    B1 -->|>=2B| B4[bigint]

    A --> C{Số thập phân?}
    C -->|Tiền| C1[bigint cents HOẶC numeric]
    C -->|Khoa học OK round| C2[double precision]

    A --> D{Text?}
    D -->|Yes| D1[text + CHECK length nếu cần]

    A --> E{Thời gian?}
    E -->|Event time| E1[timestamptz]
    E -->|Date only| E2[date]
    E -->|Duration| E3[interval]
    E -->|Local time concept| E4[timestamp]

    A --> F{Set fixed value?}
    F -->|Stable| F1[enum]
    F -->|Có thể đổi| F2[lookup table + FK]

    A --> G{Cấu trúc động?}
    G -->|Yes| G1[jsonb + GIN index]

    A --> H{Multiple of same?}
    H -->|Element đơn giản| H1[array]
    H -->|Element có thuộc tính| H2[child table]

6. Constraints — Pháo đài chống bad data

6.1 NOT NULL — always default

Default mọi column phải NOT NULL trừ khi có lý do rõ ràng cho NULL.

-- BAD: nullable by default
CREATE TABLE users (
    id bigint PRIMARY KEY,
    email text,  -- có thể NULL? Tại sao?
    created_at timestamptz
);
 
-- GOOD
CREATE TABLE users (
    id bigint PRIMARY KEY,
    email text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

NULL là gốc của 80% NPE-like bugs trong SQL. Mọi WHERE x = NULL đều fail, x IN (1, NULL) behave kỳ lạ, JOIN drop rows…

6.2 CHECK — domain constraints

CREATE TABLE products (
    id bigint PRIMARY KEY,
    price_cents bigint NOT NULL CHECK (price_cents >= 0),
    name text NOT NULL CHECK (length(name) BETWEEN 1 AND 255),
    discount_pct numeric CHECK (discount_pct BETWEEN 0 AND 100),
    status text NOT NULL CHECK (status IN ('draft', 'active', 'archived'))
);

Pattern 2024: CHECK constraint với expression phức tạp:

CREATE TABLE event_dates (
    starts_at timestamptz NOT NULL,
    ends_at timestamptz NOT NULL,
    CHECK (ends_at > starts_at),
    CHECK (ends_at - starts_at < interval '7 days')
);

6.3 UNIQUE — partial unique mạnh hơn full unique

-- Email unique trong active users (soft-deleted email có thể reuse)
CREATE UNIQUE INDEX idx_users_email_active
ON users(email)
WHERE deleted_at IS NULL;
 
-- Một user chỉ có 1 default address
CREATE UNIQUE INDEX idx_addresses_user_default
ON addresses(user_id)
WHERE is_default = true;

Partial unique index = constraint chỉ enforce trên subset → linh hoạt mạnh.

6.4 FOREIGN KEY — bảo vệ referential integrity

CREATE TABLE orders (
    id bigint PRIMARY KEY,
    user_id bigint NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    created_at timestamptz NOT NULL
);

ON DELETE options:

RESTRICT (default) — không cho delete parent nếu có child
CASCADE — xóa child khi parent xóa
SET NULL — child.parent_id = NULL
SET DEFAULT — child.parent_id = default
NO ACTION — như RESTRICT nhưng deferred

Khi nào KHÔNG dùng FK:

Cross-shard (FK không enforceable giữa shards)
High-write throughput table (FK check tốn time)
Eventually-consistent system (event log)

Nhưng default vẫn nên có FK. Performance overhead < 5% trong hầu hết case.

6.5 EXCLUDE — constraint cấp cao

EXCLUDE = generalized UNIQUE. UNIQUE chỉ check equality, EXCLUDE check với bất kỳ operator nào.

-- Room booking không overlap (đã thấy ở section 5.9)
CREATE TABLE bookings (
    room_id bigint,
    during tstzrange,
    EXCLUDE USING gist (room_id WITH =, during WITH &&)
);
 
-- Mã chương trình khuyến mãi không trùng giai đoạn
CREATE TABLE promotions (
    code text,
    valid_period tstzrange,
    EXCLUDE USING gist (code WITH =, valid_period WITH &&)
);

6.6 Deferrable constraints

Default FK check ở mỗi statement. Đôi khi cần defer đến commit:

CREATE TABLE employees (
    id bigint PRIMARY KEY,
    manager_id bigint REFERENCES employees(id) DEFERRABLE INITIALLY DEFERRED
);
 
-- Trong transaction, insert circular OK đến khi commit
BEGIN;
INSERT INTO employees (id, manager_id) VALUES (1, 2);
INSERT INTO employees (id, manager_id) VALUES (2, 1);
COMMIT;  -- check ở đây

7. Patterns chuẩn 2024

7.1 Soft delete

CREATE TABLE users (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email text NOT NULL,
    deleted_at timestamptz,
    -- Email unique chỉ trong active users
    CONSTRAINT users_email_unique_active EXCLUDE (email WITH =) WHERE (deleted_at IS NULL)
);
 
-- Mọi query phải filter
CREATE VIEW active_users AS SELECT * FROM users WHERE deleted_at IS NULL;
 
-- Hoặc dùng RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY filter_deleted ON users USING (deleted_at IS NULL);

Trade-off soft delete:

✅ Audit, undo, compliance
✅ Foreign key integrity giữ nguyên
❌ Mọi query phải remember filter
❌ Table bloat theo deleted rows

Alternative 2024: Archive table — move deleted rows sang users_archive table riêng. Pros: bảng chính sạch. Cons: code phức tạp hơn.

7.2 Audit log — append-only

CREATE TABLE users_audit (
    id bigserial PRIMARY KEY,
    user_id bigint NOT NULL,
    action text NOT NULL CHECK (action IN ('INSERT', 'UPDATE', 'DELETE')),
    actor_id bigint,  -- ai thực hiện
    diff jsonb,       -- {'email': {'old': '...', 'new': '...'}}
    occurred_at timestamptz NOT NULL DEFAULT now()
);
 
CREATE OR REPLACE FUNCTION audit_users() RETURNS trigger AS $$
DECLARE
    actor bigint;
    changes jsonb;
BEGIN
    actor := current_setting('app.current_user_id', true)::bigint;
 
    IF TG_OP = 'INSERT' THEN
        INSERT INTO users_audit(user_id, action, actor_id, diff)
        VALUES (NEW.id, 'INSERT', actor, to_jsonb(NEW));
    ELSIF TG_OP = 'UPDATE' THEN
        SELECT jsonb_object_agg(key, jsonb_build_object('old', o.value, 'new', n.value))
        INTO changes
        FROM jsonb_each(to_jsonb(OLD)) o
        JOIN jsonb_each(to_jsonb(NEW)) n USING (key)
        WHERE o.value IS DISTINCT FROM n.value;
        IF changes IS NOT NULL THEN
            INSERT INTO users_audit(user_id, action, actor_id, diff)
            VALUES (NEW.id, 'UPDATE', actor, changes);
        END IF;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO users_audit(user_id, action, actor_id, diff)
        VALUES (OLD.id, 'DELETE', actor, to_jsonb(OLD));
    END IF;
    RETURN NULL;
END $$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_users_audit
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_users();

Set actor before query:

SET LOCAL app.current_user_id = '42';
UPDATE users SET email = '[email protected]' WHERE id = 1;

Alternative 2024: Logical replication + Debezium → audit table ở downstream system. Pros: không impact OLTP. Cons: setup phức tạp. Sẽ học Tuan-Bonus-CDC-Debezium.

7.3 Temporal tables — SQL:2011 standard

Postgres native chưa hỗ trợ SQL:2011 system-versioning đầy đủ. Pattern: valid_from, valid_to:

CREATE TABLE products_history (
    id bigint NOT NULL,
    name text NOT NULL,
    price_cents bigint NOT NULL,
    valid_from timestamptz NOT NULL,
    valid_to timestamptz,  -- NULL = current
 
    PRIMARY KEY (id, valid_from),
    EXCLUDE USING gist (id WITH =, tstzrange(valid_from, valid_to, '[)') WITH &&)
);
 
-- Get product as-of specific time
SELECT * FROM products_history
WHERE id = $1 AND valid_from <= $2 AND (valid_to > $2 OR valid_to IS NULL);

Extension hỗ trợ: temporal_tables (https://github.com/arkhipov/temporal_tables), periods extension.

7.4 Tagging / Many-to-many

CREATE TABLE tags (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name text UNIQUE NOT NULL
);
 
CREATE TABLE article_tags (
    article_id bigint REFERENCES articles(id) ON DELETE CASCADE,
    tag_id bigint REFERENCES tags(id) ON DELETE CASCADE,
    tagged_at timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (article_id, tag_id)
);
 
CREATE INDEX idx_article_tags_tag ON article_tags(tag_id);
-- Để query "articles có tag X" nhanh

Alternative: Array column tags text[] (đã thảo luận section 5.8).

Quyết định:

Junction table khi tag có metadata (description, created_by), khi cần ranking, khi tag list lớn
Array khi tag đơn giản, immutable, list nhỏ

7.5 Hierarchical data — 4 patterns

A. Adjacency list (đơn giản nhất)

CREATE TABLE categories (
    id bigint PRIMARY KEY,
    parent_id bigint REFERENCES categories(id),
    name text
);

Pros: simple
Cons: query “tất cả descendants” phải recursive CTE

WITH RECURSIVE tree AS (
    SELECT * FROM categories WHERE id = 1
    UNION ALL
    SELECT c.* FROM categories c JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree;

B. Path enumeration

CREATE TABLE categories (
    id bigint PRIMARY KEY,
    path text NOT NULL,  -- '/electronics/phones/'
    name text
);
 
CREATE INDEX idx_categories_path ON categories(path text_pattern_ops);
 
-- Tất cả descendants
SELECT * FROM categories WHERE path LIKE '/electronics/phones/%';

C. Nested set (Celko)

Phức tạp, update đắt
Read tuyệt vời
Hiếm dùng 2024

D. `ltree` extension (Postgres special)

CREATE EXTENSION ltree;
CREATE TABLE categories (
    id bigint PRIMARY KEY,
    path ltree NOT NULL  -- 'electronics.phones'
);
CREATE INDEX idx_categories_path ON categories USING gist(path);
 
SELECT * FROM categories WHERE path <@ 'electronics';  -- descendants
SELECT * FROM categories WHERE path @> 'electronics.phones.iphone';  -- ancestors

Khuyến nghị 2024: Adjacency list cho 99% case (recursive CTE đủ fast với index). ltree khi cần query path-based phức tạp.

7.6 Multi-tenancy schema patterns

graph TB
    subgraph "A. Shared table + tenant_id"
        A1[tenants table]
        A2[orders: tenant_id, ...]
        A3[users: tenant_id, ...]
        A2 -.RLS by tenant_id.-> A1
        A3 -.RLS by tenant_id.-> A1
    end

    subgraph "B. Schema per tenant"
        B1[tenant_1.orders]
        B2[tenant_1.users]
        B3[tenant_2.orders]
        B4[tenant_2.users]
    end

    subgraph "C. Database per tenant"
        C1[(tenant_1_db)]
        C2[(tenant_2_db)]
        C3[(tenant_3_db)]
    end

Pattern	Pros	Cons	Khi nào
A. Shared table	Cheap, easy scale	Risk leak, noisy neighbor	<1000 tenants, small data each
B. Schema per tenant	Isolation, per-tenant migration	Schema bloat, migration nightmare	10-100 tenants, regulated industry
C. DB per tenant	Maximum isolation	Operational cost cao	<50 tenants, enterprise

Pattern 2024 phổ biến: Hybrid — pool of databases, mỗi DB host nhiều tenants với RLS.

Sẽ đào sâu Tuan-16-ORM-CQRS-Multi-Tenancy và Case-Design-Data-SaaS-Multi-tenant.

8. Anti-patterns — Cẩn thận với những thứ sau

8.1 EAV (Entity-Attribute-Value) — kinh điển

-- ANTI-PATTERN
CREATE TABLE attributes (
    entity_id bigint,
    key text,
    value text   -- string of everything
);

Lý do tệ:

Mất type safety (price = “abc” không error)
Query phức tạp: PIVOT bằng tay
Không có constraint riêng cho từng attribute
Index không hiệu quả

Thay bằng:

Column riêng cho structured attributes
JSONB cho semi-structured
Document DB (MongoDB) nếu thật sự cần unbounded schema

8.2 Polymorphic FK

-- ANTI-PATTERN
CREATE TABLE comments (
    id bigint PRIMARY KEY,
    commentable_type text,  -- 'Post' hoặc 'Photo'
    commentable_id bigint,  -- FK không enforceable!
    body text
);

Lý do tệ:

Không thể FK constraint
JOIN cần CASE WHEN
ORM hack che giấu vấn đề

Thay bằng:

Multiple FK: post_id, photo_id, mỗi cái nullable, CHECK exactly one not null
Junction table: comment_targets(comment_id, target_type, target_id) với constraint
Common parent: commentables(id, type) master, post + photo đều FK đến commentables

8.3 JSON-everywhere

-- ANTI-PATTERN
CREATE TABLE orders (
    id bigint,
    data jsonb  -- {'customer': ..., 'items': [...], 'total': ...}
);

Lý do tệ:

Mất type safety
Query: data->>'total' chậm, khó index
Schema evolution hỗn loạn
Khó JOIN

Thay bằng: structured columns + JSONB chỉ cho phần thực sự dynamic.

8.4 Boolean flag explosion

-- ANTI-PATTERN
CREATE TABLE users (
    is_active boolean,
    is_verified boolean,
    is_premium boolean,
    is_banned boolean,
    is_admin boolean,
    is_locked boolean,
    ...  -- 20 flag boolean
);

Thay bằng:

State machine với enum
Role-based: user_roles(user_id, role) junction
Feature flags: user_features(user_id, feature)

8.5 Floating point cho money

Đã thảo luận section 5.2. Re-emphasize: never float for money.

8.6 `varchar(255)` mọi nơi

Legacy từ MySQL (varchar < 256 = 1 byte length prefix). Postgres không có lý do đó.

-- ANTI-PATTERN (legacy)
name varchar(255)
 
-- GOOD
name text NOT NULL CHECK (length(name) BETWEEN 1 AND 100)

8.7 Index cho mọi column

CREATE INDEX ON users(every_column). Sai vì:

Index = storage + write overhead
Postgres không dùng nhiều index cùng lúc cho 1 query (trừ bitmap)
Maintenance cost cao

→ Tuần 03 đào sâu indexing strategy.

8.8 Không có created_at / updated_at

-- BAD
CREATE TABLE orders (id bigint, total numeric);
 
-- GOOD - default mọi table
CREATE TABLE orders (
    id bigint PRIMARY KEY,
    total numeric NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),
    updated_at timestamptz NOT NULL DEFAULT now()
);
 
-- Trigger update updated_at
CREATE OR REPLACE FUNCTION set_updated_at() RETURNS trigger AS $$
BEGIN NEW.updated_at = now(); RETURN NEW; END $$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_orders_updated_at BEFORE UPDATE ON orders
FOR EACH ROW EXECUTE FUNCTION set_updated_at();

Audit, debug, CDC streaming — đều cần timestamp.

8.9 String trong khi nên là enum/lookup

-- BAD: typo dễ xảy ra, không validate
status text  -- 'pending', 'Pending', 'PENDIN' (typo) → 3 status khác nhau
 
-- GOOD
status text NOT NULL CHECK (status IN ('pending', 'paid', 'shipped'))
-- HOẶC
status order_status NOT NULL  -- enum type

9. Lab — Thiết kế schema cho e-commerce

Bạn được brief: thiết kế DB cho e-commerce startup. Yêu cầu:

Users với multiple addresses, payment methods
Products với attributes khác nhau theo category
Orders với status tracking, items snapshot price
Reviews, ratings
Tags, search
Audit log
Multi-region: timezone-aware

9.1 Draft schema (làm trước khi xem solution)

Tự draft trên giấy hoặc dbdiagram.io.

9.2 Sample solution

-- ============== USERS ==============
CREATE TABLE users (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email citext NOT NULL UNIQUE,         -- citext = case-insensitive
    name text NOT NULL CHECK (length(name) BETWEEN 1 AND 100),
    locale text NOT NULL DEFAULT 'en-US',
    timezone text NOT NULL DEFAULT 'UTC',
    created_at timestamptz NOT NULL DEFAULT now(),
    deleted_at timestamptz
);
 
CREATE UNIQUE INDEX idx_users_email_active ON users(email) WHERE deleted_at IS NULL;
 
-- ============== ADDRESSES ==============
CREATE TABLE addresses (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id bigint NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    label text CHECK (length(label) <= 50),
    line1 text NOT NULL,
    line2 text,
    city text NOT NULL,
    region text,
    country_code char(2) NOT NULL,  -- ISO 3166-1
    postal_code text NOT NULL,
    is_default boolean NOT NULL DEFAULT false,
    created_at timestamptz NOT NULL DEFAULT now()
);
 
CREATE UNIQUE INDEX idx_addresses_user_default ON addresses(user_id) WHERE is_default = true;
 
-- ============== PRODUCTS ==============
CREATE TYPE product_status AS ENUM ('draft', 'active', 'archived');
 
CREATE TABLE categories (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    parent_id bigint REFERENCES categories(id),
    name text NOT NULL,
    slug text UNIQUE NOT NULL
);
 
CREATE TABLE products (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    category_id bigint NOT NULL REFERENCES categories(id),
    sku text UNIQUE NOT NULL CHECK (length(sku) BETWEEN 1 AND 64),
    name text NOT NULL CHECK (length(name) BETWEEN 1 AND 255),
    description text,
    price_cents bigint NOT NULL CHECK (price_cents >= 0),
    currency char(3) NOT NULL DEFAULT 'USD',
    status product_status NOT NULL DEFAULT 'draft',
    attributes jsonb NOT NULL DEFAULT '{}'::jsonb,
    tags text[] NOT NULL DEFAULT '{}',
    created_at timestamptz NOT NULL DEFAULT now(),
    updated_at timestamptz NOT NULL DEFAULT now()
);
 
CREATE INDEX idx_products_category ON products(category_id) WHERE status = 'active';
CREATE INDEX idx_products_tags ON products USING gin(tags);
CREATE INDEX idx_products_attrs ON products USING gin(attributes);
CREATE INDEX idx_products_search ON products USING gin(to_tsvector('english', name || ' ' || coalesce(description, '')));
 
-- ============== ORDERS ==============
CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipping', 'delivered', 'cancelled', 'refunded');
 
CREATE TABLE orders (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id bigint NOT NULL REFERENCES users(id),
    status order_status NOT NULL DEFAULT 'pending',
    currency char(3) NOT NULL DEFAULT 'USD',
    subtotal_cents bigint NOT NULL CHECK (subtotal_cents >= 0),
    tax_cents bigint NOT NULL CHECK (tax_cents >= 0),
    shipping_cents bigint NOT NULL CHECK (shipping_cents >= 0),
    total_cents bigint NOT NULL GENERATED ALWAYS AS (subtotal_cents + tax_cents + shipping_cents) STORED,
 
    shipping_address_id bigint REFERENCES addresses(id),
    placed_at timestamptz NOT NULL DEFAULT now(),
    paid_at timestamptz,
    delivered_at timestamptz,
    cancelled_at timestamptz,
 
    notes text,
    metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
 
    CHECK (paid_at IS NULL OR paid_at >= placed_at),
    CHECK (delivered_at IS NULL OR delivered_at >= paid_at)
);
 
CREATE INDEX idx_orders_user_created ON orders(user_id, placed_at DESC);
CREATE INDEX idx_orders_status_created ON orders(status, placed_at DESC);
 
CREATE TABLE order_items (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    order_id bigint NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id bigint NOT NULL REFERENCES products(id),
    qty int NOT NULL CHECK (qty > 0),
 
    -- Snapshot tại thời điểm purchase
    product_name_snapshot text NOT NULL,
    product_sku_snapshot text NOT NULL,
    unit_price_cents bigint NOT NULL CHECK (unit_price_cents >= 0),
    line_total_cents bigint NOT NULL GENERATED ALWAYS AS (unit_price_cents * qty) STORED
);
 
CREATE INDEX idx_order_items_order ON order_items(order_id);
 
-- ============== REVIEWS ==============
CREATE TABLE reviews (
    id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    product_id bigint NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    user_id bigint NOT NULL REFERENCES users(id),
    rating smallint NOT NULL CHECK (rating BETWEEN 1 AND 5),
    title text CHECK (length(title) <= 255),
    body text,
    created_at timestamptz NOT NULL DEFAULT now(),
 
    UNIQUE (product_id, user_id)  -- mỗi user 1 review/product
);
 
CREATE INDEX idx_reviews_product_rating ON reviews(product_id, rating);
 
-- ============== AUDIT (truncated, dùng pattern section 7.2) ==============

9.3 Decisions trong solution

Decision	Tại sao
`bigint IDENTITY` không phải UUID	OLTP đơn local DB, không distributed gen, performance > public-safe ID
`citext` cho email	Case-insensitive uniqueness automatic
`country_code char(2)`	ISO standard, fixed length
`price_cents bigint`	No float, exact arithmetic
`attributes jsonb` + `tags text[]`	Flexibility cho product variations
Snapshot trong order_items	Immutable order history
Generated `total_cents`	Auto-compute, indexable
Partial unique cho default address	One default per user
`placed_at`, `paid_at`, `delivered_at` thay vì 1 status + history	Truy vấn theo time dễ, audit không cần subquery
CHECK temporal ordering	Prevent illogical state
FK ON DELETE CASCADE cho items, REFERENCES không cascade cho user→orders	Order audit, không xóa data tài chính

9.4 Bài tập

Mở rộng schema để support discount codes: code, percentage, valid_period, max_uses, per_user_limit. Dùng tstzrange + EXCLUDE để ngăn 2 active codes trùng tên cùng giai đoạn.
Thêm wishlist: user save products. Thiết kế để query “users đã wishlist product X” nhanh, “products user A đã wishlist” nhanh.
Cart vs Order: thiết kế cart riêng hay reuse orders.status='pending'? Trade-off?
Thêm inventory: track stock per product, có concurrency (10k users mua cùng 1 product). Bạn cần gì? (Hint: Tuần 05 sẽ giúp).
Stretch: Multi-currency. Lưu price theo currency, exchange rate, hay base currency + convert? Trade-off.

10. Schema migration mindset (preview Tuần 08)

Tuần 08 đào sâu zero-downtime migration. Tại đây hiểu mindset:

flowchart LR
    A[Code v1<br/>reads/writes old schema] --> B[Migrate: add new column nullable]
    B --> C[Code v2<br/>writes to both old + new]
    C --> D[Backfill new column from old]
    D --> E[Code v3<br/>reads from new, writes both]
    E --> F[Code v4<br/>only new column]
    F --> G[Migrate: drop old column]

    style B fill:#fff9c4
    style D fill:#fff9c4
    style G fill:#ffccbc

Rule vàng: Schema migration phải support N và N+1 version cùng chạy. Không bao giờ “stop world, migrate, restart”.

11. Self-check

Schema bạn vừa thiết kế cho e-commerce có chỗ nào vi phạm 3NF không? Nếu có thì là cố ý hay sai?
Khi nào dùng text[] thay vì junction table? Cho ví dụ cụ thể.
UUID v4, v7, ULID, Snowflake — bạn pick cái nào cho orders.id? Tại sao?
Vì sao không nên varchar(255) mọi nơi trong Postgres?
Vẽ flow soft-delete với unique constraint trên email — partial unique index work thế nào?
CHECK constraint vs trigger — khi nào dùng cái nào?
Sharded counter giải quyết vấn đề gì? Trade-off?
EXCLUDE constraint khác UNIQUE thế nào? Cho ví dụ booking system.
Generated column STORED vs VIRTUAL? Postgres có cả 2 không?
Multi-tenancy 3 pattern (shared, schema-per, db-per) — pick pattern nào cho SaaS 500 tenants, mỗi tenant <10GB data?

12. Tiếp theo & cross-links

Bài tiếp theo: Tuan-03-Indexing-Mastery — schema có rồi, giờ tăng tốc query.

Đọc kèm:

Bill Karwin — SQL Antipatterns — bibliotheca anti-pattern
Postgres docs Chapter 5: Data Definition

Cross-link SD course:

Tuan-Bonus-Multi-Tenancy-SaaS-Patterns — architecture-level multi-tenancy

Tuần 02 hoàn thành. Schema first, optimize second. Cập nhật: 2026-05-16

lthieu's notes

Explorer

Tuan-02-Schema-Design-Normalization

Tuần 02 — Schema Design & Normalization

1. Context & Why

1.1 Schema là quyết định “khó hoàn tác nhất”

1.2 Tuần này dạy gì

1.3 Tham chiếu chính

2. Normalization — Lý thuyết tối thiểu, thực dụng tối đa

2.1 Tại sao normalize?

2.2 1NF — First Normal Form

2.3 2NF — partial dependency

2.4 3NF — transitive dependency

2.5 BCNF — Boyce-Codd

2.6 4NF, 5NF, 6NF

2.7 Recipe thực dụng

3. Denormalization — Phá quy tắc một cách có chủ đích

3.1 Khi nào denormalize

3.2 Pattern 1 — Computed column / generated column

3.3 Pattern 2 — Materialized view

3.4 Pattern 3 — Denormalized counter

3.5 Pattern 4 — Embedded snapshot (immutable history)

4. Key Strategy — 2024-2026

4.1 Surrogate vs Natural Key

4.2 Integer vs UUID vs ULID — bảng quyết định 2024

4.3 Vì sao UUID v4 BAD cho PK

4.4 UUID v7 — winner 2024-2026

4.5 ULID — alternative

4.6 Snowflake (Twitter/Discord style)

4.7 Composite PK — khi nào hợp lý

5. Data Types — chọn đúng tiết kiệm 30-50% storage

5.1 Integer types

5.2 Numeric — money handling

5.3 Text — text vs varchar vs char

5.4 Time types — timestamp vs timestamptz

5.5 Boolean

5.6 Enum types

5.7 JSONB — không phải vũ khí vạn năng

5.7.1 Operators (must-know)

5.7.2 Modification

5.7.3 GIN index variants

5.7.4 Path index (specific field)

5.7.5 jsonb_path_query (SQL/JSON paths, PG12+)

5.7.6 JSON vs JSONB

5.7.7 Performance tips

5.7.8 Example: structured + extensible

5.8 Array types — Postgres unique strength

5.9 Range types — underused gem

5.10 Type checklist

6. Constraints — Pháo đài chống bad data

6.1 NOT NULL — always default

6.2 CHECK — domain constraints

6.3 UNIQUE — partial unique mạnh hơn full unique

6.4 FOREIGN KEY — bảo vệ referential integrity

6.5 EXCLUDE — constraint cấp cao

6.6 Deferrable constraints

7. Patterns chuẩn 2024

7.1 Soft delete

7.2 Audit log — append-only

7.3 Temporal tables — SQL:2011 standard

7.4 Tagging / Many-to-many

7.5 Hierarchical data — 4 patterns

A. Adjacency list (đơn giản nhất)

B. Path enumeration

C. Nested set (Celko)

D. ltree extension (Postgres special)

7.6 Multi-tenancy schema patterns

8. Anti-patterns — Cẩn thận với những thứ sau

8.1 EAV (Entity-Attribute-Value) — kinh điển

8.2 Polymorphic FK

8.3 JSON-everywhere

8.4 Boolean flag explosion

8.5 Floating point cho money

8.6 varchar(255) mọi nơi

8.7 Index cho mọi column

8.8 Không có created_at / updated_at

8.9 String trong khi nên là enum/lookup

9. Lab — Thiết kế schema cho e-commerce

9.1 Draft schema (làm trước khi xem solution)

9.2 Sample solution

5.3 Text — `text` vs `varchar` vs `char`

5.4 Time types — `timestamp` vs `timestamptz`

D. `ltree` extension (Postgres special)

8.6 `varchar(255)` mọi nơi