Index Decision Tree

Quick reference: bài toán nào dùng index nào.

Tags: attachment cheatsheet indexing Liên quan: Tuan-03-Indexing-Mastery

1. Master Decision Tree

flowchart TD
    Q[Bạn có query slow?] --> A{Bottleneck<br/>seq scan?}
    A -->|No| Skip[Không cần index mới]
    A -->|Yes| B{Query type?}

    B -->|Equality + range scalar| BTREE[B-tree<br/>default]
    B -->|Array containment| GIN[GIN]
    B -->|JSONB containment| GIN
    B -->|Full-text search| GIN_FTS[GIN on tsvector]
    B -->|Fuzzy text similarity| TRGM[GIN/GiST + pg_trgm]
    B -->|Geometric / spatial| GIST[GiST + PostGIS]
    B -->|Range overlap| GIST
    B -->|Append-only time-series huge| BRIN[BRIN]
    B -->|Vector similarity| HNSW[HNSW / IVFFlat pgvector]
    B -->|IP prefix tree| SPGIST[SP-GiST]
    B -->|Multi-column equality bag| BLOOM[Bloom filter ext]

    style BTREE fill:#c8e6c9
    style GIN fill:#c8e6c9
    style HNSW fill:#fff9c4

2. B-tree — When + What Column Order

When:

WHERE col = X equality
WHERE col > X range
WHERE col BETWEEN X AND Y
WHERE col LIKE 'prefix%'
WHERE col IS NULL / IS NOT NULL
ORDER BY col
JOIN on col

Multi-column order — ESR rule:

Equality columns first
Sort columns next
Range columns last

-- Query
SELECT * FROM orders WHERE user_id = $1 AND status = 'pending' ORDER BY created_at DESC;
 
-- Optimal: (user_id, status, created_at DESC)
CREATE INDEX ON orders (user_id, status, created_at DESC);

3. Specialized Index Cheatsheet

Need	Index	Example
Tags / labels array	GIN	`CREATE INDEX ON t USING gin(tags);` `WHERE tags @> ARRAY['rust']`
JSONB partial match	GIN	`CREATE INDEX ON t USING gin(meta);` `WHERE meta @> '{"k":"v"}'`
JSONB specific path	GIN with path_ops	`CREATE INDEX ON t USING gin((meta->'k'))`
Full-text English	GIN tsvector	`CREATE INDEX ON t USING gin(to_tsvector('english', body))`
ILIKE / fuzzy	pg_trgm GIN	`CREATE INDEX ON t USING gin(name gin_trgm_ops)`
Geo within distance	PostGIS GiST	`CREATE INDEX ON places USING gist(location)`
K-nearest geo	PostGIS GiST	Same, used with `<->`
Range overlap	GiST	`CREATE INDEX ON bookings USING gist(during)`
Time-series append	BRIN	`CREATE INDEX ON events USING brin(created_at)`
Vector embedding	HNSW (pgvector)	`CREATE INDEX ON docs USING hnsw(embedding vector_cosine_ops)`

4. Covering / Partial / Expression

Covering (INCLUDE)

CREATE INDEX ON orders (user_id) INCLUDE (total, created_at);
-- Enable index-only scan if VM clean

When: hot query selecting few columns alongside indexed key.

Partial

CREATE INDEX ON users (email) WHERE deleted_at IS NULL;
CREATE INDEX ON orders (created_at) WHERE status = 'pending';

When: filter always present + small subset of table.

Expression

CREATE INDEX ON users (lower(email));
CREATE INDEX ON events ((data->>'country'));

When: query uses function — must match exact expression.

5. Anti-patterns Quick Check

❌ Index on low-cardinality alone (status 3 values) — seq scan faster
❌ Index every FK column blindly — write cost
❌ LIKE '%suffix' expecting B-tree — won’t help, use trigram
❌ Function in WHERE not matching expression index
❌ Many indexes on hot-write table — kills throughput
❌ INCLUDE huge TEXT column — index bloats

6. Workflow When Query Slow

flowchart TD
    A[Slow query] --> B[EXPLAIN ANALYZE]
    B --> C{Bottleneck node?}
    C -->|Seq Scan| D[Add index matching WHERE]
    C -->|Index Scan + Filter big| E[Add INCLUDE / composite]
    C -->|Sort spill| F[Match index ORDER]
    C -->|Hash spill| G[Increase work_mem]
    C -->|Heap fetches in IO scan| H[VACUUM table]

    D --> Test[Test new plan]
    E --> Test
    F --> Test
    G --> Test
    H --> Test

    Test --> I{Better?}
    I -->|Yes| J[Keep, monitor]
    I -->|No| K[Try alternative]

7. Index Maintenance

-- Bloat check
SELECT * FROM pgstattuple('idx_orders_user');  -- avg_leaf_density >50% OK
 
-- Rebuild (online)
REINDEX INDEX CONCURRENTLY idx_orders_user;
 
-- Unused (idx_scan = 0 over time)
SELECT indexrelname FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%_pkey';
 
-- Build new (online)
CREATE INDEX CONCURRENTLY idx_new ON orders(...);

8. Multi-DB Index Quick Compare

	Postgres	MySQL InnoDB	MongoDB	DynamoDB
Default	B-tree	B+ clustered (PK)	B-tree	Hash + sort
Multi-column	Yes (ESR)	Yes (leftmost)	Yes	LSI/GSI
Partial	Yes	Functional only	Yes	Sparse GSI
Expression/Function	Yes	Yes (8.0+)	Yes	No
Full-text	GIN tsvector	InnoDB FTS	Text index	No native
Geo	GiST + PostGIS	Spatial	2dsphere	No native
Vector	HNSW pgvector	No native	Atlas Vector	No
Concurrent build	CONCURRENTLY	Online DDL	Background	N/A

Cập nhật: 2026-05-16

lthieu's notes

Explorer

index-decision-tree