Index Decision Tree

Quick reference: bài toán nào dùng index nào.

Tags: attachment cheatsheet indexing Liên quan: Tuan-03-Indexing-Mastery


1. Master Decision Tree

flowchart TD
    Q[Bạn có query slow?] --> A{Bottleneck<br/>seq scan?}
    A -->|No| Skip[Không cần index mới]
    A -->|Yes| B{Query type?}

    B -->|Equality + range scalar| BTREE[B-tree<br/>default]
    B -->|Array containment| GIN[GIN]
    B -->|JSONB containment| GIN
    B -->|Full-text search| GIN_FTS[GIN on tsvector]
    B -->|Fuzzy text similarity| TRGM[GIN/GiST + pg_trgm]
    B -->|Geometric / spatial| GIST[GiST + PostGIS]
    B -->|Range overlap| GIST
    B -->|Append-only time-series huge| BRIN[BRIN]
    B -->|Vector similarity| HNSW[HNSW / IVFFlat pgvector]
    B -->|IP prefix tree| SPGIST[SP-GiST]
    B -->|Multi-column equality bag| BLOOM[Bloom filter ext]

    style BTREE fill:#c8e6c9
    style GIN fill:#c8e6c9
    style HNSW fill:#fff9c4

2. B-tree — When + What Column Order

When:

  • WHERE col = X equality
  • WHERE col > X range
  • WHERE col BETWEEN X AND Y
  • WHERE col LIKE 'prefix%'
  • WHERE col IS NULL / IS NOT NULL
  • ORDER BY col
  • JOIN on col

Multi-column order — ESR rule:

  1. Equality columns first
  2. Sort columns next
  3. Range columns last
-- Query
SELECT * FROM orders WHERE user_id = $1 AND status = 'pending' ORDER BY created_at DESC;
 
-- Optimal: (user_id, status, created_at DESC)
CREATE INDEX ON orders (user_id, status, created_at DESC);

3. Specialized Index Cheatsheet

NeedIndexExample
Tags / labels arrayGINCREATE INDEX ON t USING gin(tags); WHERE tags @> ARRAY['rust']
JSONB partial matchGINCREATE INDEX ON t USING gin(meta); WHERE meta @> '{"k":"v"}'
JSONB specific pathGIN with path_opsCREATE INDEX ON t USING gin((meta->'k'))
Full-text EnglishGIN tsvectorCREATE INDEX ON t USING gin(to_tsvector('english', body))
ILIKE / fuzzypg_trgm GINCREATE INDEX ON t USING gin(name gin_trgm_ops)
Geo within distancePostGIS GiSTCREATE INDEX ON places USING gist(location)
K-nearest geoPostGIS GiSTSame, used with <->
Range overlapGiSTCREATE INDEX ON bookings USING gist(during)
Time-series appendBRINCREATE INDEX ON events USING brin(created_at)
Vector embeddingHNSW (pgvector)CREATE INDEX ON docs USING hnsw(embedding vector_cosine_ops)

4. Covering / Partial / Expression

Covering (INCLUDE)

CREATE INDEX ON orders (user_id) INCLUDE (total, created_at);
-- Enable index-only scan if VM clean

When: hot query selecting few columns alongside indexed key.

Partial

CREATE INDEX ON users (email) WHERE deleted_at IS NULL;
CREATE INDEX ON orders (created_at) WHERE status = 'pending';

When: filter always present + small subset of table.

Expression

CREATE INDEX ON users (lower(email));
CREATE INDEX ON events ((data->>'country'));

When: query uses function — must match exact expression.

5. Anti-patterns Quick Check

  • ❌ Index on low-cardinality alone (status 3 values) — seq scan faster
  • ❌ Index every FK column blindly — write cost
  • LIKE '%suffix' expecting B-tree — won’t help, use trigram
  • ❌ Function in WHERE not matching expression index
  • ❌ Many indexes on hot-write table — kills throughput
  • ❌ INCLUDE huge TEXT column — index bloats

6. Workflow When Query Slow

flowchart TD
    A[Slow query] --> B[EXPLAIN ANALYZE]
    B --> C{Bottleneck node?}
    C -->|Seq Scan| D[Add index matching WHERE]
    C -->|Index Scan + Filter big| E[Add INCLUDE / composite]
    C -->|Sort spill| F[Match index ORDER]
    C -->|Hash spill| G[Increase work_mem]
    C -->|Heap fetches in IO scan| H[VACUUM table]

    D --> Test[Test new plan]
    E --> Test
    F --> Test
    G --> Test
    H --> Test

    Test --> I{Better?}
    I -->|Yes| J[Keep, monitor]
    I -->|No| K[Try alternative]

7. Index Maintenance

-- Bloat check
SELECT * FROM pgstattuple('idx_orders_user');  -- avg_leaf_density >50% OK
 
-- Rebuild (online)
REINDEX INDEX CONCURRENTLY idx_orders_user;
 
-- Unused (idx_scan = 0 over time)
SELECT indexrelname FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%_pkey';
 
-- Build new (online)
CREATE INDEX CONCURRENTLY idx_new ON orders(...);

8. Multi-DB Index Quick Compare

PostgresMySQL InnoDBMongoDBDynamoDB
DefaultB-treeB+ clustered (PK)B-treeHash + sort
Multi-columnYes (ESR)Yes (leftmost)YesLSI/GSI
PartialYesFunctional onlyYesSparse GSI
Expression/FunctionYesYes (8.0+)YesNo
Full-textGIN tsvectorInnoDB FTSText indexNo native
GeoGiST + PostGISSpatial2dsphereNo native
VectorHNSW pgvectorNo nativeAtlas VectorNo
Concurrent buildCONCURRENTLYOnline DDLBackgroundN/A

Cập nhật: 2026-05-16