Bonus 6 — MySQL Internals & High Performance

“Khóa này Postgres-first vì lý do tốt. Nhưng MySQL/MariaDB chạy nửa internet — đặc biệt Shopify, Booking, FB, Wikipedia, GitHub. Hiểu khác biệt với Postgres khi cần move qua lại.”

Tags: database mysql mariadb innodb percona vitess Thời lượng: 5-6 ngày Prerequisites: Tuan-01-DB-Internals-Refresh (Postgres internals for comparison) Liên quan: Tuan-05-Transactions-Locking · Tuan-08-Zero-Downtime-Migration


1. MySQL Ecosystem 2024-2026

1.1 Forks + flavors

  • MySQL Community (Oracle) — original, GPLv2 + commercial dual
  • MariaDB — fork by original author (Monty), OSS purist, GPLv2
  • Percona Server — Oracle MySQL + perf patches, drop-in
  • Vitess — sharding layer on MySQL (YouTube, Slack, Square, GitHub)
  • PlanetScale — managed Vitess (announced shutdown of free tier 2024, still operating)
  • Aurora MySQL — AWS reimplementation, MySQL-compatible
  • Aiven, RDS, Cloud SQL — managed MySQL

License divergence:

  • MySQL (Oracle) GPLv2 + commercial
  • MariaDB GPLv2 only
  • Percona GPLv2

1.2 Major version timeline

  • MySQL 5.6 (2013) — InnoDB online DDL
  • MySQL 5.7 (2015) — JSON, generated columns
  • MySQL 8.0 (2018) — window functions, CTEs, atomic DDL, transactional data dictionary
  • MySQL 8.0.x (continuous releases 2018-2024)
  • MySQL 8.4 LTS (2024) — first true LTS release model
  • MySQL 9.0 Innovation (2024) — vector type, JS functions
  • MariaDB 10.6 LTS (2021) — System-versioned tables, JSON
  • MariaDB 11.x (2023+) — Optimizer trace, OPTIMIZE ENGINE_TABLE

1.3 MySQL 9.0 — vector support (2024)

CREATE TABLE docs (
    id INT,
    embedding VECTOR(1536)
);
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
SELECT * FROM docs ORDER BY VECTOR_DISTANCE(embedding, '[...]', 'COSINE') LIMIT 10;

Behind pgvector in maturity but emerging.


2. InnoDB Storage Engine

2.1 Architecture

graph TB
    Buffer[InnoDB Buffer Pool<br/>LRU + adaptive] --> Pages[16KB pages]
    Pages --> Tablespace[Tablespace files<br/>.ibd per table or shared]

    Buffer --> RedoLog[Redo log<br/>circular, fixed size<br/>physical changes]
    Buffer --> UndoLog[Undo log<br/>versions for MVCC + rollback]
    Buffer --> Binlog[Binary log<br/>logical changes for replication]

    Buffer --> DoubleWrite[Doublewrite buffer<br/>torn-page protection]

2.2 Clustered Index — fundamental vs Postgres

Key difference: InnoDB table data is physically stored inside primary key B+tree. Table IS the PK index.

Postgres: heap table (unordered) + separate B-tree PK index → PK lookup = index lookup + heap fetch (2 random I/O typical)

InnoDB: PK is the B+tree leaf containing row data → PK lookup = single B+tree walk

Implications:

  • PK lookup: InnoDB ~1-2x faster
  • Secondary index lookup: hits secondary → finds PK → looks up PK tree (2 walks) vs Postgres (1 walk + 1 heap fetch)
  • Choose PK carefully — affects EVERY secondary index size
  • PK should be: small, monotonic for insert performance
  • Updating PK = physically move row (avoid!)

2.3 Why monotonic PK matters in InnoDB

UUID v4 PK in InnoDB:

  • Random insert position in B+tree
  • Page splits all over
  • Slower writes (lot of fragmentation)
  • Worse than Postgres for same workload

→ Use auto-increment BIGINT, UUID v7, or app-generated sortable ID.

2.4 Page format

Similar concept to Postgres but 16KB default (vs Postgres 8KB).

  • More space per page, less overhead per row
  • Higher fan-out in B+tree (~3 levels for billion rows)

2.5 Three logs

Different from Postgres’ single WAL:

Redo log:

  • Physical changes (“page X bytes Y..Z changed to …“)
  • Circular, fixed size
  • Recovery from crash
  • MySQL 8.0.30+: innodb_redo_log_capacity replaces old innodb_log_file_size × innodb_log_files_in_group
# Modern (8.0.30+)
innodb_redo_log_capacity = 4G
 
# Legacy (deprecated)
innodb_log_file_size = 2G
innodb_log_files_in_group = 2

Undo log:

  • Pre-modification values
  • Used for: rollback + MVCC reads (old versions)
  • Separate from redo, stored in undo tablespace
  • Auto-truncated when transactions complete

Binary log (binlog):

  • Logical changes (statements or row events)
  • For replication
  • For PITR
  • Append only

3 logs vs Postgres 1 (WAL). More to manage, more nuance, more places things can go wrong.

2.6 Doublewrite buffer

Combat torn page writes. Before writing pages to .ibd, write to doublewrite buffer first. If crash mid-write, recover from doublewrite.

In Postgres = full_page_writes in WAL. Different mechanism, same goal.

2.7 Atomic DDL (MySQL 8.0+)

Pre-8.0: DDL was multi-step, could leave inconsistent state on crash. 8.0+: data dictionary in InnoDB, DDL atomic.

ALTER TABLE huge ADD COLUMN x INT;  -- atomic, recoverable

2.8 Instant DDL (8.0.12+, expanded 8.0.29+)

Many DDL operations now instant (metadata only):

  • ADD COLUMN at end (8.0.12+)
  • ADD COLUMN anywhere (8.0.29+)
  • DROP COLUMN (8.0.29+)
  • RENAME COLUMN (8.0.x)
-- Instant in 8.0.29+
ALTER TABLE huge ADD COLUMN new_col INT, ALGORITHM=INSTANT;
ALTER TABLE huge DROP COLUMN old_col, ALGORITHM=INSTANT;

Limitations: not all columns; row format must support.

2.9 Histogram statistics (8.0.3+)

Optimizer uses histograms for non-indexed columns.

ANALYZE TABLE orders UPDATE HISTOGRAM ON status WITH 100 BUCKETS;

Helps planner pick join order without index on every column.


3. MVCC in InnoDB

3.1 Read view per transaction

  • Each transaction allocates read view at start (RR) or per-statement (RC)
  • Tracks active transactions at snapshot
  • Reads use undo log to reconstruct prior versions

3.2 Differences from Postgres

Postgres MVCCInnoDB MVCC
Old versions locationIn heap (same page)Separate undo log
VACUUM needed?Yes (clean dead tuples)No (auto-truncate undo)
Tuple has xmin/xmaxYesHidden DB_TRX_ID column
MVCC bloatYes (table grows)No (rollback segment)
ReplicationWAL streamingBinlog (logical)

3.3 Hidden columns

InnoDB tables internally have 3 extra columns:

  • DB_TRX_ID — last transaction that modified the row
  • DB_ROLL_PTR — pointer to undo log entry
  • DB_ROW_ID — used when no PK defined (avoid this!)

4. Locking — Gap & Next-Key Locks

4.1 Row locks

SELECT ... FOR UPDATE, FOR SHARE similar to Postgres.

4.2 Gap locks — InnoDB unique

In REPEATABLE READ (default), InnoDB locks not just rows but gaps between rows to prevent phantom reads.

-- Session A
BEGIN;
SELECT * FROM users WHERE age BETWEEN 20 AND 30 FOR UPDATE;
-- Locks rows 20-30, AND gaps before 20, after 30, between rows
 
-- Session B
INSERT INTO users (name, age) VALUES ('X', 25);
-- BLOCKS due to gap lock

vs Postgres SERIALIZABLE = SSI with predicate locks. Different mechanism, same goal.

4.3 Next-key locks

Combination of row lock + gap before it.

Index: (10, 20, 30, 40)
Next-key lock on 30:
  - Locks row 30
  - Locks gap (20, 30] (exclusive 20, inclusive 30)

4.4 Deadlock pattern

Deadlocks more common in InnoDB due to gap locks. Tune:

innodb_lock_wait_timeout = 50    # default 50s
innodb_print_all_deadlocks = ON  # log all

Workaround for high deadlock rate: drop to READ COMMITTED isolation (no gap locks). App handles phantom reads.

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

Pattern at scale (Shopify, GitHub): default RC, not RR. Postgres opposite (default RC, opt-in RR/Serializable).

4.5 Auto-increment locks (8.0+)

  • innodb_autoinc_lock_mode = 2 — “interleaved” — no lock for INSERT
  • Default in MySQL 8.0+
  • Pre-8.0 default 1 — could lock for INSERT…SELECT
  • Use 2 for high-concurrency INSERT

5. Replication

5.1 Binlog-based async (traditional)

Default. Primary writes binlog. Replicas read + apply.

# my.cnf primary
log-bin = mysql-bin
server-id = 1
binlog_format = ROW    # ROW > STATEMENT for safety
gtid_mode = ON
enforce_gtid_consistency = ON
 
# replica
server-id = 2
replicate-do-db = appdb

5.2 GTID — Global Transaction ID

Modern replication uses GTID (server_uuid:txn_number). Auto-positioning replicas:

-- Replica setup auto
CHANGE MASTER TO MASTER_HOST='primary', MASTER_USER='repl', MASTER_AUTO_POSITION=1;
START SLAVE;  -- or START REPLICA in 8.0.22+

No need to track binlog file + position manually.

5.3 Semi-sync

Primary waits for AT LEAST 1 replica to ack before commit.

rpl_semi_sync_master_enabled = ON
rpl_semi_sync_master_timeout = 1000  # ms (fallback to async)

Compromise: tighter durability without full sync overhead.

5.4 Parallel replication (8.0+)

Replica apply parallelized:

slave_parallel_workers = 8
slave_parallel_type = LOGICAL_CLOCK   # default 8.0

LOGICAL_CLOCK based on commit timestamps → parallel apply for non-conflicting transactions.

This is a bigger advantage MySQL has over Postgres which still has single-threaded WAL apply.

5.5 Group Replication (5.7.17+)

Built-in cluster, multi-primary or single-primary. Paxos-based.

plugin_load_add = 'group_replication.so'
group_replication_group_seeds = "node1:33061,node2:33061,node3:33061"
group_replication_start_on_boot = ON

vs Postgres: Patroni, repmgr handle similar role at orchestration level.

5.6 InnoDB Cluster

Group Replication + MySQL Router + MySQL Shell admin → “InnoDB Cluster”. Easier setup than raw Group Replication.

5.7 InnoDB ClusterSet (8.0.27+)

Disaster recovery across data centers. Cross-region MySQL clusters.


6. Performance Patterns

6.1 Buffer pool sizing

innodb_buffer_pool_size = 12G  # ~70-80% RAM (different rule from Postgres 25-35%)
innodb_buffer_pool_instances = 8  # default 8 for buffer_pool > 1GB

Why higher than Postgres? InnoDB uses buffer pool more aggressively, doesn’t rely heavily on OS page cache.

Postgres lets OS handle disk cache; MySQL takes more responsibility.

6.2 Redo log sizing

# MySQL 8.0.30+ modern
innodb_redo_log_capacity = 4G
 
# Pre-8.0.30 (still works, deprecated)
innodb_log_file_size = 2G
innodb_log_files_in_group = 2

Larger redo log → fewer checkpoints, better write throughput. Trade-off: longer recovery time.

6.3 Flush settings

innodb_flush_log_at_trx_commit = 1   # 1=durable, 2=lose 1s on crash, 0=lose more
sync_binlog = 1                       # fsync binlog after each commit

1, 1 is ACID-compliant. 2, 0 faster but data loss on crash. Trade-off knob.

# Production safe
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
innodb_flush_method = O_DIRECT       # bypass OS cache for InnoDB

6.4 Performance Schema

Like Postgres pg_stat_statements but more granular.

SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;
 
-- Top queries by time
SELECT digest_text, count_star, avg_timer_wait/1e6 AS avg_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;
 
-- Wait events
SELECT * FROM performance_schema.events_waits_summary_global_by_event_name
ORDER BY sum_timer_wait DESC LIMIT 20;

6.5 sys schema

Wrapper views on performance_schema for easier human reading.

SELECT * FROM sys.statements_with_runtimes_in_95th_percentile;
SELECT * FROM sys.io_global_by_file_by_bytes;
SELECT * FROM sys.schema_unused_indexes;
SELECT * FROM sys.schema_redundant_indexes;

6.6 Query cache (deprecated)

MySQL 8.0 removed query cache entirely.

Pre-8.0 query cache often hurt more than helped (cache invalidation overhead). Use Redis or app-level instead.

6.7 Optimizer hints (modern syntax)

SELECT /*+ INDEX(orders idx_user) */ * FROM orders WHERE user_id = 42;
SELECT /*+ MAX_EXECUTION_TIME(1000) */ ...;
SELECT /*+ NO_INDEX_MERGE(orders) */ ...;

More versatile than legacy USE INDEX.

6.8 ProxySQL — query routing

MySQL middleware:

  • Read/write split (proxy to replicas for reads)
  • Connection pooling
  • Query firewall (reject patterns)
  • Query rewriting

Setup:

INSERT INTO mysql_servers VALUES (1, 'primary', 3306, 1000, 'ONLINE');
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (1, 1, '^SELECT.*FOR UPDATE', 1);  -- writes to primary

vs Postgres pgBouncer + app-level routing.


7. Sharding with Vitess

7.1 Architecture

graph TB
    Client --> Gate[VTGate<br/>SQL parser, router<br/>stateless]
    Gate --> Tablet1[VTTablet 1<br/>shard 1]
    Gate --> Tablet2[VTTablet 2<br/>shard 2]
    Gate --> Tablet3[VTTablet 3<br/>shard 3]

    Tablet1 --> MySQL1[(MySQL)]
    Tablet2 --> MySQL2[(MySQL)]
    Tablet3 --> MySQL3[(MySQL)]

    Topo[Topology server<br/>etcd/zk/consul]
    Tablet1 -.-> Topo
    Tablet2 -.-> Topo
    Tablet3 -.-> Topo
    Gate -.-> Topo

VTGate makes shards look like single MySQL to client.

7.2 VSchema

keyspaces:
  app:
    sharded: true
    vindexes:
      hash:
        type: hash
    tables:
      users:
        column_vindexes:
          - column: id
            name: hash
      orders:
        column_vindexes:
          - column: user_id
            name: hash

Routes by hashing column. VTGate parses query, routes to correct shard(s).

7.3 Operations

# Reshard
vtctldclient Reshard --workflow=v1 \
    --source-shards=- --target-shards=-80,80-
 
# MoveTables
vtctldclient MoveTables --workflow=move --target-keyspace=appdb \
    --source-keyspace=legacy --tables=users,orders

Online resharding — split shards as data grows. Big advantage over manual.

7.4 When Vitess

  • Outgrew single MySQL (>10TB or >50K TPS sustained)
  • Need to keep MySQL ecosystem
  • Want online resharding
  • Team comfortable with infra complexity

Big users: YouTube, Slack, GitHub (some), Square, Shopify (with Mysql sharding).

7.5 PlanetScale

Managed Vitess. Branching workflow (like git for schema), zero-downtime migrations via gh-ost integration.

Notable: removed free tier April 2024. Still operates as paid service.


8. Migration Tools Specific to MySQL

8.1 gh-ost (covered Tuần 08)

Online schema change. Better than pt-osc for most cases.

gh-ost \
    --alter="ADD COLUMN test_col INT NOT NULL DEFAULT 0" \
    --database=test --table=users \
    --execute

How: ghost table + binlog replay → atomic swap.

8.2 pt-online-schema-change (Percona Toolkit)

Older, trigger-based. Still works.

8.3 Vitess MoveTables, Reshard

Move data between keyspaces without downtime.

8.4 Percona Toolkit

Mature ecosystem:

  • pt-archiver — archive old rows
  • pt-table-checksum — verify replica integrity
  • pt-deadlock-logger — log deadlocks
  • pt-query-digest — analyze slow log
  • pt-summary — system info

8.5 mydumper / myloader

Multi-threaded mysqldump replacement. Much faster for big DBs.


9. Postgres vs MySQL Decision

flowchart TD
    A[Choose] --> B{Existing ecosystem?}
    B -->|Postgres apps team| C[Postgres]
    B -->|MySQL apps team| D[MySQL]
    A --> E{Heavy schema evolution?}
    E -->|Yes complex DDL| F[Postgres - better online DDL legacy]
    E -->|MySQL 8.0.29+ instant DDL OK| OK[Either]
    A --> G{Need rich types like JSONB, arrays, custom?}
    G -->|Yes| F
    A --> H{Massive horizontal scale needed?}
    H -->|Yes >10TB sharded| I[Vitess/MySQL or distributed Postgres]
    A --> J{Replication complex multi-region?}
    J -->|Yes| K[MySQL Group Replication or Aurora]
    A --> L{Parallel replication apply critical?}
    L -->|Yes high write rate| K2[MySQL 8 - parallel apply ahead of PG]

    style C fill:#c8e6c9
    style F fill:#c8e6c9

Personal opinion 2024-2026:

  • New project, no constraint → Postgres for richer features
  • Need parallel replica apply → MySQL 8
  • Heavy sharding from day 1 → Vitess (MySQL) or CockroachDB (Postgres-compat)
  • Existing team strong in one → stay with it; both are excellent

10. Anti-patterns

PatternWhy badFix
UUID v4 PK in InnoDBPage splits, slow insertsAuto-inc or UUID v7
No PK on InnoDB tableHidden DB_ROW_ID slow + 6 bytes/rowAlways explicit PK
innodb_buffer_pool_size = 25% RAM (Postgres rule)InnoDB needs more70-80% RAM
binlog_format = STATEMENTUnsafe with non-deterministic functionsROW
innodb_flush_log_at_trx_commit = 0 for OLTPData loss on crash1 (default)
Enabling query cache pre-8.0Mostly hurts perfAlready removed in 8.0
Long-running transactionsUndo log growsShort tx; monitor innodb_history_list_length
Frequent gap lock deadlocksDefault RRRC isolation level
Forgot to set innodb_redo_log_capacity (or old log_file_size)Frequent checkpointsSizing right
Auto-increment lock mode 0Serialize insertsMode 2 (default 8.0+)

11. Lab

Day 1: MySQL setup + performance schema

docker run -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=lab mysql:8.0

Enable performance schema, run mysqlslap workload, observe.

Day 2: Clustered index demonstration

  • Create table with auto-increment PK
  • Insert 1M rows, time it
  • Repeat with UUID v4 PK, observe slower
  • Compare with Postgres equivalent

Day 3: Gap locks reproduce

Two sessions, REPEATABLE READ, reproduce gap lock blocking. Switch to READ COMMITTED, observe no block.

Day 4: GTID replication

2 MySQL instances, GTID-based replication. Test failover.

docker run -d --name primary mysql:8.0 mysqld --log-bin --server-id=1 --gtid-mode=ON --enforce-gtid-consistency
docker run -d --name replica mysql:8.0 mysqld --server-id=2 --gtid-mode=ON --enforce-gtid-consistency --read-only

Day 5: Parallel replication

Generate write workload. Watch replica lag with parallel workers 1 vs 8. Compare.

Day 6: gh-ost migration

Run online schema change on 10M-row table while load running.

Day 7: ProxySQL routing

Setup: app → ProxySQL → primary + replicas. Configure read/write split. Test.


12. Self-check

  1. Clustered index InnoDB vs heap Postgres — pros/cons mỗi cái?
  2. UUID v4 PK trong InnoDB — vì sao tệ hơn Postgres?
  3. Redo / Undo / Binlog — 3 logs InnoDB, vai trò?
  4. Gap lock — vì sao deadlock common hơn Postgres?
  5. innodb_buffer_pool_size 70-80% RAM — vì sao khác Postgres?
  6. GTID replication — auto-positioning là gì?
  7. Parallel replication 8.0+ — advantage so Postgres?
  8. Instant DDL 8.0.29+ — operations support?
  9. Vitess sharding cơ chế?
  10. Khi nào pick MySQL vs Postgres 2024-2026?

13. Tiếp theo

Tuan-Bonus-Lakehouse-Engineering

Cập nhật: 2026-05-16