Bonus 3 — Embedded DBs: SQLite, DuckDB, RocksDB, LanceDB

“Server DB không phải answer cho mọi case. SQLite chạy trên 4 tỷ device. DuckDB là ‘SQLite for analytics’. RocksDB là engine behind nhiều big-data products. Hiểu khi nào embedded thắng server.”

Tags: database sqlite duckdb rocksdb embedded lancedb Thời lượng: 5-6 ngày Prerequisites: Tuan-01-DB-Internals-Refresh Liên quan: Tuan-14-OLAP-Columnar-ClickHouse · Tuan-15-Vector-DB-AI

1. Embedded DB Definition

Runs in-process with app (linked library)
No network roundtrip
Single-writer typical (some support multi-writer)
File-based persistence
App owns lifecycle (start/stop/backup)

graph LR
    subgraph "Server DB"
        App1[App] -.network.-> Server[(DB Server<br/>separate process)]
        Server --> Disk1[Disk]
    end

    subgraph "Embedded DB"
        App2[App + DB library] --> File[(DB file)]
    end

    style App2 fill:#c8e6c9

2. SQLite — Most Deployed DB on Earth

2.1 Where SQLite runs

Every iPhone, Android device
All major browsers (Chrome, Firefox, Safari)
Boeing/Airbus airplane systems
Most apps with local data (Notion local cache, Slack offline, IntelliJ, VSCode)
Test DBs, CI pipelines
Embedded devices, IoT

Estimated 4+ billion deployments. More than every other DB combined.

2.2 Why SQLite is great

Public domain (most permissive license possible)
Single file = single DB (easy backup, transport)
ACID (with WAL mode), full SQL
Battle-tested 30 years
~150KB binary
Quality: 100% MC/DC test coverage (NASA-level), TH3 test harness has hundreds of millions of test cases

2.3 Limitations

Single writer at a time (multi-reader OK)
No network (in-process only — kindof, via socket forwarders)
Theoretical max ~281 TB (4KB pages × 2^48), practical less due to single-writer
Not for high-concurrency server workload
No row-level locking (database lock for writers)
No procedures / triggers as rich as Postgres

2.4 SQLite at scale 2024-2026

Litestream

Continuous WAL replication SQLite → S3 (or compatible).

# Background daemon replicates WAL changes
litestream replicate /path/to/db.sqlite s3://bucket/db

Recovery on container start:

litestream restore -o /path/to/db.sqlite s3://bucket/db

Pattern: SQLite + Litestream = read-heavy web app on cheap container.

LiteFS

Fly.io’s distributed SQLite layer with leader-follower replication.

Status update 2024: Fly.io pivoted away from LiteFS as primary product. Project still maintained but less active. Use cautiously for new projects — Litestream more reliable.

Cloudflare D1

Managed SQLite at edge. Multi-region replication. Pay-per-query model.

Good for: low-latency global apps where data fits SQLite scale.

Turso

Managed SQLite (LibSQL fork) with replication + edge. Postgres-style billing.

2.5 SQLite mode tuning

PRAGMA journal_mode = WAL;        -- allow concurrent reads while writing
PRAGMA synchronous = NORMAL;       -- faster, slightly less safe
PRAGMA cache_size = -200000;       -- 200MB cache (negative = KB)
PRAGMA foreign_keys = ON;          -- NOT default!
PRAGMA temp_store = MEMORY;
PRAGMA mmap_size = 30000000000;    -- 30GB memory-mapped I/O
PRAGMA busy_timeout = 5000;        -- wait 5s for lock

WAL mode crucial for any concurrent app.

2.6 SQLite extensions

JSON1 — JSON functions (built-in modern)
FTS5 — full-text search
R*Tree — spatial
SpatiaLite — full GIS
sqlite-vec — vector search (2024)
sqlite-vss — vector via FAISS
rqlite — distributed SQLite over Raft

2.7 SQLite vector search (2024)

sqlite> .load sqlite-vec0
sqlite> CREATE VIRTUAL TABLE vec USING vec0(embedding float[1536]);
sqlite> INSERT INTO vec VALUES (1, '[0.1, 0.2, ...]');
sqlite> SELECT rowid, distance FROM vec WHERE embedding MATCH '[0.1, 0.2, ...]' ORDER BY distance LIMIT 10;

For small (< 1M vectors), in-process: fast + simple + no server.

2.8 When SQLite

Mobile, desktop apps (offline-first)
Single-server web app low traffic
Read-heavy with Litestream
CI test DB (faster than Postgres)
Local dev replace heavy Postgres
Edge deployment (Cloudflare D1)
AI agent state (LangChain, AutoGen)

2.9 When NOT SQLite

Multi-process write (multi-server web)
High concurrent write (>100/s sustained)
Need rich features (RLS, advanced types)
Need network access

3. DuckDB — SQLite for Analytics

3.1 Concept

In-process columnar OLAP. C++ library, embeddable from Python, R, Node, Rust, Java, Go.

import duckdb
con = duckdb.connect("analytics.db")  # or :memory:
con.execute("CREATE TABLE events AS SELECT * FROM 'events.parquet'")
df = con.execute("SELECT date, count(*) FROM events GROUP BY date").df()

Vs ClickHouse: server vs embedded. Vs SQLite: row vs columnar; analytics vs transactional.

3.2 Use cases

Notebook analytics (Jupyter, R)
ETL transformations (instead of Spark for medium data)
Local file analytics on Parquet/CSV/JSON
Embedded analytics in apps
BI tool backends (Metabase, Lightdash)
LLM agents doing data tasks
Test fixture for analytical queries

3.3 Strengths

Single binary
No server needed
Read Parquet/Arrow/CSV/JSON directly
Full SQL: joins, window functions, CTEs
Vectorized execution
Often faster than Postgres for analytics on local files
Plays well with Pandas/Polars dataframes

3.4 Limitations

Single-process (mostly)
Memory pressure on huge data (streaming exec but some ops need RAM)
Not designed for OLTP
Single writer

3.5 DuckDB Features 2024-2026

Extensions ecosystem 2024-2026:

httpfs — read S3/HTTPS directly
postgres — query Postgres tables as DuckDB tables
mysql — query MySQL
sqlite — query SQLite (interop)
json — JSON support
parquet — Parquet read/write
iceberg — Apache Iceberg (2024)
delta — Delta Lake (2024)
spatial — GIS
fts — full-text search
vss — vector similarity (2024)
excel — read xlsx

INSTALL httpfs;
LOAD httpfs;
SELECT * FROM 's3://bucket/events.parquet' LIMIT 10;

3.6 MotherDuck

Cloud DuckDB service. Hybrid local + cloud:

Local: instant queries on local data
Cloud: scale to TB on managed compute
Federated: query both with single SQL

Pattern: dev local, prod cloud. Same code.

3.7 DuckDB vs ClickHouse

	DuckDB	ClickHouse
Deployment	Embedded library	Server cluster
Concurrency	Single-process	Multi-user, many connections
Data size	Single machine (TB)	Cluster (PB)
Best for	Analyst notebooks, embedded analytics, ETL	Production analytics service
Setup	`pip install duckdb`	Cluster setup
Cost	Free, no infra	Server cost

3.8 Reading from cloud

-- Direct from S3
SELECT count(*) FROM 's3://bucket/events/*.parquet';
 
-- From Postgres
INSTALL postgres;
ATTACH 'postgres://...' AS pg;
SELECT count(*) FROM pg.users;

3.9 Performance tricks

-- Set threads
SET threads = 8;
 
-- Memory limit
SET memory_limit = '4GB';
 
-- Out-of-core execution (spill to disk)
SET temp_directory = '/tmp/duckdb';
 
-- Profile
EXPLAIN ANALYZE SELECT ...;

4. RocksDB — KV Storage Primitive

4.1 What RocksDB is

LSM-tree key-value storage engine. Embeddable C++ library. Forked from LevelDB (Google) by Facebook.

Not a database per se — a storage engine that other databases build on.

4.2 Who uses RocksDB

CockroachDB — uses Pebble (RocksDB inspired, Go rewrite)
TiDB / TiKV — uses RocksDB underneath
MyRocks — MySQL storage engine alternative to InnoDB (Facebook)
YugabyteDB — DocDB built on RocksDB
Cassandra (some setups via sidecar)
Apache Kafka — state stores
Apache Flink — RocksDB state backend
Apache Spark structured streaming
Solana validator — account state
Many distributed systems

4.3 LSM-tree mechanics

graph TB
    Write[Write] --> WAL[Write-Ahead Log]
    Write --> MemTable[MemTable in RAM<br/>sorted skip list]
    MemTable -->|flush when full| L0[Level 0 SST files]
    L0 -->|compaction| L1[Level 1]
    L1 -->|compaction| L2[Level 2]
    L2 -->|compaction| LN[Level N]

    Read[Read] --> MemTable
    Read --> L0
    Read --> L1
    Read --> L2
    Read --> LN

Write: append to WAL (durability) + MemTable (sorted in-memory)
MemTable full: flush to L0 SST file on disk
Background compaction: merges files level by level, deduplicates

4.4 Trade-offs vs B-tree

	B-tree (InnoDB, Postgres)	LSM (RocksDB)
Write	In-place (random I/O)	Append (sequential I/O)
Read	1-2 disk I/O (log N)	Possibly N levels
Write throughput	Lower	Higher
Read throughput	Higher	Lower
Write amp	Lower	Higher (compaction rewrites)
Space amp	Lower	Higher (multiple versions)
Best for	OLTP balanced	Write-heavy, append-mostly

4.5 RocksDB tuning knobs

write_buffer_size — MemTable size
max_write_buffer_number — how many MemTables
level0_file_num_compaction_trigger — when to compact L0
target_file_size_base — SST file size
bloom_filter_bits_per_key — bloom filter density (reduce reads)
compression — Snappy / LZ4 / Zstd

Each knob trades read vs write vs space.

4.6 When use RocksDB directly

Rare for app code. Common for:

Building storage layer of distributed system
Embedded high-write KV (queue, state)
Workloads needing tunable LSM
KV store inside app for state (Flink, Kafka Streams)

App devs: probably never directly. Important to know it exists when reading “X DB uses RocksDB”.

5. LanceDB — Embedded Vector DB

5.1 Concept

Lance file format + LanceDB query engine. Apache Arrow-based. Embedded mode.

import lancedb
db = lancedb.connect("./data")
table = db.create_table("docs", data=[
    {"id": 1, "embedding": [0.1, 0.2, ...], "text": "..."}
])
results = table.search([0.1, 0.2, ...]).limit(10).to_pandas()

5.2 Strengths

Single library, no server
Versioning + time travel (like Iceberg)
Vector + scalar filter
Fast on local SSD
S3 support (sync to cloud)

5.3 Use case

AI agents with local memory
Local RAG (over your notes, docs)
ML feature store (embedded mode)
Prototyping vector workflows

5.4 vs pgvector / Qdrant

	LanceDB	pgvector	Qdrant
Deploy	Embedded	Postgres ext	Server
Versioning	Built-in	No	No
Setup	`pip install`	Postgres	Docker
Scale	Local-first	Med	Server-side
Production	OK for read-mostly	Good general	Best for filtered search

6. Other Embedded Options

6.1 LMDB

Lightning Memory-Mapped Database. B-tree, memory-mapped.

Use cases:

State stores (etcd v2 used it)
Caches in-process
Bitcoin Core
OpenLDAP backend

Strengths: ACID, very fast reads, single-writer/multi-reader.

6.2 BoltDB / bbolt

Go-native B+tree KV store. Simple, single-writer.

Used by: etcd v3, Consul, InfluxDB v1.

Strengths: pure Go (no cgo), simple API.

6.3 BadgerDB

Go-native LSM. RocksDB equivalent for Go ecosystem.

Used by: Dgraph, IPFS.

Strengths: faster pure-Go alternative to RocksDB binding.

6.4 LevelDB

Google’s original LSM library. RocksDB descendant.

Used by: IPFS, blockchain projects, Chrome IndexedDB internals.

6.5 sled

Modern Rust embedded DB. Lock-free, B-epsilon tree. Promising but less battle-tested.

6.6 Redb

Pure Rust embedded KV store. Active development 2024.

7. Embedded vs Server Decision

flowchart TD
    A[Need DB] --> B{Multi-process / multi-host access?}
    B -->|Yes| C[Server DB]
    B -->|No - single app| D{Concurrency level?}
    D -->|Low to medium <100 writes/s| E[Embedded OK]
    D -->|High concurrent writes| C
    A --> F{Network latency matters?}
    F -->|Yes - sub-ms required| E
    F -->|No| C
    A --> G{Operational simplicity?}
    G -->|Maximum simple| E
    G -->|OK ops team| C

    style E fill:#c8e6c9
    style C fill:#fff9c4

8. Production Patterns

8.1 SQLite + Litestream for low-traffic SaaS

graph LR
    App[App container] --> SQLite[(SQLite file)]
    SQLite -.replicate WAL.-> Litestream[Litestream]
    Litestream -.upload.-> S3[(S3)]

    S3 -.restore on container start.-> SQLite

Patterns:

Containerized app with SQLite local
Litestream replicates WAL to S3 continuously (latency ~1s)
Crash/restart → restore latest from S3
Cost: $5/ m o n t h v s$ 50/month managed Postgres
Latency: 0 (in-process)

Limits: single-writer. OK for: personal SaaS, marketing site, dashboards, prototype.

8.2 DuckDB for embedded analytics

App has local analytics needs (admin dashboard, in-app reports):

# In Python app
import duckdb
con = duckdb.connect(":memory:")
con.execute("CREATE VIEW orders AS SELECT * FROM read_postgres('...', 'orders')")
report = con.execute("SELECT date, count(*) FROM orders GROUP BY date").df()

No separate analytical DB. Fast.

8.3 SQLite as test fixture

# Production: Postgres
# Test: SQLite (faster setup, parallel test isolated)
DB_URL = "sqlite:///:memory:" if TESTING else "postgresql://..."

Caveat: SQL dialect differences. SQLAlchemy abstracts. Or use real Postgres in tests (testcontainers) for fidelity.

8.4 LanceDB for local LLM agent

# AI agent with local memory + RAG
import lancedb
db = lancedb.connect("./agent_memory")
mem = db.open_table("memory")
 
# Retrieve relevant memories
relevant = mem.search(query_embedding).limit(5).to_pandas()
 
# Use as LLM context

No server, no API key, all local. Perfect for dev.

8.5 DuckDB in data pipeline

# Replace Spark for medium data
import duckdb
duckdb.sql("""
    COPY (
        SELECT date_trunc('day', ts) as day, count(*) as events
        FROM read_json_auto('logs/*.json')
        GROUP BY day
    ) TO 'output.parquet' (FORMAT PARQUET)
""")

Single laptop processes 100GB/hour. No cluster.

9. Tooling

DBeaver — SQLite, DuckDB GUI
sqlite-web — SQLite web UI
DuckDB CLI — interactive shell
DBBrowser for SQLite — friendly GUI
DataGrip — paid commercial
MotherDuck — cloud DuckDB UI

10. Anti-patterns

Pattern	Why bad
SQLite for high-concurrent write web app	Single writer locks app
SQLite without WAL mode	Concurrent reads blocked
SQLite + foreign_keys = OFF (default!)	Referential integrity ignored
DuckDB for OLTP (frequent small writes)	LSM-style not designed
DuckDB as long-running service	Not designed multi-user concurrent
Direct RocksDB usage in app code	Reinventing wheel
LMDB without sizing right	Map file grows unbounded if not sized
Treating embedded as drop-in for Postgres	Feature gap
Backup by file copy while running	WAL not consistent; use VACUUM INTO or hot backup tool

11. Lab

Day 1: SQLite WAL benchmarking

Run pgbench-equivalent workload on SQLite WAL vs default journal. Compare TPS.

# SQLite + bench
sqlite3 test.db
PRAGMA journal_mode = WAL;
-- ... run workload

Day 2: Litestream backup/restore

Setup app writing to SQLite + Litestream to local S3 (minio). Kill, restore.

litestream replicate ./db.sqlite file://./backup
litestream restore -o ./db.sqlite file://./backup

Day 3: DuckDB pipeline

Load 100M-row dataset (NYC Taxi). Build aggregation. Compare with Postgres on same data + same query.

Day 4: DuckDB + S3

Read Parquet directly from S3:

INSTALL httpfs; LOAD httpfs;
SELECT count(*) FROM 's3://bucket/data.parquet';

Day 5: RocksDB intro

Use Python/Go RocksDB binding for KV store. Tune compaction. Watch SSTs build.

Day 6: LanceDB RAG

Build local RAG with LanceDB + sentence-transformers + local LLM. Query offline.

Day 7: Migration exercise

Move small Postgres DB to SQLite + Litestream. Estimate cost savings. Compare query latency.

12. Self-check

SQLite + WAL mode — vì sao mandatory cho concurrent?
Litestream giải quyết vấn đề gì với SQLite?
DuckDB vs ClickHouse — pick mỗi cái khi nào?
RocksDB là DB hay storage engine? Ai dùng nó?
LSM vs B-tree — write/read trade-off?
SQLite max DB size theoretical?
LanceDB vs pgvector — pros/cons embedded approach?
SQLite at edge (Cloudflare D1) — use case?
DuckDB extensions — name 5?
Khi nào embedded thắng server?

13. Tiếp theo

Tuan-Bonus-CDC-Debezium

Cập nhật: 2026-05-16

lthieu's notes

Explorer

Tuan-Bonus-Embedded-DBs