Tuần 06 — Connection Pooling & PgBouncer
“Bạn config Postgres
max_connections = 1000. App spawn 800 connection. Mọi thứ chạy đẹp đẽ — đến khi traffic 2x lên, app spawn 1500 → reject. Bạn tăng max_connections → RAM OOM, kernel kill process Postgres. Tại sao Postgres không ‘just scale’? Vì model process-per-connection. Hôm nay học cách bypass nó.”
Tags: database postgresql pgbouncer connection-pooling operations Thời lượng: 7 ngày (4-5h/ngày) Prerequisites: Tuan-01-DB-Internals-Refresh (process model) Liên quan: Tuan-09-DB-Observability-Tuning
1. Context & Why
1.1 Postgres process-per-connection model
graph LR Postmaster[postmaster<br/>main process] --> B1[Backend 1<br/>~10MB RAM] Postmaster --> B2[Backend 2<br/>~10MB RAM] Postmaster --> B3[Backend N<br/>~10MB RAM] Postmaster --> Bgw[bgwriter, walwriter,<br/>autovacuum workers] App1[App Client 1] -.TCP.-> B1 App2[App Client 2] -.TCP.-> B2 App3[App Client N] -.TCP.-> B3
Mỗi connection = 1 OS process. Cost:
- ~10MB base RAM
- ~1-3 ms setup time (fork + auth)
- File descriptors, kernel scheduling
- Backend allocates work_mem when needed (per operation)
Soft limit thực tế: 100-300 active connections. Trên đó:
- RAM cạn (process overhead × N)
- CPU context switching dominate
- Postgres internal contention (lightweight locks)
1.2 Vấn đề: App muốn hàng nghìn connection
Web app pattern: mỗi request → 1 connection. 10K req/s đỉnh → 10K concurrent connection. Postgres không scale.
graph TB subgraph "Không có pool" C1[App 1] --persistent-- DB1[(Postgres)] C2[App 2] --persistent-- DB1 C3[App 3] --persistent-- DB1 C4[App N=1000] --persistent-- DB1 DB1 -.crash.-> X[OOM 💥] end subgraph "Có pool" D1[App 1] --> P[PgBouncer] D2[App 2] --> P D3[App N=10000] --> P P --20 conns--> DB2[(Postgres)] end style X fill:#ef9a9a style DB2 fill:#c8e6c9
1.3 Mục tiêu tuần
- Hiểu rõ Postgres process model + ngân sách connection
- Compute pool size đúng (Little’s Law)
- Setup PgBouncer 3 modes — pick đúng cho từng workload
- Diagnose connection storm sau restart
- Hiểu prepared statement trap với transaction pooling
- Compare PgBouncer vs RDS Proxy vs PgCat vs Supavisor
- Monitor pool health: waiting clients, idle connections, churn
1.4 Tham chiếu
- PgBouncer docs — https://www.pgbouncer.org/usage.html
- Brandur Leach — PgBouncer transaction pooling — https://brandur.org/postgres-connections
- Citus blog — Connection pooling — https://www.citusdata.com/blog/
- Supabase Supavisor — https://github.com/supabase/supavisor (Elixir-based, multi-tenant)
- PgCat — https://github.com/postgresml/pgcat (Rust, load balancing + sharding)
- RDS Proxy docs — AWS
2. Pool Sizing — Toán học
2.1 Little’s Law
L (concurrent requests) = λ (req/sec) × W (sec/req)
Ví dụ:
- 1000 req/s
- 50 ms per query (avg)
- Concurrent queries: 1000 × 0.05 = 50
→ Cần ~50 connection. Không phải 1000.
2.2 Formula thực dụng
pool_size = (CPU_cores × 2) + effective_spindle_count
- HikariCP recommendation
- “Spindle count” = effective disk parallelism (SSD = high)
- Modern cloud: 16 CPU, NVMe → ~32-50 connections
Postgres specific: With max_connections = 100:
- Reserve ~10 cho superuser, autovacuum, replication
- App có 90 usable
- Distributing: 3 app instances × 30 each, hoặc 1 pgbouncer pool
2.3 Pool size quá cao
Hiện tượng:
- Throughput giảm khi tăng pool size
- CPU 100% mà queries chậm hơn
- Context switching wasted
Tại sao? Mỗi connection compete CPU. Quá nhiều thread thrashing > work.
2.4 Pool size quá thấp
Hiện tượng:
- App queue chờ pool
- P99 latency tăng
- Throughput plateau dưới capacity
Đo: monitor pool waiting clients của PgBouncer.
2.5 Optimal — Find by experiment
graph LR A[Start: pool_size = CPUcoresx2] --> B[Run load test] B --> C[Measure: TPS, P99 latency] C --> D{TPS plateau?} D -->|No| E[Increase pool size] D -->|Yes - throughput max| F{Latency acceptable?} F -->|Yes| G[Done] F -->|No| H[Reduce pool size<br/>trade latency for throughput] E --> B H --> B
3. PgBouncer — Lightweight Pooler
3.1 Architecture
graph LR subgraph "Clients" C1[App 1<br/>500 conn] C2[App 2<br/>500 conn] C3[App 3<br/>500 conn] end subgraph "PgBouncer" FB[Front: 1500 client conns<br/>single Linux process<br/>libevent-based] BB[Back pool: 20-50 conns] end subgraph "Postgres" PG[(Postgres 16<br/>max_connections=100)] end C1 --> FB C2 --> FB C3 --> FB FB --> BB --> PG
Key: PgBouncer single-process, async I/O (libevent). Một instance handle ~10K client connections trên 1 CPU core.
3.2 3 Pool modes
graph TB subgraph "Session pooling" S1[Client → server tied for whole session<br/>Most compatible<br/>Limited gain<br/>~1:1 mapping like no pool] end subgraph "Transaction pooling" T1[Client → server only for one transaction<br/>Server returned to pool on COMMIT/ROLLBACK<br/>Max gain<br/>Some features broken] end subgraph "Statement pooling" ST1[Server returned after EACH statement<br/>Auto-rollback uncommitted<br/>Almost never used] end style T1 fill:#c8e6c9 style ST1 fill:#ffccbc
3.3 Session pooling
- Default
- Client gets dedicated server connection until disconnect
- Multiplexing: many short clients reuse same server
- Limitations: nothing significant — full compatibility
Khi dùng:
- App có long-lived connections (đa số ORM)
- Khi không thể đảm bảo transaction-clean state
Pool size = max concurrent client connections, không phải request rate.
3.4 Transaction pooling — game changer
Server returned to pool sau mỗi COMMIT/ROLLBACK.
Use case: 10K clients, 99% time idle giữa transactions → handful of servers đủ.
Critical limitation: features không work:
- Prepared statements (pre PgBouncer 1.21 — đã fix 2024)
- Session-level state:
SET,LISTEN/NOTIFY, temporary tables, advisory session locks - Cursor
WITH HOLD SET ROLEpersistent
Pre-PgBouncer 1.21 trap: app dùng PREPARE name AS ... → prepared statement bind on server X. Next transaction lands server Y → “prepared statement does not exist”. Symptoms: random errors under load.
3.5 PgBouncer 1.21+ (Sept 2023) — Prepared statement support
Background: Simple vs Extended protocol
graph TB subgraph "Simple protocol" S1[Client → Server: Query 'SELECT * WHERE id=42'] S2[Server: parse + execute + return] end subgraph "Extended protocol" E1[PARSE name + 'SELECT * WHERE id=$1'] E2[BIND name + 42] E3[EXECUTE name] E4[Result] end subgraph "Explicit PREPARE" P1[PREPARE q AS 'SELECT * WHERE id=$1'] P2[EXECUTE q 42] end
Simple protocol — 1 message, full SQL inline. No prepare. Works fine with transaction pooling because no server state.
Extended protocol (PARSE/BIND/EXECUTE) — Most modern clients (asyncpg, psycopg3, pgx, pg-promise) use this even when you write “simple-looking” code. Library auto-prepares for SQL injection safety.
Explicit PREPARE/EXECUTE — User-level PREPARE q AS ... statement. Less common in app code, more in functions/scripts.
The historical pain (pre-1.21)
Extended protocol with transaction pooling:
- Client sends
PARSE→ PgBouncer routes to server X, X prepares - Connection returned to pool after COMMIT
- Next transaction → maybe server Y, no prepared statement →
prepared statement does not exist
Symptoms: random failures under load, especially after scaling out. Hard to debug.
Workarounds pre-1.21:
- Force session pooling (defeats benefit)
- Force simple protocol (most drivers don’t expose option cleanly)
- App:
DISCARD ALLafter each transaction (kills prepared cache, slow) - App: regenerate prepared statement on error (retry logic)
PgBouncer 1.21+ solution
PgBouncer tracks prepared statements client-side, replays them when server switch happens transparently.
[pgbouncer]
max_prepared_statements = 100 ; per-client maxHow it works:
- Client sends PARSE → PgBouncer caches name+sql client-side
- Forwards to server, server prepares
- Connection returned to pool
- Next transaction lands different server
- Client sends EXECUTE → PgBouncer detects missing prepared on this server, re-runs PARSE silently, then EXECUTE
Transparent to app. Massive win.
Compatibility 2024-2026
| Library | Extended protocol | PgBouncer 1.21+ tx pooling | Notes |
|---|---|---|---|
| psycopg3 (Python) | Yes | ✓ | Set prepare_threshold or 0 to disable |
| asyncpg (Python) | Yes, named auto | ✓ | Was main problem child pre-1.21 |
| node-postgres (pg) | Yes (extended after prepare) | ✓ | Mostly fine |
| pgx (Go) | Yes | ✓ | SimpleProtocol=true was workaround |
| pg-promise | Yes | ✓ | Inherits node-postgres |
| psycopg2 (Python) | Mostly simple | ✓ | Less affected, mostly fine |
| Hibernate (Java) | Yes, server-prepared | ✓ | useServerPrepStmts=true |
Test with your stack before declaring “fine”.
Versions worth knowing
- 1.21 (Sept 2023) — initial prepared statement support
- 1.22 — improved error handling, more reliable cancel
- 1.23 (2024) — better metrics, SSL improvements
Settings
max_prepared_statements = 100 ; if 0 = feature disabled
server_reset_query = ; can clear in 1.21+ (was 'DISCARD ALL')Don’t run DISCARD ALL as server_reset_query with 1.21+ — undoes prepared statement caching.
Monitoring prepared
SHOW POOLS;
-- New columns in 1.21+: see prepared stmt count per poolLogs show prepared replays:
INFO sending prepared statement '%s' to server
Edge cases still
- Cursor
WITH HOLD— still tied to session, transaction pool breaks - Temporary tables — session state, broken
- Advisory session locks — broken; use
pg_advisory_xact_lockinstead SET(notSET LOCAL) — session state, broken- LISTEN/NOTIFY — needs persistent session
For these, use session pooling on a dedicated pool, or direct connection.
3.6 Setup PgBouncer
; pgbouncer.ini
[databases]
appdb = host=localhost port=5432 dbname=appdb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600 ; close idle server after 10min
server_lifetime = 3600 ; recreate server connection after 1h (avoid leaks)
server_reset_query = DISCARD ALL
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
max_prepared_statements = 100 ; PgBouncer 1.21+userlist.txt:
"app_user" "SCRAM-SHA-256$4096:..."
3.7 Monitor PgBouncer
-- Connect to PgBouncer admin db
psql -p 6432 -U pgbouncer pgbouncer
-- Active pools
SHOW POOLS;
-- database, user, cl_active, cl_waiting, sv_active, sv_idle, sv_used, sv_tested, sv_login
-- Stats
SHOW STATS;
-- total_xact_count, total_query_count, total_received, total_sent, total_xact_time, total_query_time
-- Lists
SHOW LISTS; -- numbers
SHOW CLIENTS; -- per-client info
SHOW SERVERS; -- per-server info
SHOW DATABASES; -- configured DBsKey metrics:
cl_waiting> 0 (sustained) — pool too small, clients queueingsv_idleconsistently high — pool too big, wastetotal_query_time / total_query_count— avg query latency
3.8 Reload config without restart
RELOAD; -- reload pgbouncer.ini
RECONNECT; -- reset all server connections
SUSPEND; -- pause all activity (e.g., for failover)
RESUME;4. Transaction Pool Compatibility
4.1 Cái gì BREAK
Khi dùng transaction pooling, không đảm bảo subsequent requests dùng same backend. Nên session state mất.
Features ảnh hưởng:
graph TB subgraph "BROKEN với transaction pooling" A1[LISTEN/NOTIFY] A2[Temporary tables] A3[Advisory session locks<br/>pg_advisory_lock] A4[SET role/search_path persistent] A5[Cursor WITH HOLD] A6[Prepared statement pre-1.21] end subgraph "WORK" B1[Standard SQL] B2[Transactions] B3[Advisory xact locks<br/>pg_advisory_xact_lock] B4[SET LOCAL] B5[Server-side cursor in txn] B6[Prepared statement 1.21+] end style A1 fill:#ef9a9a style A2 fill:#ef9a9a style A3 fill:#ef9a9a style A4 fill:#ffccbc style B1 fill:#c8e6c9 style B3 fill:#c8e6c9 style B4 fill:#c8e6c9
4.2 SET LOCAL pattern
BEGIN;
SET LOCAL search_path TO tenant_42;
SET LOCAL statement_timeout = '5s';
SELECT * FROM customers;
COMMIT;
-- SET LOCAL chỉ có effect trong transaction → works with transaction poolingvs:
SET search_path TO tenant_42; -- session-level, broken4.3 Migration framework cẩn thận
Flyway, Liquibase, sqlx migrate có thể dùng session features (LOCK, advisory). Khi run qua PgBouncer transaction mode → broken.
→ Pattern: migrations bypass PgBouncer, connect direct port 5432:
App → PgBouncer (6432) → Postgres (5432)
Migration tool → Postgres (5432) directly
4.4 LISTEN/NOTIFY — không work
# Won't work via transaction pooling
conn.execute("LISTEN order_created")
# Next call may land different server, LISTEN lost→ Use direct connection cho pub/sub. Hoặc dedicated pool with session mode.
5. Connection Storms
5.1 Vấn đề
sequenceDiagram participant Apps as App fleet (200 instances) participant DB as Postgres Note over DB: Restart for upgrade DB-->>Apps: All connections drop Note over Apps: 200 instances retry... Apps->>DB: 200 × 50 = 10K reconnect attempts Note over DB: fork 10K backends...<br/>OOM 💥
Mỗi connection trên Postgres = fork + auth + memory. 10K simultaneous → kernel struggles.
5.2 Mitigation
1. PgBouncer in front — connection storm hit PgBouncer, không phải Postgres:
Apps → PgBouncer (handle storm) → Postgres (max 20 conn always)
2. App-level connection pool with stagger:
# psycopg pool with retries + jitter
from psycopg_pool import ConnectionPool
pool = ConnectionPool(
conninfo,
min_size=2,
max_size=10,
max_lifetime=3600,
reconnect_timeout=60,
reconnect_failed=lambda pool: time.sleep(random.uniform(0.5, 2.0)) # jitter
)3. Slow startup: Don’t restart all app instances simultaneously. Rolling restart with delay.
4. Health check tuning: K8s readiness probe should fail fast nếu DB unreachable, prevent thundering herd.
6. RDS Proxy (AWS)
6.1 Managed PgBouncer-like
AWS RDS Proxy. Tương đương PgBouncer transaction mode + managed.
Pros:
- Fully managed (HA, scaling, monitoring CloudWatch)
- IAM authentication
- Failover-aware (smooth across DB failover)
Cons:
- Cost: ~$0.015/hour per IO unit
- Latency add ~1-3ms
- Some features còn limited
Pattern 2024: nếu trên AWS RDS, default dùng RDS Proxy trừ khi có lý do tránh.
6.2 Compared with PgBouncer
| PgBouncer | RDS Proxy | |
|---|---|---|
| Setup | Manual (EC2 / k8s) | Click ON |
| HA | DIY | Built-in |
| Pool modes | session/transaction/statement | session/transaction only |
| Custom config | Full control | Limited |
| Cost | EC2 cost | RDS Proxy IO units |
| Latency | ~0.3ms | ~1-3ms |
| Features | Some incompat | More compat (Aurora Serverless) |
7. Alternatives — Modern Poolers 2024-2026
7.1 PgCat (Rust)
https://github.com/postgresml/pgcat
Features beyond PgBouncer:
- Load balancing across replicas
- Read/write split via SQL parser
- Sharding (route by hash)
- Auth proxying
Use case: starting to outgrow PgBouncer (need read replicas routing).
7.2 Supavisor (Elixir)
Supabase’s pooler. Built on Erlang/OTP → great for many tenants.
Features:
- Multi-tenant pool isolation
- Cluster mode (replicate state)
- Connection migration
Use case: SaaS with thousands of tenants needing isolation.
7.3 Odyssey (Yandex)
Multi-threaded PgBouncer. Better CPU utilization on multi-core.
Pros:
- Multi-threaded (PgBouncer is single-threaded)
- Storage-aware routing
- Yandex production-proven
Cons:
- Less community than PgBouncer
- Config complexity
7.4 Built-in pool (PostgreSQL 17+ planned)
Long-running proposal: Postgres native built-in pooler. PG17 not yet, PG18+ possibly. Watch the space.
7.5 Decision matrix
flowchart TD A[Need pooler?] --> B{Workload?} B -->|Small <1000 conns, no special features| C[PgBouncer session mode] B -->|High concurrency, need transaction pooling| D[PgBouncer transaction mode] B -->|Need read/write split| E[PgCat] B -->|Multi-tenant SaaS| F[Supavisor] B -->|AWS RDS / Aurora| G[RDS Proxy] B -->|Multi-core box, 100K+ clients| H[Odyssey] style C fill:#c8e6c9 style D fill:#c8e6c9
8. App-side: ORM Pool Tuning
8.1 Common pool implementations
| ORM/Lib | Pool implementation |
|---|---|
| psycopg3 (Python) | ConnectionPool |
| asyncpg (Python async) | asyncpg.create_pool() |
| SQLAlchemy | QueuePool (default) |
| Node.js node-postgres | pg.Pool |
| Go pgx | pgxpool |
| Java | HikariCP |
| Ruby ActiveRecord | ConnectionPool |
8.2 Common settings
# Example: psycopg3
ConnectionPool(
conninfo="postgresql://user:pass@host/db",
min_size=5, # warm pool
max_size=20, # cap
max_lifetime=3600, # recycle after 1h
max_idle=600, # close if idle 10min
timeout=30, # wait for available conn
)// node-postgres
const pool = new Pool({
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
allowExitOnIdle: false,
});8.3 Pool stacking
graph LR App[App with pool=20] --> PgBouncer[PgBouncer pool=50] PgBouncer --> PG[(Postgres max=100)]
If app pool maxes out → app queue. If PgBouncer pool maxes → backend queue (waiting on PgBouncer).
Sizing:
- App pool < PgBouncer pool < Postgres max_connections
- E.g., 10 app × 20 pool = 200 client conns → PgBouncer 50 conns → Postgres 100 max
8.4 Connection lifetime
Long-lived connection:
- ✅ Avoid setup overhead
- ❌ Server-side memory accumulates
- ❌ Stale connection sau DB restart
→ max_lifetime recycle. Default 1h-2h typical.
8.5 Health check / validation
Idle connection in pool có thể dead (network drop, DB restart). Validate trước trả về app:
ConnectionPool(
...,
check=lambda conn: conn.execute("SELECT 1"), # health check
)Trade-off: latency overhead per acquire. Disable cho hot path.
9. Monitoring & SLOs
9.1 Key metrics
Postgres side:
pg_stat_database.numbackends— active connectionspg_stat_activitystate distributionpg_stat_database.xact_commit, xact_rollback— TPSpg_stat_bgwriter— checkpoint, buffer activity
PgBouncer side:
cl_active, cl_waiting— client queueingsv_idle, sv_active— server utilizationrequest_per_second, query_per_second— load
App side:
- Pool acquire wait time
- Pool exhaustion errors
- Query duration P50/P95/P99
9.2 Dashboard pattern
graph LR PG[Postgres exporter] --> Prom[Prometheus] PGB[PgBouncer exporter] --> Prom App[App metrics] --> Prom Prom --> Graf[Grafana dashboard] Graf -.alerts.-> Slack[Slack/PagerDuty]
Tool: prometheus-pgbouncer-exporter, postgres_exporter.
9.3 SLOs to track
| Metric | Target |
|---|---|
| Connection acquire P99 | <100ms |
| Pool waiting clients | <1% time waiting |
| Postgres connections | <80% max_connections |
| PgBouncer cl_waiting sustained | <5 |
| App connection error rate | <0.01% |
9.4 Alerts
# Pool exhaustion
- alert: PgBouncerWaiting
expr: pgbouncer_pools_client_waiting_count > 10
for: 5m
# Postgres connections high
- alert: PostgresConnectionsHigh
expr: pg_stat_database_numbackends / pg_settings_max_connections > 0.8
for: 5m
# Storm
- alert: ConnectionsSpike
expr: rate(pg_stat_database_numbackends[1m]) > 50
for: 1m10. Production Patterns
10.1 Multiple pools for different workloads
; pgbouncer.ini
[databases]
appdb_oltp = host=primary dbname=appdb pool_size=50 pool_mode=transaction
appdb_reports = host=primary dbname=appdb pool_size=5 pool_mode=session ; long sessions for reports
appdb_replica = host=replica dbname=appdb pool_size=30 pool_mode=transactionApp route by workload:
- OLTP web requests →
appdb_oltp - BI queries →
appdb_reports - Read-only API →
appdb_replica
10.2 Multi-region
graph TB subgraph "Region US" AppUS[App US] PgbUS[PgBouncer US] PgUS[(Postgres Primary US)] end subgraph "Region EU" AppEU[App EU] PgbEU[PgBouncer EU] PgEUR[(Postgres Replica EU)] end AppUS --> PgbUS --> PgUS AppEU --> PgbEU --> PgEUR PgUS -.replication.-> PgEUR
PgBouncer local to app → low latency to pool, then traverse region for writes.
10.3 PgBouncer HA
PgBouncer SPOF nếu chỉ 1 instance. Options:
- 2+ instances behind LB (HAProxy, AWS NLB)
- Sticky connections OK với transaction mode
- Failover scripts (Patroni, repmgr)
K8s pattern:
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 3
template:
spec:
containers:
- name: pgbouncer
image: pgbouncer:1.21
resources: { requests: { cpu: 500m, memory: 256Mi } }LB service ClusterIP for app to connect.
10.4 Authentication
PgBouncer needs to authenticate client → server. 3 modes:
auth_type = trust— no auth (dev only)auth_type = md5— legacyauth_type = scram-sha-256— modern (PgBouncer 1.16+)
Modern setup: pgbouncer + scram, user list synced from Postgres.
10.5 Statement timeout
[databases]
appdb = host=... query_wait_timeout=120 idle_transaction_timeout=30PgBouncer kill server connection if query waits too long. Backstop for runaway queries.
11. Anti-patterns
| Pattern | Tại sao tệ | Fix |
|---|---|---|
max_connections = 1000 trên Postgres | OOM, context switch hell | Pool, không tăng max_connections |
| No pooler, app pool 20 × 50 instances = 1000 conn | Connection storm waiting | PgBouncer in front |
| Pool size = ∞ in app | Effective DoS to DB | Always cap |
| Transaction pool mode + LISTEN/NOTIFY in code | Silent broken | Session mode hoặc separate dedicated pool |
| Idle connection forever | Stale, leak memory | max_lifetime |
| No health check trên acquire | Random query fail | Validate (optional, perf-sensitive) |
| PgBouncer 1 instance for critical workload | SPOF | HA pair |
| Run schema migration through PgBouncer transaction mode | Locks/advisory may fail | Direct to Postgres |
| Tăng pool_size to “fix latency” | Throughput tệ hơn | Find optimal point (~CPU×2) |
| Mọi service share 1 pool | One slow service blocks all | Per-service pools |
12. Lab — 7 days
12.1 Day 1: Postgres process model
# Run Postgres with low max_connections
docker run -p 5432:5432 -e POSTGRES_PASSWORD=lab -e POSTGRES_MAX_CONNECTIONS=20 postgres:16
# Open 20 connections - psql 20 windows or script
for i in {1..20}; do psql -h localhost -U postgres -c "SELECT pg_sleep(60)" & done
# Try 21st - rejected
psql -h localhost -U postgres -c "SELECT 1"
# FATAL: sorry, too many clients alreadyMeasure RAM per connection:
docker exec -it postgres ps -auxf | grep postgres12.2 Day 2: Setup PgBouncer
# docker-compose.yml
services:
postgres:
image: postgres:16
environment: { POSTGRES_PASSWORD: lab }
command: postgres -c max_connections=50
pgbouncer:
image: edoburu/pgbouncer:latest
environment:
DATABASE_URL: "postgres://postgres:lab@postgres:5432/postgres"
POOL_MODE: transaction
MAX_CLIENT_CONN: 1000
DEFAULT_POOL_SIZE: 20
ports: ["6432:6432"]Test:
psql -h localhost -p 6432 -U postgres
# Through PgBouncer
# pgbench through pool
pgbench -h localhost -p 6432 -U postgres -c 100 -j 4 -T 60 postgres
# 100 client connections through 20 pool12.3 Day 3: Transaction pool break
# transaction_pool_break.py
import psycopg
conn = psycopg.connect("host=localhost port=6432 user=postgres")
# This works
conn.execute("LISTEN test_channel")
conn.execute("NOTIFY test_channel, 'hello'")
# After: open new client connection (simulates next request → different backend)
conn2 = psycopg.connect("host=localhost port=6432 user=postgres")
conn2.execute("NOTIFY test_channel, 'world'")
# conn never receives - different backend12.4 Day 4: Monitor pool
Setup Grafana dashboard:
docker run -d --name prometheus prom/prometheus
docker run -d --name grafana grafana/grafana
# Add pgbouncer_exporterLoad test với pgbench, observe:
- cl_active vs cl_waiting
- Pool saturation curve
12.5 Day 5: Find optimal pool size
for size in 5 10 20 30 50 100; do
# update pgbouncer config DEFAULT_POOL_SIZE
pgbench -h localhost -p 6432 -c 200 -j 8 -T 30 postgres
donePlot TPS vs pool size. Find sweet spot.
12.6 Day 6: Reproduce connection storm
# Storm
import threading, psycopg
def connect():
psycopg.connect("host=localhost port=5432 user=postgres password=lab")
time.sleep(60)
for _ in range(500):
threading.Thread(target=connect).start()Watch Postgres logs/errors. Then route through PgBouncer — storm absorbed.
12.7 Day 7: Migration through pool
# Migration tool e.g. golang-migrate
# Through PgBouncer transaction mode - watch fails
migrate -path migrations -database "postgres://...@host:6432/db" up
# Through direct - works
migrate -path migrations -database "postgres://...@host:5432/db" up13. Self-check
- Vì sao Postgres không scale tốt với 1000+ connections trực tiếp?
- Little’s Law: 5000 req/s, 20ms/query — cần bao nhiêu connection?
- 3 PgBouncer pool mode — khác nhau? Khi nào pick mỗi cái?
- Transaction pool mode break feature nào? Workaround?
- Prepared statement + transaction pool: trap gì? Postgres 1.21 fix thế nào?
- Connection storm — cơ chế, mitigation?
- RDS Proxy vs PgBouncer — pick khi nào?
- App pool 20, PgBouncer 50, Postgres max 100 — flow của a request?
- Long-running connection — pros/cons? Settings nào tune?
- Monitor pool health: 3 metrics quan trọng nhất?
14. Tiếp theo
Bài tiếp: Tuan-07-Backup-PITR-DR — sau khi handle connection layer, đảm bảo data survive.
Tuần 06 hoàn thành. Pool early, pool right. Cập nhật: 2026-05-16