Tuần 06 — Connection Pooling & PgBouncer

“Bạn config Postgres max_connections = 1000. App spawn 800 connection. Mọi thứ chạy đẹp đẽ — đến khi traffic 2x lên, app spawn 1500 → reject. Bạn tăng max_connections → RAM OOM, kernel kill process Postgres. Tại sao Postgres không ‘just scale’? Vì model process-per-connection. Hôm nay học cách bypass nó.”

Tags: database postgresql pgbouncer connection-pooling operations Thời lượng: 7 ngày (4-5h/ngày) Prerequisites: Tuan-01-DB-Internals-Refresh (process model) Liên quan: Tuan-09-DB-Observability-Tuning


1. Context & Why

1.1 Postgres process-per-connection model

graph LR
    Postmaster[postmaster<br/>main process] --> B1[Backend 1<br/>~10MB RAM]
    Postmaster --> B2[Backend 2<br/>~10MB RAM]
    Postmaster --> B3[Backend N<br/>~10MB RAM]
    Postmaster --> Bgw[bgwriter, walwriter,<br/>autovacuum workers]

    App1[App Client 1] -.TCP.-> B1
    App2[App Client 2] -.TCP.-> B2
    App3[App Client N] -.TCP.-> B3

Mỗi connection = 1 OS process. Cost:

  • ~10MB base RAM
  • ~1-3 ms setup time (fork + auth)
  • File descriptors, kernel scheduling
  • Backend allocates work_mem when needed (per operation)

Soft limit thực tế: 100-300 active connections. Trên đó:

  • RAM cạn (process overhead × N)
  • CPU context switching dominate
  • Postgres internal contention (lightweight locks)

1.2 Vấn đề: App muốn hàng nghìn connection

Web app pattern: mỗi request → 1 connection. 10K req/s đỉnh → 10K concurrent connection. Postgres không scale.

graph TB
    subgraph "Không có pool"
        C1[App 1] --persistent-- DB1[(Postgres)]
        C2[App 2] --persistent-- DB1
        C3[App 3] --persistent-- DB1
        C4[App N=1000] --persistent-- DB1
        DB1 -.crash.-> X[OOM 💥]
    end

    subgraph "Có pool"
        D1[App 1] --> P[PgBouncer]
        D2[App 2] --> P
        D3[App N=10000] --> P
        P --20 conns--> DB2[(Postgres)]
    end

    style X fill:#ef9a9a
    style DB2 fill:#c8e6c9

1.3 Mục tiêu tuần

  • Hiểu rõ Postgres process model + ngân sách connection
  • Compute pool size đúng (Little’s Law)
  • Setup PgBouncer 3 modes — pick đúng cho từng workload
  • Diagnose connection storm sau restart
  • Hiểu prepared statement trap với transaction pooling
  • Compare PgBouncer vs RDS Proxy vs PgCat vs Supavisor
  • Monitor pool health: waiting clients, idle connections, churn

1.4 Tham chiếu


2. Pool Sizing — Toán học

2.1 Little’s Law

L (concurrent requests) = λ (req/sec) × W (sec/req)

Ví dụ:

  • 1000 req/s
  • 50 ms per query (avg)
  • Concurrent queries: 1000 × 0.05 = 50

→ Cần ~50 connection. Không phải 1000.

2.2 Formula thực dụng

pool_size = (CPU_cores × 2) + effective_spindle_count
  • HikariCP recommendation
  • “Spindle count” = effective disk parallelism (SSD = high)
  • Modern cloud: 16 CPU, NVMe → ~32-50 connections

Postgres specific: With max_connections = 100:

  • Reserve ~10 cho superuser, autovacuum, replication
  • App có 90 usable
  • Distributing: 3 app instances × 30 each, hoặc 1 pgbouncer pool

2.3 Pool size quá cao

Hiện tượng:

  • Throughput giảm khi tăng pool size
  • CPU 100% mà queries chậm hơn
  • Context switching wasted

Tại sao? Mỗi connection compete CPU. Quá nhiều thread thrashing > work.

2.4 Pool size quá thấp

Hiện tượng:

  • App queue chờ pool
  • P99 latency tăng
  • Throughput plateau dưới capacity

Đo: monitor pool waiting clients của PgBouncer.

2.5 Optimal — Find by experiment

graph LR
    A[Start: pool_size = CPUcoresx2] --> B[Run load test]
    B --> C[Measure: TPS, P99 latency]
    C --> D{TPS plateau?}
    D -->|No| E[Increase pool size]
    D -->|Yes - throughput max| F{Latency acceptable?}
    F -->|Yes| G[Done]
    F -->|No| H[Reduce pool size<br/>trade latency for throughput]
    E --> B
    H --> B

3. PgBouncer — Lightweight Pooler

3.1 Architecture

graph LR
    subgraph "Clients"
        C1[App 1<br/>500 conn]
        C2[App 2<br/>500 conn]
        C3[App 3<br/>500 conn]
    end

    subgraph "PgBouncer"
        FB[Front: 1500 client conns<br/>single Linux process<br/>libevent-based]
        BB[Back pool: 20-50 conns]
    end

    subgraph "Postgres"
        PG[(Postgres 16<br/>max_connections=100)]
    end

    C1 --> FB
    C2 --> FB
    C3 --> FB
    FB --> BB --> PG

Key: PgBouncer single-process, async I/O (libevent). Một instance handle ~10K client connections trên 1 CPU core.

3.2 3 Pool modes

graph TB
    subgraph "Session pooling"
        S1[Client → server tied for whole session<br/>Most compatible<br/>Limited gain<br/>~1:1 mapping like no pool]
    end

    subgraph "Transaction pooling"
        T1[Client → server only for one transaction<br/>Server returned to pool on COMMIT/ROLLBACK<br/>Max gain<br/>Some features broken]
    end

    subgraph "Statement pooling"
        ST1[Server returned after EACH statement<br/>Auto-rollback uncommitted<br/>Almost never used]
    end

    style T1 fill:#c8e6c9
    style ST1 fill:#ffccbc

3.3 Session pooling

  • Default
  • Client gets dedicated server connection until disconnect
  • Multiplexing: many short clients reuse same server
  • Limitations: nothing significant — full compatibility

Khi dùng:

  • App có long-lived connections (đa số ORM)
  • Khi không thể đảm bảo transaction-clean state

Pool size = max concurrent client connections, không phải request rate.

3.4 Transaction pooling — game changer

Server returned to pool sau mỗi COMMIT/ROLLBACK.

Use case: 10K clients, 99% time idle giữa transactions → handful of servers đủ.

Critical limitation: features không work:

  • Prepared statements (pre PgBouncer 1.21 — đã fix 2024)
  • Session-level state: SET, LISTEN/NOTIFY, temporary tables, advisory session locks
  • Cursor WITH HOLD
  • SET ROLE persistent

Pre-PgBouncer 1.21 trap: app dùng PREPARE name AS ... → prepared statement bind on server X. Next transaction lands server Y → “prepared statement does not exist”. Symptoms: random errors under load.

3.5 PgBouncer 1.21+ (Sept 2023) — Prepared statement support

Background: Simple vs Extended protocol

graph TB
    subgraph "Simple protocol"
        S1[Client → Server: Query 'SELECT * WHERE id=42']
        S2[Server: parse + execute + return]
    end

    subgraph "Extended protocol"
        E1[PARSE name + 'SELECT * WHERE id=$1']
        E2[BIND name + 42]
        E3[EXECUTE name]
        E4[Result]
    end

    subgraph "Explicit PREPARE"
        P1[PREPARE q AS 'SELECT * WHERE id=$1']
        P2[EXECUTE q 42]
    end

Simple protocol — 1 message, full SQL inline. No prepare. Works fine with transaction pooling because no server state.

Extended protocol (PARSE/BIND/EXECUTE) — Most modern clients (asyncpg, psycopg3, pgx, pg-promise) use this even when you write “simple-looking” code. Library auto-prepares for SQL injection safety.

Explicit PREPARE/EXECUTE — User-level PREPARE q AS ... statement. Less common in app code, more in functions/scripts.

The historical pain (pre-1.21)

Extended protocol with transaction pooling:

  1. Client sends PARSE → PgBouncer routes to server X, X prepares
  2. Connection returned to pool after COMMIT
  3. Next transaction → maybe server Y, no prepared statement → prepared statement does not exist

Symptoms: random failures under load, especially after scaling out. Hard to debug.

Workarounds pre-1.21:

  • Force session pooling (defeats benefit)
  • Force simple protocol (most drivers don’t expose option cleanly)
  • App: DISCARD ALL after each transaction (kills prepared cache, slow)
  • App: regenerate prepared statement on error (retry logic)

PgBouncer 1.21+ solution

PgBouncer tracks prepared statements client-side, replays them when server switch happens transparently.

[pgbouncer]
max_prepared_statements = 100   ; per-client max

How it works:

  1. Client sends PARSE → PgBouncer caches name+sql client-side
  2. Forwards to server, server prepares
  3. Connection returned to pool
  4. Next transaction lands different server
  5. Client sends EXECUTE → PgBouncer detects missing prepared on this server, re-runs PARSE silently, then EXECUTE

Transparent to app. Massive win.

Compatibility 2024-2026

LibraryExtended protocolPgBouncer 1.21+ tx poolingNotes
psycopg3 (Python)YesSet prepare_threshold or 0 to disable
asyncpg (Python)Yes, named autoWas main problem child pre-1.21
node-postgres (pg)Yes (extended after prepare)Mostly fine
pgx (Go)YesSimpleProtocol=true was workaround
pg-promiseYesInherits node-postgres
psycopg2 (Python)Mostly simpleLess affected, mostly fine
Hibernate (Java)Yes, server-prepareduseServerPrepStmts=true

Test with your stack before declaring “fine”.

Versions worth knowing

  • 1.21 (Sept 2023) — initial prepared statement support
  • 1.22 — improved error handling, more reliable cancel
  • 1.23 (2024) — better metrics, SSL improvements

Settings

max_prepared_statements = 100    ; if 0 = feature disabled
server_reset_query =              ; can clear in 1.21+ (was 'DISCARD ALL')

Don’t run DISCARD ALL as server_reset_query with 1.21+ — undoes prepared statement caching.

Monitoring prepared

SHOW POOLS;
-- New columns in 1.21+: see prepared stmt count per pool

Logs show prepared replays:

INFO sending prepared statement '%s' to server

Edge cases still

  • Cursor WITH HOLD — still tied to session, transaction pool breaks
  • Temporary tables — session state, broken
  • Advisory session locks — broken; use pg_advisory_xact_lock instead
  • SET (not SET LOCAL) — session state, broken
  • LISTEN/NOTIFY — needs persistent session

For these, use session pooling on a dedicated pool, or direct connection.

3.6 Setup PgBouncer

; pgbouncer.ini
 
[databases]
appdb = host=localhost port=5432 dbname=appdb
 
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
 
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 3
 
server_idle_timeout = 600       ; close idle server after 10min
server_lifetime = 3600          ; recreate server connection after 1h (avoid leaks)
server_reset_query = DISCARD ALL
 
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
 
max_prepared_statements = 100   ; PgBouncer 1.21+

userlist.txt:

"app_user" "SCRAM-SHA-256$4096:..."

3.7 Monitor PgBouncer

-- Connect to PgBouncer admin db
psql -p 6432 -U pgbouncer pgbouncer
 
-- Active pools
SHOW POOLS;
-- database, user, cl_active, cl_waiting, sv_active, sv_idle, sv_used, sv_tested, sv_login
 
-- Stats
SHOW STATS;
-- total_xact_count, total_query_count, total_received, total_sent, total_xact_time, total_query_time
 
-- Lists
SHOW LISTS;        -- numbers
SHOW CLIENTS;      -- per-client info
SHOW SERVERS;      -- per-server info
SHOW DATABASES;    -- configured DBs

Key metrics:

  • cl_waiting > 0 (sustained) — pool too small, clients queueing
  • sv_idle consistently high — pool too big, waste
  • total_query_time / total_query_count — avg query latency

3.8 Reload config without restart

RELOAD;       -- reload pgbouncer.ini
RECONNECT;    -- reset all server connections
SUSPEND;      -- pause all activity (e.g., for failover)
RESUME;

4. Transaction Pool Compatibility

4.1 Cái gì BREAK

Khi dùng transaction pooling, không đảm bảo subsequent requests dùng same backend. Nên session state mất.

Features ảnh hưởng:

graph TB
    subgraph "BROKEN với transaction pooling"
        A1[LISTEN/NOTIFY]
        A2[Temporary tables]
        A3[Advisory session locks<br/>pg_advisory_lock]
        A4[SET role/search_path persistent]
        A5[Cursor WITH HOLD]
        A6[Prepared statement pre-1.21]
    end

    subgraph "WORK"
        B1[Standard SQL]
        B2[Transactions]
        B3[Advisory xact locks<br/>pg_advisory_xact_lock]
        B4[SET LOCAL]
        B5[Server-side cursor in txn]
        B6[Prepared statement 1.21+]
    end

    style A1 fill:#ef9a9a
    style A2 fill:#ef9a9a
    style A3 fill:#ef9a9a
    style A4 fill:#ffccbc
    style B1 fill:#c8e6c9
    style B3 fill:#c8e6c9
    style B4 fill:#c8e6c9

4.2 SET LOCAL pattern

BEGIN;
SET LOCAL search_path TO tenant_42;
SET LOCAL statement_timeout = '5s';
SELECT * FROM customers;
COMMIT;
-- SET LOCAL chỉ có effect trong transaction → works with transaction pooling

vs:

SET search_path TO tenant_42;  -- session-level, broken

4.3 Migration framework cẩn thận

Flyway, Liquibase, sqlx migrate có thể dùng session features (LOCK, advisory). Khi run qua PgBouncer transaction mode → broken.

→ Pattern: migrations bypass PgBouncer, connect direct port 5432:

App → PgBouncer (6432) → Postgres (5432)
Migration tool → Postgres (5432) directly

4.4 LISTEN/NOTIFY — không work

# Won't work via transaction pooling
conn.execute("LISTEN order_created")
# Next call may land different server, LISTEN lost

→ Use direct connection cho pub/sub. Hoặc dedicated pool with session mode.


5. Connection Storms

5.1 Vấn đề

sequenceDiagram
    participant Apps as App fleet (200 instances)
    participant DB as Postgres

    Note over DB: Restart for upgrade
    DB-->>Apps: All connections drop
    Note over Apps: 200 instances retry...
    Apps->>DB: 200 × 50 = 10K reconnect attempts
    Note over DB: fork 10K backends...<br/>OOM 💥

Mỗi connection trên Postgres = fork + auth + memory. 10K simultaneous → kernel struggles.

5.2 Mitigation

1. PgBouncer in front — connection storm hit PgBouncer, không phải Postgres:

Apps → PgBouncer (handle storm) → Postgres (max 20 conn always)

2. App-level connection pool with stagger:

# psycopg pool with retries + jitter
from psycopg_pool import ConnectionPool
pool = ConnectionPool(
    conninfo,
    min_size=2,
    max_size=10,
    max_lifetime=3600,
    reconnect_timeout=60,
    reconnect_failed=lambda pool: time.sleep(random.uniform(0.5, 2.0))  # jitter
)

3. Slow startup: Don’t restart all app instances simultaneously. Rolling restart with delay.

4. Health check tuning: K8s readiness probe should fail fast nếu DB unreachable, prevent thundering herd.


6. RDS Proxy (AWS)

6.1 Managed PgBouncer-like

AWS RDS Proxy. Tương đương PgBouncer transaction mode + managed.

Pros:

  • Fully managed (HA, scaling, monitoring CloudWatch)
  • IAM authentication
  • Failover-aware (smooth across DB failover)

Cons:

  • Cost: ~$0.015/hour per IO unit
  • Latency add ~1-3ms
  • Some features còn limited

Pattern 2024: nếu trên AWS RDS, default dùng RDS Proxy trừ khi có lý do tránh.

6.2 Compared with PgBouncer

PgBouncerRDS Proxy
SetupManual (EC2 / k8s)Click ON
HADIYBuilt-in
Pool modessession/transaction/statementsession/transaction only
Custom configFull controlLimited
CostEC2 costRDS Proxy IO units
Latency~0.3ms~1-3ms
FeaturesSome incompatMore compat (Aurora Serverless)

7. Alternatives — Modern Poolers 2024-2026

7.1 PgCat (Rust)

https://github.com/postgresml/pgcat

Features beyond PgBouncer:

  • Load balancing across replicas
  • Read/write split via SQL parser
  • Sharding (route by hash)
  • Auth proxying

Use case: starting to outgrow PgBouncer (need read replicas routing).

7.2 Supavisor (Elixir)

Supabase’s pooler. Built on Erlang/OTP → great for many tenants.

Features:

  • Multi-tenant pool isolation
  • Cluster mode (replicate state)
  • Connection migration

Use case: SaaS with thousands of tenants needing isolation.

7.3 Odyssey (Yandex)

Multi-threaded PgBouncer. Better CPU utilization on multi-core.

Pros:

  • Multi-threaded (PgBouncer is single-threaded)
  • Storage-aware routing
  • Yandex production-proven

Cons:

  • Less community than PgBouncer
  • Config complexity

7.4 Built-in pool (PostgreSQL 17+ planned)

Long-running proposal: Postgres native built-in pooler. PG17 not yet, PG18+ possibly. Watch the space.

7.5 Decision matrix

flowchart TD
    A[Need pooler?] --> B{Workload?}
    B -->|Small <1000 conns, no special features| C[PgBouncer session mode]
    B -->|High concurrency, need transaction pooling| D[PgBouncer transaction mode]
    B -->|Need read/write split| E[PgCat]
    B -->|Multi-tenant SaaS| F[Supavisor]
    B -->|AWS RDS / Aurora| G[RDS Proxy]
    B -->|Multi-core box, 100K+ clients| H[Odyssey]

    style C fill:#c8e6c9
    style D fill:#c8e6c9

8. App-side: ORM Pool Tuning

8.1 Common pool implementations

ORM/LibPool implementation
psycopg3 (Python)ConnectionPool
asyncpg (Python async)asyncpg.create_pool()
SQLAlchemyQueuePool (default)
Node.js node-postgrespg.Pool
Go pgxpgxpool
JavaHikariCP
Ruby ActiveRecordConnectionPool

8.2 Common settings

# Example: psycopg3
ConnectionPool(
    conninfo="postgresql://user:pass@host/db",
    min_size=5,           # warm pool
    max_size=20,          # cap
    max_lifetime=3600,    # recycle after 1h
    max_idle=600,         # close if idle 10min
    timeout=30,           # wait for available conn
)
// node-postgres
const pool = new Pool({
    max: 20,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000,
    allowExitOnIdle: false,
});

8.3 Pool stacking

graph LR
    App[App with pool=20] --> PgBouncer[PgBouncer pool=50]
    PgBouncer --> PG[(Postgres max=100)]

If app pool maxes out → app queue. If PgBouncer pool maxes → backend queue (waiting on PgBouncer).

Sizing:

  • App pool < PgBouncer pool < Postgres max_connections
  • E.g., 10 app × 20 pool = 200 client conns → PgBouncer 50 conns → Postgres 100 max

8.4 Connection lifetime

Long-lived connection:

  • ✅ Avoid setup overhead
  • ❌ Server-side memory accumulates
  • ❌ Stale connection sau DB restart

max_lifetime recycle. Default 1h-2h typical.

8.5 Health check / validation

Idle connection in pool có thể dead (network drop, DB restart). Validate trước trả về app:

ConnectionPool(
    ...,
    check=lambda conn: conn.execute("SELECT 1"),  # health check
)

Trade-off: latency overhead per acquire. Disable cho hot path.


9. Monitoring & SLOs

9.1 Key metrics

Postgres side:

  • pg_stat_database.numbackends — active connections
  • pg_stat_activity state distribution
  • pg_stat_database.xact_commit, xact_rollback — TPS
  • pg_stat_bgwriter — checkpoint, buffer activity

PgBouncer side:

  • cl_active, cl_waiting — client queueing
  • sv_idle, sv_active — server utilization
  • request_per_second, query_per_second — load

App side:

  • Pool acquire wait time
  • Pool exhaustion errors
  • Query duration P50/P95/P99

9.2 Dashboard pattern

graph LR
    PG[Postgres exporter] --> Prom[Prometheus]
    PGB[PgBouncer exporter] --> Prom
    App[App metrics] --> Prom
    Prom --> Graf[Grafana dashboard]

    Graf -.alerts.-> Slack[Slack/PagerDuty]

Tool: prometheus-pgbouncer-exporter, postgres_exporter.

9.3 SLOs to track

MetricTarget
Connection acquire P99<100ms
Pool waiting clients<1% time waiting
Postgres connections<80% max_connections
PgBouncer cl_waiting sustained<5
App connection error rate<0.01%

9.4 Alerts

# Pool exhaustion
- alert: PgBouncerWaiting
  expr: pgbouncer_pools_client_waiting_count > 10
  for: 5m
 
# Postgres connections high
- alert: PostgresConnectionsHigh
  expr: pg_stat_database_numbackends / pg_settings_max_connections > 0.8
  for: 5m
 
# Storm
- alert: ConnectionsSpike
  expr: rate(pg_stat_database_numbackends[1m]) > 50
  for: 1m

10. Production Patterns

10.1 Multiple pools for different workloads

; pgbouncer.ini
[databases]
appdb_oltp = host=primary dbname=appdb pool_size=50 pool_mode=transaction
appdb_reports = host=primary dbname=appdb pool_size=5 pool_mode=session  ; long sessions for reports
appdb_replica = host=replica dbname=appdb pool_size=30 pool_mode=transaction

App route by workload:

  • OLTP web requests → appdb_oltp
  • BI queries → appdb_reports
  • Read-only API → appdb_replica

10.2 Multi-region

graph TB
    subgraph "Region US"
        AppUS[App US]
        PgbUS[PgBouncer US]
        PgUS[(Postgres Primary US)]
    end

    subgraph "Region EU"
        AppEU[App EU]
        PgbEU[PgBouncer EU]
        PgEUR[(Postgres Replica EU)]
    end

    AppUS --> PgbUS --> PgUS
    AppEU --> PgbEU --> PgEUR
    PgUS -.replication.-> PgEUR

PgBouncer local to app → low latency to pool, then traverse region for writes.

10.3 PgBouncer HA

PgBouncer SPOF nếu chỉ 1 instance. Options:

  • 2+ instances behind LB (HAProxy, AWS NLB)
  • Sticky connections OK với transaction mode
  • Failover scripts (Patroni, repmgr)

K8s pattern:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: pgbouncer
        image: pgbouncer:1.21
        resources: { requests: { cpu: 500m, memory: 256Mi } }

LB service ClusterIP for app to connect.

10.4 Authentication

PgBouncer needs to authenticate client → server. 3 modes:

  • auth_type = trust — no auth (dev only)
  • auth_type = md5 — legacy
  • auth_type = scram-sha-256 — modern (PgBouncer 1.16+)

Modern setup: pgbouncer + scram, user list synced from Postgres.

10.5 Statement timeout

[databases]
appdb = host=... query_wait_timeout=120 idle_transaction_timeout=30

PgBouncer kill server connection if query waits too long. Backstop for runaway queries.


11. Anti-patterns

PatternTại sao tệFix
max_connections = 1000 trên PostgresOOM, context switch hellPool, không tăng max_connections
No pooler, app pool 20 × 50 instances = 1000 connConnection storm waitingPgBouncer in front
Pool size = ∞ in appEffective DoS to DBAlways cap
Transaction pool mode + LISTEN/NOTIFY in codeSilent brokenSession mode hoặc separate dedicated pool
Idle connection foreverStale, leak memorymax_lifetime
No health check trên acquireRandom query failValidate (optional, perf-sensitive)
PgBouncer 1 instance for critical workloadSPOFHA pair
Run schema migration through PgBouncer transaction modeLocks/advisory may failDirect to Postgres
Tăng pool_size to “fix latency”Throughput tệ hơnFind optimal point (~CPU×2)
Mọi service share 1 poolOne slow service blocks allPer-service pools

12. Lab — 7 days

12.1 Day 1: Postgres process model

# Run Postgres with low max_connections
docker run -p 5432:5432 -e POSTGRES_PASSWORD=lab -e POSTGRES_MAX_CONNECTIONS=20 postgres:16
 
# Open 20 connections - psql 20 windows or script
for i in {1..20}; do psql -h localhost -U postgres -c "SELECT pg_sleep(60)" & done
 
# Try 21st - rejected
psql -h localhost -U postgres -c "SELECT 1"
# FATAL: sorry, too many clients already

Measure RAM per connection:

docker exec -it postgres ps -auxf | grep postgres

12.2 Day 2: Setup PgBouncer

# docker-compose.yml
services:
  postgres:
    image: postgres:16
    environment: { POSTGRES_PASSWORD: lab }
    command: postgres -c max_connections=50
 
  pgbouncer:
    image: edoburu/pgbouncer:latest
    environment:
      DATABASE_URL: "postgres://postgres:lab@postgres:5432/postgres"
      POOL_MODE: transaction
      MAX_CLIENT_CONN: 1000
      DEFAULT_POOL_SIZE: 20
    ports: ["6432:6432"]

Test:

psql -h localhost -p 6432 -U postgres
# Through PgBouncer
 
# pgbench through pool
pgbench -h localhost -p 6432 -U postgres -c 100 -j 4 -T 60 postgres
# 100 client connections through 20 pool

12.3 Day 3: Transaction pool break

# transaction_pool_break.py
import psycopg
 
conn = psycopg.connect("host=localhost port=6432 user=postgres")
 
# This works
conn.execute("LISTEN test_channel")
conn.execute("NOTIFY test_channel, 'hello'")
 
# After: open new client connection (simulates next request → different backend)
conn2 = psycopg.connect("host=localhost port=6432 user=postgres")
conn2.execute("NOTIFY test_channel, 'world'")
# conn never receives - different backend

12.4 Day 4: Monitor pool

Setup Grafana dashboard:

docker run -d --name prometheus prom/prometheus
docker run -d --name grafana grafana/grafana
# Add pgbouncer_exporter

Load test với pgbench, observe:

  • cl_active vs cl_waiting
  • Pool saturation curve

12.5 Day 5: Find optimal pool size

for size in 5 10 20 30 50 100; do
    # update pgbouncer config DEFAULT_POOL_SIZE
    pgbench -h localhost -p 6432 -c 200 -j 8 -T 30 postgres
done

Plot TPS vs pool size. Find sweet spot.

12.6 Day 6: Reproduce connection storm

# Storm
import threading, psycopg
def connect():
    psycopg.connect("host=localhost port=5432 user=postgres password=lab")
    time.sleep(60)
 
for _ in range(500):
    threading.Thread(target=connect).start()

Watch Postgres logs/errors. Then route through PgBouncer — storm absorbed.

12.7 Day 7: Migration through pool

# Migration tool e.g. golang-migrate
# Through PgBouncer transaction mode - watch fails
migrate -path migrations -database "postgres://...@host:6432/db" up
 
# Through direct - works
migrate -path migrations -database "postgres://...@host:5432/db" up

13. Self-check

  1. Vì sao Postgres không scale tốt với 1000+ connections trực tiếp?
  2. Little’s Law: 5000 req/s, 20ms/query — cần bao nhiêu connection?
  3. 3 PgBouncer pool mode — khác nhau? Khi nào pick mỗi cái?
  4. Transaction pool mode break feature nào? Workaround?
  5. Prepared statement + transaction pool: trap gì? Postgres 1.21 fix thế nào?
  6. Connection storm — cơ chế, mitigation?
  7. RDS Proxy vs PgBouncer — pick khi nào?
  8. App pool 20, PgBouncer 50, Postgres max 100 — flow của a request?
  9. Long-running connection — pros/cons? Settings nào tune?
  10. Monitor pool health: 3 metrics quan trọng nhất?

14. Tiếp theo

Bài tiếp: Tuan-07-Backup-PITR-DR — sau khi handle connection layer, đảm bảo data survive.


Tuần 06 hoàn thành. Pool early, pool right. Cập nhật: 2026-05-16