Foundations: Compilers, Virtual Machines & Memory Models
“Cloudflare Workers V8 isolate cold start <1ms? Đó là V8 (JS engine) khởi tạo isolate trong micro giây vì compile JIT-on-demand. Wasm Component Model? Đó là bytecode + capability sandbox. Java GC pause kill production? Đó là generational GC stop-the-world. Rust borrow checker? Đó là static type system phát hiện data race compile-time. Architect không hiểu PL/Compiler = không hiểu runtime characteristics của system mình build.”
Tags: cs-foundations compilers virtual-machines garbage-collection fundamentals Student: Hieu (Backend Dev → Architect) Liên quan: Tuan-Bonus-Edge-Wasm-Architecture · Tuan-Foundations-OS-Essentials · Tuan-Foundations-Computer-Architecture
1. Context & Why
Tại sao cần hiểu Compilers & VMs?
| Architecture decision | PL/Compiler concept |
|---|---|
| Wasm Component Model | Bytecode + capability sandbox |
| V8 isolate cold start | JIT compilation, snapshot |
| Java/Go GC pauses | Generational GC, stop-the-world |
| Rust no GC, fast | Static analysis, ownership |
| Python GIL bottleneck | Interpreter design |
| Erlang/BEAM hot reload | VM-level deployment |
| WebAssembly polyglot | Common bytecode target |
| Kubernetes operators in Rust | Memory safety + perf |
Key insight: Choice of language = trade-off giữa performance, safety, productivity. Hiểu compiler internals → đưa ra decision đúng cho mỗi service.
Tham chiếu chính
- Crafting Interpreters (Bob Nystrom, free) — https://craftinginterpreters.com/
- Compilers: Principles, Techniques, and Tools (Dragon Book — Aho et al.)
- Programming Language Pragmatics (Scott)
- The Garbage Collection Handbook (Jones, Hosking, Moss)
- Engineering a Compiler (Cooper & Torczon)
2. Deep Dive — Compilation Pipeline
2.1 Phases of Compilation
Source code
│
▼
┌───────────────────────┐
│ Lexer (Tokenizer) │ Source → Tokens
└──────────┬────────────┘
▼
┌───────────────────────┐
│ Parser │ Tokens → AST
└──────────┬────────────┘
▼
┌───────────────────────┐
│ Semantic Analysis │ Type checking, scope
└──────────┬────────────┘
▼
┌───────────────────────┐
│ IR (Intermediate │ Architecture-independent
│ Representation) │
└──────────┬────────────┘
▼
┌───────────────────────┐
│ Optimization │ Constant folding, dead code,
│ │ inlining, vectorization
└──────────┬────────────┘
▼
┌───────────────────────┐
│ Code Generation │ IR → target machine code
└──────────┬────────────┘
▼
Machine code
(or Bytecode)
2.1.1 Lexer
Convert characters to tokens:
Input: "let x = 42;"
Tokens: [LET, IDENT(x), EQUALS, NUMBER(42), SEMI]
2.1.2 Parser
Tokens to AST:
let x = 42 + y * 2;
Assign
/ \
x Add
/ \
42 Mul
/ \
y 2
Top-down (recursive descent) vs bottom-up (LR, LALR).
2.1.3 Type checking
let x: int = "hello"; // Error: string not assignable to int
Static type system: Catch errors at compile time. Type inference: Infer types from usage (Rust, Haskell, TypeScript).
2.2 Compilation Strategies
2.2.1 Ahead-of-Time (AOT)
Compile entire program before run.
gcc main.c -o main # AOT compile
./main # Execute machine code directlyExamples: C, C++, Rust, Go (default), Swift.
Pros:
- Fast startup (no compile at runtime)
- Predictable performance
- Whole-program optimization
Cons:
- Slow build
- Less runtime adaptability
2.2.2 Just-In-Time (JIT)
Compile during execution (typically in VM).
JS source
↓
V8 parses to AST
↓
Ignition (interpreter): runs bytecode
↓
TurboFan (JIT): hot functions → optimized machine code
↓
Deoptimize if assumptions wrong → back to interpreter
Examples: V8 (JS), JVM (HotSpot), .NET CLR, PyPy.
Pros:
- Adaptive optimization (profile-guided)
- Fast startup (interpret first)
- Reoptimize based on runtime info
Cons:
- Warmup time before peak perf
- More memory (compiler in process)
- Complexity
2.2.3 Interpretation
Execute AST or bytecode directly, no machine code.
# CPython:
source.py → bytecode (.pyc) → interpreter executesExamples: CPython, Ruby (MRI), Bash, Lua.
Pros: Simplest, portable. Cons: 10-100x slower than compiled.
2.2.4 AOT + JIT (hybrid)
- Java: JVM bytecode AOT, JIT to native at runtime
- C# / .NET: IL AOT, JIT or AOT-compiled
- Wasm: AOT to bytecode, AOT-compile or JIT in browser/runtime
2.3 Type Systems
2.3.1 Static vs Dynamic
| Static (Java, Rust, Go) | Dynamic (Python, JS, Ruby) | |
|---|---|---|
| Type check | Compile time | Runtime |
| Errors caught | Early | Late |
| Refactoring safety | High | Low |
| Productivity initial | Lower | Higher |
| Productivity at scale | Higher | Lower |
| Performance | Higher | Lower |
2.3.2 Strong vs Weak
Strong (Python, Java, Rust): no implicit conversions, rejects "3" + 4.
Weak (JS, PHP): coerces, "3" + 4 = "34" (JS).
2.3.3 Nominal vs Structural
Nominal (Java, C#): Same name = same type.
class Point { int x, y; }
class Coord { int x, y; } // Different from Point despite same shapeStructural (TypeScript, Go interfaces): Same shape = same type.
interface Point { x: number; y: number; }
interface Coord { x: number; y: number; }
// Compatible, can assign one to other2.3.4 Type inference
let x = 42; // Inferred: i32
let v = vec![1, 2]; // Inferred: Vec<i32>
fn add(a: i32, b: i32) -> i32 {
a + b
}
let result = add(1, 2); // Inferred: i32Hindley-Milner: Foundational algorithm, used in Haskell, ML, OCaml.
2.4 Memory Management
3 main approaches: manual, garbage collected, ownership-based.
2.4.1 Manual
void* p = malloc(100);
// ... use p
free(p); // Programmer responsibilityPros: Predictable, no overhead. Cons: Memory leaks, use-after-free, double-free, buffer overflows.
2.4.2 Garbage Collection
VM tracks references, frees unreachable objects.
List<String> list = new ArrayList<>(); // Allocated on heap
list = null; // Not freed yet, GC will collect3 main GC algorithms:
Mark-Sweep
- Mark: Walk from roots (stack, globals), mark reachable objects
- Sweep: Free unmarked
Pros: Simple Cons: Stop-the-world, heap fragmentation
Mark-Compact
Mark-sweep + compact reachable objects to one end.
Pros: No fragmentation Cons: Even slower
Generational
Insight: Most objects die young.
Young Generation (Eden + Survivor)
↓ (objects survive several GCs)
Old Generation (long-lived)
- Minor GC: Young gen only, frequent, fast
- Major GC (Full GC): All gens, infrequent, slow
Pros: Fast for typical workload. Cons: Tuning complexity.
2.4.3 Modern GCs
G1 GC (Java 9+ default)
- Region-based heap
- Concurrent marking
- Pauses: 10-200ms typical
ZGC (Java 11+, low-pause)
- Concurrent everything
- Pauses < 10ms even for huge heaps (TBs)
- Trade-off: higher CPU/memory overhead
Shenandoah (Red Hat)
Similar to ZGC, low-pause.
Go GC (concurrent, mark-sweep)
- Designed for low-latency
- Concurrent marking
- Sub-millisecond pauses (modern Go)
Python (reference counting + cycle detection)
- Each object has refcount
- When refcount → 0, free immediately
- Periodic cycle detection for cycles
- GIL prevents concurrent modifications
2.4.4 Ownership (Rust)
Compile-time memory management without GC.
fn main() {
let s = String::from("hello"); // s owns
take_ownership(s);
// println!("{}", s); // Error: s moved
}
fn take_ownership(s: String) {
println!("{}", s);
// s freed at end of scope
}Borrow checker: Ensures no use-after-free, no data race, all at compile time.
Trade-off: Steeper learning curve, but no runtime overhead.
2.5 Bytecode & Virtual Machines
2.5.1 What is a VM?
Software-emulated machine. Executes bytecode instead of native code.
Source → Compile to bytecode → VM executes bytecode (interpret/JIT)
Examples:
- JVM (Java, Kotlin, Scala, Clojure)
- CLR (C#, F#, VB.NET)
- V8 (JavaScript)
- Wasm runtime (Wasmtime, V8, etc.)
- CPython VM (Python)
- Erlang BEAM (Erlang, Elixir)
2.5.2 Why VMs?
- Portability: Same bytecode → run on any platform with VM
- Safety: VM enforces type/memory checks
- Optimization: Profile-guided JIT
- Hot reload: Replace bytecode at runtime
- Sandbox: Limit what code can do
2.5.3 JVM bytecode example
public int add(int a, int b) {
return a + b;
}Compiled bytecode:
iload_1 ; push a
iload_2 ; push b
iadd ; add ints, push result
ireturn ; return
Stack-based VM: Operations pop operands from stack.
2.5.4 Wasm bytecode
(module
(func $add (param $a i32) (param $b i32) (result i32)
local.get $a
local.get $b
i32.add)
(export "add" (func $add)))Wasm = stack-based bytecode + structured control flow + linear memory + types.
Designed for: Fast load, fast verify, fast execute.
2.5.5 V8 architecture (browser + Node.js + Cloudflare Workers)
JS source
↓
Parser → AST
↓
Ignition (interpreter): bytecode
↓ profile hot functions
TurboFan (optimizing JIT): optimized machine code
↓ deoptimize on bad assumption
Back to Ignition
Isolates: Lightweight V8 instances. Cloudflare Workers spawn isolate per request, share heap snapshot.
Cold start: V8 isolate creation ~5ms vs 100-1000ms for container.
2.6 JIT Optimizations
JIT compilers do amazing things runtime:
2.6.1 Inlining
function square(x) { return x * x; }
for (let i = 0; i < 1000; i++) {
sum += square(i);
}
// JIT inlines:
for (let i = 0; i < 1000; i++) {
sum += i * i;
}2.6.2 Type specialization
function add(a, b) { return a + b; }
// JIT sees always called with int
// → Generate int-specialized version (fast)
// If suddenly called with string → deoptimize2.6.3 Escape analysis
Allocate on stack instead of heap if object doesn’t escape function.
public void foo() {
Point p = new Point(1, 2); // JIT may allocate on stack
System.out.println(p.x + p.y);
// p doesn't escape
}2.6.4 Loop optimizations
- Loop unrolling
- Loop invariant code motion
- Vectorization (SIMD)
2.7 Memory Model & Concurrency
Memory model = rules about how thread sees memory writes from other threads.
2.7.1 The problem
// Thread 1
x = 1;
y = 2;
// Thread 2
print(y); // Could see 2 but x still 0?
print(x);Without memory model: Compiler/CPU can reorder writes for performance.
2.7.2 Memory ordering
Java (since 1.5), C++11, Rust define memory models:
| Order | Guarantee |
|---|---|
| Relaxed | No ordering — fastest |
| Acquire/Release | Pairwise sync |
| Sequential consistency | Total global order — slowest |
use std::sync::atomic::{AtomicI32, Ordering};
let x = AtomicI32::new(0);
x.store(1, Ordering::Release); // Pairs with Acquire load elsewhere
let v = x.load(Ordering::Acquire);2.7.3 Atomic operations
let counter = AtomicU64::new(0);
counter.fetch_add(1, Ordering::Relaxed); // Hardware atomicHardware support: x86 LOCK prefix, ARM LDXR/STXR, etc.
2.7.4 Lock-free data structures
// Treiber stack: lock-free
let head: AtomicPtr<Node> = ...;
fn push(value: T) {
let new_node = Box::new(Node { value, next: null });
loop {
let old_head = head.load(Acquire);
new_node.next = old_head;
if head.compare_exchange(old_head, new_node, Release, Relaxed).is_ok() {
return;
}
}
}CAS (Compare-And-Swap) is foundation. Hardware supports atomic CAS.
2.7.5 Common pitfalls
- Word tearing: 64-bit value updated non-atomically on 32-bit platform
- Volatile (Java) ≠ atomic: Just prevents reordering, doesn’t make compound atomic
- Java synchronized keyword: Implicit memory barrier + mutex
2.8 Concurrency Models
2.8.1 Threads + locks (traditional)
synchronized void deposit(int amount) {
balance += amount;
}Pros: Direct, familiar Cons: Deadlock, race conditions, hard to reason
2.8.2 Actor model (Erlang, Akka)
% Each actor = process with mailbox
loop(State) ->
receive
{deposit, Amount} ->
loop(State + Amount);
{balance, From} ->
From ! State,
loop(State)
end.Pros: No shared state, isolated failures Cons: Different mental model
2.8.3 CSP — Communicating Sequential Processes (Go)
ch := make(chan int)
go func() {
ch <- 42
}()
result := <-chChannels for communication, goroutines for concurrency.
2.8.4 Async/await (modern)
async def fetch_user(id):
response = await http_get(f"/users/{id}")
return response.json()Cooperative scheduling: Function explicitly yields control.
2.8.5 STM — Software Transactional Memory (Clojure, Haskell)
atomically $ do
bal <- readTVar balance
writeTVar balance (bal + 100)Optimistic concurrency: Transactions retry on conflict.
2.9 Sandboxing — Security via Compiler/VM
2.9.1 Levels of sandboxing
| Level | Example | Cost |
|---|---|---|
| OS process | Container | Heavy (~10MB) |
| VM (gVisor) | Cloud Functions | Medium |
| Bytecode VM (Wasm) | Workers | Light (~1ms cold) |
| Language VM (V8 isolate) | Workers | Light |
| Native sandbox (seccomp) | Container | Very light |
2.9.2 Wasm sandbox
- No syscalls direct
- Limited memory (linear memory)
- No network/file by default
- Capability-based: import what you need
2.9.3 Capability security
Code can only do what’s explicitly granted:
Component imports:
- wasi:filesystem (file operations)
- wasi:http (HTTP calls)
Code cannot:
- Spawn process
- Access network outside http
- Read files outside provided dirs
Different from “trusted code” — by construction, not by trust.
2.10 Compiler Optimizations
2.10.1 Constant folding
int x = 1 + 2 + 3;
// Compile time: x = 62.10.2 Dead code elimination
if (false) {
expensive_call(); // Removed
}2.10.3 Common subexpression elimination
y = a + b + c;
z = a + b + d;
// → tmp = a + b; y = tmp + c; z = tmp + d;2.10.4 Loop unrolling
for (int i = 0; i < 100; i++) sum += arr[i];
// →
for (int i = 0; i < 100; i += 4) {
sum += arr[i] + arr[i+1] + arr[i+2] + arr[i+3];
}2.10.5 Inlining
Replace function call with body.
2.10.6 Auto-vectorization
Compiler emits SIMD instructions automatically.
for (int i = 0; i < N; i++) c[i] = a[i] + b[i];
// Compiler emits: SIMD vectorized add (4-8 elements at once)3. Practical Implications
3.1 Choosing language for backend
| Need | Recommended |
|---|---|
| Maximum performance, no GC | Rust, C++ |
| High throughput, simple | Go |
| Mature ecosystem, JVM | Java, Kotlin |
| Productivity, OK perf | Python, Ruby (small services) |
| Browser/JS reuse | TypeScript / Node |
| Real-time, fault-tolerant | Erlang, Elixir |
| ML/data science | Python |
| Data engineering | Scala, Java (Spark) |
3.2 GC tuning matters
Java GC tuning can change throughput 2-10x:
# Throughput (G1 GC, default)
java -XX:+UseG1GC -Xmx4g app.jar
# Low latency (ZGC)
java -XX:+UseZGC -Xmx16g app.jar
# Heap sizing
-Xms2g -Xmx2g # Same min/max → no resizing
# Logging
-Xlog:gc*:file=gc.logCommon settings:
-XX:MaxGCPauseMillis=200(G1 target)-XX:+ParallelRefProcEnabled-XX:+AlwaysPreTouch(zero out heap upfront)
3.3 V8 isolate cold start
Why Workers fast:
- V8 process pre-started
- Per request: spawn new isolate (~1ms)
- Run code (~ms)
- Destroy isolate
Compare:
- AWS Lambda: 100-1000ms cold start (container init)
- Lambda@Edge: 100-300ms
- Workers: 5ms
3.4 Rust adoption in infrastructure
Why Rust everywhere 2020+:
- Memory safety without GC
- Performance ≈ C++
- Zero-cost abstractions
- No data race compile-time
- Modern type system
Used by:
- TiKV (distributed KV)
- Cloudflare Pingora (HTTP proxy replaces Nginx)
- Apache Polaris (Iceberg REST catalog)
- Scylla (planned Rust port)
- Many sidecar/proxies
4. Performance & Profiling
4.1 JVM profiling
# Java Flight Recorder (free since Java 11)
java -XX:StartFlightRecording=duration=60s,filename=profile.jfr ...
# Async profiler (lower overhead)
async-profiler -d 30 -e cpu -f flame.html <pid>4.2 Go profiling
import _ "net/http/pprof"
go func() {
log.Println(http.ListenAndServe(":6060", nil))
}()go tool pprof http://localhost:6060/debug/pprof/profile?seconds=304.3 V8 profiling
node --prof app.js
# Generates isolate-*.log
node --prof-process isolate-*.log > processed.txt4.4 Common metrics
- CPU samples: Where is time spent?
- Allocation rate: Bytes/sec allocated
- GC pauses: Distribution of pauses
- Heap fragmentation: Used vs reserved
5. Practical Code Patterns
5.1 Avoid GC pressure
Allocate less, reuse more:
// BAD: allocate per call
void process(Request req) {
List<String> tags = new ArrayList<>();
// ...
}
// GOOD: reuse via thread-local
ThreadLocal<List<String>> tagsTL = ThreadLocal.withInitial(ArrayList::new);
void process(Request req) {
List<String> tags = tagsTL.get();
tags.clear();
// ...
}5.2 Lock-free patterns
// Bad: blocking
private int count = 0;
synchronized void increment() { count++; }
// Good: atomic
private AtomicInteger count = new AtomicInteger();
void increment() { count.incrementAndGet(); }5.3 Avoid boxing
// BAD: boxes on every call
Map<Integer, Integer> map = new HashMap<>();
map.put(1, 2); // Integer boxing
// GOOD: primitive collections (Eclipse Collections, Trove)
IntIntMap map = new IntIntHashMap();
map.put(1, 2); // No boxing6. Code Examples
6.1 Simple lexer in Python
import re
TOKEN_SPEC = [
('NUMBER', r'\d+'),
('IDENT', r'[a-zA-Z_]\w*'),
('PLUS', r'\+'),
('MINUS', r'-'),
('LPAREN', r'\('),
('RPAREN', r'\)'),
('SKIP', r'\s+'),
]
def tokenize(text):
pattern = '|'.join(f'(?P<{name}>{pat})' for name, pat in TOKEN_SPEC)
for match in re.finditer(pattern, text):
kind = match.lastgroup
value = match.group()
if kind != 'SKIP':
yield (kind, value)
tokens = list(tokenize("3 + (4 * 2)"))
print(tokens)
# [('NUMBER', '3'), ('PLUS', '+'), ('LPAREN', '('),
# ('NUMBER', '4'), ('IDENT', '*'), ('NUMBER', '2'), ('RPAREN', ')')]6.2 Recursive descent parser
class Parser:
def __init__(self, tokens):
self.tokens = list(tokens)
self.pos = 0
def peek(self):
return self.tokens[self.pos] if self.pos < len(self.tokens) else None
def consume(self):
tok = self.tokens[self.pos]
self.pos += 1
return tok
def parse_expr(self):
left = self.parse_term()
while self.peek() and self.peek()[0] in ('PLUS', 'MINUS'):
op = self.consume()
right = self.parse_term()
left = ('binop', op[0], left, right)
return left
def parse_term(self):
# Simplified
return self.parse_atom()
def parse_atom(self):
tok = self.consume()
if tok[0] == 'NUMBER':
return ('num', int(tok[1]))
elif tok[0] == 'LPAREN':
expr = self.parse_expr()
self.consume() # RPAREN
return expr
tokens = tokenize("3 + 4 + 5")
ast = Parser(tokens).parse_expr()
print(ast)
# ('binop', 'PLUS', ('binop', 'PLUS', ('num', 3), ('num', 4)), ('num', 5))6.3 Stack-based VM
class VM:
def __init__(self):
self.stack = []
def execute(self, bytecode):
for op in bytecode:
if op[0] == 'PUSH':
self.stack.append(op[1])
elif op[0] == 'ADD':
b = self.stack.pop()
a = self.stack.pop()
self.stack.append(a + b)
elif op[0] == 'PRINT':
print(self.stack.pop())
# Bytecode for: print(3 + 4)
program = [
('PUSH', 3),
('PUSH', 4),
('ADD',),
('PRINT',),
]
VM().execute(program) # 76.4 Reference counting GC (simplified)
class RefCounted:
def __init__(self):
self.refcount = 1
def acquire(self):
self.refcount += 1
return self
def release(self):
self.refcount -= 1
if self.refcount == 0:
self._destroy()
def _destroy(self):
# Free resources
pass7. System Design Diagrams
7.1 Compilation Pipeline
flowchart TB Source[Source code] Source --> Lex[Lexer<br/>tokens] Lex --> Parse[Parser<br/>AST] Parse --> TypeCheck[Type Check<br/>Semantic Analysis] TypeCheck --> IR[Intermediate<br/>Representation] IR --> Opt[Optimizer<br/>const fold, inline,<br/>vectorize] Opt --> CodeGen[Code Generator] CodeGen --> Native[Native Machine Code] CodeGen --> Bytecode[Bytecode<br/>JVM, Wasm] Bytecode --> VM[Virtual Machine<br/>Interpret or JIT] style Source fill:#bbdefb style Native fill:#c8e6c9 style Bytecode fill:#fff9c4
7.2 V8 JIT Pipeline
flowchart LR JS[JS Source] --> Parse[Parser] Parse --> AST[AST] AST --> Igni[Ignition<br/>Interpreter] Igni --> Bytecode[Bytecode] Bytecode --> Hot{Hot function?} Hot -->|Yes| Turbo[TurboFan<br/>Optimizing JIT] Hot -->|No| Continue[Continue interpreting] Turbo --> Optimized[Optimized<br/>Native Code] Optimized --> Deopt{Assumption broken?} Deopt -->|Yes| Igni Deopt -->|No| Continue
7.3 GC Generations
flowchart LR subgraph Heap["JVM Heap"] Eden["Eden<br/>(Young)"] S0[Survivor 0] S1[Survivor 1] Old["Old Gen<br/>(Tenured)"] Eden -->|Minor GC, survives| S0 S0 -->|Minor GC| S1 S1 -->|Tenured after N| Old end GC{GC Trigger} GC -->|Eden full<br/>Minor GC| Eden GC -->|Old full<br/>Major GC| Old
7.4 Memory Models
flowchart LR subgraph Manual["Manual (C/C++)"] M1[malloc/free<br/>direct control] M2[Risk: leaks, UAF] end subgraph GC["GC (Java/Go/Python)"] G1[VM tracks refs<br/>auto-frees] G2[Pause times<br/>throughput cost] end subgraph Ownership["Ownership (Rust)"] O1[Compile-time<br/>borrow checker] O2[No GC<br/>no UAF<br/>steeper learning] end style Manual fill:#ffcdd2 style GC fill:#fff9c4 style Ownership fill:#c8e6c9
8. Aha Moments & Pitfalls
Aha Moments
#1: Compilers are translators with optimizers. Source → AST → IR → optimized → target. Each step opens optimization opportunities.
#2: JIT vs AOT trade-off. JIT adapts to runtime patterns. AOT predictable startup. Java’s HotSpot famously beats AOT in long-running workloads.
#3: GC vs ownership = different trade-offs. GC easier, runtime cost. Rust harder upfront, no runtime cost.
#4: Bytecode is portable + safe. Wasm runs in browser, server, edge — same binary. JVM bytecode is verified before exec.
#5: V8 isolate ≠ container. Isolate = lightweight V8 sandbox, ms cold start. Container = whole OS namespace, ~ms init.
#6: Memory models matter for concurrent code. Without proper ordering, you’ll see “impossible” bugs. Java/C++/Rust formalize.
#7: Lock-free is hard but rewarding. CAS-based stacks/queues 10-100x throughput vs mutex. Need expert.
#8: Capability security via compiler. Wasm Component Model enforces capabilities at type level. Stronger than runtime checks.
Pitfalls
Pitfall 1: Trusting type for runtime
Java type checks at compile, but runtime can have ClassCastException via reflection. Fix: Treat type system as primary defense, runtime checks as safety net.
Pitfall 2: GC pause shock
4GB heap, default G1, sudden 5-second pause → outage. Fix: Tune GC, monitor pauses, use ZGC for low-latency.
Pitfall 3: Memory leak in GC language
“GC frees everything” → wrong. Holds references in static maps, listener callbacks → leak. Fix: Profiler, weak references where appropriate.
Pitfall 4: Boxing causing GC pressure
Map<Integer, Integer>allocates Integer objects per entry → GC churn. Fix: Primitive collections (Eclipse Collections, Trove for Java).
Pitfall 5: synchronized everywhere
Coarse locks → contention. 4-core machine performance ≈ 1-core. Fix: Fine-grained locks, lock-free DS, immutable data.
Pitfall 6: JIT warmup problem
Production deployment: first requests slow because JIT not optimized yet. Fix: Warm up with synthetic load before serving real traffic.
Pitfall 7: Stop-the-world surprise
JVM full GC freezes app for seconds. K8s health probe fails → restart. Fix: Increase health probe timeout, tune GC.
Pitfall 8: Native call boundary
JNI/FFI calls 100x slower than internal. Crossing repeatedly → bottleneck. Fix: Batch calls, use direct buffers.
Pitfall 9: Reflection abuse
Class.forName(), dynamic proxies → JIT can’t optimize. Fix: Code generation at startup (e.g., MapStruct, Dagger).
Pitfall 10: Premature lock-free
Implement lock-free queue → 100 lines of subtle code → bugs. Fix: Use proven libs (java.util.concurrent, crossbeam in Rust).
9. Internal Links
| Topic | Connects to |
|---|---|
| Tuan-Bonus-Edge-Wasm-Architecture | Wasm Component Model, V8 isolates |
| Tuan-Foundations-OS-Essentials | Process, thread, namespaces underlie VMs |
| Tuan-Foundations-Computer-Architecture | JIT optimization targets memory hierarchy |
| Tuan-Bonus-Consistency-Models-Isolation | Memory model, atomicity |
Tham khảo
Books:
- Crafting Interpreters (Bob Nystrom, free) — https://craftinginterpreters.com/
- Compilers: Principles, Techniques, and Tools (Dragon Book — Aho et al.)
- Engineering a Compiler (Cooper & Torczon)
- Programming Language Pragmatics (Scott)
- The Garbage Collection Handbook (Jones, Hosking, Moss)
- Modern Compiler Implementation in Java (Appel)
Online:
- Crafting Interpreters free book — https://craftinginterpreters.com/
- LLVM tutorial — https://llvm.org/docs/tutorial/
- V8 deep dive — https://v8.dev/blog/
Papers:
- The Cliff Click Hash Map (lock-free)
- A History of Modern 64-bit Computing — context
Specific projects to study:
- LLVM (modular compiler infrastructure)
- V8 source code
- Wasmtime (Wasm runtime)
- HotSpot JVM
- Go compiler
Tiếp theo: Tuan-Foundations-Math-for-Architects — Linear algebra, probability, discrete math, info theory.