Foundations: Compilers, Virtual Machines & Memory Models

“Cloudflare Workers V8 isolate cold start <1ms? Đó là V8 (JS engine) khởi tạo isolate trong micro giây vì compile JIT-on-demand. Wasm Component Model? Đó là bytecode + capability sandbox. Java GC pause kill production? Đó là generational GC stop-the-world. Rust borrow checker? Đó là static type system phát hiện data race compile-time. Architect không hiểu PL/Compiler = không hiểu runtime characteristics của system mình build.”

Tags: cs-foundations compilers virtual-machines garbage-collection fundamentals Student: Hieu (Backend Dev → Architect) Liên quan: Tuan-Bonus-Edge-Wasm-Architecture · Tuan-Foundations-OS-Essentials · Tuan-Foundations-Computer-Architecture

1. Context & Why

Tại sao cần hiểu Compilers & VMs?

Architecture decision	PL/Compiler concept
Wasm Component Model	Bytecode + capability sandbox
V8 isolate cold start	JIT compilation, snapshot
Java/Go GC pauses	Generational GC, stop-the-world
Rust no GC, fast	Static analysis, ownership
Python GIL bottleneck	Interpreter design
Erlang/BEAM hot reload	VM-level deployment
WebAssembly polyglot	Common bytecode target
Kubernetes operators in Rust	Memory safety + perf

Key insight: Choice of language = trade-off giữa performance, safety, productivity. Hiểu compiler internals → đưa ra decision đúng cho mỗi service.

Tham chiếu chính

Crafting Interpreters (Bob Nystrom, free) — https://craftinginterpreters.com/
Compilers: Principles, Techniques, and Tools (Dragon Book — Aho et al.)
Programming Language Pragmatics (Scott)
The Garbage Collection Handbook (Jones, Hosking, Moss)
Engineering a Compiler (Cooper & Torczon)

2. Deep Dive — Compilation Pipeline

2.1 Phases of Compilation

Source code
    │
    ▼
┌───────────────────────┐
│  Lexer (Tokenizer)    │  Source → Tokens
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Parser               │  Tokens → AST
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Semantic Analysis    │  Type checking, scope
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  IR (Intermediate     │  Architecture-independent
│  Representation)      │
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Optimization         │  Constant folding, dead code,
│                       │  inlining, vectorization
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│  Code Generation      │  IR → target machine code
└──────────┬────────────┘
           ▼
       Machine code
       (or Bytecode)

2.1.1 Lexer

Convert characters to tokens:

Input:  "let x = 42;"
Tokens: [LET, IDENT(x), EQUALS, NUMBER(42), SEMI]

2.1.2 Parser

Tokens to AST:

let x = 42 + y * 2;

       Assign
       /    \
      x      Add
            /    \
          42     Mul
                /   \
               y     2

Top-down (recursive descent) vs bottom-up (LR, LALR).

2.1.3 Type checking

let x: int = "hello";  // Error: string not assignable to int

Static type system: Catch errors at compile time. Type inference: Infer types from usage (Rust, Haskell, TypeScript).

2.2 Compilation Strategies

2.2.1 Ahead-of-Time (AOT)

Compile entire program before run.

gcc main.c -o main    # AOT compile
./main                 # Execute machine code directly

Examples: C, C++, Rust, Go (default), Swift.

Pros:

Fast startup (no compile at runtime)
Predictable performance
Whole-program optimization

Cons:

Slow build
Less runtime adaptability

2.2.2 Just-In-Time (JIT)

Compile during execution (typically in VM).

JS source
    ↓
V8 parses to AST
    ↓
Ignition (interpreter): runs bytecode
    ↓
TurboFan (JIT): hot functions → optimized machine code
    ↓
Deoptimize if assumptions wrong → back to interpreter

Examples: V8 (JS), JVM (HotSpot), .NET CLR, PyPy.

Pros:

Adaptive optimization (profile-guided)
Fast startup (interpret first)
Reoptimize based on runtime info

Cons:

Warmup time before peak perf
More memory (compiler in process)
Complexity

2.2.3 Interpretation

Execute AST or bytecode directly, no machine code.

# CPython:
source.py → bytecode (.pyc) → interpreter executes

Examples: CPython, Ruby (MRI), Bash, Lua.

Pros: Simplest, portable. Cons: 10-100x slower than compiled.

2.2.4 AOT + JIT (hybrid)

Java: JVM bytecode AOT, JIT to native at runtime
C# / .NET: IL AOT, JIT or AOT-compiled
Wasm: AOT to bytecode, AOT-compile or JIT in browser/runtime

2.3 Type Systems

2.3.1 Static vs Dynamic

	Static (Java, Rust, Go)	Dynamic (Python, JS, Ruby)
Type check	Compile time	Runtime
Errors caught	Early	Late
Refactoring safety	High	Low
Productivity initial	Lower	Higher
Productivity at scale	Higher	Lower
Performance	Higher	Lower

2.3.2 Strong vs Weak

Strong (Python, Java, Rust): no implicit conversions, rejects "3" + 4. Weak (JS, PHP): coerces, "3" + 4 = "34" (JS).

2.3.3 Nominal vs Structural

Nominal (Java, C#): Same name = same type.

class Point { int x, y; }
class Coord { int x, y; }  // Different from Point despite same shape

Structural (TypeScript, Go interfaces): Same shape = same type.

interface Point { x: number; y: number; }
interface Coord { x: number; y: number; }
// Compatible, can assign one to other

2.3.4 Type inference

let x = 42;          // Inferred: i32
let v = vec![1, 2];  // Inferred: Vec<i32>
 
fn add(a: i32, b: i32) -> i32 {
    a + b
}
let result = add(1, 2);  // Inferred: i32

Hindley-Milner: Foundational algorithm, used in Haskell, ML, OCaml.

2.4 Memory Management

3 main approaches: manual, garbage collected, ownership-based.

2.4.1 Manual

void* p = malloc(100);
// ... use p
free(p);  // Programmer responsibility

Pros: Predictable, no overhead. Cons: Memory leaks, use-after-free, double-free, buffer overflows.

2.4.2 Garbage Collection

VM tracks references, frees unreachable objects.

List<String> list = new ArrayList<>();  // Allocated on heap
list = null;  // Not freed yet, GC will collect

3 main GC algorithms:

Mark-Sweep

Mark: Walk from roots (stack, globals), mark reachable objects
Sweep: Free unmarked

Pros: Simple Cons: Stop-the-world, heap fragmentation

Mark-Compact

Mark-sweep + compact reachable objects to one end.

Pros: No fragmentation Cons: Even slower

Generational

Insight: Most objects die young.

Young Generation (Eden + Survivor)
  ↓ (objects survive several GCs)
Old Generation (long-lived)

Minor GC: Young gen only, frequent, fast
Major GC (Full GC): All gens, infrequent, slow

Pros: Fast for typical workload. Cons: Tuning complexity.

2.4.3 Modern GCs

G1 GC (Java 9+ default)

Region-based heap
Concurrent marking
Pauses: 10-200ms typical

ZGC (Java 11+, low-pause)

Concurrent everything
Pauses < 10ms even for huge heaps (TBs)
Trade-off: higher CPU/memory overhead

Shenandoah (Red Hat)

Similar to ZGC, low-pause.

Go GC (concurrent, mark-sweep)

Designed for low-latency
Concurrent marking
Sub-millisecond pauses (modern Go)

Python (reference counting + cycle detection)

Each object has refcount
When refcount → 0, free immediately
Periodic cycle detection for cycles
GIL prevents concurrent modifications

2.4.4 Ownership (Rust)

Compile-time memory management without GC.

fn main() {
    let s = String::from("hello");  // s owns
    take_ownership(s);
    // println!("{}", s);  // Error: s moved
}
 
fn take_ownership(s: String) {
    println!("{}", s);
    // s freed at end of scope
}

Borrow checker: Ensures no use-after-free, no data race, all at compile time.

Trade-off: Steeper learning curve, but no runtime overhead.

2.5 Bytecode & Virtual Machines

2.5.1 What is a VM?

Software-emulated machine. Executes bytecode instead of native code.

Source → Compile to bytecode → VM executes bytecode (interpret/JIT)

Examples:

JVM (Java, Kotlin, Scala, Clojure)
CLR (C#, F#, VB.NET)
V8 (JavaScript)
Wasm runtime (Wasmtime, V8, etc.)
CPython VM (Python)
Erlang BEAM (Erlang, Elixir)

2.5.2 Why VMs?

Portability: Same bytecode → run on any platform with VM
Safety: VM enforces type/memory checks
Optimization: Profile-guided JIT
Hot reload: Replace bytecode at runtime
Sandbox: Limit what code can do

2.5.3 JVM bytecode example

public int add(int a, int b) {
    return a + b;
}

Compiled bytecode:

iload_1     ; push a
iload_2     ; push b
iadd        ; add ints, push result
ireturn     ; return

Stack-based VM: Operations pop operands from stack.

2.5.4 Wasm bytecode

(module
  (func $add (param $a i32) (param $b i32) (result i32)
    local.get $a
    local.get $b
    i32.add)
  (export "add" (func $add)))

Wasm = stack-based bytecode + structured control flow + linear memory + types.

Designed for: Fast load, fast verify, fast execute.

2.5.5 V8 architecture (browser + Node.js + Cloudflare Workers)

JS source
    ↓
Parser → AST
    ↓
Ignition (interpreter): bytecode
    ↓ profile hot functions
TurboFan (optimizing JIT): optimized machine code
    ↓ deoptimize on bad assumption
Back to Ignition

Isolates: Lightweight V8 instances. Cloudflare Workers spawn isolate per request, share heap snapshot.

Cold start: V8 isolate creation ~5ms vs 100-1000ms for container.

2.6 JIT Optimizations

JIT compilers do amazing things runtime:

2.6.1 Inlining

function square(x) { return x * x; }
 
for (let i = 0; i < 1000; i++) {
    sum += square(i);
}
 
// JIT inlines:
for (let i = 0; i < 1000; i++) {
    sum += i * i;
}

2.6.2 Type specialization

function add(a, b) { return a + b; }
 
// JIT sees always called with int
// → Generate int-specialized version (fast)
// If suddenly called with string → deoptimize

2.6.3 Escape analysis

Allocate on stack instead of heap if object doesn’t escape function.

public void foo() {
    Point p = new Point(1, 2);  // JIT may allocate on stack
    System.out.println(p.x + p.y);
    // p doesn't escape
}

2.6.4 Loop optimizations

Loop unrolling
Loop invariant code motion
Vectorization (SIMD)

2.7 Memory Model & Concurrency

Memory model = rules about how thread sees memory writes from other threads.

2.7.1 The problem

// Thread 1
x = 1;
y = 2;
 
// Thread 2
print(y);  // Could see 2 but x still 0?
print(x);

Without memory model: Compiler/CPU can reorder writes for performance.

2.7.2 Memory ordering

Java (since 1.5), C++11, Rust define memory models:

Order	Guarantee
Relaxed	No ordering — fastest
Acquire/Release	Pairwise sync
Sequential consistency	Total global order — slowest

use std::sync::atomic::{AtomicI32, Ordering};
 
let x = AtomicI32::new(0);
x.store(1, Ordering::Release);  // Pairs with Acquire load elsewhere
let v = x.load(Ordering::Acquire);

2.7.3 Atomic operations

let counter = AtomicU64::new(0);
counter.fetch_add(1, Ordering::Relaxed);  // Hardware atomic

Hardware support: x86 LOCK prefix, ARM LDXR/STXR, etc.

2.7.4 Lock-free data structures

// Treiber stack: lock-free
let head: AtomicPtr<Node> = ...;
 
fn push(value: T) {
    let new_node = Box::new(Node { value, next: null });
    loop {
        let old_head = head.load(Acquire);
        new_node.next = old_head;
        if head.compare_exchange(old_head, new_node, Release, Relaxed).is_ok() {
            return;
        }
    }
}

CAS (Compare-And-Swap) is foundation. Hardware supports atomic CAS.

2.7.5 Common pitfalls

Word tearing: 64-bit value updated non-atomically on 32-bit platform
Volatile (Java) ≠ atomic: Just prevents reordering, doesn’t make compound atomic
Java synchronized keyword: Implicit memory barrier + mutex

2.8 Concurrency Models

2.8.1 Threads + locks (traditional)

synchronized void deposit(int amount) {
    balance += amount;
}

Pros: Direct, familiar Cons: Deadlock, race conditions, hard to reason

2.8.2 Actor model (Erlang, Akka)

% Each actor = process with mailbox
loop(State) ->
    receive
        {deposit, Amount} ->
            loop(State + Amount);
        {balance, From} ->
            From ! State,
            loop(State)
    end.

Pros: No shared state, isolated failures Cons: Different mental model

2.8.3 CSP — Communicating Sequential Processes (Go)

ch := make(chan int)
go func() {
    ch <- 42
}()
result := <-ch

Channels for communication, goroutines for concurrency.

2.8.4 Async/await (modern)

async def fetch_user(id):
    response = await http_get(f"/users/{id}")
    return response.json()

Cooperative scheduling: Function explicitly yields control.

2.8.5 STM — Software Transactional Memory (Clojure, Haskell)

atomically $ do
    bal <- readTVar balance
    writeTVar balance (bal + 100)

Optimistic concurrency: Transactions retry on conflict.

2.9 Sandboxing — Security via Compiler/VM

2.9.1 Levels of sandboxing

Level	Example	Cost
OS process	Container	Heavy (~10MB)
VM (gVisor)	Cloud Functions	Medium
Bytecode VM (Wasm)	Workers	Light (~1ms cold)
Language VM (V8 isolate)	Workers	Light
Native sandbox (seccomp)	Container	Very light

2.9.2 Wasm sandbox

No syscalls direct
Limited memory (linear memory)
No network/file by default
Capability-based: import what you need

2.9.3 Capability security

Code can only do what’s explicitly granted:

Component imports:
  - wasi:filesystem  (file operations)
  - wasi:http        (HTTP calls)

Code cannot:
  - Spawn process
  - Access network outside http
  - Read files outside provided dirs

Different from “trusted code” — by construction, not by trust.

2.10 Compiler Optimizations

2.10.1 Constant folding

int x = 1 + 2 + 3;
// Compile time: x = 6

2.10.2 Dead code elimination

if (false) {
    expensive_call();  // Removed
}

2.10.3 Common subexpression elimination

y = a + b + c;
z = a + b + d;
// → tmp = a + b; y = tmp + c; z = tmp + d;

2.10.4 Loop unrolling

for (int i = 0; i < 100; i++) sum += arr[i];
// →
for (int i = 0; i < 100; i += 4) {
    sum += arr[i] + arr[i+1] + arr[i+2] + arr[i+3];
}

2.10.5 Inlining

Replace function call with body.

2.10.6 Auto-vectorization

Compiler emits SIMD instructions automatically.

for (int i = 0; i < N; i++) c[i] = a[i] + b[i];
// Compiler emits: SIMD vectorized add (4-8 elements at once)

3. Practical Implications

3.1 Choosing language for backend

Need	Recommended
Maximum performance, no GC	Rust, C++
High throughput, simple	Go
Mature ecosystem, JVM	Java, Kotlin
Productivity, OK perf	Python, Ruby (small services)
Browser/JS reuse	TypeScript / Node
Real-time, fault-tolerant	Erlang, Elixir
ML/data science	Python
Data engineering	Scala, Java (Spark)

3.2 GC tuning matters

Java GC tuning can change throughput 2-10x:

# Throughput (G1 GC, default)
java -XX:+UseG1GC -Xmx4g app.jar
 
# Low latency (ZGC)
java -XX:+UseZGC -Xmx16g app.jar
 
# Heap sizing
-Xms2g -Xmx2g  # Same min/max → no resizing
 
# Logging
-Xlog:gc*:file=gc.log

Common settings:

-XX:MaxGCPauseMillis=200 (G1 target)
-XX:+ParallelRefProcEnabled
-XX:+AlwaysPreTouch (zero out heap upfront)

3.3 V8 isolate cold start

Why Workers fast:

V8 process pre-started
Per request: spawn new isolate (~1ms)
Run code (~ms)
Destroy isolate

Compare:

AWS Lambda: 100-1000ms cold start (container init)
Lambda@Edge: 100-300ms
Workers: 5ms

3.4 Rust adoption in infrastructure

Why Rust everywhere 2020+:

Memory safety without GC
Performance ≈ C++
Zero-cost abstractions
No data race compile-time
Modern type system

Used by:

TiKV (distributed KV)
Cloudflare Pingora (HTTP proxy replaces Nginx)
Apache Polaris (Iceberg REST catalog)
Scylla (planned Rust port)
Many sidecar/proxies

4. Performance & Profiling

4.1 JVM profiling

# Java Flight Recorder (free since Java 11)
java -XX:StartFlightRecording=duration=60s,filename=profile.jfr ...
 
# Async profiler (lower overhead)
async-profiler -d 30 -e cpu -f flame.html <pid>

4.2 Go profiling

import _ "net/http/pprof"
 
go func() {
    log.Println(http.ListenAndServe(":6060", nil))
}()

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

4.3 V8 profiling

node --prof app.js
# Generates isolate-*.log
 
node --prof-process isolate-*.log > processed.txt

4.4 Common metrics

CPU samples: Where is time spent?
Allocation rate: Bytes/sec allocated
GC pauses: Distribution of pauses
Heap fragmentation: Used vs reserved

5. Practical Code Patterns

5.1 Avoid GC pressure

Allocate less, reuse more:

// BAD: allocate per call
void process(Request req) {
    List<String> tags = new ArrayList<>();
    // ...
}
 
// GOOD: reuse via thread-local
ThreadLocal<List<String>> tagsTL = ThreadLocal.withInitial(ArrayList::new);
 
void process(Request req) {
    List<String> tags = tagsTL.get();
    tags.clear();
    // ...
}

5.2 Lock-free patterns

// Bad: blocking
private int count = 0;
synchronized void increment() { count++; }
 
// Good: atomic
private AtomicInteger count = new AtomicInteger();
void increment() { count.incrementAndGet(); }

5.3 Avoid boxing

// BAD: boxes on every call
Map<Integer, Integer> map = new HashMap<>();
map.put(1, 2);  // Integer boxing
 
// GOOD: primitive collections (Eclipse Collections, Trove)
IntIntMap map = new IntIntHashMap();
map.put(1, 2);  // No boxing

6. Code Examples

6.1 Simple lexer in Python

import re
 
TOKEN_SPEC = [
    ('NUMBER',   r'\d+'),
    ('IDENT',    r'[a-zA-Z_]\w*'),
    ('PLUS',     r'\+'),
    ('MINUS',    r'-'),
    ('LPAREN',   r'\('),
    ('RPAREN',   r'\)'),
    ('SKIP',     r'\s+'),
]
 
def tokenize(text):
    pattern = '|'.join(f'(?P<{name}>{pat})' for name, pat in TOKEN_SPEC)
    for match in re.finditer(pattern, text):
        kind = match.lastgroup
        value = match.group()
        if kind != 'SKIP':
            yield (kind, value)
 
 
tokens = list(tokenize("3 + (4 * 2)"))
print(tokens)
# [('NUMBER', '3'), ('PLUS', '+'), ('LPAREN', '('),
#  ('NUMBER', '4'), ('IDENT', '*'), ('NUMBER', '2'), ('RPAREN', ')')]

6.2 Recursive descent parser

class Parser:
    def __init__(self, tokens):
        self.tokens = list(tokens)
        self.pos = 0
 
    def peek(self):
        return self.tokens[self.pos] if self.pos < len(self.tokens) else None
 
    def consume(self):
        tok = self.tokens[self.pos]
        self.pos += 1
        return tok
 
    def parse_expr(self):
        left = self.parse_term()
        while self.peek() and self.peek()[0] in ('PLUS', 'MINUS'):
            op = self.consume()
            right = self.parse_term()
            left = ('binop', op[0], left, right)
        return left
 
    def parse_term(self):
        # Simplified
        return self.parse_atom()
 
    def parse_atom(self):
        tok = self.consume()
        if tok[0] == 'NUMBER':
            return ('num', int(tok[1]))
        elif tok[0] == 'LPAREN':
            expr = self.parse_expr()
            self.consume()  # RPAREN
            return expr
 
 
tokens = tokenize("3 + 4 + 5")
ast = Parser(tokens).parse_expr()
print(ast)
# ('binop', 'PLUS', ('binop', 'PLUS', ('num', 3), ('num', 4)), ('num', 5))

6.3 Stack-based VM

class VM:
    def __init__(self):
        self.stack = []
 
    def execute(self, bytecode):
        for op in bytecode:
            if op[0] == 'PUSH':
                self.stack.append(op[1])
            elif op[0] == 'ADD':
                b = self.stack.pop()
                a = self.stack.pop()
                self.stack.append(a + b)
            elif op[0] == 'PRINT':
                print(self.stack.pop())
 
 
# Bytecode for: print(3 + 4)
program = [
    ('PUSH', 3),
    ('PUSH', 4),
    ('ADD',),
    ('PRINT',),
]
VM().execute(program)  # 7

6.4 Reference counting GC (simplified)

class RefCounted:
    def __init__(self):
        self.refcount = 1
 
    def acquire(self):
        self.refcount += 1
        return self
 
    def release(self):
        self.refcount -= 1
        if self.refcount == 0:
            self._destroy()
 
    def _destroy(self):
        # Free resources
        pass

7. System Design Diagrams

7.1 Compilation Pipeline

flowchart TB
    Source[Source code]
    Source --> Lex[Lexer<br/>tokens]
    Lex --> Parse[Parser<br/>AST]
    Parse --> TypeCheck[Type Check<br/>Semantic Analysis]
    TypeCheck --> IR[Intermediate<br/>Representation]
    IR --> Opt[Optimizer<br/>const fold, inline,<br/>vectorize]
    Opt --> CodeGen[Code Generator]

    CodeGen --> Native[Native Machine Code]
    CodeGen --> Bytecode[Bytecode<br/>JVM, Wasm]

    Bytecode --> VM[Virtual Machine<br/>Interpret or JIT]

    style Source fill:#bbdefb
    style Native fill:#c8e6c9
    style Bytecode fill:#fff9c4

7.2 V8 JIT Pipeline

flowchart LR
    JS[JS Source] --> Parse[Parser]
    Parse --> AST[AST]
    AST --> Igni[Ignition<br/>Interpreter]
    Igni --> Bytecode[Bytecode]
    Bytecode --> Hot{Hot function?}
    Hot -->|Yes| Turbo[TurboFan<br/>Optimizing JIT]
    Hot -->|No| Continue[Continue interpreting]
    Turbo --> Optimized[Optimized<br/>Native Code]
    Optimized --> Deopt{Assumption broken?}
    Deopt -->|Yes| Igni
    Deopt -->|No| Continue

7.3 GC Generations

flowchart LR
    subgraph Heap["JVM Heap"]
        Eden["Eden<br/>(Young)"]
        S0[Survivor 0]
        S1[Survivor 1]
        Old["Old Gen<br/>(Tenured)"]

        Eden -->|Minor GC, survives| S0
        S0 -->|Minor GC| S1
        S1 -->|Tenured after N| Old
    end

    GC{GC Trigger}
    GC -->|Eden full<br/>Minor GC| Eden
    GC -->|Old full<br/>Major GC| Old

7.4 Memory Models

flowchart LR
    subgraph Manual["Manual (C/C++)"]
        M1[malloc/free<br/>direct control]
        M2[Risk: leaks, UAF]
    end

    subgraph GC["GC (Java/Go/Python)"]
        G1[VM tracks refs<br/>auto-frees]
        G2[Pause times<br/>throughput cost]
    end

    subgraph Ownership["Ownership (Rust)"]
        O1[Compile-time<br/>borrow checker]
        O2[No GC<br/>no UAF<br/>steeper learning]
    end

    style Manual fill:#ffcdd2
    style GC fill:#fff9c4
    style Ownership fill:#c8e6c9

8. Aha Moments & Pitfalls

Aha Moments

#1: Compilers are translators with optimizers. Source → AST → IR → optimized → target. Each step opens optimization opportunities.

#2: JIT vs AOT trade-off. JIT adapts to runtime patterns. AOT predictable startup. Java’s HotSpot famously beats AOT in long-running workloads.

#3: GC vs ownership = different trade-offs. GC easier, runtime cost. Rust harder upfront, no runtime cost.

#4: Bytecode is portable + safe. Wasm runs in browser, server, edge — same binary. JVM bytecode is verified before exec.

#5: V8 isolate ≠ container. Isolate = lightweight V8 sandbox, ms cold start. Container = whole OS namespace, ~ms init.

#6: Memory models matter for concurrent code. Without proper ordering, you’ll see “impossible” bugs. Java/C++/Rust formalize.

#7: Lock-free is hard but rewarding. CAS-based stacks/queues 10-100x throughput vs mutex. Need expert.

#8: Capability security via compiler. Wasm Component Model enforces capabilities at type level. Stronger than runtime checks.

Pitfalls

Pitfall 1: Trusting type for runtime

Java type checks at compile, but runtime can have ClassCastException via reflection. Fix: Treat type system as primary defense, runtime checks as safety net.

Pitfall 2: GC pause shock

4GB heap, default G1, sudden 5-second pause → outage. Fix: Tune GC, monitor pauses, use ZGC for low-latency.

Pitfall 3: Memory leak in GC language

“GC frees everything” → wrong. Holds references in static maps, listener callbacks → leak. Fix: Profiler, weak references where appropriate.

Pitfall 4: Boxing causing GC pressure

Map<Integer, Integer> allocates Integer objects per entry → GC churn. Fix: Primitive collections (Eclipse Collections, Trove for Java).

Pitfall 5: synchronized everywhere

Coarse locks → contention. 4-core machine performance ≈ 1-core. Fix: Fine-grained locks, lock-free DS, immutable data.

Pitfall 6: JIT warmup problem

Production deployment: first requests slow because JIT not optimized yet. Fix: Warm up with synthetic load before serving real traffic.

Pitfall 7: Stop-the-world surprise

JVM full GC freezes app for seconds. K8s health probe fails → restart. Fix: Increase health probe timeout, tune GC.

Pitfall 8: Native call boundary

JNI/FFI calls 100x slower than internal. Crossing repeatedly → bottleneck. Fix: Batch calls, use direct buffers.

Pitfall 9: Reflection abuse

Class.forName(), dynamic proxies → JIT can’t optimize. Fix: Code generation at startup (e.g., MapStruct, Dagger).

Pitfall 10: Premature lock-free

Implement lock-free queue → 100 lines of subtle code → bugs. Fix: Use proven libs (java.util.concurrent, crossbeam in Rust).

9. Internal Links

Topic	Connects to
Tuan-Bonus-Edge-Wasm-Architecture	Wasm Component Model, V8 isolates
Tuan-Foundations-OS-Essentials	Process, thread, namespaces underlie VMs
Tuan-Foundations-Computer-Architecture	JIT optimization targets memory hierarchy
Tuan-Bonus-Consistency-Models-Isolation	Memory model, atomicity

Tham khảo

Books:

Crafting Interpreters (Bob Nystrom, free) — https://craftinginterpreters.com/
Compilers: Principles, Techniques, and Tools (Dragon Book — Aho et al.)
Engineering a Compiler (Cooper & Torczon)
Programming Language Pragmatics (Scott)
The Garbage Collection Handbook (Jones, Hosking, Moss)
Modern Compiler Implementation in Java (Appel)

Online:

Crafting Interpreters free book — https://craftinginterpreters.com/
LLVM tutorial — https://llvm.org/docs/tutorial/
V8 deep dive — https://v8.dev/blog/

Papers:

The Cliff Click Hash Map (lock-free)
A History of Modern 64-bit Computing — context

Specific projects to study:

LLVM (modular compiler infrastructure)
V8 source code
Wasmtime (Wasm runtime)
HotSpot JVM
Go compiler

Tiếp theo: Tuan-Foundations-Math-for-Architects — Linear algebra, probability, discrete math, info theory.

lthieu's notes

Explorer

Tuan-Foundations-Compilers-VMs