Tuần Bonus: Model Context Protocol (MCP) Architecture
“Trước MCP: Mỗi AI app phải tự code integration với GitHub, Slack, Postgres, Stripe — N×M problem cho N apps × M services. Sau MCP: 1 standard protocol, 2000+ servers chia sẻ chung. MCP là ‘USB-C của AI tools’ — Anthropic giới thiệu cuối 2024, OpenAI/Google/Sourcegraph adopt 2025.”
Tags: system-design mcp ai-infrastructure protocol anthropic bonus Student: Hieu (Backend Dev → Architect) Prerequisite: Tuan-04-API-Design-REST-gRPC · Tuan-14-AuthN-AuthZ-Security Liên quan: Tuan-Bonus-LLM-Serving-Infrastructure · Tuan-Bonus-Outbox-Pattern
1. Context & Why
Analogy đời thường — Chuẩn USB-C
Hieu, tưởng tượng năm 2010 mỗi nhãn điện thoại 1 cổng sạc:
- iPhone: Lightning
- Samsung: Mini-USB
- Nokia: 3.5mm
- BlackBerry: Mini-USB B
- Sony: Magnetic
Khách hàng phải mua N cáp khác nhau. Khách sạn phải có M loại cáp cho khách. Total integration = N × M.
USB-C (2015): 1 chuẩn, hoạt động với mọi thiết bị. Khách 1 cáp, khách sạn 1 loại.
Model Context Protocol (MCP) là USB-C của AI tools:
- Trước: Claude Desktop, Cursor, ChatGPT, Cline mỗi cái tự code integration với GitHub, Slack, Postgres, Linear…
- Sau: 1 chuẩn JSON-RPC, ai cũng implement được, share lẫn nhau
Tại sao Backend Dev cần hiểu MCP?
| Lý do | Hậu quả |
|---|---|
| AI agents are taking off | Mọi product 2025+ tích hợp AI |
| Tool standardization | Build 1 MCP server, work cho mọi LLM |
| Industry standard | Anthropic, OpenAI, Google DeepMind, Sourcegraph adopt |
| Security perimeter | MCP servers expose data → cần auth, rate limit, audit |
| Backend skill | Build MCP server = Python/TS server với rich API |
Timeline & Adoption
- Nov 2024: Anthropic giới thiệu MCP với Claude Desktop
- Dec 2024: First wave servers (filesystem, GitHub, Slack)
- Mar 2025: Streamable HTTP transport — production ready
- 2025 Q2-Q3: ~2000 community servers
- Aug 2025: OpenAI announces support
- Q4 2025: Google DeepMind, Sourcegraph integrate
- Nov 2025: Spec version 2025-11-25
Tham chiếu chính
- MCP Spec — https://modelcontextprotocol.io/specification/2025-11-25
- MCP TypeScript SDK — https://github.com/modelcontextprotocol/typescript-sdk
- MCP Python SDK — https://github.com/modelcontextprotocol/python-sdk
- MCP Server Registry — https://github.com/modelcontextprotocol/servers
- Cloudflare Remote MCP — https://blog.cloudflare.com/remote-model-context-protocol-servers-mcp/
2. Deep Dive — Khái niệm cốt lõi
2.1 The N×M Problem
Trước MCP:
LLM Apps (N): Services (M):
- Claude Desktop - GitHub
- Cursor - Slack
- Cline - Postgres
- ChatGPT - Linear
- Custom apps - Notion
- Stripe
- Files
Total integrations needed: N × M = 5 × 7 = 35
Each app must build/maintain integration with each service.
Sau MCP:
LLM Apps speak MCP client:
- Claude Desktop → MCP client
- Cursor → MCP client
Services expose MCP server:
- github-mcp-server
- slack-mcp-server
- postgres-mcp-server
Total integrations: N + M = 5 + 7 = 12
Each app implements MCP client once.
Each service implements MCP server once.
2.2 MCP Architecture
┌──────────────────┐ ┌──────────────────┐
│ AI Application │ │ MCP Server │
│ (Host) │ │ (Tool/Service) │
│ │ │ │
│ ┌────────────┐ │ JSON-RPC 2.0 │ ┌──────────────┐ │
│ │ MCP Client ├─┼──────────────────┼─┤ Service │ │
│ └────────────┘ │ (transport) │ │ logic │ │
│ │ │ └──────────────┘ │
│ ┌────────────┐ │ │ │
│ │ LLM │ │ │ Exposes: │
│ │ (Claude, │ │ │ - Tools │
│ │ GPT-4...│ │ │ - Resources │
│ └────────────┘ │ │ - Prompts │
└──────────────────┘ └──────────────────┘
3 actor:
- Host: User-facing AI app (Claude Desktop, Cursor)
- Client: Embedded in host, manages MCP connections
- Server: Exposes capabilities (tools, resources, prompts)
2.3 Three Capabilities
2.3.1 Tools (callable functions)
LLM can invoke functions on server.
// Server exposes tool
{
"name": "search_github_issues",
"description": "Search GitHub issues by query",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string" },
"repo": { "type": "string" },
"limit": { "type": "integer", "default": 10 }
},
"required": ["query"]
}
}
// LLM calls
{
"method": "tools/call",
"params": {
"name": "search_github_issues",
"arguments": {
"query": "memory leak",
"repo": "anthropic/sdk",
"limit": 5
}
}
}
// Response
{
"content": [
{
"type": "text",
"text": "Found 3 issues: ..."
}
]
}2.3.2 Resources (read-only data)
LLM can read files, DB rows, web pages.
// List resources
{
"method": "resources/list"
}
// Response
{
"resources": [
{
"uri": "file:///path/to/doc.md",
"name": "Documentation",
"mimeType": "text/markdown"
}
]
}
// Read resource
{
"method": "resources/read",
"params": { "uri": "file:///path/to/doc.md" }
}
// Response
{
"contents": [
{ "uri": "...", "mimeType": "text/markdown", "text": "# Title\n..." }
]
}2.3.3 Prompts (reusable templates)
Server provides prompt templates user can invoke.
{
"method": "prompts/get",
"params": {
"name": "code_review",
"arguments": { "file": "src/app.py" }
}
}
// Response: structured prompt with file content embedded2.4 Transport Layers
2.4.1 stdio (local processes)
Most common cho local tools (filesystem, shell access).
Host process spawns server as subprocess
Communication via stdin/stdout (JSON-RPC over newline-delimited JSON)
Pros: Simple, secure (no network exposure), low latency Cons: Local only, single client
Example: Claude Desktop launches npx @modelcontextprotocol/server-filesystem as subprocess.
2.4.2 Streamable HTTP (production)
Introduced 2025: HTTP transport for remote MCP servers.
POST /mcp HTTP/1.1
Host: server.example.com
Content-Type: application/json
Authorization: Bearer <token>
{"jsonrpc":"2.0","method":"tools/list","id":1}
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"jsonrpc":"2.0","result":{...},"id":1}
data: {"jsonrpc":"2.0","method":"notifications/...","params":{...}}
Features:
- Server-Sent Events (SSE) for streaming responses
- HTTP-friendly (auth, proxy, CDN)
- Stateless or stateful (session)
Pros: Production-ready, standard HTTP infrastructure Cons: Need auth design
2.4.3 WebSocket (deprecated in favor of Streamable HTTP)
Initially supported, now superseded.
2.5 JSON-RPC 2.0 Foundation
MCP messages are JSON-RPC 2.0:
Request:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": { "name": "...", "arguments": {...} }
}Response (success):
{
"jsonrpc": "2.0",
"id": 1,
"result": { "content": [...] }
}Response (error):
{
"jsonrpc": "2.0",
"id": 1,
"error": { "code": -32602, "message": "Invalid params" }
}Notification (no response expected):
{
"jsonrpc": "2.0",
"method": "notifications/cancelled",
"params": { "requestId": 1 }
}2.6 Lifecycle & Capability Negotiation
Client Server
│ │
├──── initialize ──────────────────►│
│ {protocolVersion, capabilities,
│ clientInfo}
│ │
│◄──── initialized ─────────────────┤
│ {protocolVersion, capabilities,
│ serverInfo}
│ │
├──── initialized notification ────►│
│ │
├──── tools/list ──────────────────►│
│◄──── result ──────────────────────┤
│ │
├──── tools/call ──────────────────►│
│◄──── result ──────────────────────┤
│ │
├──── shutdown ────────────────────►│
│◄──── result ──────────────────────┤
Capability negotiation: Client and server announce what they support (resources, tools, prompts, sampling, logging).
2.7 Security Model
MCP servers expose powerful capabilities (file system, DB, APIs). Security paramount.
2.7.1 Local stdio: process isolation
- Server runs as subprocess of host
- Inherits user permissions
- No network attack surface
- Risk: Malicious server can read user files. Mitigation: trusted server registry.
2.7.2 Remote HTTP: OAuth 2.1 + DPoP
MCP spec 2025-11-25 mandates OAuth 2.1 for HTTP transport:
1. Client redirects user to /oauth/authorize
2. User consents to scopes (e.g., "tools:execute", "resources:read")
3. Authorization code → token exchange with PKCE
4. Token bound to client via DPoP (RFC 9449) — see Tuan-14
5. Each MCP request includes:
Authorization: DPoP <access_token>
DPoP: <signed JWT>
Tham chiếu: Tuan-14-AuthN-AuthZ-Security sections 2.16 (DPoP) và 2.17 (FAPI 2.0).
2.7.3 Server-side sandboxing
# Example: filesystem server with path restriction
ALLOWED_PATHS = ["/tmp/mcp-workspace", "/home/user/projects"]
def read_file(path: str) -> str:
abs_path = os.path.realpath(path) # Resolve symlinks
if not any(abs_path.startswith(p) for p in ALLOWED_PATHS):
raise PermissionError(f"Path {path} not allowed")
return open(abs_path).read()Best practices:
- Validate all paths (prevent path traversal)
- Sandbox file system access (chroot, containers)
- Rate limit per client
- Audit log all tool invocations
2.8 MCP vs OpenAPI vs gRPC
| Feature | MCP | OpenAPI/REST | gRPC |
|---|---|---|---|
| Designed for | LLM tool use | Human-facing APIs | RPC between services |
| Schema | JSON Schema (per tool) | OpenAPI YAML | Protobuf |
| Discoverability | Built-in (tools/list) | OpenAPI doc | Reflection (limited) |
| Streaming | Yes (SSE) | Limited | First-class |
| AI semantics | Tools, prompts, resources | Generic CRUD | Generic methods |
| Auth | OAuth 2.1 + DPoP | Bearer, OAuth | Pluggable |
| Best for | AI agents calling tools | Web APIs | Service-to-service |
MCP key differentiator: Designed for AI consumption — schema includes natural language descriptions, prompt templates, resource semantics.
2.9 Building MCP Server — Pattern
Reference architecture:
┌────────────────────────────────────────┐
│ MCP Server │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Transport Layer │ │
│ │ - stdio (subprocess) │ │
│ │ - Streamable HTTP │ │
│ └────────────┬─────────────────────┘ │
│ │ │
│ ┌────────────▼──────────────────────┐ │
│ │ Protocol Handler │ │
│ │ - JSON-RPC parsing │ │
│ │ - Capability negotiation │ │
│ └────────────┬──────────────────────┘ │
│ │ │
│ ┌────────────▼──────────────────────┐ │
│ │ Auth & Authorization │ │
│ │ - OAuth token validation │ │
│ │ - Scope enforcement │ │
│ │ - DPoP verification │ │
│ └────────────┬──────────────────────┘ │
│ │ │
│ ┌────────────▼──────────────────────┐ │
│ │ Tool/Resource Registry │ │
│ │ - Tool schemas │ │
│ │ - Resource URIs │ │
│ │ - Prompt templates │ │
│ └────────────┬──────────────────────┘ │
│ │ │
│ ┌────────────▼──────────────────────┐ │
│ │ Business Logic │ │
│ │ - Service integration │ │
│ │ - Rate limiting │ │
│ │ - Audit log │ │
│ └───────────────────────────────────┘ │
└────────────────────────────────────────┘
2.10 Production Deployment Patterns
2.10.1 Local stdio (developer tools)
User runs claude-desktop → spawns local MCP servers as subprocesses.
// claude_desktop_config.json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/projects"]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."
}
}
}
}2.10.2 Remote MCP (enterprise)
Centralized MCP server, accessible via HTTPS.
Cloudflare Workers → MCP Server (TypeScript)
↓
Postgres / GitHub API / etc.
Client (Claude Desktop, web app) → HTTPS → MCP Server
Cloudflare offers Remote MCP hosting (Apr 2025): Workers-based, OAuth built-in.
2.10.3 Multi-tenant MCP
┌──────────────────┐
│ MCP Gateway │
│ - Auth │
│ - Tenant routing│
└────────┬─────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Server A│ │ Server B│ │ Server C│
│ Tenant 1│ │ Tenant 2│ │ Tenant 3│
└─────────┘ └─────────┘ └─────────┘
Per-tenant MCP servers with isolated data and rate limits.
3. Estimation
3.1 Tool latency budget
Typical MCP tool call:
- LLM decision to call: ~500ms
- Network to MCP server: ~50ms
- Tool execution: variable (10ms - 5s)
- Response back: ~50ms
- LLM processes result: ~500ms
Total: 1.1s + tool execution time.
Implication: Tools should be fast (<100ms) for good UX. Long-running ops → progress events.
3.2 Throughput
Single MCP server:
- stdio: Limited by host (1 client)
- HTTP: Standard web server throughput (~1K-10K req/s per instance)
Scaling: Stateless HTTP MCP servers scale horizontally with load balancer.
3.3 Cost
Self-hosted Cloudflare Workers MCP:
- 20-100/month (downstream API costs)
vs custom REST API:
- $50-200/month server + integration cost
- MCP cost-effective via standardization
4. Security First
4.1 Threat model
| Threat | Mitigation |
|---|---|
| Malicious MCP server steals data | Trusted registry, code signing, sandbox |
| Prompt injection via tool results | Validate tool output, sanitize before returning to LLM |
| Token leak | OAuth + DPoP, short-lived tokens, key rotation |
| Path traversal in filesystem server | Realpath validation, allowed-paths whitelist |
| SQL injection in DB MCP | Parameterized queries, read-only access |
| Resource exhaustion | Rate limits, timeouts, query complexity bounds |
4.2 OAuth 2.1 flow (MCP spec)
1. Client → /authorize?
response_type=code&
client_id=...&
redirect_uri=...&
scope=tools:execute resources:read&
code_challenge=...& (PKCE)
code_challenge_method=S256
2. User consents
3. Server redirects → redirect_uri?code=...
4. Client → /token (POST)
grant_type=authorization_code&
code=...&
code_verifier=...
5. Server returns access_token (short-lived) + refresh_token
6. Client uses access_token in MCP calls:
Authorization: Bearer <access_token>
DPoP: <signed JWT>
4.3 Permission scopes
Recommended scope structure:
tools:list # See tool catalog
tools:execute # Call any tool
tools:execute:read_only # Only side-effect-free tools
tools:execute:filesystem:read
tools:execute:database:write
resources:list
resources:read
resources:read:public
prompts:list
prompts:use
Principle: Users grant minimal scopes. Apps request granular permissions.
4.4 Audit logging
CREATE TABLE mcp_audit_log (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
user_id UUID,
client_id UUID,
server_id TEXT,
method TEXT NOT NULL, -- e.g., 'tools/call'
tool_name TEXT, -- if tools/call
arguments JSONB,
result_status TEXT, -- 'success', 'error', 'denied'
duration_ms INT,
error_message TEXT
);
-- Index for compliance queries
CREATE INDEX idx_audit_user_time ON mcp_audit_log (user_id, timestamp DESC);4.5 Sandboxing untrusted servers
Risk: User installs random MCP server from internet → reads all files.
Mitigations:
- Container sandbox: Run server in restricted container (no network, limited fs)
- eBPF policy: Block syscalls
- Code signing: Verified publisher (like browser extensions)
- Permissions UI: Host shows what server can access
5. DevOps — MCP Operations
5.1 MCP server in TypeScript
// server.ts — minimal MCP server
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema
} from "@modelcontextprotocol/sdk/types.js";
const server = new Server(
{ name: "my-mcp-server", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
// Define tools
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "search_users",
description: "Search users by query",
inputSchema: {
type: "object",
properties: {
query: { type: "string" },
limit: { type: "integer", default: 10 }
},
required: ["query"]
}
}
]
}));
// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params;
if (name === "search_users") {
const users = await searchUsers(args.query, args.limit);
return {
content: [
{ type: "text", text: JSON.stringify(users, null, 2) }
]
};
}
throw new Error(`Unknown tool: ${name}`);
});
// Start
const transport = new StdioServerTransport();
await server.connect(transport);5.2 MCP server in Python (with HTTP transport)
# server.py
from mcp.server.fastmcp import FastMCP
import httpx
mcp = FastMCP("my-server")
@mcp.tool()
async def search_users(query: str, limit: int = 10) -> list[dict]:
"""Search users by query."""
async with httpx.AsyncClient() as client:
resp = await client.get(
f"https://api.example.com/users",
params={"q": query, "limit": limit}
)
return resp.json()
@mcp.resource("user://{user_id}")
async def get_user(user_id: str) -> str:
"""Get user profile by ID."""
# ... fetch user
return user_profile_markdown
@mcp.prompt()
def code_review(file: str) -> str:
"""Generate code review prompt."""
return f"Please review the code in {file} and suggest improvements."
if __name__ == "__main__":
# stdio transport
mcp.run()
# Or HTTP:
# mcp.run_sse(host="0.0.0.0", port=8000)5.3 Cloudflare Workers MCP
// wrangler.toml
// name = "mcp-worker"
// main = "src/index.ts"
// compatibility_date = "2025-11-01"
// src/index.ts
import { MCPServer } from "@cloudflare/workers-mcp";
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const server = new MCPServer({
name: "github-mcp",
version: "1.0.0",
tools: [
{
name: "search_issues",
description: "Search GitHub issues",
handler: async (args) => {
// Use env.GITHUB_TOKEN
return await searchIssues(args.query, env.GITHUB_TOKEN);
}
}
]
});
return server.handleRequest(request);
}
};5.4 Monitoring
groups:
- name: mcp_alerts
rules:
- alert: MCPHighErrorRate
expr: |
sum(rate(mcp_requests_total{status="error"}[5m])) /
sum(rate(mcp_requests_total[5m])) > 0.05
for: 5m
annotations:
summary: "MCP error rate > 5%"
- alert: MCPHighLatency
expr: |
histogram_quantile(0.99,
rate(mcp_tool_duration_seconds_bucket[5m])
) > 5
for: 5m
annotations:
summary: "P99 MCP tool latency > 5s"
- alert: MCPAuthFailures
expr: rate(mcp_auth_failures_total[5m]) > 1
for: 5m
annotations:
summary: "Suspicious auth failure rate"5.5 Testing MCP servers
# pytest test
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
async def test_mcp_server():
params = StdioServerParameters(
command="python", args=["server.py"]
)
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Test tools/list
tools = await session.list_tools()
assert any(t.name == "search_users" for t in tools.tools)
# Test tools/call
result = await session.call_tool(
"search_users",
arguments={"query": "alice", "limit": 5}
)
assert len(result.content) > 06. Code Implementation
6.1 Production MCP server (Python)
"""
Production-grade MCP server với:
- OAuth 2.1 auth
- Rate limiting per client
- Audit logging
- Permission scopes
"""
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import asyncio
import time
class ProductionMCPServer:
def __init__(self):
self.server = Server("production-server")
self.rate_limits: dict[str, list[float]] = {}
self._setup_handlers()
def _setup_handlers(self):
@self.server.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="query_database",
description="Run SQL query (read-only)",
inputSchema={
"type": "object",
"properties": {
"sql": {"type": "string"},
"params": {"type": "array"}
},
"required": ["sql"]
}
),
]
@self.server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
client_id = self._get_client_id()
# Rate limiting
if not self._check_rate_limit(client_id):
return [TextContent(
type="text",
text="Rate limit exceeded. Try again later."
)]
# Permission check
if not self._has_permission(client_id, name):
return [TextContent(
type="text",
text=f"Permission denied for {name}"
)]
# Audit log
start = time.time()
try:
if name == "query_database":
result = await self._safe_query(arguments)
return [TextContent(type="text", text=str(result))]
else:
raise ValueError(f"Unknown tool: {name}")
except Exception as e:
await self._audit_log(
client_id, name, arguments,
status="error", duration=time.time() - start,
error=str(e)
)
raise
finally:
await self._audit_log(
client_id, name, arguments,
status="success", duration=time.time() - start
)
def _check_rate_limit(self, client_id: str) -> bool:
now = time.time()
history = self.rate_limits.setdefault(client_id, [])
# Keep only last 60s
self.rate_limits[client_id] = [t for t in history if t > now - 60]
if len(self.rate_limits[client_id]) >= 100: # 100 req/min
return False
self.rate_limits[client_id].append(now)
return True
async def _safe_query(self, arguments: dict):
"""Read-only query with timeout."""
sql = arguments["sql"]
if not sql.strip().lower().startswith("select"):
raise ValueError("Only SELECT allowed")
params = arguments.get("params", [])
# Use parameterized query, timeout 5s
async with asyncio.timeout(5):
return await db.fetch(sql, *params)
async def main():
server = ProductionMCPServer()
async with stdio_server() as (read, write):
await server.server.run(read, write)
if __name__ == "__main__":
asyncio.run(main())6.2 MCP client in app
"""
Embed MCP client in custom AI app.
"""
from mcp.client.session import ClientSession
from mcp.client.streamable_http import streamablehttp_client
class MCPToolClient:
def __init__(self, server_url: str, token: str):
self.server_url = server_url
self.token = token
self.session: ClientSession | None = None
async def connect(self):
self.transport_cm = streamablehttp_client(
self.server_url,
headers={"Authorization": f"Bearer {self.token}"}
)
read, write = await self.transport_cm.__aenter__()
self.session = ClientSession(read, write)
await self.session.initialize()
async def list_tools(self):
return (await self.session.list_tools()).tools
async def call_tool(self, name: str, arguments: dict):
result = await self.session.call_tool(name, arguments=arguments)
return result.content
async def close(self):
if self.session:
await self.session.close()
await self.transport_cm.__aexit__(None, None, None)
# In LLM agent loop
async def agent_loop(user_query: str):
mcp = MCPToolClient(
"https://mcp.myapp.com/mcp",
token=get_user_token()
)
await mcp.connect()
tools = await mcp.list_tools()
tool_specs = [
{
"name": t.name,
"description": t.description,
"input_schema": t.inputSchema
}
for t in tools
]
# Pass to LLM
response = await claude.messages.create(
model="claude-3-5-sonnet",
tools=tool_specs,
messages=[{"role": "user", "content": user_query}]
)
if response.stop_reason == "tool_use":
for tool_call in response.content:
if tool_call.type == "tool_use":
result = await mcp.call_tool(
tool_call.name,
arguments=tool_call.input
)
# Feed back to Claude...
await mcp.close()7. System Design Diagrams
7.1 N×M Problem → MCP Pattern
flowchart LR subgraph Before["Before MCP (N×M)"] AppA[App A] --> SvcA1[GitHub] AppA --> SvcB1[Slack] AppA --> SvcC1[Postgres] AppB[App B] --> SvcA2[GitHub] AppB --> SvcB2[Slack] AppB --> SvcC2[Postgres] end subgraph After["After MCP (N+M)"] AppA2[App A] --> MCP1[MCP Client] AppB2[App B] --> MCP2[MCP Client] MCP1 --> SrvA[GitHub MCP Server] MCP1 --> SrvB[Slack MCP Server] MCP1 --> SrvC[Postgres MCP Server] MCP2 --> SrvA MCP2 --> SrvB MCP2 --> SrvC end style Before fill:#ffcdd2 style After fill:#c8e6c9
7.2 MCP Lifecycle
sequenceDiagram participant H as Host App participant C as MCP Client participant S as MCP Server participant L as LLM H->>C: Initialize C->>S: initialize(version, capabilities) S-->>C: serverInfo + capabilities C->>S: notifications/initialized C->>S: tools/list S-->>C: [tool schemas] H->>L: User query + tool schemas L-->>H: Tool call: search_users("alice") H->>C: callTool("search_users", {...}) C->>S: tools/call S->>S: Auth check S->>S: Rate limit S->>S: Execute S-->>C: {content: [...]} C-->>H: result H->>L: Tool result L-->>H: Final response H-->>User: Answer
7.3 Streamable HTTP Transport
sequenceDiagram participant C as Client participant Auth as OAuth Server participant S as MCP Server C->>Auth: Authorization Code Flow + PKCE Auth-->>C: access_token C->>S: POST /mcp<br/>Authorization: Bearer ...<br/>DPoP: ...<br/>{tools/call} S->>S: Verify token + DPoP S->>S: Execute tool Note over S: Long-running tool S-->>C: 200 OK<br/>Content-Type: text/event-stream S-->>C: data: {progress 25%} S-->>C: data: {progress 50%} S-->>C: data: {progress 75%} S-->>C: data: {result}
7.4 Multi-tenant MCP Gateway
flowchart TB Clients[AI Clients] --> Gateway[MCP Gateway<br/>OAuth + Tenant Routing] Gateway --> Tenant1[Tenant 1 Servers] Gateway --> Tenant2[Tenant 2 Servers] Gateway --> Tenant3[Tenant 3 Servers] subgraph Tenant1["Tenant 1"] T1GH[GitHub MCP] T1DB[Postgres MCP] T1FS[Filesystem MCP] end subgraph Tenant2["Tenant 2"] T2GH[GitHub MCP] T2Slack[Slack MCP] end Audit[Audit Log] -.- Gateway
8. Aha Moments & Pitfalls
Aha Moments
#1: MCP solves N×M integration problem cho AI. Cùng pattern như USB-C, ODBC, Web standards. 1 protocol, ai cũng implement được.
#2: Schema = AI-readable. JSON Schema với natural language descriptions, LLMs có thể “đọc” tool definitions. Khác OpenAPI dành cho human developers.
#3: 3 capabilities orthogonal: Tools (actions), Resources (data), Prompts (templates). Server có thể chỉ expose 1, không phải tất cả.
#4: stdio cho local, HTTP cho remote. stdio đơn giản nhất, secure, đủ cho 90% use cases. HTTP cho enterprise/cloud.
#5: OAuth 2.1 + DPoP là standard. Stolen token không reusable. Tham chiếu T14 cho deep dive.
#6: Streamable HTTP > WebSocket. SSE simpler, HTTP-friendly, works through proxies/CDN. Spec deprecated WebSocket transport.
#7: MCP server = backend service. Same skills as REST/gRPC server: auth, rate limit, observability, audit. Just different protocol.
#8: Trust matters. User installs MCP server = giving access to data. Code signing, registry, sandboxing critical.
Pitfalls
Pitfall 1: No auth on remote MCP
Sai: Public HTTP endpoint without auth → anyone can call tools. Đúng: OAuth 2.1 mandatory. DPoP for high-value.
Pitfall 2: Path traversal in filesystem server
Sai:
read_file(path)accepts../../../etc/passwd. Đúng: Validaterealpathagainst allowed prefixes.
Pitfall 3: SQL injection in DB MCP
Sai:
f"SELECT * FROM users WHERE id = {user_input}". Đúng: Parameterized queries always.
Pitfall 4: No rate limit
Sai: LLM in loop calls tool 1000 times/sec → DoS service. Đúng: Per-client, per-tool rate limits.
Pitfall 5: Long-running tools without progress
Sai:
compile_projectruns 60s, no feedback → client times out. Đúng: Send progress notifications via SSE.
Pitfall 6: Tool output too large
Sai: Tool returns 10MB result → LLM context overflow. Đúng: Pagination, summary, return URI to fetch separately.
Pitfall 7: Trust untrusted servers
Sai: User runs random GitHub MCP server → server reads all files. Đúng: Verified registry, code signing, sandbox.
Pitfall 8: Verbose tool descriptions
Sai: 5KB description per tool → LLM context bloat. Đúng: Concise, action-oriented descriptions. Examples in description.
Pitfall 9: No audit log
Sai: No record of what tools called, by whom, when. Đúng: Audit log every tool call. Required for compliance.
Pitfall 10: Tool errors not handled
Sai: Tool throws exception → MCP returns generic error → LLM stuck. Đúng: Structured error response with retry hints.
9. Internal Links
| Topic | Liên hệ |
|---|---|
| Tuan-04-API-Design-REST-gRPC | Foundation; MCP là JSON-RPC variant |
| Tuan-14-AuthN-AuthZ-Security | OAuth 2.1, DPoP, scopes |
| Tuan-09-Rate-Limiter | Per-client rate limit cho MCP |
| Tuan-13-Monitoring-Observability | Monitor MCP servers |
| Tuan-Bonus-LLM-Serving-Infrastructure | LLM consumes MCP tools |
| Tuan-Bonus-Multi-Tenancy-SaaS-Patterns | Multi-tenant MCP gateway |
Tham khảo
Spec & Docs:
- MCP Specification — https://modelcontextprotocol.io/specification/2025-11-25
- MCP Roadmap 2026 — https://modelcontextprotocol.io/development/roadmap
- TypeScript SDK — https://github.com/modelcontextprotocol/typescript-sdk
- Python SDK — https://github.com/modelcontextprotocol/python-sdk
Servers:
- Official server registry — https://github.com/modelcontextprotocol/servers
- mcp.so (community registry) — https://mcp.so/
Engineering blogs:
- Anthropic, Introducing Model Context Protocol — https://www.anthropic.com/news/model-context-protocol
- Cloudflare, Remote MCP Servers — https://blog.cloudflare.com/remote-model-context-protocol-servers-mcp/
- Sourcegraph, Cody + MCP integration — https://sourcegraph.com/blog/
Tools:
- MCP Inspector — https://github.com/modelcontextprotocol/inspector
- Cloudflare Workers MCP — https://developers.cloudflare.com/agents/
Hoàn thành Phase F. Tiếp theo: Phase G — Platform Engineering, FinOps, Progressive Delivery, Edge+Wasm.