Tuần Bonus: FinOps & Cloud Unit Economics

“CTO: Cloud bill tháng này $200K, tăng 40% so với tháng trước. Lý do? Engineer A bật instance dev rồi quên tắt. Engineer B chạy ML training ở instance đắt nhất. Database team scale up 5x vì ngại bị blame outage. Không ai chịu trách nhiệm — vì cost không ai own. FinOps là khung quản trị giải quyết bài toán này.”

Tags: system-design finops cost-optimization cloud-economics bonus Student: Hieu (Backend Dev → Architect) Prerequisite: Tuan-11-Microservices-Pattern · Tuan-Bonus-Platform-Engineering-IDP Liên quan: Tuan-Bonus-Multi-Tenancy-SaaS-Patterns · Tuan-Bonus-Progressive-Delivery


1. Context & Why

Analogy đời thường — Hoá đơn điện chung cư

Hieu, tưởng tượng chung cư 100 căn hộ chỉ có 1 đồng hồ điện chung. Cuối tháng:

  • Hoá đơn $50K
  • Chia đều 100 căn = $500/căn
  • Vấn đề: Căn nhà có hồ bơi xài 10× điện, vẫn trả như căn studio
  • Người tiết kiệm bị penalize, người lãng phí được trợ giá
  • Không ai có động lực tiết kiệm

Giải pháp:

  1. Submeter mỗi căn (visibility)
  2. Bill theo usage (allocation)
  3. Khuyến khích tiết kiệm (optimization)
  4. Forecast trước khi over-budget (planning)

Đây chính là FinOps: 4 phase cho cloud spend — Inform → Optimize → Operate.

Tại sao Backend Dev cần hiểu FinOps?

Lý doHậu quả nếu không
Cloud bill exponentialStartup 200K nếu không control
Cost = competitive advantageLower cost-per-request → win pricing
AI workload phá vỡ FinOps cũ1 LLM call có thể 1, variance 1000x
C-level metric”Cost per ARR dollar” vào board deck
Engineer ownership”You build it, you run it, you pay for it
Karma điểm: $21B saving 2025 (Deloitte)Không adopt = leave money on table

Tại sao Alex Xu không cover?

Alex Xu Vol 1+2 nói về sizing và capacity nhưng không đề cập FinOps framework, cost allocation, unit economics. FinOps là discipline emerge 2020+ với organizations cloud-native.

Tham chiếu chính


2. Deep Dive — Khái niệm cốt lõi

2.1 The 3 FinOps Phases

FinOps Foundation defines lifecycle:

┌────────────────────────────────────────────────────┐
│                                                      │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│   │  INFORM  │───►│ OPTIMIZE │───►│  OPERATE │     │
│   │          │    │          │    │          │     │
│   │ Visibility│   │  Action  │    │ Continuous│    │
│   │ Allocation│   │  Savings │    │ Iteration │    │
│   │ Forecast  │   │          │    │           │    │
│   └──────────┘    └──────────┘    └──────────┘     │
│         ▲                                  │        │
│         └──────────────────────────────────┘        │
│                                                      │
└────────────────────────────────────────────────────┘

2.1.1 Inform Phase

Goal: Everyone knows where money goes.

  • Tagging: Every resource tagged (team, env, service)
  • Allocation: Map cost to teams/services
  • Reporting: Dashboards by dimension
  • Forecasting: Predict next month’s bill

2.1.2 Optimize Phase

Goal: Reduce waste, improve efficiency.

  • Rightsize: Match resources to actual needs
  • Reservations: Commit-based discounts (RIs, Savings Plans)
  • Spot/preemptible: Use cheap capacity
  • Architecture: Refactor for cost (e.g., serverless, S3 tiers)

2.1.3 Operate Phase

Goal: Continuous discipline.

  • Anomaly detection: Alert on unusual spend
  • Showback/Chargeback: Bill teams for their usage
  • Budgets: Enforce limits
  • Culture: FinOps champions in every team

2.2 Cost Allocation Strategies

2.2.1 Tagging-based (most common)

# AWS resource tags
aws ec2 run-instances \
  --tag-specifications "ResourceType=instance,Tags=[
    {Key=Team,Value=payments},
    {Key=Service,Value=payment-api},
    {Key=Environment,Value=production},
    {Key=CostCenter,Value=eng-001}
  ]"

AWS Cost Allocation Tags (or GCP Labels, Azure Tags):

  • Mark resources with team/service
  • AWS Cost Explorer slices by tag
  • Untagged resources = “shared” or “wasted”

Challenge: Enforce tagging.

  • AWS Service Control Policy: Reject create without tags
  • Kyverno/OPA: K8s admission control
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-cost-tags
spec:
  validationFailureAction: enforce
  rules:
    - name: check-cost-tags
      match:
        resources:
          kinds: [Deployment, StatefulSet, Job]
      validate:
        message: "Must have team, service, env labels"
        pattern:
          metadata:
            labels:
              team: "?*"
              service: "?*"
              env: "?*"

2.2.2 Account/Subscription-based

Pattern: 1 AWS account per team or environment.

my-org/
├── shared-services-prod/    # Shared infra (Route53, IAM)
├── team-payments-prod/      # Payments team prod
├── team-payments-staging/
├── team-checkout-prod/
└── team-fraud-prod/

Pros: Hard isolation, easy attribution Cons: Operational overhead (1000 accounts)

2.2.3 Kubernetes-based (Kubecost / OpenCost)

Kubecost / OpenCost (CNCF): Allocate cost to K8s namespaces, deployments, pods.

Namespace tenant-a uses:
  - 2 CPU avg, 4 GB memory avg
  - 100 GB persistent storage
  - 50 GB network egress

Cost (last 30 days):
  Compute: $50
  Storage: $10
  Network: $5
  Total: $65/month

Integration: Kubecost reads cloud billing API + K8s metrics → daily allocation.

2.3 Cost Per Request — Unit Economics

Concept: Cost per business unit (request, user, transaction).

Formula:

Examples:

BusinessUnitTarget
API SaaSCost / 1M API requests< $5
ML inferenceCost / 1K inferences$0.10-1
Storage SaaSCost / GB stored / month< $0.10
E-commerceCost / order processed< $0.50
AnalyticsCost / event ingested< $0.001

Why important: Cost grows linear with usage → need to drive unit cost down.

Year 1: $1M cost / 100M requests = $10/M requests
Year 2: $5M cost / 1B requests = $5/M requests (50% improvement!)
Year 3: $20M cost / 10B requests = $2/M requests

Compounding effect: 30% unit cost reduction × 10x growth = 7x cost growth (vs 10x).

2.4 Compute Optimization

2.4.1 Rightsize EC2/VMs

Pattern: Many instances over-provisioned 50-70%.

Tools:

  • AWS Compute Optimizer (free)
  • GCP Recommender
  • Azure Advisor

Process:

  1. Review past 14 days CPU/memory utilization
  2. Identify instances with < 40% peak usage
  3. Downsize to next tier
  4. Monitor 1 week
  5. Repeat

Savings: Typically 20-40% on over-provisioned compute.

2.4.2 Spot Instances / Preemptible VMs

Spot: Up to 90% discount, but can be reclaimed in 2 minutes.

Best for:

  • Batch jobs (ML training, data processing)
  • Stateless workers (with retry)
  • Cost-tolerant workloads (analytics)

Not for:

  • Stateful services (databases)
  • Latency-sensitive APIs
  • Single-instance critical services

Pattern: Mix spot + on-demand:

# K8s with Karpenter
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-nodepool
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: [spot]
      taints:
        - key: spot
          value: "true"
          effect: NoSchedule
 
# Workloads tolerate spot
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-trainer
spec:
  template:
    spec:
      tolerations:
        - key: spot
          operator: Equal
          value: "true"

2.4.3 Reserved Instances / Savings Plans

TypeCommitDiscountFlexibility
EC2 RI 3-year3 years specific instance60-72%Low
EC2 RI 1-year1 year specific instance30-50%Low
Compute Savings Plan 1-year$/h commit30-50%High (any region/family)
Compute Savings Plan 3-year$/h commit 3yr50-65%High
EC2 Instance SP$/h commit specific family50-60%Medium

Strategy:

  • Cover steady-state with Savings Plans (e.g., 70% baseline)
  • Burst capacity on-demand
  • 3-year if confident, 1-year if growing rapidly

Tools: ProsperOps, Cloudability, Spot.io auto-manage commits.

2.4.4 Karpenter (Kubernetes)

Karpenter (AWS, OSS): Smart node provisioning. Replaces Cluster Autoscaler.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: [m6i, m6a, c6i, c6a]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: [large, xlarge, 2xlarge]
        - key: karpenter.sh/capacity-type
          operator: In
          values: [spot, on-demand]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Magic:

  • Fast scaling (60s to add capacity)
  • Bin-packs pods → fewer nodes
  • Auto-consolidates underutilized nodes
  • Mixes spot + on-demand transparently

2.5 Storage Optimization

2.5.1 S3 Storage Classes

ClassCost/GB/monthUse case
Standard$0.023Active access
Intelligent-Tiering0.0025 monitoringAuto-optimize
Standard-IA$0.0125Infrequent access
One Zone-IA$0.01Recreatable, infrequent
Glacier Instant$0.004Archive, instant retrieve
Glacier Flexible$0.0036Archive, hours retrieve
Glacier Deep Archive$0.000997-10 year retention

Lifecycle policy (auto-migration):

{
  "Rules": [{
    "Status": "Enabled",
    "Filter": { "Prefix": "logs/" },
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 }  // 7 years
  }]
}

2.5.2 EBS Optimization

  • gp3 over gp2: Same perf, 20% cheaper
  • Snapshot lifecycle: Auto-delete old snapshots
  • Detached volumes: Find and delete unattached EBS
# Find unattached EBS
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

2.5.3 Database Cost

  • Right-size: Many DBs over-provisioned
  • Reserved: RDS RI 1-3 year
  • Serverless: Aurora Serverless v2 cho variable workload
  • Read replicas: Only when needed
  • Snapshot frequency: Match RPO

2.6 Network Optimization

Network costs hidden killer of cloud bills.

2.6.1 Cross-AZ traffic

AWS: $0.01/GB cross-AZ. Sounds small but adds up.

Microservices:
  Service A in us-east-1a → Service B in us-east-1b
  100 GB/day cross-AZ = $30/month per pair
  
With 50 pairs: $1,500/month JUST for cross-AZ

Mitigations:

  • Topology-aware routing: K8s tries to route to same AZ pod
  • Local zones: Pin services to AZ
  • Cache layer: Reduce cross-AZ DB calls

2.6.2 Internet egress

Most expensive: $0.05-0.09/GB out to internet.

1 TB egress/day = $50-90/day = $1,500-2,700/month

Mitigations:

  • CloudFront/CDN: $0.085/GB → customer pays via CDN, often cheaper
  • Compression: Reduce payload size
  • Egress-free providers: Cloudflare R2, Backblaze B2 (zero egress fee)

2.6.3 NAT Gateway

Hidden cost: 0.045/GB processed.

1 NAT Gateway running 24/7: $32/month idle
+ traffic: $45/TB processed

Mitigations:

  • VPC Endpoints: For AWS services (S3, ECR, etc.)
  • NAT Instance: Cheaper for low traffic (but less HA)

2.7 AI/ML Cost Specifics

LLM workload phá vỡ FinOps cũ:

  • Token-based pricing: /1M output
  • Caching dramatic impact: 90% cost reduction with prompt cache
  • Provider switch: Sonnet vs Haiku 10x cost difference
  • Self-host trade-off: GPU $4-8/hour fixed

Key metrics:

  • Cost per chat session: $0.10-1.00
  • Cost per RAG query: $0.05-0.50
  • Cost per training run: $1K-1M
  • GPU utilization: Should be > 70%

Optimization patterns:

  1. Model routing: Cheap model for simple, expensive for complex
  2. Prompt caching (Anthropic): Re-use prompt prefix
  3. Batching: Multiple requests in 1 call
  4. Quantization: INT8/INT4 for self-host
  5. Fine-tune small model: Specialized > general LLM

Tham chiếu: Tuan-Bonus-LLM-Serving-Infrastructure cho self-host economics.

2.8 Anomaly Detection & Alerting

Cost spike = silent killer. Detect before bill shock.

# AWS Cost Anomaly Detection
{
  "AnomalySubscription": {
    "SubscriptionName": "team-alerts",
    "Threshold": 100,
    "Frequency": "DAILY",
    "MonitorArnList": ["arn:aws:..."],
    "Subscribers": [{
      "Type": "EMAIL",
      "Address": "[email protected]"
    }]
  }
}

Custom alerts (Prometheus):

- alert: DailyCostSpike
  expr: |
    (
      sum by (team) (kubecost_daily_cost) -
      avg_over_time(kubecost_daily_cost[7d])
    ) / avg_over_time(kubecost_daily_cost[7d]) > 0.5
  for: 1h
  annotations:
    summary: "Team {{ $labels.team }} cost +50% vs 7-day avg"

2.9 Showback vs Chargeback

Showback: Show teams their cost (informational) Chargeback: Actually bill teams (P&L impact)

ShowbackChargeback
Visibility
AccountabilityLowHigh
EffortMediumHigh (need accounting)
AdoptionEasyResistance
Best forPre-FinOps maturityMature FinOps culture

Pattern: Start with showback (months 1-12), evolve to chargeback (year 2+).

2.10 The Pillars (FinOps Foundation)

┌──────────────────────────────────────────┐
│              FinOps Pillars                │
├──────────────────────────────────────────┤
│  1. Visibility & Allocation                │
│  2. Optimization                           │
│  3. Forecasting & Budgeting                │
│  4. Anomaly Management                     │
│  5. Rate Optimization (commits)            │
│  6. Workload Optimization (rightsize)      │
│  7. FinOps Automation                      │
│  8. FinOps Education & Culture             │
└──────────────────────────────────────────┘

3. Estimation

3.1 Cost breakdown typical

Average AWS spend distribution (Vantage 2024):

  • Compute (EC2): 40-50%
  • Database (RDS): 15-20%
  • Storage (S3, EBS): 10-15%
  • Network (egress, NAT): 10-20% (often under-counted!)
  • Other (DDB, Lambda, etc.): 5-10%

3.2 Optimization potential

OptimizationTypical savings
Rightsize EC220-40% on compute
Reserved + Savings Plans30-50% on covered
Spot for batch60-90% on batch
S3 Intelligent-Tiering20-40% on storage
Karpenter consolidation15-30% on K8s compute
CDN for egress50-80% on internet egress
AI prompt caching90% on LLM API

Combined: Mature FinOps program saves 30-50% of total cloud bill in year 1.

3.3 ROI of FinOps

Investment:

  • 1-2 FinOps engineers: $300-600K/year
  • Tools (Kubecost, ProsperOps, etc.): $50-200K/year
  • Total: $400-800K/year

Return (org spending $10M/year cloud):

  • 30% saving = $3M/year
  • ROI: 4-7x

Break-even: ~$3M cloud spend justifies dedicated FinOps team.


4. Security First — Cost Security

Bitcoin mining: Compromised credentials → spin up GPU instances.

2024 incident: Startup credentials leaked → attacker spawned 200x p4d.24xlarge → $200K/day

Mitigations:

  • AWS GuardDuty (detect anomalies)
  • Service Control Policies (limit instance types)
  • Spending alerts (1K, $10K thresholds)
  • MFA for billing access

4.2 Token-based attacks (LLM)

Token explosion: Compromised LLM API key → attacker drains budget.

1M tokens @ $0.10/1K = $100
1B tokens @ $0.10/1K = $100,000

Attacker can drain budget in hours.

Mitigations:

  • API key per service (limited blast radius)
  • Per-key rate limits + spending caps
  • Anomaly detection on token usage
  • Rotate keys regularly

4.3 Data egress as exfiltration

Attacker downloads sensitive data → high egress bill + breach.

Mitigations:

  • Egress monitoring (CloudWatch)
  • VPC flow logs
  • DLP tools

5. DevOps — FinOps in Practice

5.1 Tagging governance

# Terraform module enforces tagging
locals {
  required_tags = {
    Team        = var.team
    Service     = var.service
    Environment = var.environment
    CostCenter  = var.cost_center
    ManagedBy   = "terraform"
    Repository  = var.repo_url
  }
}
 
resource "aws_instance" "web" {
  ami           = var.ami
  instance_type = var.instance_type
  tags          = local.required_tags
}

SCP (Service Control Policy):

{
  "Effect": "Deny",
  "Action": "ec2:RunInstances",
  "Resource": "arn:aws:ec2:*:*:instance/*",
  "Condition": {
    "Null": {
      "aws:RequestTag/Team": "true"
    }
  }
}

5.2 Kubecost setup

# helm install
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set kubecostToken="..." \
  --set prometheus.server.persistentVolume.size=128Gi

Access: kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

Key dashboards:

  • Cost by namespace (team)
  • Cost by deployment
  • Cost over time
  • Optimization recommendations

5.3 Cost dashboard in Backstage

# catalog-info.yaml — add cost annotation
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  annotations:
    kubecost.io/namespace: payments
    aws.amazon.com/account-id: "123456789012"
    aws.amazon.com/cost-center: eng-payments
// Backstage plugin pulls cost from Kubecost API
const cost = await fetch(
  `${kubecostUrl}/model/aggregatedCostModel?aggregate=namespace&filter=namespace:payments`
).then(r => r.json());

5.4 Slack daily reports

# Daily cost report bot
import boto3
import slack_sdk
 
def daily_cost_report():
    ce = boto3.client("ce")
    end = datetime.now().strftime("%Y-%m-%d")
    start = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
 
    response = ce.get_cost_and_usage(
        TimePeriod={"Start": start, "End": end},
        Granularity="DAILY",
        Metrics=["UnblendedCost"],
        GroupBy=[{"Type": "TAG", "Key": "Team"}]
    )
 
    report = "📊 *Daily Cost Report*\n"
    for group in response["ResultsByTime"][0]["Groups"]:
        team = group["Keys"][0].replace("Team$", "")
        cost = float(group["Metrics"]["UnblendedCost"]["Amount"])
        report += f"  {team}: ${cost:.2f}\n"
 
    slack = slack_sdk.WebClient(token=os.environ["SLACK_TOKEN"])
    slack.chat_postMessage(channel="#finops", text=report)

5.5 Budget enforcement

# AWS Budget with action
aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "monthly-team-payments",
    "BudgetLimit": { "Amount": "10000", "Unit": "USD" },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST",
    "CostFilters": { "TagKeyValue": ["user:Team$payments"] }
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "ComparisonOperator": "GREATER_THAN",
        "NotificationType": "ACTUAL",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [{ "SubscriptionType": "SNS", "Address": "arn:aws:sns:..." }]
    }
  ]'

6. Code Implementation

6.1 Cost allocation script

"""
Allocate AWS costs to teams based on tags.
"""
 
import boto3
from datetime import datetime, timedelta
 
 
def allocate_monthly_cost():
    ce = boto3.client("ce")
 
    # Last 30 days
    end = datetime.now().strftime("%Y-%m-%d")
    start = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")
 
    # Group by team tag
    response = ce.get_cost_and_usage(
        TimePeriod={"Start": start, "End": end},
        Granularity="MONTHLY",
        Metrics=["UnblendedCost"],
        GroupBy=[
            {"Type": "TAG", "Key": "Team"},
            {"Type": "DIMENSION", "Key": "SERVICE"},
        ]
    )
 
    allocations = {}
    for time_period in response["ResultsByTime"]:
        for group in time_period["Groups"]:
            team = group["Keys"][0].replace("Team$", "") or "untagged"
            service = group["Keys"][1]
            cost = float(group["Metrics"]["UnblendedCost"]["Amount"])
 
            allocations.setdefault(team, {})[service] = cost
 
    # Identify untagged costs
    untagged = allocations.get("untagged", {})
    total_untagged = sum(untagged.values())
    if total_untagged > 1000:
        print(f"⚠️  ${total_untagged:.2f} in untagged resources!")
 
    return allocations
 
 
def calculate_unit_cost(allocations: dict, business_units: dict):
    """Calculate cost per business unit per team."""
    unit_costs = {}
    for team, services in allocations.items():
        team_cost = sum(services.values())
        team_units = business_units.get(team, 1)
        unit_costs[team] = {
            "total_cost": team_cost,
            "business_units": team_units,
            "unit_cost": team_cost / team_units,
        }
    return unit_costs

6.2 Karpenter config for cost optimization

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: [c, m, r]
        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["2", "4", "8", "16"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]  # only newer generations
        - key: kubernetes.io/arch
          operator: In
          values: [amd64, arm64]  # Graviton 20% cheaper
        - key: karpenter.sh/capacity-type
          operator: In
          values: [spot, on-demand]
      nodeClassRef:
        name: default
 
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
    expireAfter: 720h  # rotate every 30 days
 
  limits:
    cpu: "1000"
    memory: 1000Gi

6.3 LLM cost router

"""
Route LLM requests to cheapest model that satisfies quality.
"""
 
class CostAwareLLMRouter:
    def __init__(self):
        self.models = [
            {"name": "claude-haiku", "cost_per_mtok": 0.25, "quality": 0.85},
            {"name": "gpt-4o-mini", "cost_per_mtok": 0.15, "quality": 0.82},
            {"name": "claude-sonnet", "cost_per_mtok": 3.00, "quality": 0.95},
            {"name": "gpt-4o", "cost_per_mtok": 5.00, "quality": 0.94},
        ]
 
    def route(self, prompt: str, min_quality: float = 0.85) -> str:
        """Pick cheapest model meeting quality threshold."""
        complexity = self._assess_complexity(prompt)
 
        # Adjust min quality based on complexity
        required_quality = max(min_quality, complexity * 0.95)
 
        eligible = [m for m in self.models if m["quality"] >= required_quality]
        cheapest = min(eligible, key=lambda m: m["cost_per_mtok"])
 
        return cheapest["name"]
 
    def _assess_complexity(self, prompt: str) -> float:
        """0.0 (simple) to 1.0 (complex)."""
        # Simple heuristics; could use small classifier
        if len(prompt) < 100:
            return 0.3
        if "code" in prompt.lower() or "analyze" in prompt.lower():
            return 0.9
        if "?" in prompt and len(prompt) < 500:
            return 0.5
        return 0.7

7. System Design Diagrams

7.1 FinOps Lifecycle

flowchart LR
    Inform[Inform<br/>Visibility,<br/>Allocation,<br/>Forecast]
    Optimize[Optimize<br/>Rightsize,<br/>Reservations,<br/>Architecture]
    Operate[Operate<br/>Anomaly det,<br/>Showback,<br/>Culture]

    Inform --> Optimize --> Operate
    Operate --> Inform

    style Inform fill:#bbdefb
    style Optimize fill:#c8e6c9
    style Operate fill:#fff9c4

7.2 Cost Allocation Architecture

flowchart TB
    Cloud[AWS / GCP / Azure<br/>Billing API]
    K8s[Kubernetes Cluster<br/>Prometheus metrics]

    Cloud --> Allocator[Cost Allocator<br/>Kubecost / OpenCost]
    K8s --> Allocator

    Tags[(Resource Tags<br/>Team, Service, Env)] --> Allocator

    Allocator --> Dashboard[Dashboard<br/>Grafana / Backstage]
    Allocator --> Slack[Slack Bot<br/>Daily reports]
    Allocator --> Alerts[Anomaly Alerts]
    Allocator --> DB[(Cost DB<br/>BigQuery / Postgres)]

    DB --> Analysis[Analysts<br/>Quarterly reviews]
    DB --> ML[ML Forecasting]

7.3 Spot + On-Demand Strategy

flowchart LR
    Workload[Workload]

    Workload --> Sched{Workload Type}
    Sched -->|Stateful, latency-sensitive| OD[On-Demand<br/>Higher cost,<br/>guaranteed]
    Sched -->|Batch, retry-able| Spot[Spot Instances<br/>60-90% off]
    Sched -->|Steady state| RI[Reserved /<br/>Savings Plan<br/>30-65% off]

    OD --> Pool[Compute Pool]
    Spot --> Pool
    RI --> Pool

    Note[Mix:<br/>30% on-demand baseline<br/>50% reserved/SP<br/>20% spot for burst]

    style Note fill:#fff9c4

8. Aha Moments & Pitfalls

Aha Moments

#1: Cloud cost không phải IT cost, là engineering cost. Engineers control 80% of bill (instance choice, query patterns, architecture). FinOps = engineer ownership.

#2: Tag everything from Day 1. Untagged resources = “shared” = no accountability. Enforcement via SCP / Kyverno mandatory.

#3: Network costs hidden killer. Cross-AZ, NAT Gateway, internet egress often 20% of bill. Monitor carefully.

#4: Spot saves 60-90% but needs architecture. Stateless, retry-able, decoupled. Worth the design effort for batch workloads.

#5: Unit economics > absolute cost. 100M. 50K. Track $/unit, drive down over time.

#6: AI workload is different beast. Token-based, variance 1000x, prompt cache 90% reduction. New playbook needed.

#7: Showback before chargeback. Mature culture first, then bill teams. Premature chargeback = political war.

#8: Compounding effect. 30% reduction × 3 years = 65% cumulative. Small wins compound.

Pitfalls

Pitfall 1: Cost optimization without visibility

Try to optimize before understanding spend → flying blind. Fix: Inform first (tagging, allocation), optimize second.

Pitfall 2: Over-commit on Reserved/Savings Plans

Buy 3-year RI, then growth slows → waste. Fix: Cover 60-70% baseline, leave 30% flexibility.

Pitfall 3: Forget about idle resources

Dev environments running 24/7, untagged. Fix: Auto-shutdown nightly. Lambda script schedules.

Pitfall 4: No anomaly detection

Cost spike Friday, discover Monday → bill shock. Fix: Daily alerts, $1K threshold for new spend.

Pitfall 5: Optimize the wrong thing

Save 5K on engineer time. Fix: Pareto principle — focus on top 20% costs.

Pitfall 6: FinOps as accounting

FinOps team = bookkeepers, no engineering input. Fix: Cross-functional. FinOps engineer + finance + platform.

Pitfall 7: Tools without process

Buy Kubecost license, never check. Fix: Weekly FinOps review meeting, action items.

Pitfall 8: Ignore network egress

Optimize compute, ignore $50K/month egress. Fix: Audit egress carefully. CDN can save 50%+.

Pitfall 9: One-time exercise

“We did FinOps last year”. Cost creeps back. Fix: Continuous discipline. Quarterly reviews.

Pitfall 10: AI cost without controls

Engineers free to use any LLM. $50K/month surprise. Fix: Per-team token budgets, model routing rules.


TopicLiên hệ
Tuan-11-Microservices-PatternCost per microservice; tagging
Tuan-Bonus-Multi-Tenancy-SaaS-PatternsPer-tenant cost allocation
Tuan-Bonus-Platform-Engineering-IDPCost dashboards in IDP
Tuan-Bonus-LLM-Serving-InfrastructureLLM cost specifics
Tuan-Bonus-Multi-Region-Active-Active-DSQLMulti-region cost trade-offs
Tuan-13-Monitoring-ObservabilityCost as observability metric

Tham khảo

Frameworks:

Tools:

Reports:

Books:

  • Cloud FinOps (J.R. Storment, Mike Fuller, 2nd ed 2024)
  • Cloud Native Patterns — chapter on cost

Tiếp theo: Tuan-Bonus-Progressive-Delivery — Deploy strategy với canary, feature flags, automated rollback.