Tuan 15: Data Security & Encryption
“Encryption without key management is like buying a safe and taping the combination to the door.”
Tags: system-design security encryption devops compliance Student: Hieu Prerequisite: Tuan-14-AuthN-AuthZ-Security Lien quan: Tuan-02-Back-of-the-envelope · Tuan-07-Database-Sharding-Replication · Tuan-12-CICD-Pipeline · Tuan-13-Monitoring-Observability
1. Context & Why
Analogy doi thuong: Ket sat ngan hang — nhieu lop bao ve
Hieu, tuong tuong em di gui vang o ngan hang. Ngan hang khong chi co mot o khoa:
- Cua chinh ngan hang co bao ve 24/7 ⇒ Network security (TLS/firewall)
- Phong ket sat can the tu nhan vien + ma PIN ⇒ Authentication & Authorization (da hoc o Tuan-14-AuthN-AuthZ-Security)
- Moi ngan ket co khoa rieng, chi chu so huu giu chia khoa ⇒ Encryption at rest (du lieu luu tren disk)
- Khi van chuyen vang, co xe boc thep + GPS tracking ⇒ Encryption in transit (du lieu di tren network)
- Chia khoa ket khong de trong ket ma cat rieng o kho bao mat khac ⇒ Key Management Service (KMS)
- So sach giao dich ghi chi tiet ai mo ket, luc nao, lay gi ⇒ Audit logging (cho compliance)
- Dinh ky doi ma ket ⇒ Key rotation
- Vang, tien mat, giay to duoc phan loai va bao ve khac nhau ⇒ Data classification
Bai hoc cot loi: Bao mat du lieu khong phai mot lop duy nhat ma la defense in depth — nhieu lop chong len nhau. Mat mot lop, cac lop khac van bao ve.
Tai sao Alex Xu nhan manh Data Security?
Trong moi System Design Interview, khi interviewer hoi “how do you handle sensitive data?”, ho muon nghe:
“User PII duoc encrypt at rest bang AES-256-GCM voi envelope encryption. Key duoc quan ly boi AWS KMS voi automatic rotation moi 365 ngay. Data in transit qua TLS 1.3. PII fields trong database dung column-level encryption. Audit log ghi moi access vao sensitive data, luu 7 nam cho compliance. GDPR right to erasure duoc implement bang crypto-shredding — xoa encryption key thay vi xoa tung record.”
Do la su khac biet giua mot engineer binh thuong va mot Security-aware Architect.
2. Deep Dive — Cac khai niem cot loi
2.1 Encryption at Rest vs Encryption in Transit
| Khia canh | At Rest (Du lieu luu tru) | In Transit (Du lieu truyen tai) |
|---|---|---|
| Muc dich | Bao ve data tren disk/storage | Bao ve data khi di qua network |
| Ky thuat chinh | AES-256, TDE, column-level encryption | TLS 1.2/1.3, mTLS |
| Chong lai | Physical theft, unauthorized disk access | Man-in-the-middle, eavesdropping |
| Vi du | Database files, S3 objects, backups | API calls, database connections, inter-service communication |
| Ai quan ly key | KMS (AWS KMS, Vault) | Certificate Authority (CA), cert-manager |
| Performance impact | Thap (hardware-accelerated AES-NI) | Thap voi TLS 1.3 (1-RTT handshake) |
Quy tac vang: Encrypt ca hai. Khong bao gio chi encrypt mot phia. Data at rest khong encrypt = mot vu data breach la lo het. Data in transit khong encrypt = moi request co the bi sniff.
2.2 Symmetric vs Asymmetric Encryption
Symmetric Encryption (Ma hoa doi xung)
Mot key duy nhat dung cho ca encrypt va decrypt.
| Thuat toan | Key Size | Block Size | Toc do | Use Case |
|---|---|---|---|---|
| AES-128 | 128 bit | 128 bit | Rat nhanh | General purpose |
| AES-256 | 256 bit | 128 bit | Nhanh | Sensitive data, government, compliance |
| ChaCha20 | 256 bit | Stream | Nhanh (khong can AES-NI) | Mobile, IoT (thieu hardware AES) |
AES Modes quan trong:
| Mode | Tinh chat | Dung khi |
|---|---|---|
| ECB | Khong an toan — cung plaintext → cung ciphertext | KHONG BAO GIO dung |
| CBC | Can IV, sequential processing | Legacy systems |
| CTR | Parallelizable, can unique nonce | High-throughput encryption |
| GCM | Authenticated encryption (confidentiality + integrity) | Khuyến nghị mac dinh — TLS, API payload |
Aha Moment: AES-GCM la “gold standard” vi no vua encrypt vua dam bao integrity (authentication tag). Neu attacker sua ciphertext, GCM phat hien ngay khi decrypt. CBC khong co tinh nang nay — can them HMAC rieng.
Asymmetric Encryption (Ma hoa bat doi xung)
Hai key: public key (encrypt / verify) va private key (decrypt / sign).
| Thuat toan | Key Size tuong duong AES-128 | Toc do | Use Case |
|---|---|---|---|
| RSA-2048 | 2048 bit | Cham (1000x cham hon AES) | Key exchange, digital signature |
| RSA-4096 | 4096 bit | Rat cham | High-security scenarios |
| ECDSA (P-256) | 256 bit | Nhanh hon RSA nhieu | TLS certificates, JWT signing |
| Ed25519 | 256 bit | Rat nhanh | SSH keys, modern signatures |
Tai sao khong dung asymmetric cho moi thu? Vi no cham 100-1000x so voi symmetric. Trong thuc te, asymmetric chi dung de trao doi symmetric key (key exchange), sau do dung symmetric key cho bulk encryption. Day chinh la cach TLS hoat dong.
2.3 TLS In Detail
TLS (Transport Layer Security) la nen tang cua moi giao tiep an toan tren internet.
TLS 1.3 Handshake (1-RTT)
Client Server
| |
|--- ClientHello (supported ciphers, key share) -->|
| |
|<-- ServerHello (chosen cipher, key share, |
| EncryptedExtensions, Certificate, |
| CertificateVerify, Finished) ---------------|
| |
|--- Finished (encrypted) ---------------------->|
| |
|<========= Application Data (encrypted) =======>|
So sanh TLS 1.2 vs TLS 1.3:
| Khia canh | TLS 1.2 | TLS 1.3 |
|---|---|---|
| Handshake | 2-RTT | 1-RTT (0-RTT cho resumption) |
| Cipher suites | Nhieu (co nhieu weak) | Chi 5 cipher suites (tat ca manh) |
| Forward secrecy | Tuy chon | Bat buoc |
| RSA key exchange | Co | Loai bo (chi ECDHE) |
| Toc do | Cham hon | Nhanh hon ~40% handshake |
Forward Secrecy (Bi mat chuyen tiep): Ngay ca neu private key bi lo trong tuong lai, toan bo traffic truoc do van an toan. Vi moi session dung ephemeral key rieng (ECDHE).
mTLS (Mutual TLS)
Trong microservices, khong chi client verify server ma server cung verify client:
Service A Service B
| |
|--- ClientHello + Client Certificate -------->|
|<-- ServerHello + Server Certificate ---------|
|--- (Both verify each other's certificate) ---|
|<========= Encrypted communication ==========>|
Use case: Service mesh (Istio, Linkerd), zero-trust architecture. Xem Tuan-11-Microservices-Pattern.
2.4 Envelope Encryption
Van de: Encrypt 1TB data voi AES-256 key. Khi can rotate key → phai decrypt va re-encrypt toan bo 1TB? Khong kha thi!
Giai phap: Envelope Encryption (ma hoa phong bi):
- Data Encryption Key (DEK): Key truc tiep encrypt data. Moi object/record co DEK rieng.
- Key Encryption Key (KEK): Key dung de encrypt DEK. Duoc quan ly boi KMS.
┌─────────────────┐
│ KMS (KEK) │
│ Master Key │
└────────┬────────┘
│ encrypt/decrypt DEK
┌────────▼────────┐
│ Encrypted DEK │
│ (stored with │
│ data) │
└────────┬────────┘
│ DEK decrypts data
┌────────▼────────┐
│ Encrypted Data │
│ (on disk/S3) │
└─────────────────┘
Loi ich:
- Key rotation nhanh: Chi can re-encrypt DEK (vai byte) bang KEK moi, khong can re-encrypt data
- Performance: DEK duoc cache trong memory de encrypt/decrypt nhanh
- Isolation: Moi object/tenant co DEK rieng — compromise mot DEK chi anh huong mot phan data
- Scale: KMS chi can handle DEK encrypt/decrypt (nho, nhanh), khong can handle bulk data
Day la cach AWS S3 SSE-KMS, Google Cloud KMS, va Azure Key Vault hoat dong. Moi object trong S3 co DEK rieng, encrypt boi KEK trong KMS.
2.5 Key Management (KMS)
AWS KMS
| Tinh nang | Chi tiet |
|---|---|
| Key types | Symmetric (AES-256), Asymmetric (RSA, ECC) |
| Key storage | FIPS 140-2 Level 3 HSM |
| Key rotation | Tu dong moi 365 ngay (configurable) |
| Access control | IAM policies + Key policies + Grants |
| Audit | Moi API call log trong CloudTrail |
| Pricing | 0.03/10,000 requests |
| Multi-region | Multi-Region Keys (replicate key across regions) |
HashiCorp Vault
| Tinh nang | Chi tiet |
|---|---|
| Secret engines | KV, PKI, Transit, Database, AWS, SSH, … |
| Auth methods | Token, AppRole, Kubernetes, LDAP, OIDC |
| Encryption as a Service | Transit engine — app gui plaintext, Vault tra ciphertext |
| Dynamic secrets | Tao database credentials tu dong, auto-revoke |
| Seal/Unseal | Master key chia thanh Shamir shares (m-of-n) |
| Audit | Moi operation duoc log (tamper-evident) |
Khi nao dung AWS KMS vs Vault?
| Tieu chi | AWS KMS | HashiCorp Vault |
|---|---|---|
| Cloud-native AWS | Tot nhat | Tot |
| Multi-cloud / Hybrid | Khong | Tot nhat |
| Dynamic secrets | Khong co | Co |
| PKI / Certificate management | Han che | Rat manh |
| Encryption as a Service | Co (nhung phai gui data den AWS) | Co (self-hosted, data khong roi datacenter) |
| Operational complexity | Thap (managed) | Cao (phai van hanh cluster) |
| Cost at scale | Co the dat (per-request) | License Vault Enterprise hoac tu host |
2.6 Key Rotation
Tai sao phai rotate key?
- Giam blast radius neu key bi compromise
- Compliance requirement (PCI-DSS yeu cau rotate key it nhat moi 12 thang)
- Giam luong data duoc encrypt boi cung mot key
Key rotation voi envelope encryption:
Truoc rotation:
KEK-v1 encrypt --> DEK-001 (encrypted)
DEK-001 encrypt --> Data Object A
Sau rotation:
KEK-v2 encrypt --> DEK-001 (re-encrypted boi KEK-v2)
DEK-001 van giu nguyen --> Data Object A KHONG can re-encrypt!
Chi phi rotation = so luong DEK x thoi gian re-encrypt DEK (microseconds). Voi 1 trieu DEK, rotation mat vai giay, khong phai vai ngay.
Key versioning: Luon giu lai key cu de decrypt data cu. AWS KMS tu dong giu tat ca phien ban cu.
2.7 Data Classification (Phan loai du lieu)
| Level | Ten goi | Vi du | Bao ve yeu cau |
|---|---|---|---|
| Public | Cong khai | Marketing content, public API docs | Integrity check, khong can encrypt |
| Internal | Noi bo | Internal wiki, employee directory | Encrypt in transit, access control |
| Confidential | Mat | Financial reports, customer emails, business plans | Encrypt at rest + in transit, audit logging, need-to-know access |
| Restricted | Toi mat | PII, PHI, payment card data, encryption keys | Encrypt at rest + in transit, field-level encryption, strict access control, audit moi access, key management, compliance (GDPR/PCI-DSS/HIPAA) |
Quy tac: Classify truoc, encrypt sau. Khong phan loai ⇒ khong biet can bao ve den dau ⇒ hoac encrypt thieu (nguy hiem) hoac encrypt thua (ton tien va performance).
2.8 PII Handling (Xu ly thong tin ca nhan)
PII (Personally Identifiable Information) gom:
| Loai | Vi du | Muc do nhay cam |
|---|---|---|
| Direct identifiers | Ho ten, CMND/CCCD, email, SDT, dia chi | Cao |
| Indirect identifiers | Ngay sinh, gioi tinh, zip code, IP address | Trung binh (ket hop 3+ co the identify ca nhan) |
| Sensitive PII | So the tin dung, medical records, criminal history, biometric data | Rat cao |
Cac ky thuat bao ve PII:
- Encryption (field-level): Encrypt chi cac truong PII trong database
- Tokenization: Thay the PII bang token ngau nhien, luu mapping trong vault rieng
- Data masking: Hien thi
***@email.comthay vi dia chi email day du - Pseudonymization: Thay PII bang pseudonym, van co the reverse voi key rieng
- Anonymization: Xoa hoan toan kha nang identify — khong the reverse (GDPR safe)
- Minimization: Chi thu thap PII that su can thiet
2.9 Data Masking & Tokenization
Data Masking
Original: Nguyen Van Hieu | 0912345678 | [email protected]
Static Mask: Nguyen V** H*** | 091****678 | h***@company.com
Dynamic Mask (role=admin): Nguyen Van Hieu | 0912345678 | [email protected]
Dynamic Mask (role=support): Nguyen V** H*** | 091****678 | h***@company.com
Cac loai masking:
- Static masking: Du lieu bi mask vinh vien (dung cho test/dev environment)
- Dynamic masking: Mask at query time dua tren role nguoi dung (production)
- On-the-fly masking: Mask khi export data ra ngoai he thong
Tokenization
Credit Card: 4532-1234-5678-9012
|
v (tokenize)
Token: tok_8f3a2b1c9d4e
|
v (stored in Token Vault - rieng biet, bao mat cao)
Mapping: tok_8f3a2b1c9d4e --> 4532-1234-5678-9012
So sanh Encryption vs Tokenization:
| Khia canh | Encryption | Tokenization |
|---|---|---|
| Format preservation | Khong (ciphertext dai hon plaintext) | Co (token co the cung format) |
| Quan he toan hoc | Ciphertext co quan he toan hoc voi plaintext | Khong co quan he — hoan toan random |
| PCI-DSS scope | He thong van trong scope | Co the giam scope (chi Token Vault trong scope) |
| Performance | Nhanh (AES-NI) | Lookup table (can database call) |
PCI-DSS tip: Tokenization duoc uu tien cho credit card data vi no giam PCI scope. Chi Token Vault can PCI compliant, cac service khac chi thay token.
2.10 GDPR Basics (Tong quan GDPR)
GDPR (General Data Protection Regulation) — Luat bao ve du lieu cua EU, anh huong moi cong ty xu ly du lieu nguoi dung EU.
| Quyen | Ten tieng Viet | Y nghia ky thuat |
|---|---|---|
| Right to Access | Quyen truy cap | User co the yeu cau xuat tat ca data cua ho |
| Right to Erasure (Right to be Forgotten) | Quyen xoa | User yeu cau xoa → phai xoa moi noi (including backups!) |
| Right to Portability | Quyen chuyen doi | Export data user ra format chuan (JSON, CSV) |
| Right to Rectification | Quyen chinh sua | User co the yeu cau sua data sai |
| Consent | Su dong y | Phai co bang chung user dong y truoc khi thu thap data |
| Data Minimization | Toi thieu hoa | Chi thu thap data thuc su can thiet |
| Breach Notification | Thong bao lo lot | Phai thong bao trong 72 gio sau khi phat hien breach |
Crypto-shredding — Giai phap cho Right to Erasure
Van de: User yeu cau xoa data, nhung data nam trong backups, replicas, analytics pipeline, Kafka topics… Xoa het la bat kha thi!
Giai phap: Crypto-shredding:
- Moi user co DEK rieng (per-user encryption key)
- Tat ca PII cua user encrypt bang DEK nay
- Khi user yeu cau xoa → xoa DEK cua user
- Data van ton tai nhung khong the decrypt = effectively deleted
User A's data:
DEK-A --> encrypt --> [encrypted PII in DB, backups, logs...]
User A requests erasure:
DELETE DEK-A from KMS
--> All User A's data = unreadable garbage
--> GDPR compliant!
Aha Moment: Crypto-shredding la ly do per-entity encryption key cuc ky quan trong. Neu dung chung mot key cho moi user, khong the xoa data cua mot user ma khong anh huong nguoi khac.
2.11 PCI-DSS Overview
PCI-DSS (Payment Card Industry Data Security Standard) — bat buoc cho moi he thong xu ly the thanh toan.
| Requirement | Mo ta | Ky thuat |
|---|---|---|
| Req 3 | Protect stored cardholder data | Encryption at rest (AES-256), tokenization, masking |
| Req 4 | Encrypt transmission of cardholder data | TLS 1.2+ cho moi transmission |
| Req 3.5 | Protect encryption keys | KMS, split knowledge, dual control |
| Req 3.6 | Key management procedures | Documented key rotation, generation, destruction |
| Req 10 | Track and monitor all access | Audit logging, SIEM, log retention |
| Req 3.1 | Minimize data storage | Chi giu data can thiet, co retention policy |
PCI-DSS scope reduction strategies:
- Tokenization: Thay credit card bang token ⇒ giam he thong trong scope
- Network segmentation: Tach Payment zone rieng, firewall cat voi cac zone khac
- Third-party processing: Dung Stripe/Braintree ⇒ ho chiu PCI scope, minh chi giu token
2.12 Backup Encryption
| Khia canh | Recommendation |
|---|---|
| Encryption | AES-256-GCM cho tat ca backups |
| Key management | Backup encryption key rieng, luu trong KMS |
| Key khong luu cung backup | Tuyet doi — mat ca hai = mat het |
| Test restore | Dinh ky test restore tu encrypted backup (it nhat moi quy) |
| Offsite backup | Encrypt truoc khi chuyen ra offsite |
| Retention | Backup key phai ton tai it nhat bang thoi gian retention cua backup |
Pitfall kinh dien: Team lam key rotation cho production nhung quen rotate backup encryption key hoac te hon, xoa key cu trong khi backup cu van con. Ket qua: backup ton tai nhung khong the restore.
2.13 Secure Deletion (Xoa an toan)
| Phuong phap | Mo ta | Hieu qua |
|---|---|---|
| rm / DELETE | Xoa pointer, data van tren disk | Khong an toan |
| Overwrite (1-pass zero) | Ghi de bang 0 | Du cho HDD hien dai |
| DoD 5220.22-M | 3 passes (0, 1, random) | Legacy standard |
| Crypto-shredding | Xoa encryption key | Hieu qua nhat cho cloud/SSD |
| Physical destruction | Nghien, dot, khu tu | Cho hardware decommission |
SSD luu y: SSD co wear leveling va spare blocks — overwrite khong dam bao xoa het. Crypto-shredding la cach duy nhat dam bao tren SSD/cloud storage.
2.14 Database-level Encryption
TDE (Transparent Data Encryption)
TDE encrypt toan bo database files tren disk. Application khong can thay doi code.
| Database | TDE Support | Chi tiet |
|---|---|---|
| PostgreSQL | Khong native (dung pgcrypto, pg_tde extension) | Can 3rd party hoac file-system encryption |
| MySQL | Co (InnoDB tablespace encryption) | AES-256, key trong keyring plugin |
| SQL Server | Co (Enterprise edition) | AES-256, certificate-based key management |
| Oracle | Co (Advanced Security Option) | AES-256, wallet-based key management |
| MongoDB | Co (Enterprise) | AES-256-CBC, KMIP integration |
TDE han che: Chi bao ve data on disk. Khi data duoc load vao memory (query result, buffer pool), no o dang plaintext. DBA co quyen truy cap van doc duoc.
Column-level Encryption
Encrypt chi cac column chua sensitive data:
-- PostgreSQL voi pgcrypto
-- Encrypt khi INSERT
INSERT INTO users (name, email_encrypted, phone_encrypted)
VALUES (
'Nguyen Van Hieu',
pgp_sym_encrypt('[email protected]', 'my-secret-key'),
pgp_sym_encrypt('0912345678', 'my-secret-key')
);
-- Decrypt khi SELECT (chi role co quyen)
SELECT name,
pgp_sym_decrypt(email_encrypted::bytea, 'my-secret-key') as email,
pgp_sym_decrypt(phone_encrypted::bytea, 'my-secret-key') as phone
FROM users
WHERE id = 1;Han che: Khong the index hoac search tren encrypted columns (vi gia tri encrypted khac nhau moi lan do random IV).
Field-level Encryption (Client-side)
Application encrypt truoc khi gui xuong database. Database chi thay ciphertext.
Uu diem so voi TDE/column-level:
- DBA khong doc duoc (key nam phia app/KMS)
- Data encrypted suot vong doi (at rest + in transit giua app va DB)
- Co the dung khac key cho khac tenant (multi-tenant isolation)
Nhuoc diem:
- Application code phuc tap hon
- Khong the query tren encrypted fields (truoc khi encrypt, can luu search index rieng)
- Schema migration phuc tap
MongoDB Client-Side Field Level Encryption (CSFLE): MongoDB ho tro native CSFLE tu version 4.2, voi automatic encryption/decryption. Cuc ky manh cho multi-tenant SaaS.
3. Estimation — Encryption Overhead
3.1 Encryption CPU Overhead
AES-256-GCM voi hardware AES-NI (hau het CPU hien dai):
So sanh: Mot API request trung binh mat 10-100ms. Encryption mat 0.5us. Encryption overhead < 0.005% latency cho data nho.
3.2 Storage Overhead
AES-GCM them vao moi encrypted object:
Voi envelope encryption, them encrypted DEK:
Vi du: Encrypt 100 trieu records, moi record 1KB:
Voi data lon (images, videos), overhead gan nhu 0% vi 284 bytes / 2MB = 0.014%.
3.3 KMS Request Cost
AWS KMS pricing (us-east-1, tinh cho 2026):
Vi du: He thong voi 10 CMKs, 50M encrypt/decrypt requests/thang:
Toi uu: Cache DEK trong memory (co TTL). Thay vi goi KMS moi request, chi goi khi DEK cache miss hoac expire. Giam tu 50M requests xuong con ~500K requests/thang = **150.
3.4 Audit Log Storage cho Compliance
Moi access vao sensitive data can log:
Gia su he thong co 10K sensitive data accesses/s:
Chi phi storage (S3 Standard → Glacier cho data cu):
| Tier | Data | Cost/month |
|---|---|---|
| S3 Standard (0-1 thang) | 432 GB | $10 |
| S3 IA (1-12 thang) | 5.2 TB | $66 |
| S3 Glacier (1-7 nam) | 1,100 TB | $4,400 |
| Total | ~$4,500/month |
Voi compression (typ. 10:1 cho text logs), giam xuong ~$450/month. Van la mot khoan chi phi dang ke.
3.5 TLS Handshake Overhead
Voi connection pooling va keep-alive, handshake chi xay ra mot lan cho nhieu requests:
Ket luan: TLS overhead la khong dang ke voi connection reuse. Khong co ly do nao de khong dung TLS.
4. Security Deep Dive
4.1 Key Escrow Risks (Rui ro ky gui khoa)
Key escrow: Giao encryption key cho ben thu ba (VD: chinh phu, cloud provider) giu ho.
| Rui ro | Mo ta |
|---|---|
| Single point of compromise | Ben thu ba bi hack → tat ca data bi lo |
| Insider threat | Nhan vien ben thu ba co the truy cap key |
| Legal compulsion | Chinh phu co the ep ben thu ba giao key |
| Trust boundary | Bat buoc trust ben thu ba — vi pham zero-trust |
Giai phap:
- Tu quan ly key (self-managed KMS) cho data cuc ky nhay cam
- BYOK (Bring Your Own Key): Dung key cua minh voi cloud KMS
- Hold Your Own Key (HYOK): Key khong bao gio roi infrastructure cua minh
- Split knowledge: Khong ai mot minh co du key de decrypt (Shamir’s Secret Sharing)
4.2 HSM vs Software KMS
| Khia canh | HSM (Hardware Security Module) | Software KMS |
|---|---|---|
| Key storage | Tamper-resistant hardware | Encrypted in software |
| FIPS 140-2 | Level 3 (physical tamper protection) | Level 1-2 |
| Key extraction | Khong the extract key ra ngoai HSM | Key trong memory co the bi dump |
| Performance | 1,000-10,000 operations/s | 100,000+ operations/s |
| Cost | 50,000/unit (hoac CloudHSM ~$1.50/hr) | Thap (software license) |
| Use case | Root CA, master keys, PCI-DSS, government | Application-level encryption, dev/staging |
Rule of thumb: Dung HSM cho root of trust (master key, CA signing key). Dung software KMS cho bulk operations (encrypt/decrypt data). Ket hop ca hai: HSM giu KEK, software KMS handle DEK.
4.3 Side-Channel Attacks Awareness
Side-channel attack: Tan cong khong nhắm vao thuat toan ma nhắm vao implementation cua no.
| Loai | Cach tan cong | Phong chong |
|---|---|---|
| Timing attack | Do thoi gian encrypt/decrypt de suy ra key | Constant-time comparison, padding |
| Cache attack | Quan sat CPU cache access patterns | Cache partitioning, disable hyperthreading |
| Power analysis | Do dien nang tieu thu khi encrypt | HSM (co shielding) |
| Spectre/Meltdown | Khai thac speculative execution | OS/CPU patches, process isolation |
| Padding oracle | Exploit error messages khi padding sai | Dung authenticated encryption (GCM), khong leak error details |
Thuc te cho developer: Dung thu vien crypto da duoc audit (libsodium, OpenSSL, AWS Encryption SDK). KHONG BAO GIO tu implement crypto algorithm. Ngay ca Google, Apple cung dung standard libraries.
4.4 Data Breach Response Plan
Timeline bat buoc (GDPR yeu cau thong bao trong 72 gio):
| Gio | Action |
|---|---|
| 0-1h | Phat hien va xac nhan breach (monitoring/SIEM alert) |
| 1-4h | Containment: isolate affected systems, revoke compromised credentials, rotate keys |
| 4-12h | Assessment: xac dinh scope (bao nhieu records, loai data gi, ai bi anh huong) |
| 12-24h | Evidence preservation: forensic copy truoc khi remediate |
| 24-48h | Remediation: patch vulnerability, restore tu clean backup |
| 48-72h | Notification: bao DPA (Data Protection Authority), bao users bi anh huong |
| 72h+ | Post-incident: root cause analysis, update runbook, re-test |
Encryption giam thiet hai breach:
- Data encrypted + key khong bi compromise = breach khong can thong bao (GDPR Article 34.3a)
- Tokenized data bi lo = vo gia tri vi khong co token vault
Aha Moment: Encrypt data dung cach co the bien mot catastrophic breach thanh mot security incident khong can thong bao. Day la ROI thuc te cua encryption.
5. DevOps — Thuc hanh trien khai
5.1 HashiCorp Vault Setup va Usage
Vault Architecture cho Production
# vault-values.yaml (Helm chart for Kubernetes)
server:
ha:
enabled: true
replicas: 3
raft:
enabled: true
config: |
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
}
}
extraEnvironmentVars:
VAULT_SEAL_TYPE: awskms
VAULT_AWSKMS_SEAL_KEY_ID: "alias/vault-unseal-key"
# Auto-unseal dung AWS KMS (khong can manual unseal)
seal:
awskms:
region: "ap-southeast-1"
kms_key_id: "alias/vault-unseal-key"
auditStorage:
enabled: true
size: 50Gi
ui:
enabled: true
injector:
enabled: true # Vault Agent Injector cho Kubernetes podsVault Policies (Principle of Least Privilege)
# policy-app-payment.hcl
# Chi cho phep payment service doc secrets cua no
path "secret/data/payment/*" {
capabilities = ["read", "list"]
}
path "transit/encrypt/payment-key" {
capabilities = ["update"]
}
path "transit/decrypt/payment-key" {
capabilities = ["update"]
}
# Khong cho phep:
# - Doc secrets cua service khac
# - Tao/xoa keys
# - Access root/admin paths5.2 AWS KMS Integration
Terraform IaC cho KMS
# kms.tf
resource "aws_kms_key" "data_encryption" {
description = "Customer data encryption key"
deletion_window_in_days = 30 # Safety: 30 ngay truoc khi xoa that
enable_key_rotation = true # Tu dong rotate moi 365 ngay
multi_region = false
key_usage = "ENCRYPT_DECRYPT"
customer_master_key_spec = "SYMMETRIC_DEFAULT" # AES-256
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowKeyAdmin"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/SecurityAdmin"
}
Action = ["kms:*"]
Resource = "*"
},
{
Sid = "AllowAppEncryptDecrypt"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/AppServiceRole"
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:GenerateDataKey",
"kms:GenerateDataKeyWithoutPlaintext",
"kms:DescribeKey"
]
Resource = "*"
}
]
})
tags = {
Environment = "production"
DataClassification = "restricted"
ManagedBy = "terraform"
}
}
resource "aws_kms_alias" "data_encryption" {
name = "alias/customer-data-key"
target_key_id = aws_kms_key.data_encryption.key_id
}
# CloudWatch alarm neu KMS requests bat thuong (co the la attack)
resource "aws_cloudwatch_metric_alarm" "kms_anomaly" {
alarm_name = "kms-request-anomaly"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "NumberOfAPIRequests"
namespace = "AWS/KMS"
period = 300
statistic = "Sum"
threshold = 10000 # 10K requests trong 5 phut la bat thuong
dimensions = {
KeyId = aws_kms_key.data_encryption.key_id
}
alarm_actions = [aws_sns_topic.security_alerts.arn]
}5.3 cert-manager cho Kubernetes
# cert-manager-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: [email protected]
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
class: nginx
---
# Tu dong tao va renew TLS certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-tls
namespace: production
spec:
secretName: api-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
commonName: api.company.com
dnsNames:
- api.company.com
- "*.api.company.com"
duration: 2160h # 90 ngay
renewBefore: 720h # Renew truoc 30 ngay
---
# Internal mTLS voi private CA
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: internal-ca
spec:
vault:
path: pki/sign/internal-services
server: https://vault.vault.svc.cluster.local:8200
auth:
kubernetes:
role: cert-manager
mountPath: /v1/auth/kubernetes
secretRef:
name: vault-token
key: token
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: service-a-mtls
namespace: production
spec:
secretName: service-a-mtls-secret
issuerRef:
name: internal-ca
kind: ClusterIssuer
commonName: service-a.production.svc.cluster.local
usages:
- server auth
- client auth # mTLS: ca server va client auth
duration: 720h # 30 ngay (internal certs rotate nhanh hon)
renewBefore: 240h # Renew truoc 10 ngay5.4 Automated Key Rotation
# CronJob rotate application-level encryption keys
apiVersion: batch/v1
kind: CronJob
metadata:
name: key-rotation
namespace: security
spec:
schedule: "0 2 1 */3 *" # Moi 3 thang, 2AM ngay 1
jobTemplate:
spec:
template:
spec:
serviceAccountName: key-rotation-sa
containers:
- name: key-rotation
image: company/key-rotation:latest
env:
- name: VAULT_ADDR
value: "https://vault.vault.svc.cluster.local:8200"
- name: KMS_KEY_ALIAS
value: "alias/customer-data-key"
- name: SLACK_WEBHOOK
valueFrom:
secretRef:
name: slack-webhook
key: url
command:
- /bin/sh
- -c
- |
# 1. Rotate key trong Vault Transit engine
vault write -f transit/keys/payment-key/rotate
# 2. Update min_decryption_version (giu 3 versions cu)
vault write transit/keys/payment-key \
min_decryption_version=$(vault read -field=latest_version transit/keys/payment-key | awk '{print $1 - 3}')
# 3. Trigger re-encryption cua DEKs bang key moi
python3 /scripts/reencrypt-deks.py
# 4. Thong bao ket qua
curl -X POST $SLACK_WEBHOOK \
-d '{"text":"Key rotation completed for payment-key"}'
restartPolicy: OnFailure5.5 Data Classification Tooling
# Prometheus rules de phat hien unencrypted sensitive data access
groups:
- name: data_classification_alerts
rules:
- alert: UnencryptedPIIAccess
expr: |
rate(db_query_total{table=~"users|payments|medical_records",
encrypted="false"}[5m]) > 0
for: 1m
labels:
severity: critical
compliance: "gdpr,pci-dss"
annotations:
summary: "Unencrypted access to sensitive table {{ $labels.table }}"
runbook: "https://wiki.internal/runbooks/data-classification"
- alert: SensitiveDataInLogs
expr: |
rate(log_pii_detected_total[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "PII detected in application logs"
action: "Scrub logs immediately, check log sanitization filters"6. Code Examples
6.1 Python: AES-256-GCM Encryption/Decryption
"""
AES-256-GCM Encryption/Decryption with envelope encryption pattern.
Dung trong production voi proper key management (KMS).
"""
import os
import json
import base64
from dataclasses import dataclass
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
@dataclass
class EncryptedPayload:
"""Cau truc luu tru ciphertext + metadata de decrypt."""
ciphertext: bytes
nonce: bytes # 12 bytes, unique per encryption
key_id: str # ID cua KEK da encrypt DEK nay
encrypted_dek: bytes # DEK da duoc encrypt boi KEK
def to_json(self) -> str:
return json.dumps({
"ciphertext": base64.b64encode(self.ciphertext).decode(),
"nonce": base64.b64encode(self.nonce).decode(),
"key_id": self.key_id,
"encrypted_dek": base64.b64encode(self.encrypted_dek).decode(),
})
@classmethod
def from_json(cls, data: str) -> "EncryptedPayload":
d = json.loads(data)
return cls(
ciphertext=base64.b64decode(d["ciphertext"]),
nonce=base64.b64decode(d["nonce"]),
key_id=d["key_id"],
encrypted_dek=base64.b64decode(d["encrypted_dek"]),
)
class EnvelopeEncryptor:
"""
Envelope encryption:
1. Generate random DEK
2. Encrypt data voi DEK (AES-256-GCM)
3. Encrypt DEK voi KEK (tu KMS)
4. Luu encrypted DEK cung voi ciphertext
"""
def __init__(self, kms_client):
"""
kms_client: doi tuong co method encrypt_key() va decrypt_key()
Co the la AWS KMS, Vault Transit, hoac mock cho testing.
"""
self.kms = kms_client
def encrypt(self, plaintext: bytes, key_id: str,
associated_data: bytes = None) -> EncryptedPayload:
# 1. Generate random DEK (256-bit)
dek = AESGCM.generate_key(bit_length=256)
# 2. Generate random nonce (96-bit, NIST recommended for GCM)
nonce = os.urandom(12)
# 3. Encrypt data voi DEK
aesgcm = AESGCM(dek)
ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data)
# 4. Encrypt DEK voi KEK (thong qua KMS)
encrypted_dek = self.kms.encrypt_key(dek, key_id)
# 5. Xoa DEK khoi memory ngay khi khong can
# (Python khong dam bao secure erase, nhung minimizes window)
del dek
return EncryptedPayload(
ciphertext=ciphertext,
nonce=nonce,
key_id=key_id,
encrypted_dek=encrypted_dek,
)
def decrypt(self, payload: EncryptedPayload,
associated_data: bytes = None) -> bytes:
# 1. Decrypt DEK tu KMS
dek = self.kms.decrypt_key(payload.encrypted_dek, payload.key_id)
# 2. Decrypt data voi DEK
aesgcm = AESGCM(dek)
plaintext = aesgcm.decrypt(payload.nonce, payload.ciphertext,
associated_data)
del dek
return plaintext
# === Vi du su dung ===
class MockKMS:
"""Mock KMS cho demo. Production dung AWS KMS hoac Vault."""
def __init__(self):
# KEK -- trong production, key nay nam trong HSM/KMS
self._keys = {
"key-001": AESGCM.generate_key(bit_length=256),
}
def encrypt_key(self, dek: bytes, key_id: str) -> bytes:
kek = self._keys[key_id]
nonce = os.urandom(12)
aesgcm = AESGCM(kek)
return nonce + aesgcm.encrypt(nonce, dek, None)
def decrypt_key(self, encrypted_dek: bytes, key_id: str) -> bytes:
kek = self._keys[key_id]
nonce = encrypted_dek[:12]
ciphertext = encrypted_dek[12:]
aesgcm = AESGCM(kek)
return aesgcm.decrypt(nonce, ciphertext, None)
if __name__ == "__main__":
kms = MockKMS()
encryptor = EnvelopeEncryptor(kms)
# Encrypt PII
user_data = json.dumps({
"name": "Nguyen Van Hieu",
"email": "[email protected]",
"phone": "0912345678",
"cccd": "001234567890"
}).encode()
# associated_data = context khong encrypt nhung bind vao ciphertext
# Neu associated_data bi thay doi, decrypt se fail --> chong tampering
context = b"user_id=12345"
encrypted = encryptor.encrypt(user_data, "key-001",
associated_data=context)
print(f"Encrypted payload: {encrypted.to_json()[:100]}...")
# Decrypt
decrypted = encryptor.decrypt(encrypted, associated_data=context)
print(f"Decrypted: {json.loads(decrypted)}")
# Thu thay doi context --> decrypt fail (integrity check)
try:
encryptor.decrypt(encrypted, associated_data=b"user_id=99999")
except Exception as e:
print(f"Tamper detected! {e}")6.2 Vault Integration (Read/Write Secrets)
"""
HashiCorp Vault integration cho application secrets va encryption.
Dung Vault Transit engine cho Encryption as a Service.
"""
import hvac
import base64
import os
from functools import lru_cache
class VaultClient:
"""Production-ready Vault client voi retry va caching."""
def __init__(self, vault_addr: str = None, role: str = "app"):
self.vault_addr = vault_addr or os.getenv("VAULT_ADDR",
"https://vault:8200")
self.client = hvac.Client(url=self.vault_addr)
self._authenticate(role)
def _authenticate(self, role: str):
"""Authenticate bang Kubernetes ServiceAccount (production)
hoac Token (development)."""
token = os.getenv("VAULT_TOKEN")
if token:
self.client.token = token
return
# Kubernetes auth
jwt_path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
if os.path.exists(jwt_path):
with open(jwt_path) as f:
jwt = f.read()
self.client.auth.kubernetes.login(role=role, jwt=jwt)
else:
raise RuntimeError("No Vault authentication method available")
# --- KV Secrets Engine (Static secrets) ---
def read_secret(self, path: str) -> dict:
"""Doc secret tu KV v2 engine."""
result = self.client.secrets.kv.v2.read_secret_version(path=path)
return result["data"]["data"]
def write_secret(self, path: str, data: dict):
"""Ghi secret vao KV v2 engine (versioned)."""
self.client.secrets.kv.v2.create_or_update_secret(
path=path, secret=data
)
# --- Transit Engine (Encryption as a Service) ---
def encrypt(self, key_name: str, plaintext: bytes,
context: bytes = None) -> str:
"""Encrypt data qua Vault Transit engine.
Key KHONG BAO GIO roi Vault -- chi ciphertext tra ve."""
b64_plaintext = base64.b64encode(plaintext).decode()
params = {"plaintext": b64_plaintext}
if context:
params["context"] = base64.b64encode(context).decode()
result = self.client.secrets.transit.encrypt_data(
name=key_name, **params
)
return result["data"]["ciphertext"] # "vault:v1:base64..."
def decrypt(self, key_name: str, ciphertext: str,
context: bytes = None) -> bytes:
"""Decrypt data qua Vault Transit engine."""
params = {"ciphertext": ciphertext}
if context:
params["context"] = base64.b64encode(context).decode()
result = self.client.secrets.transit.decrypt_data(
name=key_name, **params
)
return base64.b64decode(result["data"]["plaintext"])
def rewrap(self, key_name: str, ciphertext: str,
context: bytes = None) -> str:
"""Re-encrypt voi key version moi nhat (key rotation).
Vault decrypt bang key cu, re-encrypt bang key moi.
Plaintext KHONG BAO GIO roi Vault."""
params = {"ciphertext": ciphertext}
if context:
params["context"] = base64.b64encode(context).decode()
result = self.client.secrets.transit.rewrap_data(
name=key_name, **params
)
return result["data"]["ciphertext"]
# --- Dynamic Database Credentials ---
def get_db_credentials(self, role: str = "app-readonly") -> dict:
"""Lay dynamic database credentials (tu dong revoke sau TTL)."""
result = self.client.secrets.database.generate_credentials(
name=role
)
return {
"username": result["data"]["username"],
"password": result["data"]["password"],
"ttl": result["lease_duration"],
"lease_id": result["lease_id"],
}
# === Vi du su dung ===
if __name__ == "__main__":
vault = VaultClient()
# 1. Doc database password tu Vault (static secret)
db_config = vault.read_secret("database/production")
print(f"DB Host: {db_config['host']}")
# 2. Encrypt PII qua Transit engine
pii = b'{"email": "[email protected]", "phone": "0912345678"}'
ciphertext = vault.encrypt("pii-key", pii,
context=b"user_id=12345")
print(f"Ciphertext: {ciphertext}")
# 3. Decrypt
plaintext = vault.decrypt("pii-key", ciphertext,
context=b"user_id=12345")
print(f"Plaintext: {plaintext.decode()}")
# 4. Key rotation: rewrap existing ciphertext voi key moi
new_ciphertext = vault.rewrap("pii-key", ciphertext,
context=b"user_id=12345")
print(f"Rewrapped: {new_ciphertext}")
# 5. Dynamic DB credentials (tu dong expire)
creds = vault.get_db_credentials("app-readonly")
print(f"Dynamic DB user: {creds['username']}, TTL: {creds['ttl']}s")6.3 TLS Certificate Generation Script
#!/bin/bash
# generate-internal-certs.sh
# Tao internal CA va service certificates cho mTLS
# Dung cho development/staging. Production dung cert-manager + Vault PKI.
set -euo pipefail
CERTS_DIR="./certs"
CA_DAYS=3650 # CA valid 10 nam
CERT_DAYS=365 # Service certs valid 1 nam
KEY_SIZE=4096 # RSA key size cho CA
EC_CURVE="prime256v1" # ECDSA cho service certs (nhanh hon RSA)
mkdir -p "${CERTS_DIR}"
echo "=== 1. Tao Root CA ==="
openssl genrsa -out "${CERTS_DIR}/ca-key.pem" ${KEY_SIZE}
openssl req -new -x509 \
-key "${CERTS_DIR}/ca-key.pem" \
-out "${CERTS_DIR}/ca-cert.pem" \
-days ${CA_DAYS} \
-subj "/C=VN/ST=HCM/O=Company/OU=Security/CN=Internal Root CA" \
-addext "basicConstraints=critical,CA:TRUE" \
-addext "keyUsage=critical,keyCertSign,cRLSign"
echo "=== 2. Tao Service Certificate (ECDSA) ==="
generate_service_cert() {
local SERVICE_NAME=$1
local SANS=$2 # Subject Alternative Names
echo "--- Generating cert for ${SERVICE_NAME} ---"
# Generate ECDSA private key (nhanh hon RSA, key nho hon)
openssl ecparam -genkey -name ${EC_CURVE} \
-out "${CERTS_DIR}/${SERVICE_NAME}-key.pem"
# Create CSR
openssl req -new \
-key "${CERTS_DIR}/${SERVICE_NAME}-key.pem" \
-out "${CERTS_DIR}/${SERVICE_NAME}.csr" \
-subj "/C=VN/ST=HCM/O=Company/OU=${SERVICE_NAME}/CN=${SERVICE_NAME}"
# Sign voi CA, them SANs
openssl x509 -req \
-in "${CERTS_DIR}/${SERVICE_NAME}.csr" \
-CA "${CERTS_DIR}/ca-cert.pem" \
-CAkey "${CERTS_DIR}/ca-key.pem" \
-CAcreateserial \
-out "${CERTS_DIR}/${SERVICE_NAME}-cert.pem" \
-days ${CERT_DAYS} \
-extfile <(cat <<EOF
subjectAltName=${SANS}
keyUsage=critical,digitalSignature,keyEncipherment
extendedKeyUsage=serverAuth,clientAuth
EOF
)
# Xoa CSR (khong can giu)
rm -f "${CERTS_DIR}/${SERVICE_NAME}.csr"
# Verify certificate
openssl verify -CAfile "${CERTS_DIR}/ca-cert.pem" \
"${CERTS_DIR}/${SERVICE_NAME}-cert.pem"
echo "--- ${SERVICE_NAME} cert OK ---"
}
# Tao cert cho cac services
generate_service_cert "api-gateway" \
"DNS:api-gateway,DNS:api-gateway.production.svc.cluster.local,DNS:localhost,IP:127.0.0.1"
generate_service_cert "payment-service" \
"DNS:payment-service,DNS:payment-service.production.svc.cluster.local"
generate_service_cert "user-service" \
"DNS:user-service,DNS:user-service.production.svc.cluster.local"
echo "=== 3. Tao Kubernetes Secrets ==="
echo "Run these commands to create K8s secrets:"
for service in api-gateway payment-service user-service; do
echo "kubectl create secret tls ${service}-tls \\"
echo " --cert=${CERTS_DIR}/${service}-cert.pem \\"
echo " --key=${CERTS_DIR}/${service}-key.pem \\"
echo " -n production"
echo ""
done
echo "=== 4. Certificate Info ==="
for cert in "${CERTS_DIR}"/*-cert.pem; do
echo "--- $(basename ${cert}) ---"
openssl x509 -in "${cert}" -noout -subject -dates -ext subjectAltName
echo ""
done
echo "=== Done! ==="
echo "IMPORTANT: Trong production, dung cert-manager + Vault PKI thay vi script nay."
echo "CA private key (${CERTS_DIR}/ca-key.pem) phai duoc bao ve cuc ky can than!"6.4 Field-level Encryption in PostgreSQL
"""
Field-level encryption cho PostgreSQL.
Encrypt PII fields truoc khi luu vao database.
Database chi thay ciphertext -- DBA khong doc duoc.
"""
import json
import hashlib
from typing import Optional
import asyncpg
from vault_client import VaultClient # Tu section 6.2
class SecureUserRepository:
"""Repository pattern voi field-level encryption cho PII."""
# Fields can encrypt (Restricted classification)
ENCRYPTED_FIELDS = {"email", "phone", "cccd", "address"}
# Fields can search (luu blind index)
SEARCHABLE_ENCRYPTED_FIELDS = {"email", "phone"}
def __init__(self, db_pool: asyncpg.Pool, vault: VaultClient,
transit_key: str = "user-pii-key"):
self.db = db_pool
self.vault = vault
self.transit_key = transit_key
# HMAC key cho blind index (cho phep search tren encrypted fields)
self._hmac_key = vault.read_secret(
"secrets/blind-index-key"
)["key"].encode()
def _blind_index(self, field_name: str, value: str) -> str:
"""Tao blind index de search tren encrypted field.
HMAC(field_name + value) --> deterministic hash, khong the reverse."""
return hashlib.blake2b(
f"{field_name}:{value.lower().strip()}".encode(),
key=self._hmac_key,
digest_size=32
).hexdigest()
def _encrypt_field(self, value: str, user_id: str) -> str:
"""Encrypt mot field voi context = user_id (chong swap attack)."""
return self.vault.encrypt(
self.transit_key,
value.encode(),
context=f"user:{user_id}".encode()
)
def _decrypt_field(self, ciphertext: str, user_id: str) -> str:
"""Decrypt mot field."""
return self.vault.decrypt(
self.transit_key,
ciphertext,
context=f"user:{user_id}".encode()
).decode()
async def create_user(self, user_id: str, data: dict) -> None:
"""Tao user voi PII duoc encrypt."""
encrypted_data = {}
blind_indexes = {}
for key, value in data.items():
if key in self.ENCRYPTED_FIELDS and value:
encrypted_data[key] = self._encrypt_field(value, user_id)
if key in self.SEARCHABLE_ENCRYPTED_FIELDS:
blind_indexes[f"{key}_idx"] = self._blind_index(key, value)
else:
encrypted_data[key] = value
await self.db.execute("""
INSERT INTO users (
id, name, email_encrypted, phone_encrypted,
cccd_encrypted, address_encrypted,
email_idx, phone_idx,
created_at
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, NOW())
""",
user_id,
encrypted_data.get("name"), # name = Internal, khong encrypt
encrypted_data.get("email"),
encrypted_data.get("phone"),
encrypted_data.get("cccd"),
encrypted_data.get("address"),
blind_indexes.get("email_idx"),
blind_indexes.get("phone_idx"),
)
async def get_user(self, user_id: str) -> Optional[dict]:
"""Doc user va decrypt PII."""
row = await self.db.fetchrow(
"SELECT * FROM users WHERE id = $1", user_id
)
if not row:
return None
return {
"id": row["id"],
"name": row["name"],
"email": self._decrypt_field(row["email_encrypted"], user_id),
"phone": self._decrypt_field(row["phone_encrypted"], user_id),
"cccd": self._decrypt_field(row["cccd_encrypted"], user_id),
"address": self._decrypt_field(row["address_encrypted"], user_id),
"created_at": str(row["created_at"]),
}
async def find_by_email(self, email: str) -> Optional[dict]:
"""Tim user bang email (dung blind index, khong decrypt toan bo)."""
email_idx = self._blind_index("email", email)
row = await self.db.fetchrow(
"SELECT id FROM users WHERE email_idx = $1", email_idx
)
if not row:
return None
return await self.get_user(row["id"])
async def gdpr_erase_user(self, user_id: str) -> None:
"""GDPR Right to Erasure -- crypto-shredding approach.
Option 1 (simple): Xoa record hoan toan
Option 2 (crypto-shredding): Xoa user's DEK -- data thanh garbage
Dung Option 1 cho database records.
Dung Option 2 cho data trong backups/logs (khong the xoa truc tiep).
"""
# Xoa tu database
await self.db.execute("DELETE FROM users WHERE id = $1", user_id)
# Xoa user-specific key version trong Vault
# (neu dung per-user keys thay vi shared transit key)
# self.vault.delete_key(f"user-{user_id}-key")
# Log erasure cho compliance audit
await self.db.execute("""
INSERT INTO gdpr_erasure_log (user_id, erased_at, method)
VALUES ($1, NOW(), 'direct_delete + crypto_shredding')
""", user_id)
# SQL Schema
CREATE_TABLE_SQL = """
CREATE TABLE users (
id VARCHAR(36) PRIMARY KEY,
name VARCHAR(255), -- Internal: khong encrypt
email_encrypted TEXT NOT NULL, -- Restricted: encrypted
phone_encrypted TEXT, -- Restricted: encrypted
cccd_encrypted TEXT, -- Restricted: encrypted
address_encrypted TEXT, -- Restricted: encrypted
email_idx VARCHAR(64), -- Blind index cho search
phone_idx VARCHAR(64), -- Blind index cho search
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ
);
CREATE INDEX idx_users_email ON users(email_idx);
CREATE INDEX idx_users_phone ON users(phone_idx);
-- Audit table cho GDPR compliance
CREATE TABLE gdpr_erasure_log (
id SERIAL PRIMARY KEY,
user_id VARCHAR(36) NOT NULL,
erased_at TIMESTAMPTZ NOT NULL,
method VARCHAR(100) NOT NULL
);
"""6.5 Field-level Encryption in MongoDB (CSFLE)
"""
MongoDB Client-Side Field Level Encryption (CSFLE).
MongoDB driver tu dong encrypt/decrypt -- transparent voi application code.
"""
from pymongo import MongoClient
from pymongo.encryption import ClientEncryption, Algorithm
from pymongo.encryption_options import AutoEncryptionOpts
from bson.codec_options import CodecOptions
from bson.binary import STANDARD, UUID_SUBTYPE
import os
def setup_mongodb_csfle():
"""Setup MongoDB CSFLE voi AWS KMS."""
# KMS provider config
kms_providers = {
"aws": {
"accessKeyId": os.getenv("AWS_ACCESS_KEY_ID"),
"secretAccessKey": os.getenv("AWS_SECRET_ACCESS_KEY"),
}
}
# Master key config (CMK trong AWS KMS)
master_key = {
"region": "ap-southeast-1",
"key": os.getenv("AWS_KMS_KEY_ARN"), # ARN cua CMK
}
# Tao Data Encryption Key (DEK) -- chi lam mot lan
key_vault_namespace = "encryption.__keyVault"
key_vault_client = MongoClient(os.getenv("MONGODB_URI"))
client_encryption = ClientEncryption(
kms_providers=kms_providers,
key_vault_namespace=key_vault_namespace,
key_vault_client=key_vault_client,
codec_options=CodecOptions(uuid_representation=STANDARD),
)
# Tao DEK (luu trong key vault collection, encrypted boi AWS KMS CMK)
data_key_id = client_encryption.create_data_key(
"aws", master_key=master_key, key_alt_names=["user-pii-key"]
)
# Schema map -- dinh nghia field nao can encrypt va bang algorithm nao
json_schema = {
"bsonType": "object",
"encryptMetadata": {"keyId": [data_key_id]},
"properties": {
"name": {"bsonType": "string"}, # Khong encrypt
"email": {
"encrypt": {
"bsonType": "string",
# Deterministic: cho phep query exact match
"algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
}
},
"phone": {
"encrypt": {
"bsonType": "string",
# Random: an toan hon, khong query duoc
"algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
}
},
"cccd": {
"encrypt": {
"bsonType": "string",
"algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
}
},
"medical_history": {
"encrypt": {
"bsonType": "object",
"algorithm": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
}
},
},
}
schema_map = {"mydb.users": json_schema}
# Tao auto-encrypting client
auto_encryption_opts = AutoEncryptionOpts(
kms_providers=kms_providers,
key_vault_namespace=key_vault_namespace,
schema_map=schema_map,
# mongocryptd hoac crypt_shared library
crypt_shared_lib_path="/usr/lib/mongo_crypt_v1.so",
)
# Client nay TU DONG encrypt khi write va decrypt khi read
secure_client = MongoClient(
os.getenv("MONGODB_URI"),
auto_encryption_opts=auto_encryption_opts,
)
return secure_client
if __name__ == "__main__":
client = setup_mongodb_csfle()
db = client["mydb"]
users = db["users"]
# Insert -- email va phone TU DONG duoc encrypt truoc khi gui den MongoDB
users.insert_one({
"name": "Nguyen Van Hieu", # Plaintext (khong encrypt)
"email": "[email protected]", # Auto-encrypted (deterministic)
"phone": "0912345678", # Auto-encrypted (random)
"cccd": "001234567890", # Auto-encrypted (random)
})
# Find by email (deterministic encryption cho phep exact match)
user = users.find_one({"email": "[email protected]"})
print(f"Found: {user['name']}, {user['email']}")
# Output: Found: Nguyen Van Hieu, [email protected]
# (tu dong decrypt!)
# Neu connect bang client KHONG co auto-encryption:
plain_client = MongoClient(os.getenv("MONGODB_URI"))
raw = plain_client["mydb"]["users"].find_one({"name": "Nguyen Van Hieu"})
print(f"Raw email: {raw['email']}")
# Output: Raw email: Binary(6, b'\x06\x...') <-- ciphertext, khong doc duoc!7. Mermaid Diagrams
7.1 Envelope Encryption Flow
sequenceDiagram participant App as Application participant KMS as KMS (AWS/Vault) participant Store as Database/S3 Note over App,Store: === ENCRYPTION FLOW === App->>KMS: GenerateDataKey(KeyId="master-key") KMS-->>App: {PlaintextDEK, EncryptedDEK} Note over App: Encrypt data voi PlaintextDEK (AES-256-GCM) App->>App: ciphertext = AES-GCM(PlaintextDEK, plaintext) App->>App: Xoa PlaintextDEK khoi memory App->>Store: Store {ciphertext + EncryptedDEK + nonce} Note over App,Store: === DECRYPTION FLOW === App->>Store: Read {ciphertext + EncryptedDEK + nonce} Store-->>App: {ciphertext, EncryptedDEK, nonce} App->>KMS: Decrypt(EncryptedDEK) KMS-->>App: PlaintextDEK App->>App: plaintext = AES-GCM-Decrypt(PlaintextDEK, ciphertext) App->>App: Xoa PlaintextDEK khoi memory Note over App,Store: === KEY ROTATION === App->>Store: Read EncryptedDEK (encrypted by KEK-v1) App->>KMS: ReEncrypt(EncryptedDEK, NewKeyId="KEK-v2") KMS-->>App: NewEncryptedDEK (encrypted by KEK-v2) App->>Store: Update EncryptedDEK (ciphertext KHONG doi!) Note over App: Data KHONG can re-encrypt!<br/>Chi re-encrypt DEK (vai bytes)
7.2 KMS Architecture
flowchart TB subgraph "Applications" A1[Payment Service] A2[User Service] A3[Analytics Service] end subgraph "Key Management Layer" direction TB V[HashiCorp Vault Cluster] V --> VT[Transit Engine<br/>Encryption as a Service] V --> VK[KV Engine<br/>Static Secrets] V --> VP[PKI Engine<br/>Certificate Authority] V --> VD[Database Engine<br/>Dynamic Credentials] end subgraph "Cloud KMS" KMS[AWS KMS] KMS --> CMK1["CMK: vault-unseal<br/>(Auto-unseal Vault)"] KMS --> CMK2["CMK: s3-encryption<br/>(S3 SSE-KMS)"] KMS --> CMK3["CMK: rds-encryption<br/>(RDS TDE)"] end subgraph "HSM Layer" HSM[CloudHSM Cluster] HSM --> HSMK1["Root CA Key<br/>(never extracted)"] HSM --> HSMK2["Master Signing Key"] end subgraph "Storage" S3[S3 Buckets<br/>SSE-KMS encrypted] RDS[RDS PostgreSQL<br/>TDE enabled] Mongo[MongoDB Atlas<br/>CSFLE enabled] end A1 --> VT A1 --> VD A2 --> VT A2 --> VK A3 --> KMS V --> KMS KMS --> HSM VP --> HSM CMK2 --> S3 CMK3 --> RDS A2 --> Mongo style HSM fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#fff style V fill:#7950f2,stroke:#333,stroke-width:2px,color:#fff style KMS fill:#f9a825,stroke:#333,stroke-width:2px
7.3 Data Classification Decision Tree
flowchart TD Start["Data nay la gi?"] --> Q1{"Chua thong tin<br/>dinh danh ca nhan<br/>(PII/PHI)?"} Q1 -->|Co| Q2{"Loai PII nao?"} Q1 -->|Khong| Q3{"Data noi bo<br/>hay cong khai?"} Q2 -->|"Credit card,<br/>medical, biometric"| R["RESTRICTED<br/>🔴"] Q2 -->|"Email, phone,<br/>name, address"| C["CONFIDENTIAL<br/>🟠"] Q3 -->|Cong khai| PUB["PUBLIC<br/>🟢"] Q3 -->|Noi bo| INT["INTERNAL<br/>🟡"] R --> R_ACT["Actions:<br/>- Field-level encryption<br/>- Audit moi access<br/>- Key per tenant<br/>- Tokenization<br/>- PCI-DSS/HIPAA compliance<br/>- 7-year audit retention"] C --> C_ACT["Actions:<br/>- Encrypt at rest + transit<br/>- Column-level encryption<br/>- Access control (RBAC)<br/>- Audit log<br/>- GDPR compliance<br/>- Data masking for non-prod"] INT --> INT_ACT["Actions:<br/>- Encrypt in transit (TLS)<br/>- Basic access control<br/>- Standard logging"] PUB --> PUB_ACT["Actions:<br/>- Integrity check (signing)<br/>- CDN caching OK<br/>- No encryption needed"] style R fill:#ff6b6b,stroke:#333,color:#fff style C fill:#ff922b,stroke:#333,color:#fff style INT fill:#ffd43b,stroke:#333 style PUB fill:#51cf66,stroke:#333
8. Aha Moments & Pitfalls
Aha Moments
#1 — Encrypting Everything vs Encrypting Smart: Encrypt toan bo database voi TDE thi don gian, nhung khong bao ve khoi DBA doc data. Field-level encryption mat cong hon nhung bao ve tot hon. Phan loai data truoc, chon encryption level phu hop sau.
#2 — Key management is harder than encryption: AES-256 la “solved problem” — thu vien nao cung co. Nhung ai giu key? Key luu o dau? Rotate the nao? Revoke ra sao? Backup key the nao? Day moi la 90% do kho cua encryption. Encryption khong co key management = khong co encryption.
#3 — GDPR Right to Erasure voi Encrypted Data: Khong can xoa tung record khoi moi backup, log, Kafka topic. Chi can xoa encryption key (crypto-shredding). Du lieu van ton tai nhung vinh vien khong the doc. Day la giai phap elegant nhat cho “right to be forgotten” trong he thong phuc tap.
#4 — Audit log lon hon data: Trong he thong compliance, audit log thuong lon gap 5-10 lan data chinh. Phai tinh vao storage estimation va co tiered storage strategy (hot → warm → cold → archive).
#5 — Tokenization giam PCI scope: Thay vi encrypt credit card (van trong PCI scope), dung tokenization de dua data ra khoi scope hoan toan. Chi Token Vault can PCI compliant. Giam chi phi audit va compliance dang ke.
Pitfalls
Pitfall #1 — Encrypt tat ca bang mot key duy nhat: Mot key cho toan bo database. Key bi lo = toan bo data bi lo. Dung envelope encryption voi DEK per-record hoac per-tenant.
Pitfall #2 — Luu encryption key cung cho voi encrypted data: “Em de key trong config file tren cung server voi database.” Hacker lay duoc server = lay duoc ca data va key. Key PHAI nam rieng biet (KMS/Vault/HSM).
Pitfall #3 — Quen encrypt backup: Production encrypt chuan chinh. Nhung backup file tren S3 lai khong encrypt. Attacker chi can access backup la co toan bo data plaintext. Moi backup phai encrypt, va backup encryption key phai khac production key.
Pitfall #4 — Key rotation xoa key cu: Rotate key nhung xoa key cu ngay lap tuc. Tat ca data encrypt bang key cu khong the decrypt. Luon giu key cu it nhat bang thoi gian retention cua data. AWS KMS tu dong giu tat ca key versions.
Pitfall #5 — Dung ECB mode: AES-ECB khong dung IV, cung plaintext block → cung ciphertext block. Co the nhin thay pattern trong data (vi du noi tieng: “ECB penguin”). Luon dung GCM hoac CTR+HMAC.
Pitfall #6 — Khong test restore tu encrypted backup: Team cau hinh backup encryption, nhung khong bao gio test restore. Khi can restore that, phat hien key da bi rotate va version cu bi xoa. Test restore dinh ky, it nhat moi quy.
Pitfall #7 — Log chua PII: Application log ghi
INFO: User [email protected] logged in from 192.168.1.1. Email la PII, IP la PII. Log bi truy cap = data breach. Sanitize PII trong logs, hoac encrypt log entries chua PII.
Pitfall #8 — GDPR right to erasure cho backups: User yeu cau xoa data. Team xoa khoi database nhung quen rang data con trong 30 ban backup. Voi crypto-shredding (per-user DEK), xoa key la du. Khong co per-user key = phai restore va rewrite moi backup.
9. Internal Links & Tham khao
Prerequisite
- Tuan-14-AuthN-AuthZ-Security — Authentication & Authorization (truoc khi noi encryption, phai hieu AuthN/AuthZ)
Lien quan truc tiep
- Tuan-02-Back-of-the-envelope — Estimation framework (dung cho encryption overhead calculations)
- Tuan-07-Database-Sharding-Replication — Database encryption considerations khi sharding
- Tuan-11-Microservices-Pattern — mTLS giua services, secret management
- Tuan-12-CICD-Pipeline — Secret injection trong CI/CD, cert rotation automation
- Tuan-13-Monitoring-Observability — Audit logging, security monitoring, compliance dashboards
Se dung kien thuc nay
- Tuan-16-Design-URL-Shortener — Data encryption cho URL metadata
- Tuan-17-Design-Chat-System — End-to-end encryption, message encryption at rest
- Tuan-19-Design-Notification-System — PII handling trong notifications
- Tuan-20-Design-Key-Value-Store — Encryption at rest cho distributed storage
Tham khao
- Alex Xu, System Design Interview — Chapter 9: Design a Web Crawler (HTTPS/TLS), Chapter 13: Design a Chat System (E2E encryption)
- NIST SP 800-57: Recommendation for Key Management
- OWASP Cryptographic Failures (Top 10 #2)
- AWS Well-Architected Framework — Security Pillar: Data Protection
- GDPR Articles 5, 17, 20, 25, 32, 33, 34
- PCI-DSS v4.0 Requirements 3, 4, 10
- HashiCorp Vault Documentation: Transit Secrets Engine
- MongoDB Client-Side Field Level Encryption Documentation
Tuan truoc: Tuan-14-AuthN-AuthZ-Security — Authentication & Authorization Tuan sau: Tuan-16-Design-URL-Shortener — Ap dung tat ca kien thuc vao bai toan thuc te