主题
容器编排与Kubernetes实战
课程目标
通过本课程的学习,你将能够:
- 将Docker应用平滑迁移到Kubernetes
- 构建完整的微服务应用系统
- 实现蓝绿发布和金丝雀发布
- 构建可观测性体系(监控+日志)
- 搭建K8s CI/CD流水线
- 排查生产环境常见故障
前置要求:已完成K8S基础篇和进阶篇所有课程
课程概述
本课程包含6个实战案例,每个案例都有明确的目标和成果:
| 案例 | 主题 | 目标 | 预计时间 |
|---|---|---|---|
| 案例1 | 从Docker到K8s迁移 | 掌握K8s基础部署技能 | 2小时 |
| 案例2 | 构建微服务应用 | 掌握多服务协同部署 | 3小时 |
| 案例3 | 蓝绿/金丝雀发布 | 掌握零停机发布策略 | 2小时 |
| 案例4 | 构建可观测性体系 | 掌握监控和日志方案 | 3小时 |
| 案例5 | K8s CI/CD流水线 | 掌握自动化部署 | 2小时 |
| 案例6 | 生产故障排查 | 掌握问题定位技能 | 2小时 |
实战案例1:从Docker到K8s的平滑迁移
场景介绍
你有一个基于Docker部署的Web应用(Nginx + PHP + MySQL),现在需要迁移到Kubernetes集群。通过本案例,你将理解K8s相比Docker的优势,并掌握迁移的基本流程。
学习目标
完成本案例后,你将能够:
- 将Docker Compose应用转换为K8S资源
- 使用ConfigMap和Secret管理配置
- 实现服务发现和负载均衡
- 验证应用在K8S上的正常运行
前置准备
- 可用的K8S集群(minikube/K3s/云服务器)
- kubectl已配置
- 原Docker应用代码
原Docker应用
docker-compose.yml:
yaml
version: '3'
services:
web:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./html:/usr/share/nginx/html
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- php
php:
image: php:7.4-fpm
volumes:
- ./html:/var/www/html
db:
image: mysql:5.7
environment:
MYSQL_ROOT_PASSWORD: secret123
MYSQL_DATABASE: myapp
volumes:
- db_data:/var/lib/mysql
volumes:
db_data:迁移步骤
步骤1:创建Namespace
yaml
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: docker-to-k8sbash
kubectl apply -f namespace.yaml
kubectl config set-context --current --namespace=docker-to-k8s步骤2:创建ConfigMap和Secret
yaml
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-config
data:
nginx.conf: |
server {
listen 80;
server_name localhost;
root /usr/share/nginx/html;
index index.php index.html;
location ~ \.php$ {
fastcgi_pass php:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: php-config
data:
php.ini: |
memory_limit = 256M
upload_max_filesize = 64Myaml
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: mysql-secret
type: Opaque
stringData:
root-password: secret123
database: myapp步骤3:部署MySQL(StatefulSet)
yaml
# mysql.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
- name: MYSQL_DATABASE
valueFrom:
secretKeyRef:
name: mysql-secret
key: database
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
selector:
app: mysql
ports:
- port: 3306
clusterIP: None步骤4:部署PHP-FPM
yaml
# php.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: php
spec:
replicas: 2
selector:
matchLabels:
app: php
template:
metadata:
labels:
app: php
spec:
containers:
- name: php
image: php:7.4-fpm
volumeMounts:
- name: php-config
mountPath: /usr/local/etc/php/php.ini
subPath: php.ini
volumes:
- name: php-config
configMap:
name: php-config
---
apiVersion: v1
kind: Service
metadata:
name: php
spec:
selector:
app: php
ports:
- port: 9000步骤5:部署Nginx
yaml
# nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/conf.d/default.conf
subPath: nginx.conf
volumes:
- name: nginx-config
configMap:
name: nginx-config
---
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
selector:
app: nginx
ports:
- port: 80
targetPort: 80
type: NodePort步骤6:验证部署
bash
# 部署所有资源
kubectl apply -f namespace.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f mysql.yaml
kubectl apply -f php.yaml
kubectl apply -f nginx.yaml
# 查看部署状态
kubectl get all
# 测试访问
kubectl get svc nginx
# 访问 http://<node-ip>:<node-port>案例总结
迁移对比:
| 特性 | Docker Compose | Kubernetes |
|---|---|---|
| 服务发现 | 容器名解析 | DNS + Service |
| 负载均衡 | 无 | 内置 |
| 配置管理 | 环境变量/挂载 | ConfigMap/Secret |
| 存储 | Docker Volume | PVC/PV |
| 高可用 | 手动 | 自动故障转移 |
| 扩缩容 | 手动 | HPA自动扩缩容 |
迁移收益:
- ✅ 服务自动发现和负载均衡
- ✅ 配置与代码分离
- ✅ 自动故障恢复
- ✅ 水平自动扩缩容
实战案例2:构建完整的微服务应用
场景介绍
构建一个电商微服务系统,包含:前端(Vue.js)、后端API(Node.js)、订单服务(Go)、用户服务(Python)、MySQL数据库、Redis缓存。通过本案例,你将掌握多服务协同部署的技能。
系统架构
┌─────────────────────────────────────────────────────────────┐
│ 用户请求 │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Ingress Controller │
│ (路由: / -> 前端, /api -> 后端) │
└──────────────────────┬──────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 前端服务 │ │ 后端API │ │ 订单服务 │
│ (Vue.js) │ │ (Node.js) │ │ (Go) │
│ 3 replicas │ │ 3 replicas │ │ 2 replicas │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ └───────┬───────┘
│ │
│ ┌───────┴───────┐
│ ▼ ▼
│ ┌─────────────┐ ┌─────────────┐
│ │ 用户服务 │ │ MySQL │
│ │ (Python) │ │ (Stateful) │
│ │ 2 replicas │ │ 1 replica │
│ └──────┬──────┘ └─────────────┘
│ │
│ ▼
│ ┌─────────────┐
└───────►│ Redis │
│ (缓存) │
└─────────────┘学习目标
完成本案例后,你将能够:
- 设计微服务的K8S部署架构
- 使用不同的K8S资源类型(Deployment/StatefulSet)
- 实现服务间的网络通信
- 配置数据持久化和缓存
部署步骤
步骤1:部署基础设施(MySQL + Redis)
yaml
# infrastructure.yaml
# MySQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: rootpass
- name: MYSQL_DATABASE
value: ecommerce
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
selector:
app: mysql
ports:
- port: 3306
clusterIP: None
---
# Redis Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
app: redis
ports:
- port: 6379步骤2:部署后端服务
yaml
# backend-services.yaml
# 用户服务 (Python)
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
spec:
replicas: 2
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: ecommerce/user-service:v1
ports:
- containerPort: 5000
env:
- name: DB_HOST
value: mysql
- name: DB_NAME
value: ecommerce
- name: REDIS_HOST
value: redis
---
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
ports:
- port: 5000
---
# 订单服务 (Go)
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 2
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: ecommerce/order-service:v1
ports:
- containerPort: 8080
env:
- name: DB_HOST
value: mysql
- name: REDIS_HOST
value: redis
---
apiVersion: v1
kind: Service
metadata:
name: order-service
spec:
selector:
app: order-service
ports:
- port: 8080步骤3:部署前端和API网关
yaml
# frontend.yaml
# 后端API (Node.js)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
spec:
replicas: 3
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
spec:
containers:
- name: api-gateway
image: ecommerce/api-gateway:v1
ports:
- containerPort: 3000
env:
- name: USER_SERVICE_URL
value: http://user-service:5000
- name: ORDER_SERVICE_URL
value: http://order-service:8080
---
apiVersion: v1
kind: Service
metadata:
name: api-gateway
spec:
selector:
app: api-gateway
ports:
- port: 3000
---
# 前端 (Vue.js)
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: ecommerce/frontend:v1
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: frontend
spec:
selector:
app: frontend
ports:
- port: 80步骤4:配置Ingress路由
yaml
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ecommerce-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: ecommerce.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
- path: /api
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 3000
- path: /api/users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 5000
- path: /api/orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 8080验证测试
bash
# 部署所有服务
kubectl apply -f infrastructure.yaml
kubectl apply -f backend-services.yaml
kubectl apply -f frontend.yaml
kubectl apply -f ingress.yaml
# 查看所有资源
kubectl get all
# 测试服务访问
kubectl get ingress
# 配置本地hosts: <node-ip> ecommerce.local
# 访问 http://ecommerce.local案例总结
微服务部署要点:
- 有状态服务使用StatefulSet(MySQL)
- 无状态服务使用Deployment(其他服务)
- 服务发现通过DNS名称访问
- 负载均衡自动实现
- 配置管理使用环境变量
实战案例3:蓝绿发布与金丝雀发布
场景介绍
在生产环境中,应用更新需要保证零停机。本案例将演示两种发布策略:蓝绿发布(快速切换)和金丝雀发布(灰度发布)。
学习目标
完成本案例后,你将能够:
- 实现蓝绿发布策略
- 实现金丝雀发布策略
- 使用Ingress进行流量切分
- 快速回滚到稳定版本
蓝绿发布
原理
初始状态:
┌─────────────────────────────────────┐
│ Ingress │
│ (100% 流量) │
└─────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 蓝色环境 (v1.0) │
│ Deployment: app-blue │
│ Replicas: 3 │
└─────────────────────────────────────┘
发布过程:
1. 部署绿色环境 (v2.0)
┌─────────────────────────────────────┐
│ 蓝色环境 (v1.0) │
│ 3 replicas │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ 绿色环境 (v2.0) │
│ 3 replicas │
└─────────────────────────────────────┘
2. 切换流量
┌─────────────────────────────────────┐
│ Ingress │
│ (100% 流量) │
└─────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 绿色环境 (v2.0) │
└─────────────────────────────────────┘实施步骤
yaml
# blue-green-deployment.yaml
# Service(不变,只切换selector)
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp
version: blue # 切换为 green 进行发布
ports:
- port: 80
targetPort: 8080
---
# 蓝色环境 (v1.0)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myapp:v1.0
ports:
- containerPort: 8080
---
# 绿色环境 (v2.0)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:v2.0
ports:
- containerPort: 8080bash
# 蓝绿发布流程
# 1. 初始状态:蓝色环境运行v1.0
kubectl apply -f blue-green-deployment.yaml
# 2. 部署绿色环境(不切换流量)
# 创建绿色环境 Deployment YAML 文件
cat > app-green-deployment.yaml << 'EOF'
# 绿色环境 Deployment(蓝绿发布 - 新版本)
# 用途:部署新版本应用(v2.0)
# 特点:
# - 与蓝色环境并行运行
# - 不直接接收流量(Service 仍指向 blue)
# - 验证通过后再切换流量
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
labels:
app: myapp
version: green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:v2.0 # 新版本镜像
ports:
- containerPort: 8080
EOF
# 应用绿色环境
kubectl apply -f app-green-deployment.yaml
# 3. 验证绿色环境
kubectl get pods -l version=green
kubectl logs -l version=green
# 4. 切换流量到绿色环境
kubectl patch service app-service -p '{"spec":{"selector":{"version":"green"}}}'
# 5. 如果出问题,快速回滚
kubectl patch service app-service -p '{"spec":{"selector":{"version":"blue"}}}'
# 6. 确认稳定后,删除蓝色环境
kubectl delete deployment app-blue金丝雀发布
原理
初始状态:
┌─────────────────────────────────────┐
│ Ingress │
│ (100% -> v1.0) │
└─────────────────────────────────────┘
金丝雀发布:
┌─────────────────────────────────────┐
│ Ingress │
│ (90% -> v1.0, 10% -> v2.0) │
└─────────────┬─────────────┬─────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ v1.0 (90%) │ │ v2.0 (10%) │
│ 9 replicas │ │ 1 replica │
└─────────────────┘ └─────────────────┘
逐步增加:
┌─────────────────────────────────────┐
│ Ingress │
│ (50% -> v1.0, 50% -> v2.0) │
└─────────────┬─────────────┬─────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ v1.0 (50%) │ │ v2.0 (50%) │
│ 5 replicas │ │ 5 replicas │
└─────────────────┘ └─────────────────┘
最终状态:
┌─────────────────────────────────────┐
│ Ingress │
│ (100% -> v2.0) │
└─────────────────────────────────────┘实施步骤
yaml
# canary-deployment.yaml
# v1.0 稳定版本
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
track: stable
template:
metadata:
labels:
app: myapp
track: stable
spec:
containers:
- name: app
image: myapp:v1.0
ports:
- containerPort: 8080
---
# v2.0 金丝雀版本
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-canary
spec:
replicas: 1 # 初始1个副本,10%流量
selector:
matchLabels:
app: myapp
track: canary
template:
metadata:
labels:
app: myapp
track: canary
spec:
containers:
- name: app
image: myapp:v2.0
ports:
- containerPort: 8080
---
# Service(同时包含两个版本)
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp # 同时匹配 stable 和 canary
ports:
- port: 80
targetPort: 8080bash
# 金丝雀发布流程
# 1. 初始状态:10%流量到新版本
kubectl apply -f canary-deployment.yaml
# 2. 监控新版本指标(错误率、延迟等)
kubectl logs -l track=canary -f
# 3. 逐步增加新版本流量(25%)
kubectl scale deployment app-stable --replicas=7
kubectl scale deployment app-canary --replicas=3
# 4. 继续增加(50%)
kubectl scale deployment app-stable --replicas=5
kubectl scale deployment app-canary --replicas=5
# 5. 大部分流量(75%)
kubectl scale deployment app-stable --replicas=3
kubectl scale deployment app-canary --replicas=7
# 6. 全部切换(100%)
kubectl scale deployment app-stable --replicas=0
kubectl scale deployment app-canary --replicas=10
# 7. 更新stable标签,清理
cat > app-stable.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-stable
spec:
replicas: 10
selector:
matchLabels:
app: myapp
track: stable
template:
metadata:
labels:
app: myapp
track: stable
spec:
containers:
- name: app
image: myapp:v2.0 # 更新为v2.0
EOF
kubectl apply -f app-stable.yaml
kubectl delete deployment app-canary案例总结
| 发布策略 | 优点 | 缺点 | 适用场景 |
|---|---|---|---|
| 蓝绿发布 | 快速切换、即时回滚 | 资源占用双倍 | 关键业务、需要快速回滚 |
| 金丝雀发布 | 风险低、渐进式 | 发布时间长 | 大规模用户、需要验证 |
实战案例4:构建可观测性体系
场景介绍
生产环境需要全面的可观测性,包括指标监控(Metrics)、日志收集(Logs)、链路追踪(Tracing)。本案例将部署Prometheus + Grafana + ELK Stack。
学习目标
完成本案例后,你将能够:
- 部署Prometheus进行指标收集
- 部署Grafana进行可视化展示
- 部署ELK Stack进行日志收集
- 配置应用暴露监控指标
系统架构
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Prometheus │ │ Grafana │ │ Applications │ │
│ │ (指标收集) │ │ (可视化) │ │ (暴露/metrics) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │
│ │ │ │ │
│ └────────────────┴─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ ELK Stack │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Filebeat │ │ Logstash │ │ Elasticsearch│ │ │
│ │ │ (日志收集) │ │ (日志处理) │ │ (日志存储) │ │ │
│ │ └─────────────┘ └─────────────┘ └──────┬──────┘ │ │
│ │ │ │ │
│ │ ┌──────┴──────┐ │ │
│ │ │ Kibana │ │ │
│ │ │ (日志查询) │ │ │
│ │ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘部署Prometheus
yaml
# prometheus.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.45.0
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
- name: prometheus-data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
spec:
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
type: NodePort部署Grafana
yaml
# grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:10.0.0
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin123
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
type: NodePort部署ELK Stack
yaml
# elk-stack.yaml
# Elasticsearch
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: monitoring
spec:
serviceName: elasticsearch
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
ports:
- containerPort: 9200
env:
- name: discovery.type
value: single-node
- name: xpack.security.enabled
value: "false"
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: monitoring
spec:
selector:
app: elasticsearch
ports:
- port: 9200
---
# Kibana
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: docker.elastic.co/kibana/kibana:8.8.0
ports:
- containerPort: 5601
env:
- name: ELASTICSEARCH_HOSTS
value: '["http://elasticsearch:9200"]'
---
apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: monitoring
spec:
selector:
app: kibana
ports:
- port: 5601
targetPort: 5601
type: NodePort应用集成监控
yaml
# app-with-metrics.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
containers:
- name: app
image: myapp:v1.0
ports:
- containerPort: 8080案例总结
可观测性三大支柱:
| 支柱 | 工具 | 用途 |
|---|---|---|
| Metrics | Prometheus | 指标收集和告警 |
| Logs | ELK Stack | 日志收集和分析 |
| Tracing | Jaeger/Zipkin | 分布式链路追踪 |
实战案例5:K8s CI/CD流水线
场景介绍
实现从代码提交到K8s部署的全自动化流程。使用GitLab CI/Jenkins + K8s构建DevOps流水线。
学习目标
完成本案例后,你将能够:
- 配置GitLab CI/CD流水线
- 实现自动化构建和测试
- 实现自动化部署到K8s
- 配置环境隔离(dev/staging/prod)
GitLab CI配置
yaml
# .gitlab-ci.yml
stages:
- build
- test
- deploy
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
KUBECONFIG: /etc/deploy/config
# 构建阶段
build:
stage: build
image: docker:latest
services:
- docker:dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $DOCKER_IMAGE .
- docker push $DOCKER_IMAGE
only:
- main
- develop
# 测试阶段
test:
stage: test
image: node:18
script:
- npm ci
- npm run test
- npm run lint
only:
- main
- develop
# 部署到开发环境
deploy_dev:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context dev
- kubectl set image deployment/myapp myapp=$DOCKER_IMAGE -n dev
- kubectl rollout status deployment/myapp -n dev
environment:
name: development
url: http://dev.myapp.com
only:
- develop
# 部署到生产环境
deploy_prod:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context prod
- kubectl set image deployment/myapp myapp=$DOCKER_IMAGE -n prod
- kubectl rollout status deployment/myapp -n prod
environment:
name: production
url: http://myapp.com
only:
- main
when: manual # 需要手动触发Jenkins Pipeline
groovy
// Jenkinsfile
pipeline {
agent any
environment {
DOCKER_IMAGE = "myapp:${BUILD_NUMBER}"
KUBECONFIG = credentials('kubeconfig')
}
stages {
stage('Build') {
steps {
script {
docker.build(DOCKER_IMAGE)
}
}
}
stage('Test') {
steps {
sh 'npm test'
}
}
stage('Push') {
steps {
script {
docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
docker.image(DOCKER_IMAGE).push()
}
}
}
}
stage('Deploy to Dev') {
when {
branch 'develop'
}
steps {
sh """
kubectl config use-context dev
kubectl set image deployment/myapp myapp=${DOCKER_IMAGE} -n dev
kubectl rollout status deployment/myapp -n dev
"""
}
}
stage('Deploy to Prod') {
when {
branch 'main'
}
steps {
input message: 'Deploy to Production?', ok: 'Deploy'
sh """
kubectl config use-context prod
kubectl set image deployment/myapp myapp=${DOCKER_IMAGE} -n prod
kubectl rollout status deployment/myapp -n prod
"""
}
}
}
}案例总结
CI/CD流程:
代码提交 -> 自动构建 -> 自动测试 -> 自动部署
│ │ │ │
▼ ▼ ▼ ▼
GitLab Docker Unit Test Kubernetes实战案例6:生产环境故障排查
场景介绍
模拟生产环境常见故障,学习排查思路和解决方法。
学习目标
完成本案例后,你将能够:
- 诊断Pod启动失败问题
- 排查网络连接问题
- 解决资源不足问题
- 使用工具链快速定位故障
常见故障场景
场景1:Pod处于Pending状态
bash
# 症状
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# myapp 0/1 Pending 0 5m
# 排查步骤
kubectl describe pod myapp
# 查看Events部分,常见原因:
# - 0/3 nodes are available: 3 Insufficient cpu
# - 0/3 nodes are available: 3 Insufficient memory
# 解决方案
# 1. 增加节点
# 2. 减少资源请求
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}}}'场景2:ImagePullBackOff
bash
# 症状
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# myapp 0/1 ImagePullBackOff 0 2m
# 排查步骤
kubectl describe pod myapp
# 查看Events:
# Failed to pull image "myapp:v2.0": rpc error: image not found
# 解决方案
# 1. 检查镜像名称和标签
kubectl get deployment myapp -o yaml | grep image
# 2. 确认镜像仓库访问权限
# 3. 重新部署正确镜像
kubectl set image deployment/myapp myapp=myapp:v1.0场景3:CrashLoopBackOff
bash
# 症状
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# myapp 0/1 CrashLoopBackOff 5 10m
# 排查步骤
# 1. 查看日志
kubectl logs myapp --previous
# 2. 查看事件
kubectl describe pod myapp
# 常见原因和解决方案
# 原因1:应用启动失败
# 解决:修复应用代码或配置
# 原因2:健康检查失败
# 解决:调整探针配置
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","livenessProbe":{"initialDelaySeconds":60}}]}}}}'
# 原因3:资源限制
# 解决:增加资源限制
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","resources":{"limits":{"memory":"512Mi"}}}]}}}}'场景4:Service无法访问
bash
# 症状:Service ClusterIP无法访问
# 排查步骤
# 1. 检查Service是否存在
kubectl get svc myapp
# 2. 检查Endpoints
kubectl get endpoints myapp
# 如果没有endpoints,说明selector不匹配
# 3. 检查Pod标签
kubectl get pods --show-labels
# 确认Pod标签与Service selector匹配
# 4. 检查Pod状态
kubectl get pods -l app=myapp
# 5. 测试访问
kubectl run -it --rm debug --image=busybox --restart=Never -- sh
wget -O- http://myapp:80场景5:节点NotReady
bash
# 症状
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# node1 NotReady <none> 10d v1.28.0
# 排查步骤
kubectl describe node node1
# 查看Conditions部分
# 常见原因
# 1. kubelet问题
ssh node1
sudo systemctl status kubelet
sudo systemctl restart kubelet
# 2. 磁盘压力
df -h
# 清理磁盘空间
# 3. 内存压力
free -h
# 4. PID压力
ps aux | wc -l排查工具箱
bash
# 1. 快速诊断脚本
#!/bin/bash
# debug-pod.sh
POD=$1
echo "=== Pod Status ==="
kubectl get pod $POD
echo ""
echo "=== Pod Events ==="
kubectl describe pod $POD | grep -A 20 Events
echo ""
echo "=== Pod Logs ==="
kubectl logs $POD --tail=50
echo ""
echo "=== Previous Logs ==="
kubectl logs $POD --previous --tail=50 2>/dev/null || echo "No previous logs"案例总结
排查思路:
发现问题
│
▼
查看状态 (kubectl get)
│
▼
查看详情 (kubectl describe)
│
▼
查看日志 (kubectl logs)
│
▼
进入容器调试 (kubectl exec)
│
▼
解决问题课程总结
通过6个实战案例,你已经掌握了:
- Docker到K8s迁移 - 理解K8s的优势和迁移方法
- 微服务部署 - 掌握多服务协同部署的技能
- 发布策略 - 实现零停机的蓝绿/金丝雀发布
- 可观测性 - 构建完整的监控和日志体系
- CI/CD流水线 - 实现自动化部署
- 故障排查 - 快速定位和解决生产问题
下一步学习建议:
- 深入学习K8s网络(CNI、Service Mesh)
- 学习K8s安全(RBAC、NetworkPolicy、PodSecurity)
- 掌握K8s多集群管理
- 了解GitOps实践(ArgoCD、Flux)