跳转到内容

容器编排与Kubernetes实战

课程目标

通过本课程的学习,你将能够:

  • 将Docker应用平滑迁移到Kubernetes
  • 构建完整的微服务应用系统
  • 实现蓝绿发布和金丝雀发布
  • 构建可观测性体系(监控+日志)
  • 搭建K8s CI/CD流水线
  • 排查生产环境常见故障

前置要求:已完成K8S基础篇和进阶篇所有课程


课程概述

本课程包含6个实战案例,每个案例都有明确的目标和成果:

案例主题目标预计时间
案例1从Docker到K8s迁移掌握K8s基础部署技能2小时
案例2构建微服务应用掌握多服务协同部署3小时
案例3蓝绿/金丝雀发布掌握零停机发布策略2小时
案例4构建可观测性体系掌握监控和日志方案3小时
案例5K8s CI/CD流水线掌握自动化部署2小时
案例6生产故障排查掌握问题定位技能2小时

实战案例1:从Docker到K8s的平滑迁移

场景介绍

你有一个基于Docker部署的Web应用(Nginx + PHP + MySQL),现在需要迁移到Kubernetes集群。通过本案例,你将理解K8s相比Docker的优势,并掌握迁移的基本流程。

学习目标

完成本案例后,你将能够:

  • 将Docker Compose应用转换为K8S资源
  • 使用ConfigMap和Secret管理配置
  • 实现服务发现和负载均衡
  • 验证应用在K8S上的正常运行

前置准备

  • 可用的K8S集群(minikube/K3s/云服务器)
  • kubectl已配置
  • 原Docker应用代码

原Docker应用

docker-compose.yml

yaml
version: '3'
services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./html:/usr/share/nginx/html
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - php
  
  php:
    image: php:7.4-fpm
    volumes:
      - ./html:/var/www/html
  
  db:
    image: mysql:5.7
    environment:
      MYSQL_ROOT_PASSWORD: secret123
      MYSQL_DATABASE: myapp
    volumes:
      - db_data:/var/lib/mysql

volumes:
  db_data:

迁移步骤

步骤1:创建Namespace

yaml
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: docker-to-k8s
bash
kubectl apply -f namespace.yaml
kubectl config set-context --current --namespace=docker-to-k8s

步骤2:创建ConfigMap和Secret

yaml
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
data:
  nginx.conf: |
    server {
        listen 80;
        server_name localhost;
        root /usr/share/nginx/html;
        index index.php index.html;
        
        location ~ \.php$ {
            fastcgi_pass php:9000;
            fastcgi_index index.php;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            include fastcgi_params;
        }
    }
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: php-config
data:
  php.ini: |
    memory_limit = 256M
    upload_max_filesize = 64M
yaml
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
type: Opaque
stringData:
  root-password: secret123
  database: myapp

步骤3:部署MySQL(StatefulSet)

yaml
# mysql.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:5.7
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        - name: MYSQL_DATABASE
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: database
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
  clusterIP: None

步骤4:部署PHP-FPM

yaml
# php.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php
spec:
  replicas: 2
  selector:
    matchLabels:
      app: php
  template:
    metadata:
      labels:
        app: php
    spec:
      containers:
      - name: php
        image: php:7.4-fpm
        volumeMounts:
        - name: php-config
          mountPath: /usr/local/etc/php/php.ini
          subPath: php.ini
      volumes:
      - name: php-config
        configMap:
          name: php-config
---
apiVersion: v1
kind: Service
metadata:
  name: php
spec:
  selector:
    app: php
  ports:
  - port: 9000

步骤5:部署Nginx

yaml
# nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-config
          mountPath: /etc/nginx/conf.d/default.conf
          subPath: nginx.conf
      volumes:
      - name: nginx-config
        configMap:
          name: nginx-config
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
  type: NodePort

步骤6:验证部署

bash
# 部署所有资源
kubectl apply -f namespace.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f mysql.yaml
kubectl apply -f php.yaml
kubectl apply -f nginx.yaml

# 查看部署状态
kubectl get all

# 测试访问
kubectl get svc nginx
# 访问 http://<node-ip>:<node-port>

案例总结

迁移对比

特性Docker ComposeKubernetes
服务发现容器名解析DNS + Service
负载均衡内置
配置管理环境变量/挂载ConfigMap/Secret
存储Docker VolumePVC/PV
高可用手动自动故障转移
扩缩容手动HPA自动扩缩容

迁移收益

  • ✅ 服务自动发现和负载均衡
  • ✅ 配置与代码分离
  • ✅ 自动故障恢复
  • ✅ 水平自动扩缩容

实战案例2:构建完整的微服务应用

场景介绍

构建一个电商微服务系统,包含:前端(Vue.js)、后端API(Node.js)、订单服务(Go)、用户服务(Python)、MySQL数据库、Redis缓存。通过本案例,你将掌握多服务协同部署的技能。

系统架构

┌─────────────────────────────────────────────────────────────┐
│                         用户请求                             │
└──────────────────────┬──────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                    Ingress Controller                        │
│              (路由: / -> 前端, /api -> 后端)                  │
└──────────────────────┬──────────────────────────────────────┘

         ┌─────────────┼─────────────┐
         ▼             ▼             ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│   前端服务   │ │   后端API   │ │  订单服务   │
│  (Vue.js)   │ │  (Node.js)  │ │    (Go)     │
│  3 replicas │ │  3 replicas │ │  2 replicas │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
       │               │               │
       │               └───────┬───────┘
       │                       │
       │               ┌───────┴───────┐
       │               ▼               ▼
       │        ┌─────────────┐ ┌─────────────┐
       │        │  用户服务   │ │   MySQL     │
       │        │  (Python)   │ │  (Stateful) │
       │        │  2 replicas │ │   1 replica │
       │        └──────┬──────┘ └─────────────┘
       │               │
       │               ▼
       │        ┌─────────────┐
       └───────►│    Redis    │
                │   (缓存)     │
                └─────────────┘

学习目标

完成本案例后,你将能够:

  • 设计微服务的K8S部署架构
  • 使用不同的K8S资源类型(Deployment/StatefulSet)
  • 实现服务间的网络通信
  • 配置数据持久化和缓存

部署步骤

步骤1:部署基础设施(MySQL + Redis)

yaml
# infrastructure.yaml
# MySQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: rootpass
        - name: MYSQL_DATABASE
          value: ecommerce
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
  clusterIP: None
---
# Redis Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  selector:
    app: redis
  ports:
  - port: 6379

步骤2:部署后端服务

yaml
# backend-services.yaml
# 用户服务 (Python)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: ecommerce/user-service:v1
        ports:
        - containerPort: 5000
        env:
        - name: DB_HOST
          value: mysql
        - name: DB_NAME
          value: ecommerce
        - name: REDIS_HOST
          value: redis
---
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
  - port: 5000
---
# 订单服务 (Go)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
      - name: order-service
        image: ecommerce/order-service:v1
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          value: mysql
        - name: REDIS_HOST
          value: redis
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  selector:
    app: order-service
  ports:
  - port: 8080

步骤3:部署前端和API网关

yaml
# frontend.yaml
# 后端API (Node.js)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
      - name: api-gateway
        image: ecommerce/api-gateway:v1
        ports:
        - containerPort: 3000
        env:
        - name: USER_SERVICE_URL
          value: http://user-service:5000
        - name: ORDER_SERVICE_URL
          value: http://order-service:8080
---
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
spec:
  selector:
    app: api-gateway
  ports:
  - port: 3000
---
# 前端 (Vue.js)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: ecommerce/frontend:v1
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  selector:
    app: frontend
  ports:
  - port: 80

步骤4:配置Ingress路由

yaml
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ecommerce-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: ecommerce.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-gateway
            port:
              number: 3000
      - path: /api/users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 5000
      - path: /api/orders
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 8080

验证测试

bash
# 部署所有服务
kubectl apply -f infrastructure.yaml
kubectl apply -f backend-services.yaml
kubectl apply -f frontend.yaml
kubectl apply -f ingress.yaml

# 查看所有资源
kubectl get all

# 测试服务访问
kubectl get ingress
# 配置本地hosts: <node-ip> ecommerce.local
# 访问 http://ecommerce.local

案例总结

微服务部署要点

  1. 有状态服务使用StatefulSet(MySQL)
  2. 无状态服务使用Deployment(其他服务)
  3. 服务发现通过DNS名称访问
  4. 负载均衡自动实现
  5. 配置管理使用环境变量

实战案例3:蓝绿发布与金丝雀发布

场景介绍

在生产环境中,应用更新需要保证零停机。本案例将演示两种发布策略:蓝绿发布(快速切换)和金丝雀发布(灰度发布)。

学习目标

完成本案例后,你将能够:

  • 实现蓝绿发布策略
  • 实现金丝雀发布策略
  • 使用Ingress进行流量切分
  • 快速回滚到稳定版本

蓝绿发布

原理

初始状态:
┌─────────────────────────────────────┐
│           Ingress                   │
│         (100% 流量)                 │
└─────────────┬───────────────────────┘


┌─────────────────────────────────────┐
│     蓝色环境 (v1.0)                 │
│     Deployment: app-blue            │
│     Replicas: 3                     │
└─────────────────────────────────────┘

发布过程:
1. 部署绿色环境 (v2.0)
┌─────────────────────────────────────┐
│     蓝色环境 (v1.0)                 │
│     3 replicas                      │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│     绿色环境 (v2.0)                 │
│     3 replicas                      │
└─────────────────────────────────────┘

2. 切换流量
┌─────────────────────────────────────┐
│           Ingress                   │
│         (100% 流量)                 │
└─────────────┬───────────────────────┘


┌─────────────────────────────────────┐
│     绿色环境 (v2.0)                 │
└─────────────────────────────────────┘

实施步骤

yaml
# blue-green-deployment.yaml
# Service(不变,只切换selector)
apiVersion: v1
kind: Service
metadata:
  name: app-service
spec:
  selector:
    app: myapp
    version: blue  # 切换为 green 进行发布
  ports:
  - port: 80
    targetPort: 8080
---
# 蓝色环境 (v1.0)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:v1.0
        ports:
        - containerPort: 8080
---
# 绿色环境 (v2.0)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:v2.0
        ports:
        - containerPort: 8080
bash
# 蓝绿发布流程

# 1. 初始状态:蓝色环境运行v1.0
kubectl apply -f blue-green-deployment.yaml

# 2. 部署绿色环境(不切换流量)
# 创建绿色环境 Deployment YAML 文件
cat > app-green-deployment.yaml << 'EOF'
# 绿色环境 Deployment(蓝绿发布 - 新版本)
# 用途:部署新版本应用(v2.0)
# 特点:
#   - 与蓝色环境并行运行
#   - 不直接接收流量(Service 仍指向 blue)
#   - 验证通过后再切换流量
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
  labels:
    app: myapp
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:v2.0    # 新版本镜像
        ports:
        - containerPort: 8080
EOF

# 应用绿色环境
kubectl apply -f app-green-deployment.yaml

# 3. 验证绿色环境
kubectl get pods -l version=green
kubectl logs -l version=green

# 4. 切换流量到绿色环境
kubectl patch service app-service -p '{"spec":{"selector":{"version":"green"}}}'

# 5. 如果出问题,快速回滚
kubectl patch service app-service -p '{"spec":{"selector":{"version":"blue"}}}'

# 6. 确认稳定后,删除蓝色环境
kubectl delete deployment app-blue

金丝雀发布

原理

初始状态:
┌─────────────────────────────────────┐
│           Ingress                   │
│      (100% -> v1.0)                 │
└─────────────────────────────────────┘

金丝雀发布:
┌─────────────────────────────────────┐
│           Ingress                   │
│      (90% -> v1.0, 10% -> v2.0)     │
└─────────────┬─────────────┬─────────┘
              │             │
              ▼             ▼
┌─────────────────┐ ┌─────────────────┐
│   v1.0 (90%)    │ │   v2.0 (10%)    │
│   9 replicas    │ │   1 replica     │
└─────────────────┘ └─────────────────┘

逐步增加:
┌─────────────────────────────────────┐
│           Ingress                   │
│      (50% -> v1.0, 50% -> v2.0)     │
└─────────────┬─────────────┬─────────┘
              │             │
              ▼             ▼
┌─────────────────┐ ┌─────────────────┐
│   v1.0 (50%)    │ │   v2.0 (50%)    │
│   5 replicas    │ │   5 replicas    │
└─────────────────┘ └─────────────────┘

最终状态:
┌─────────────────────────────────────┐
│           Ingress                   │
│      (100% -> v2.0)                 │
└─────────────────────────────────────┘

实施步骤

yaml
# canary-deployment.yaml
# v1.0 稳定版本
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: myapp
      track: stable
  template:
    metadata:
      labels:
        app: myapp
        track: stable
    spec:
      containers:
      - name: app
        image: myapp:v1.0
        ports:
        - containerPort: 8080
---
# v2.0 金丝雀版本
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
spec:
  replicas: 1  # 初始1个副本,10%流量
  selector:
    matchLabels:
      app: myapp
      track: canary
  template:
    metadata:
      labels:
        app: myapp
        track: canary
    spec:
      containers:
      - name: app
        image: myapp:v2.0
        ports:
        - containerPort: 8080
---
# Service(同时包含两个版本)
apiVersion: v1
kind: Service
metadata:
  name: app-service
spec:
  selector:
    app: myapp  # 同时匹配 stable 和 canary
  ports:
  - port: 80
    targetPort: 8080
bash
# 金丝雀发布流程

# 1. 初始状态:10%流量到新版本
kubectl apply -f canary-deployment.yaml

# 2. 监控新版本指标(错误率、延迟等)
kubectl logs -l track=canary -f

# 3. 逐步增加新版本流量(25%)
kubectl scale deployment app-stable --replicas=7
kubectl scale deployment app-canary --replicas=3

# 4. 继续增加(50%)
kubectl scale deployment app-stable --replicas=5
kubectl scale deployment app-canary --replicas=5

# 5. 大部分流量(75%)
kubectl scale deployment app-stable --replicas=3
kubectl scale deployment app-canary --replicas=7

# 6. 全部切换(100%)
kubectl scale deployment app-stable --replicas=0
kubectl scale deployment app-canary --replicas=10

# 7. 更新stable标签,清理
cat > app-stable.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-stable
spec:
  replicas: 10
  selector:
    matchLabels:
      app: myapp
      track: stable
  template:
    metadata:
      labels:
        app: myapp
        track: stable
    spec:
      containers:
      - name: app
        image: myapp:v2.0  # 更新为v2.0
EOF
kubectl apply -f app-stable.yaml
kubectl delete deployment app-canary

案例总结

发布策略优点缺点适用场景
蓝绿发布快速切换、即时回滚资源占用双倍关键业务、需要快速回滚
金丝雀发布风险低、渐进式发布时间长大规模用户、需要验证

实战案例4:构建可观测性体系

场景介绍

生产环境需要全面的可观测性,包括指标监控(Metrics)、日志收集(Logs)、链路追踪(Tracing)。本案例将部署Prometheus + Grafana + ELK Stack。

学习目标

完成本案例后,你将能够:

  • 部署Prometheus进行指标收集
  • 部署Grafana进行可视化展示
  • 部署ELK Stack进行日志收集
  • 配置应用暴露监控指标

系统架构

┌─────────────────────────────────────────────────────────────┐
│                      Kubernetes Cluster                      │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ Prometheus  │  │   Grafana   │  │   Applications      │  │
│  │  (指标收集)  │  │  (可视化)   │  │  (暴露/metrics)     │  │
│  └──────┬──────┘  └──────┬──────┘  └──────────┬──────────┘  │
│         │                │                     │            │
│         └────────────────┴─────────────────────┘            │
│                          │                                   │
│                          ▼                                   │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    ELK Stack                           │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │  │
│  │  │ Filebeat    │  │ Logstash    │  │ Elasticsearch│   │  │
│  │  │ (日志收集)   │  │ (日志处理)   │  │ (日志存储)   │   │  │
│  │  └─────────────┘  └─────────────┘  └──────┬──────┘   │  │
│  │                                           │          │  │
│  │                                    ┌──────┴──────┐   │  │
│  │                                    │    Kibana   │   │  │
│  │                                    │  (日志查询)  │   │  │
│  │                                    └─────────────┘   │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

部署Prometheus

yaml
# prometheus.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    
    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:v2.45.0
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/prometheus
        - name: prometheus-data
          mountPath: /prometheus
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config
      - name: prometheus-data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
spec:
  selector:
    app: prometheus
  ports:
  - port: 9090
    targetPort: 9090
  type: NodePort

部署Grafana

yaml
# grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:10.0.0
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin123
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
spec:
  selector:
    app: grafana
  ports:
  - port: 3000
    targetPort: 3000
  type: NodePort

部署ELK Stack

yaml
# elk-stack.yaml
# Elasticsearch
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: monitoring
spec:
  serviceName: elasticsearch
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
        ports:
        - containerPort: 9200
        env:
        - name: discovery.type
          value: single-node
        - name: xpack.security.enabled
          value: "false"
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: monitoring
spec:
  selector:
    app: elasticsearch
  ports:
  - port: 9200
---
# Kibana
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:8.8.0
        ports:
        - containerPort: 5601
        env:
        - name: ELASTICSEARCH_HOSTS
          value: '["http://elasticsearch:9200"]'
---
apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: monitoring
spec:
  selector:
    app: kibana
  ports:
  - port: 5601
    targetPort: 5601
  type: NodePort

应用集成监控

yaml
# app-with-metrics.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      containers:
      - name: app
        image: myapp:v1.0
        ports:
        - containerPort: 8080

案例总结

可观测性三大支柱

支柱工具用途
MetricsPrometheus指标收集和告警
LogsELK Stack日志收集和分析
TracingJaeger/Zipkin分布式链路追踪

实战案例5:K8s CI/CD流水线

场景介绍

实现从代码提交到K8s部署的全自动化流程。使用GitLab CI/Jenkins + K8s构建DevOps流水线。

学习目标

完成本案例后,你将能够:

  • 配置GitLab CI/CD流水线
  • 实现自动化构建和测试
  • 实现自动化部署到K8s
  • 配置环境隔离(dev/staging/prod)

GitLab CI配置

yaml
# .gitlab-ci.yml
stages:
  - build
  - test
  - deploy

variables:
  DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  KUBECONFIG: /etc/deploy/config

# 构建阶段
build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $DOCKER_IMAGE .
    - docker push $DOCKER_IMAGE
  only:
    - main
    - develop

# 测试阶段
test:
  stage: test
  image: node:18
  script:
    - npm ci
    - npm run test
    - npm run lint
  only:
    - main
    - develop

# 部署到开发环境
deploy_dev:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context dev
    - kubectl set image deployment/myapp myapp=$DOCKER_IMAGE -n dev
    - kubectl rollout status deployment/myapp -n dev
  environment:
    name: development
    url: http://dev.myapp.com
  only:
    - develop

# 部署到生产环境
deploy_prod:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context prod
    - kubectl set image deployment/myapp myapp=$DOCKER_IMAGE -n prod
    - kubectl rollout status deployment/myapp -n prod
  environment:
    name: production
    url: http://myapp.com
  only:
    - main
  when: manual  # 需要手动触发

Jenkins Pipeline

groovy
// Jenkinsfile
pipeline {
    agent any
    
    environment {
        DOCKER_IMAGE = "myapp:${BUILD_NUMBER}"
        KUBECONFIG = credentials('kubeconfig')
    }
    
    stages {
        stage('Build') {
            steps {
                script {
                    docker.build(DOCKER_IMAGE)
                }
            }
        }
        
        stage('Test') {
            steps {
                sh 'npm test'
            }
        }
        
        stage('Push') {
            steps {
                script {
                    docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
                        docker.image(DOCKER_IMAGE).push()
                    }
                }
            }
        }
        
        stage('Deploy to Dev') {
            when {
                branch 'develop'
            }
            steps {
                sh """
                    kubectl config use-context dev
                    kubectl set image deployment/myapp myapp=${DOCKER_IMAGE} -n dev
                    kubectl rollout status deployment/myapp -n dev
                """
            }
        }
        
        stage('Deploy to Prod') {
            when {
                branch 'main'
            }
            steps {
                input message: 'Deploy to Production?', ok: 'Deploy'
                sh """
                    kubectl config use-context prod
                    kubectl set image deployment/myapp myapp=${DOCKER_IMAGE} -n prod
                    kubectl rollout status deployment/myapp -n prod
                """
            }
        }
    }
}

案例总结

CI/CD流程

代码提交 -> 自动构建 -> 自动测试 -> 自动部署
    │           │           │           │
    ▼           ▼           ▼           ▼
 GitLab      Docker      Unit Test   Kubernetes

实战案例6:生产环境故障排查

场景介绍

模拟生产环境常见故障,学习排查思路和解决方法。

学习目标

完成本案例后,你将能够:

  • 诊断Pod启动失败问题
  • 排查网络连接问题
  • 解决资源不足问题
  • 使用工具链快速定位故障

常见故障场景

场景1:Pod处于Pending状态

bash
# 症状
kubectl get pods
# NAME    READY   STATUS    RESTARTS   AGE
# myapp   0/1     Pending   0          5m

# 排查步骤
kubectl describe pod myapp
# 查看Events部分,常见原因:
# - 0/3 nodes are available: 3 Insufficient cpu
# - 0/3 nodes are available: 3 Insufficient memory

# 解决方案
# 1. 增加节点
# 2. 减少资源请求
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}}}'

场景2:ImagePullBackOff

bash
# 症状
kubectl get pods
# NAME    READY   STATUS             RESTARTS   AGE
# myapp   0/1     ImagePullBackOff   0          2m

# 排查步骤
kubectl describe pod myapp
# 查看Events:
# Failed to pull image "myapp:v2.0": rpc error: image not found

# 解决方案
# 1. 检查镜像名称和标签
kubectl get deployment myapp -o yaml | grep image
# 2. 确认镜像仓库访问权限
# 3. 重新部署正确镜像
kubectl set image deployment/myapp myapp=myapp:v1.0

场景3:CrashLoopBackOff

bash
# 症状
kubectl get pods
# NAME    READY   STATUS             RESTARTS   AGE
# myapp   0/1     CrashLoopBackOff   5          10m

# 排查步骤
# 1. 查看日志
kubectl logs myapp --previous
# 2. 查看事件
kubectl describe pod myapp

# 常见原因和解决方案
# 原因1:应用启动失败
# 解决:修复应用代码或配置

# 原因2:健康检查失败
# 解决:调整探针配置
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","livenessProbe":{"initialDelaySeconds":60}}]}}}}'

# 原因3:资源限制
# 解决:增加资源限制
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","resources":{"limits":{"memory":"512Mi"}}}]}}}}'

场景4:Service无法访问

bash
# 症状:Service ClusterIP无法访问

# 排查步骤
# 1. 检查Service是否存在
kubectl get svc myapp

# 2. 检查Endpoints
kubectl get endpoints myapp
# 如果没有endpoints,说明selector不匹配

# 3. 检查Pod标签
kubectl get pods --show-labels
# 确认Pod标签与Service selector匹配

# 4. 检查Pod状态
kubectl get pods -l app=myapp

# 5. 测试访问
kubectl run -it --rm debug --image=busybox --restart=Never -- sh
wget -O- http://myapp:80

场景5:节点NotReady

bash
# 症状
kubectl get nodes
# NAME     STATUS     ROLES    AGE   VERSION
# node1    NotReady   <none>   10d   v1.28.0

# 排查步骤
kubectl describe node node1
# 查看Conditions部分

# 常见原因
# 1. kubelet问题
ssh node1
sudo systemctl status kubelet
sudo systemctl restart kubelet

# 2. 磁盘压力
df -h
# 清理磁盘空间

# 3. 内存压力
free -h

# 4. PID压力
ps aux | wc -l

排查工具箱

bash
# 1. 快速诊断脚本
#!/bin/bash
# debug-pod.sh
POD=$1
echo "=== Pod Status ==="
kubectl get pod $POD
echo ""
echo "=== Pod Events ==="
kubectl describe pod $POD | grep -A 20 Events
echo ""
echo "=== Pod Logs ==="
kubectl logs $POD --tail=50
echo ""
echo "=== Previous Logs ==="
kubectl logs $POD --previous --tail=50 2>/dev/null || echo "No previous logs"

案例总结

排查思路

发现问题


查看状态 (kubectl get)


查看详情 (kubectl describe)


查看日志 (kubectl logs)


进入容器调试 (kubectl exec)


解决问题

课程总结

通过6个实战案例,你已经掌握了:

  1. Docker到K8s迁移 - 理解K8s的优势和迁移方法
  2. 微服务部署 - 掌握多服务协同部署的技能
  3. 发布策略 - 实现零停机的蓝绿/金丝雀发布
  4. 可观测性 - 构建完整的监控和日志体系
  5. CI/CD流水线 - 实现自动化部署
  6. 故障排查 - 快速定位和解决生产问题

下一步学习建议

  • 深入学习K8s网络(CNI、Service Mesh)
  • 学习K8s安全(RBAC、NetworkPolicy、PodSecurity)
  • 掌握K8s多集群管理
  • 了解GitOps实践(ArgoCD、Flux)

评论区

专业的Linux技术学习平台,从入门到精通的完整学习路径