前言:
- 学习后可掌握的技能:
1、掌握kubernetes全栈技术体系
2、掌握微服务Istio
3、掌握可视化监控系统Prometheus+alertmanager+Grafana
4、掌握智能化日志采集平台EFK/ELK
5、掌握Jenkins+gitlab+harbor和k8s构建企业级DevOps平台
6、掌握适配边缘计算机的轻量级的k8s-k3s入门与实战
7、掌握Rancher管理k8s集群-部署k8s应用 - k8s功能:
批处理、弹性伸缩、故障自我恢复、存储编排、机密和配置管理、自动部署和回滚、服务发现和负载均衡
HDFS包含Datanode、Namenode、Journalnode三个组件 - 认证考试:
CKA(运维者)、CKAD(开发者:go语言)、CKS - 简单介绍:
master node(控制节点组件):
apiserver
(统一接口)、scheduler
(调度器)、controller-manager
(控制器管理器/管理中心)、Etcd
(基础数据库/集群、网络信息:高可用)、calico
、containerd
worker node(工作节点组件):
kubelet
(上报/执行)、kube-proxy
(负载均衡和网络代理)、calico
(网络插件)、coredns
(DNS服务/域名解析)、containerd
、pod
(最小调度单元:封装容器) - 核心命令:
kubectl
- 核心资源:所有的资源都可以使用一个yaml配置文件来创建
label
标签 用来做分组管理
Deployment
管理Replicaset和Pad的副本控制器,对pod扩容、缩容、滚动更新、回滚、维持pod数量
service
四层代理,定义一个网络对象资源,查找对应关联标签的所有pod,防止podIP变化而影响服务
一、部署节点
通过PVE配置两台master控制节点、两台worker工作节点和两台负载均衡节点以及一台获取镜像的仓库
节点类型 | 数量 | CPU | 内存 | 磁盘 | IP | 关键服务/组件 |
---|---|---|---|---|---|---|
master节点 | 2 | 4核 | 4G | 100G | 10.31.3.211-212/16 | HAProxy,kube-apiserver, etcd, kube-scheduler |
worker节点 | 2 | 4核 | 4G | 100G | 10.31.3.221-222/16 | kubelet, kube-proxy, contained |
负载均衡节点 | 2 | 2核 | 2G | 40G | 10.31.3.231-232/16;10.31.3.240/16(VIP) | HAProxy,Keepalived,Nginx |
Harbor镜像库 | 1 | 4核 | 4G | 200G | 10.31.3.250/16 | Docker,Harbor |
二、环境准备
(所有节点执行,Harbor是单独的配置,如需请点击这里)
2.1 安装更新基础工具
# 1. 配置阿里云Ubuntu镜像源
sudo sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list
sudo sed -i 's/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list
# 2. 安装基础工具
sudo apt update && sudo apt upgrade -y
sudo apt install -y nano gnupg2 apt-transport-https ca-certificates software-properties-common lsb-release chrony containerd
2.2 主机名与解析设置
# 1. 设置主机名:每台不同,主机名必须不同且符合DNS命名规则(小写字母、数字、连字符)
sudo hostnamectl set-hostname master1 # 在第一个master执行
sudo hostnamectl set-hostname master2 # 在第二个master执行
sudo hostnamectl set-hostname worker1 # 在第一个worker执行
sudo hostnamectl set-hostname worker2 # 在第二个worker执行
sudo hostnamectl set-hostname slb1 # 在第一个负载均衡节点执行
sudo hostnamectl set-hostname slb2 # 在第二个负载均衡节点执行
# 2. 添加主机解析:所有节点相同,可通过ip addr命令查看IP,替换为实际IP及主机名
cat <<EOF | sudo tee -a /etc/hosts
10.31.3.211 master1
10.31.3.212 master2
10.31.3.221 worker1
10.31.3.222 worker2
10.31.3.231 slb1
10.31.3.232 slb2
10.31.3.240 vip
10.31.3.250 harbor.mysen.pro
EOF
# 3. 测试连通性:ping master2(需能解析到正确IP)
2.3 固定IP地址
# 编辑固定IP
nano /etc/netplan/50-cloud-init.yaml
network:
version: 2
ethernets:
enp6s18:
dhcp4: no
addresses: [10.31.3.211/16] # 替换为实际IP及掩码
routes:
- to: default
via: 10.31.0.1 # 替换为实际默认网关
nameservers:
addresses: [10.31.0.1, 223.5.5.5] # 设置DNS
# 应用网络修改
netplan apply
2.4 基础环境配置
# 1. 禁用交换分区(必须):交换分区会降低kubelet性能
sudo swapoff -a
sudo sed -i 's/^.*swap.*$/#&/' /etc/fstab
# swapoff -a临时禁用,free -h 查看交换分区,第二行永久禁用
# 2. 时间同步(关键):时间不同步会导致证书验证失败(误差需<1秒)
sudo timedatectl set-timezone Asia/Shanghai
sudo systemctl enable chrony && sudo systemctl start chrony
# 检查同步状态使用sudo chronyc sources
# 3. 加载内核模块
sudo tee /etc/modules-load.d/k8s.conf <<EOF
br_netfilter
overlay
EOF
sudo modprobe br_netfilter overlay
# 验证:输入lsmod | grep br_netfilter 命令应有输出
# 4. 网络参数优化,net.ipv4.ip_forward允许Pod跨节点通信
sudo tee /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
kernel.panic = 10
kernel.panic_on_oops = 1
EOF
sudo sysctl --system
2.6 关闭ufw防火墙
sudo ufw status #查看防火墙状态,inactive → 防火墙已关闭;active → 防火墙已启用
sudo ufw disable #显示 Firewall stopped and disabled on system startup,表示已关闭且禁止开机自启。
sudo systemctl disable ufw # 禁止防火墙开机自启
2.7 容器运行时安装
# 1. 生成默认配置
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# 2. 关键修改:使用systemd cgroup驱动和国内镜像,SystemdCgroup必须设置true,否则kubelet无法启动
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo sed -i 's|registry.k8s.io/pause:3.8|registry.aliyuncs.com/google_containers/pause:3.10|g' /etc/containerd/config.toml
# 3. 配置私有仓库在[plugins."io.containerd.grpc.v1.cri".registry.mirrors]下方,注意缩进2格
sudo nano /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.mysen.pro"]
endpoint = ["http://harbor.mysen.pro"]
[plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.mysen.pro".tls]
insecure_skip_verify = true
# 4. 重启服务
sudo systemctl restart containerd && sudo systemctl enable containerd
# 验证:sudo ctr version
2.8 kubernetes组件安装
# 1. 下载签名密钥并添加存储库
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/k8s.gpg
echo 'deb [signed-by=/etc/apt/keyrings/k8s.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' | tee /etc/apt/sources.list.d/k8s.list
# 2. 更新软件包列表,查看可用版本:apt-cache madison kubeadm
sudo apt update && sudo apt upgrade -y
# 安装指定版本,验证安装:kubeadm version
sudo apt install -y kubelet=1.33.3-1.1 kubeadm=1.33.3-1.1 kubectl=1.33.3-1.1
# 锁定版本防止意外升级
sudo apt-mark hold kubelet kubeadm kubectl
# 3. 确保kubelet使用systemd作为cgroup驱动
cat <<EOF | sudo tee /etc/default/kubelet
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.10
EOF
# 4. 设置开机自启动,systemctl status kubelet查看状态为activating(auto-restart),未完成所有部署是正常的
sudo systemctl daemon-reload && sudo systemctl enable kubelet && sudo systemctl restart kubelet
kubeadm:官方安装kubernetes工具,常用命令:kubeadm init,kubeadm join
kubelet:是启动执行删除pod的服务
kubectl:操作kubernetes资源,创建、删除、修改;需要加载config配置文件
三、集群部署
3.1 首个Master节点初始化
# 1. 在master1上执行,
sudo kubeadm init \
--control-plane-endpoint="10.31.3.240:6443" \
--image-repository=registry.aliyuncs.com/google_containers \
--kubernetes-version=v1.33.3 \
--pod-network-cidr=10.244.0.0/16 \
--upload-certs \
--certificate-key=$(kubeadm certs certificate-key)
# 若部署有误可运行sudo kubeadm reset,手动清理/root/.kube /etc/cni/net.d /etc/kubernetes三个,重启kubelet、containerd
# 保存加入集群的命令(后面会用到)
# 类似如下命令,请保存实际输出的命令:
# kubeadm join 10.31.3.240:6443 --token xxxx --discovery-token-ca-cert-hash sha256:xxxx --control-plane --certificate-key xxxx
# 2. 初始化成功后配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
3.2 其他Master节点接入
# 1. 在master2上执行(使用第一个Master生成的证书密钥)
sudo kubeadm join 10.31.3.240:6443 --token xxxx --discovery-token-ca-cert-hash sha256:xxxx --control-plane --certificate-key xxxx
# 2. 初始化成功后配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 3. 验证Master节点状态
kubectl get nodes
kubectl get pods -n kube-system
3.3 Worker节点接入
# 1. 或者在每个Worker节点手动执行(替换实际token和hash,不包含--control-plane参数))
sudo kubeadm join 10.31.3.240:6443 --token xxxx --discovery-token-ca-cert-hash sha256:xxxx
# 2. 加入后检查
kubectl get nodes -o wide
kubectl get pods -n kube-system -o wide
# 预期输出为NoReady,因为calico网络插件还没部署
NAME STATUS ROLES AGE VERSION
master1 NoReady control-plane 10m v1.33.3
master2 NoReady control-plane 2m v1.33.3
worker1 NoReady <none> 30s v1.33.3
worker2 NoReady <none> 25s v1.33.3
3.4 负载均衡配置
- 两台负载均衡节点
# 1. 两台负载均衡节点安装HAProxy和keepalived
sudo apt install -y haproxy keepalived
# 2. 两台都配置HAProxy(/etc/haproxy/haproxy.cfg),maxconn设置最大连接数防止过载,balance roundrobin使用轮询算法分配请求
cat <<EOF | sudo tee /etc/haproxy/haproxy.cfg
global
log /dev/log local0
maxconn 4096
log /dev/log local1 notice
tune.ssl.default-dh-param 2048
stats socket /var/run/haproxy/admin.sock mode 660 level admin
daemon
defaults
log global
mode tcp
timeout connect 5000
timeout client 50000
timeout server 50000
frontend k8s-api
bind *:6443
default_backend k8s-api
backend k8s-api
option tcp-check
balance roundrobin
server master1 10.31.3.211:6443 check inter 2s fastinter 1s downinter 2s rise 3 fall 2
server master2 10.31.3.212:6443 check inter 2s fastinter 1s downinter 2s rise 3 fall 2
EOF
# 3. slb1配置Keepalived(/etc/keepalived/keepalived.conf),enp6s18(3处)修改为实际网卡名(ip addr查看)
cat <<EOF | sudo tee /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface enp6s18
state MASTER
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass 42
}
virtual_ipaddress {
10.31.3.240/16 dev enp6s18 label enp6s18:1
}
track_script {
chk_haproxy
}
}
EOF
# 4. slb2配置Keepalived,enp6s18(3处)修改为实际网卡名(ip addr查看),仅state和priority不同,virtual_ipaddress虚拟IP地址
cat <<EOF | sudo tee /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface enp6s18
state BACKUP
virtual_router_id 51
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 42
}
virtual_ipaddress {
10.31.3.240/16 dev enp6s18 label enp6s18:1 # 虚拟IP地址
}
track_script {
chk_haproxy
}
}
EOF
# 5. 重启HAProxy和Keepalived
sudo systemctl restart haproxy keepalived && sudo systemctl enable haproxy keepalived
- 所有Master节点
# 1.安装HAProxy
sudo apt install -y haproxy
# 2.创建专用配置文件,maxconn小于负载均衡节点,bind 127.0.0.1:16443仅监听本地,balance leastconn更适合备用路径
sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy-local.cfg
cat <<EOF | sudo tee /etc/haproxy/haproxy-local.cfg
global
log /dev/log local0
maxconn 3072
defaults
mode tcp
timeout connect 5000
timeout client 50000
timeout server 50000
frontend k8s-api-local
bind 127.0.0.1:16443
default_backend k8s-api
backend k8s-api
balance leastconn
server master1 10.31.3.211:6443 check inter 3s fall 3 rise 2
server master2 10.31.3.212:6443 check inter 3s fall 3 rise 2
EOF
# 3. 创建专用systemd服务
cat <<EOF | sudo tee /etc/systemd/system/haproxy-local.service
[Unit]
Description=Local HAProxy for Kubernetes API
After=network.target
[Service]
ExecStart=/usr/sbin/haproxy -f /etc/haproxy/haproxy-local.cfg -p /var/run/haproxy-local.pid
ExecReload=/bin/kill -USR2 \$MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 4. 启动并启用服务
sudo systemctl daemon-reload && sudo systemctl enable --now haproxy-local
- 所有worker节点
# 在所有Worker节点的/etc/default/kubelet中添加
KUBELET_EXTRA_ARGS="--kubeconfig=/etc/kubernetes/kubelet.conf \
--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--config=/var/lib/kubelet/config.yaml \
--network-plugin=cni \
--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.10 \
--api-servers=https://10.31.3.240:6443,https://10.31.3.211:16443,https://10.31.3.212:16443 \
--api-server-timeout=30s \
--max-connection-per-host=10"
# --api-servers:指定多个API Server端点,VIP优先,本地备用路径次之
# --api-server-timeout:设置30秒超时,避免长时间等待不可用端点
# --max-connection-per-host:限制到每个API Server的连接数
端口放开要求:
- Masters: 6443 (API), 2379-2380 (etcd), 10250 (kubelet)
- Workers: 10250 (kubelet), 30000-32767 (NodePort)
3.5 安装etcd
# 通过二进制文件安装
ETCD_VER=v3.6.4
wget https://github.com/etcd-io/etcd/releases/download/v3.6.4/etcd-v3.6.4-linux-amd64.tar.gz
tar xvf etcd-v3.6.4-linux-amd64.tar.gz
sudo mv etcd-v3.6.4-linux-amd64/etcd* /usr/local/bin/
# 手动创建服务单元文件
nano /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
Type=notify
ExecStart=/usr/local/bin/etcd \
--name=etcd1 \
--data-dir=/var/lib/etcd \
--listen-client-urls=https://10.31.3.211:2379 \
--advertise-client-urls=https://10.31.3.211:2379 \
--listen-peer-urls=https://10.31.3.211:2380 \
--client-cert-auth=true \
--initial-advertise-peer-urls=https://10.31.3.211:2380 \
--initial-cluster="etcd1=https://10.31.3.211:2380,etcd2=https://10.31.3.212:2380" \
--initial-cluster-token=etcd-cluster-1 \
--initial-cluster-state=new \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--key-file=/etc/kubernetes/pki/etcd/server.key \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--auto-compaction-retention=1 \
--heartbeat-interval=100 \
--election-timeout=500 \
--snapshot-count=10000 \
--quota-backend-bytes=8589934592
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
# 创建peer证书
openssl genrsa -out peer.key 2048
openssl req -new -key peer.key -out peer.csr -subj "/CN=etcd-peer" -config <(
cat <<-EOF
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = @alt_names
[alt_names]
IP.1 = 10.31.3.211
IP.2 = 10.31.3.212
IP.3 = 10.31.3.240
DNS.1 = etcd1
DNS.2 = etcd2
DNS.3 = localhost
EOF
)
# 使用CA签发peer证书
openssl x509 -req -in peer.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out peer.crt -days 3650 -extensions v3_req -extfile <(
cat <<-EOF
[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = @alt_names
[alt_names]
IP.1 = 10.31.3.211
IP.2 = 10.31.3.212
IP.3 = 10.31.3.240
DNS.1 = etcd1
DNS.2 = etcd2
DNS.3 = localhost
EOF
)
# 确保数据目录存在且权限正确
sudo mkdir -p /var/lib/etcd
sudo chown -R etcd:etcd /var/lib/etcd
sudo chmod 700 /var/lib/etcd
sudo systemctl daemon-reload && sudo systemctl enable etcd && sudo systemctl restart etcd
3.6 安装kube-apiserver
# 1. 下载Kubernetes 1.33.3版本组件,仅master节点
wget https://dl.k8s.io/v1.33.3/kubernetes-server-linux-amd64.tar.gz
tar -xzvf kubernetes-server-linux-amd64.tar.gz
cd kubernetes/server/bin
# 2. 移动kube-apiserver到系统路径
sudo mv kube-apiserver /usr/local/bin/
sudo chmod +x /usr/local/bin/kube-apiserver
# 3. 创建证书目录并进入且CA证书生成,先在第一个master节点上操作
sudo mkdir -p /etc/kubernetes/pki/etcd && cd /etc/kubernetes/pki
openssl genrsa -out ca.key 2048
openssl req -new -x509 -key ca.key -out ca.crt -days 3650 -subj "/CN=kubernetes-ca"
# 生成API Server证书私钥
openssl genrsa -out apiserver.key 2048
# 创建CSR配置文件
cat > apiserver-csr.conf <<EOF
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
IP.1 = 10.96.0.1
IP.2 = 10.31.3.240
IP.3 = 10.31.3.211
IP.4 = 10.31.3.212
IP.5 = 10.31.3.250
IP.6 = 10.31.3.221
IP.7 = 10.31.3.222
EOF
# 生成CSR
openssl req -new -key apiserver.key -out apiserver.csr -subj "/CN=kube-apiserver" -config apiserver-csr.conf
# 使用CA签名
openssl x509 -req -in apiserver.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out apiserver.crt -days 3650 -extensions v3_req -extfile apiserver-csr.conf
# 生成API Server访问etcd的客户端证书
openssl genrsa -out apiserver-etcd-client.key 2048
openssl req -new -key apiserver-etcd-client.key -out apiserver-etcd-client.csr -subj "/CN=kube-apiserver-etcd-client"
openssl x509 -req -in apiserver-etcd-client.csr -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -out apiserver-etcd-client.crt -days 3650
# 生成front-proxy CA证书
openssl genrsa -out front-proxy-ca.key 2048
openssl req -x509 -new -nodes -key front-proxy-ca.key -days 3650 -out front-proxy-ca.crt -subj "/CN=front-proxy-ca"
# 生成front-proxy客户端证书
openssl genrsa -out front-proxy-client.key 2048
openssl req -new -key front-proxy-client.key -out front-proxy-client.csr -subj "/CN=front-proxy-client"
openssl x509 -req -in front-proxy-client.csr -CA front-proxy-ca.crt -CAkey front-proxy-ca.key -CAcreateserial -out front-proxy-client.crt -days 3650
# 生成Service Account密钥,用于签署服务账户令牌
openssl genrsa -out sa.key 2048
openssl rsa -in sa.key -pubout -out sa.pub
# 4. 生成etcd CA证书,进入etcd目录执行
cd etcd/
openssl genrsa -out ca.key 2048
# 创建配置文件
nano /etc/kubernetes/pki/etcd/openssl.cnf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
default_bits = 2048
default_md = sha256
[req_distinguished_name]
countryName = Country Name (2 letter code)
countryName_default = CN
stateOrProvinceName = State or Province Name (full name)
stateOrProvinceName_default = Sichuan
localityName = Locality Name (eg, city)
localityName_default = Chengdu
organizationName = Organization Name (eg, company)
organizationName_default = Kubernetes
commonName = Common Name (e.g. server FQDN or YOUR name)
commonName_default = etcd.cluster.local
[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
extendedKeyUsage = serverAuth, clientAuth
[alt_names]
IP.1 = 10.31.3.211
IP.2 = 10.31.3.212
IP.3 = 127.0.0.1
IP.4 = ::1
IP.5 = 10.31.3.240
DNS.1 = etcd.cluster.local
DNS.2 = localhost
openssl genrsa -out server.key 2048
openssl req -new -key server.key -out server.csr -config openssl.cnf
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt -subj "/CN=etcd-ca"
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 365 -extensions v3_req -extfile openssl.cnf
# 更新kubeconfig文件
kubectl config set-cluster kubernetes --certificate-authority=/etc/kubernetes/pki/ca.crt --server=https://10.31.3.240:6443
# 同步证书到所有master节点
rsync -avz /etc/kubernetes/pki/ root@master2:/etc/kubernetes/pki/
sudo cp /etc/kubernetes/pki/ca.crt /usr/local/share/ca-certificates/kubernetes-ca.crt
sudo update-ca-certificates
# 在所有etcd节点上执行,重启etcd服务
sudo systemctl restart etcd
# 5. 创建systemd服务单元文件,所有master节点,仅修改--advertise-address
cat <<EOF | sudo tee /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://kubernetes.io
After=network.target
[Service]
ExecStart=/usr/local/bin/kube-apiserver \\
--advertise-address=10.31.3.240 \\
--allow-privileged=true \\
--api-audiences=kubernetes \\
--audit-log-path=/var/log/audit.log \\
--authorization-mode=Node,RBAC \\
--client-ca-file=/etc/kubernetes/pki/ca.crt \\
--enable-admission-plugins=NodeRestriction \\
--enable-bootstrap-token-auth=true \\
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt \\
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt \\
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key \\
--etcd-servers=https://10.31.3.211:2379,https://10.31.3.212:2379 \\
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt \\
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key \\
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt \\
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key \\
--requestheader-allowed-names=front-proxy-client \\
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt \\
--requestheader-extra-headers-prefix=X-Remote-Extra- \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--secure-port=6443 \\
--service-account-issuer=https://kubernetes.default.svc.cluster.local \\
--service-account-key-file=/etc/kubernetes/pki/sa.pub \\
--service-account-signing-key-file=/etc/kubernetes/pki/sa.key \\
--service-cluster-ip-range=10.96.0.0/12 \\
--service-node-port-range=30000-32767 \\
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt \\
--enable-aggregator-routing=true \\
--v=2 \\
--endpoint-reconciler-type=lease \\
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
Restart=on-failure
RestartSec=10s
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
EOF
# 6. 启动并启用服务
sudo systemctl daemon-reload && sudo systemctl enable kube-apiserver && sudo systemctl restart kube-apiserver
# 查看状态和失败日志
systemctl status kube-apiserver
journalctl -u kube-apiserver -n 50 --no-pager
3.5 网络插件部署(Calico)
# 1. 下载部署文件,如果下载不下来找台能访问的复制进去
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/calico.yaml
# 2. nano calico.yaml进入文件取消以下两行注释并注意缩进
# - name: CALICO_IPV4POOL_CIDR
# value: "192.168.0.0/16"
# 3. 修改为私有仓库镜像地址以及修改pod地址池
sed -i 's|docker.io/calico/|harbor.mysen.pro/library/calico-|g' calico.yaml
sed -i 's|192.168.0.0/16|10.244.0.0/16|g' calico.yaml
# 修改calico.yaml添加以下配置
# 启用IPIP模式提高跨子网性能
- name: CALICO_IPV4POOL_IPIP
value: "Always"
# 启用BGP路由(大规模集群)
- name: CALICO_IPV4POOL_IPIP
value: "Never"
- name: IP_AUTODETECTION_METHOD
value: "interface=enp.*"
# 添加网络策略示例
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: default-deny
spec:
selector: all()
types:
- Ingress
- Egress
# 4. 应用配置,只需要在master节点部署
kubectl apply -f calico.yaml
# 网络验证
kubectl run test --image=nginx -- sleep 3600 # 测试Pod
3.6 最终验证
- 集群健康检查
kubectl get nodes # 应显示2个Master和2个Worker节点,状态均为Ready # 常见修复noready状态步骤 systemctl restart kubelet kubeadm init phase kubelet-finalize all
kubectl cluster-info # 查看集群基本信息
kubectl get events -A # 查看集群事件
kubectl get pods -A # 所有Pod Running
kubectl get pods -n kube-system
kubectl describe node | grep -i taint # 检查master污点
- 部署测试应用
```bash
# 需要先在harbor私有镜像仓库上获取nginx镜像(docker pull)打标签(docker tag)登录(docker login)再推送(docker push)
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get svc nginx
- 高可用性测试
# 1. 模拟Master节点故障 # 关闭一个Master节点后,验证集群操作是否正常 kubectl get nodes # 应显示故障节点NotReady kubectl create deployment test-ha --image=busybox -- sleep 3600 # 应能正常创建 # 2. 恢复节点后验证自动恢复 # 重启故障节点后,检查是否自动重新加入集群
四、其他配置
4.1 生产环境安全配置
# 1. 关键安全配置
# 1.1 启用RBAC(默认已启用)
kubectl get clusterrolebinding
# 1.2 配置Pod安全策略(或PodSecurity Admission),创建基线策略
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: baseline
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
readOnlyRootFilesystem: false
EOF
# 2. 资源配额与限制,命名空间资源配额
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
requests.cpu: "20"
requests.memory: 100Gi
limits.cpu: "40"
limits.memory: 200Gi
pods: "100"
EOF
4.2 性能优化
# 1. 内核参数优化,在/etc/sysctl.d/10-k8s-optimize.conf中添加
cat <<EOF | sudo tee /etc/sysctl.d/10-k8s-optimize.conf
net.core.somaxconn = 32768
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_max_syn_backlog = 8096
net.ipv4.tcp_slow_start_after_idle = 0
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
EOF
sudo sysctl --system
# 2. kubelet配置优化,修改/etc/default/kubelet添加以下参数
KUBELET_EXTRA_ARGS="
--max-pods=150
--kube-api-burst=100
--kube-api-qps=50
--serialize-image-pulls=false
--registry-burst=20
--registry-qps=10
"
sudo systemctl daemon-reload
sudo systemctl restart kubelet
4.3 扩展与升级策略
# 1.集群扩展指南
# 1.1 添加新Worker节点
# 使用相同的worker-join.sh脚本在新节点上执行
# 1.2 添加新Master节点
# 使用与初始加入相同的命令,确保使用相同的certificate-key
# 1.3 验证节点加入
kubectl get nodes
kubectl get pods -n kube-system -o wide
# 2. 集群升级策略
# 2.1 升级kubeadm(所有节点)
sudo apt update
sudo apt install -y kubeadm=1.33.4-00
sudo apt-mark hold kubeadm
# 2.2 Master节点升级
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.33.4
# 2.3 Worker节点升级
sudo kubeadm upgrade node
# 2.4 升级kubelet和kubectl
sudo apt install -y kubelet=1.33.4-00 kubectl=1.33.4-00
sudo systemctl restart kubelet
4.4 监控与日志方案
# 1. 部署Prometheus监控
# 1.1 使用国内镜像源安装kube-prometheus
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
# 1.2 替换镜像为国内源
find . -type f -name "*.yaml" -exec sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' {} +
find . -type f -name "*.yaml" -exec sed -i 's/grafana/grafana.mirrors.ustc.edu.cn/g' {} +
# 1.3 部署监控栈
kubectl apply --server-side -f manifests/setup
kubectl wait --for condition=Established --all CustomResourceDefinition --namespace=monitoring
kubectl apply -f manifests/
# 2. 日志收集方案,部署EFK栈(Elasticsearch+Fluentd+Kibana)
# 2.1 使用国内镜像版本
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/es-service.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/es-statefulset.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/kibana-service.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/kibana-deployment.yaml
# 2.2 替换镜像为国内源
kubectl set image daemonset/fluentd-elasticsearch -n kube-system fluentd-elasticsearch=registry.cn-hangzhou.aliyuncs.com/google_containers/fluentd-elasticsearch:v2.8.0
4.5 备份与灾难恢复
# 1. etcd定期备份
# 1.1 创建备份脚本/etc/kubernetes/etcd-backup.sh
cat <<'EOF' | sudo tee /etc/kubernetes/etcd-backup.sh
#!/bin/bash
DATE=$(date +%Y%m%d-%H%M%S)
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /var/lib/etcd/etcd-snapshot-${DATE}.db
find /var/lib/etcd/etcd-snapshot-*.db -mtime +7 -exec rm {} \;
EOF
# 1.2 设置定时任务(每天凌晨2点执行)
(crontab -l 2>/dev/null; echo "0 2 * * * /bin/bash /etc/kubernetes/etcd-backup.sh") | crontab -
# 2. 集群恢复步骤
# 2.1 停止所有Master节点的kube-apiserver
sudo systemctl stop kube-apiserver
# 2.2 恢复etcd数据(在所有Master节点执行)
ETCDCTL_API=3 etcdctl snapshot restore /path/to/backup.db \
--data-dir /var/lib/etcd/new \
--initial-cluster "k8s-master1=https://192.168.1.101:2380,k8s-master2=https://192.168.1.102:2380,k8s-master3=https://192.168.1.103:2380" \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls https://<当前节点IP>:2380 \
--name <当前节点主机名>
# 2.3 替换etcd数据目录
sudo mv /var/lib/etcd/ /var/lib/etcd.old/
sudo mv /var/lib/etcd/new/ /var/lib/etcd/
sudo chown -R etcd:etcd /var/lib/etcd/
# 2.4 重启etcd和kube-apiserver
sudo systemctl restart etcd
sudo systemctl restart kube-apiserver
五、常见问题排查
-
常见问题处理
问题1:kubelet无法启动
检查:journalctl -u kubelet | tail -50
常见原因:cgroup驱动不一致(containerd需设SystemdCgroup=true)
问题2:Pod卡在Pending
检查:kubectl describe pod <名字>
解决方案:确认网络插件已安装,节点无污点 -
节点NotReady状态排查
# 1. 检查kubelet日志
sudo journalctl -u kubelet -n 100 --no-pager
# 2. 验证容器运行时状态
sudo systemctl status containerd
sudo ctr images ls
# 3. 检查网络插件状态
kubectl get pods -n kube-system | grep calico
kubectl logs <calico-pod-name> -n kube-system
# 4. 验证节点资源是否充足
free -h
df -h
- Pod创建失败排查
# 1. 查看Pod描述信息
kubectl describe pod <pod-name>
# 2. 检查事件日志
kubectl get events --sort-by='.metadata.creationTimestamp'
# 3. 验证镜像拉取
kubectl describe pod <pod-name> | grep -i image
sudo ctr images pull <image-name>
# 4. 检查资源配额
kubectl describe quota --all-namespaces
证书权限问题:
sudo chmod 644 /etc/kubernetes/pki/etcd/server.crt
sudo chmod 600 /etc/kubernetes/pki/etcd/server.key
开启ipvs
k8s 1.8开始使用 1.11版本稳定 采用hash表
优势(对比iptables):
1、为大型集群提供更好的扩展性和性能
2、支持比iptables更复杂的负载均衡算法(最小负载、最少连接、加权等等)
3、支持服务器健康检查和连接重试等功能