K8S部署---故障处理
问题3:kubeadm⽅式部署K8S集时,node加⼊K8S集时卡顿并失败?
[root@k8s-node01 ~]# kubeadm join 192.168.1.201:6443 --token 1qo7ms.7atall1jcecf10qz --discovery-token-ca-cert-hash
sha256:d1d102ceb6241a3617777f6156cd4e86dc9f9edd9e1d6d73266d6ca7f6280890
[preflight] Running pre-flight checks
原因分析:初始化主机点后发现部分组件异常;
[root@k8s-master01 ~]# kubectl get pod -n kube-system && kubectl get svc
NAME READY STATUS RESTARTS AGE
coredns-54d67798b7-28w5q 0/1 Pending 0 3m39s
coredns-54d67798b7-sxqpm 0/1 Pending 0 3m39s
etcd-k8s-master01 1/1 Running 0 3m53s
kube-apiserver-k8s-master01 1/1 Running 0 3m53s
kube-controller-manager-k8s-master01 1/1 Running 0 3m53s
kube-proxy-rvj6w 0/1 CrashLoopBackOff 5 3m40s
kube-scheduler-k8s-master01 1/1 Running 0 3m53s
解决⽅法:修改kubeadm-config.yaml ,重新进⾏初始化主节点。
kubeadm reset -f;ipvsadm --clear;rm -rf ./.kube
kubeadm init --config=new-kubeadm-config.yaml  --upload-certs |tee kubeadm-init.log
问题2:kubeadm⽅式部署K8S集时,node加⼊K8S集失败?
[root@k8s-node01 ~]# kubeadm join 192.168.1.201:6443 --token 2g9k0a.tsm6xe31rdb7jbo8 --discovery-token-ca-cert-hash
sha256:d1d102ceb6241a3617777f6156cd4e86dc9f9edd9e1d6d73266d6ca7f6280890
[preflight] Running pre-flight checks
[preflight] Reading configuration from
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
To see the stack trace of this error execute with --v=5 or higher
原因分析:token过期了;
解决⽅法:重新⽣成不过期的token即可;
[root@k8s-master01 ~]# kubeadm token create --ttl 0 --print-join-command
W0819 12:00:27.541838 :202] WARNING: kubeadm cannot validate component configs for API groups [fig.k8s.fig.k8s.io]
kubeadm join 192.168.1.201:6443 --token 6xyv8a.cueltqmpe9qa8nxu --discovery-token-ca-cert-hash
sha256:bd78dfd370e47dfca742b5f6934c21014792168fa4dc19c9fa63bfdd87270097
问题3:kubeadm⽅式部署K8S集时,部署flannel组件失败?
kube-flannel-ds-amd64-8cqqz            0/1    CrashLoopBackOff    3          84s  192.168.66.10  k8s-master01  <none>          <none>原因分析:查看⽇志,发现注册⽹络失败,原因在于主节点初始化时yaml⽂件存在问题。
kubectl logs kube-flannel-ds-amd64-8cqqz -n kubesystem
I0602 01:53:54.021093 :514] Determining IP address of default interface
I0602 01:53:54.022514 :527] Using interface with name ens33 and address 192.168.66.10
I0602 01:53:54.022619 :544] Defaulting external address to interface address (192.168.66.10)
I0602 01:53:54.030311 :126] Waiting 10m0s for node controller to sync
I0602 01:53:54.030555 :309] Starting kube subnet manager
I0602 01:53:55.118656 :133] Node controller sync successful
I0602 01:53:55.118754 :244] Created subnet manager: Kubernetes Subnet Manager - k8s-master01
I0602 01:53:55.118765 :247] Installing signal handlers
I0602 01:53:55.119057 :386] Found network config - Backend type: vxlan
I0602 01:53:55.119146 :120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0602 01:53:55.119470 :289] Error registering network: failed to acquire lease: node "k8s-master01" pod cidr not assigned
I0602 01:53:55.119506 :366]
解决⽅法:修改kubeadm-config.yaml后重新初始化主节点即可。
问题4:K8S集初始化主节点失败?
[init] Using Kubernetes version: v1.15.1
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.6. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10251]: Port 10251 is in use
[ERROR Port-10252]: Port 10252 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
原因分析:K8S集已进⾏过初始化主节点。
解决⽅法:重置K8S后重新初始化。
kubeadm reset
kubeadm init --config=kubeadm-config.yaml --upload-certs |tee kubeadm-init.log
问题5:重置K8S成功后,是否需要删除相关⽂件?
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0602 10:20:53.656954 :79] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/f /etc/f /etc/f
/etc/f /etc/f]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/
run/kubernetes]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
原因分析:⽆。
解决⽅法:可根据提⽰删除相关⽂件,避免主节点初始化后引起其他问题。
问题6:主节点初始化成功后,查看节点信息失败?
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
原因分析:kubeadm重置时,未删除缓存⽂件等。
解决⽅法:删除缓存⽂件后再进⾏初始化主节点。
rm -rf $HOME/.kube/
kubeadm reset
>kubeadm-init.log
问题7:⼯作节点加⼊master节点时卡住不动,仅提⽰docker版本警告?
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.6. Latest validated version: 18.09 error execution phase preflight: couldn't validate the identity of the AP
I Server: abort connecting to API servers after timeout of 5m0s
原因分析:master节点的token过期了,docker版本过⾼。
解决⽅法:使⽤ 18.06 版本可以消除该警告;在master节点重新⽣成token;并在⼯作节点使⽤新tonken执⾏命令。
问题8、master节点加⼊K8S集失败?
[root@k8s-master02 ~]# kubeadm join 192.168.1.201:6443 --token 6xyv8a.cueltqmpe9qa8nxu --discovery-token-ca-cert-hash
sha256:bd78dfd370e47dfca742b5f6934c21014792168fa4dc19c9fa63bfdd87270097 \
> --control-plane --certificate-key b464a8d23d3313c4c0bb5b65648b039cb9b1177dddefbf46e2e296899d0e4516
[preflight] Running pre-flight checks
[preflight] Reading configuration from
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
To see the stack trace of this error execute with --v=5 or higher
原因分析:证书未共享。
解决⽅法:共享证书即可
>>其他master节点执⾏如下内容>>>>>>###
mkdir -p /etc/kubernetes/pki/etcd/
>>master01节点执⾏如下内容>>>>>>###
cd /etc/kubernetes/pki/
scp ca.* front-proxy-ca.* sa.* 192.168.1.202:/etc/kubernetes/pki/
scp ca.* front-proxy-ca.* sa.* 192.168.1.203:/etc/kubernetes/pki/
>>其他master节点执⾏如下内容>>>>>>###
kubeadm join 192.168.1.201:6443 --token 6xyv8a.cueltqmpe9qa8nxu --discovery-token-ca-cert-hash
sha256:bd78dfd370e47dfca742b5f6934c21014792168fa4dc19c9fa63bfdd87270097 --control-plane --certificate-key
b464a8d23d3313c4c0bb5b65648b039cb9b1177dddefbf46e2e296899d0e4516
问题9、部署prometheus失败?
unable to recognize "0prometheus-operator-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version " m/v1"
unable to recognize "alertmanager-alertmanager.yaml": no matches for kind "Alertmanager" in version "s/v1" unable to recognize "alertmanager-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "s/v1" unable to recognize "grafana-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "s/v1"
unable to recognize "kube-state-metrics-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "s/ v1" unable to recognize "node-exporter-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in version "s/v1" unable to recognize "prometheus-prometheus.yaml": no matches for kind "Prometheus" in version "s/v1"
unable to recognize "prometheus-rules.yaml": no matches for kind "PrometheusRule" in version "s/v1"
unable to recognize "prometheus-serviceMonitor.yaml": no matches for kind "ServiceMonitor" in versi
unableon "s/v1" unable to recognize "prometheus-serviceMonitorApiserver.yaml": no matches for kind "ServiceMonitor" in version "s /v1"
unable to recognize "prometheus-serviceMonitorCoreDNS.yaml": no matches for kind "ServiceMonitor" in version
"s/v 1"
unable to recognize "prometheus-serviceMonitorKubeControllerManager.yaml": no matches for kind "ServiceMonitor" in version "s/v1"
unable to recognize "prometheus-serviceMonitorKubeScheduler.yaml": no matches for kind "ServiceMonitor" in version "s /v1"
unable to recognize "prometheus-serviceMonitorKubelet.yaml": no matches for kind "ServiceMonitor" in version "s/v 1"
原因分析:不明。
解决⽅法:重新执⾏发布命令。
问题7:⽆法下载K8S⾼可⽤安装包,但能直接通过浏览器下载git包?
Cloning into 'k8s-ha-install'...
error: RPC failed; result=35, HTTP code = 0
fatal: The remote end hung up unexpectedly
原因分析: git buffer 太⼩了。
解决⽅法:git config --global http.postBuffer 100M  #git buffer增⼤。
问题8:⽆法下载K8S⾼可⽤安装包,但能直接通过浏览器下载git包?
原因分析:不明;
解决⽅法:⼿动下载并上传git包;
问题9:K8S查看分⽀失败?
[root@k8s-master01 k8s-ha-install-master]# git branch -a
fatal: Not a git repository (or any of the parent directories): .git
原因分析:缺少.git本地仓库。
解决⽅法:初始化git即可。
[root@k8s-master01 k8s-ha-install-master]# git init
Initialized empty Git repository in /root/install-k8s-v1.17/k8s-ha-install-master/.git/
[root@k8s-master01 k8s-ha-install-master]# git branch -a
问题10:K8S切换分⽀失败?
[root@k8s-master01 k8s-ha-install-master]# git checkout manual-installation-v1.20.x
error: pathspec 'manual-installation-v1.20.x' did not match any file(s) known to git.
原因分析:没有发现分⽀。
解决⽅法:必须通过git下载,不能通过浏览器下载zip格式的⽂件;或将git下载的⽂件打包后再次解压也不可使⽤。
[root@k8s-master01 k8s-ha-install]# git checkout manual-installation-v1.20.x
Branch manual-installation-v1.20.x set up to track remote branch manual-installation-v1.20.x from origin. Switched to a new branch 'manual-installation-v1.20.x'
问题11:apiserver聚合证书⽣成失败?
原因分析:命令缺少hosts参数,不适合⽤于⽹站;但不影响apiserver组件与其他组件通信。
解决⽅法:⽆需关注。

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。

发表评论