zl程序教程

您现在的位置是:首页 >  其他

当前栏目

【k8s-5】kubeadm init过程的错误

2023-03-31 10:33:55 时间

可以看到执行了 kubeadm init 之后,貌似一直卡住 kubelet 这个进程的健康检查上,日志如下。

[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I1031 14:44:25.770815   10034 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I1031 14:44:25.770828   10034 waitcontrolplane.go:89] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

进一步,按照提示,尝试看看 kubelet 启动的时候是不是遇到什么问题。

	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in docker:
		- 'docker ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'docker logs CONTAINERID'

详细看看 kubelet 启动的信息。

[root@VM-23-145-centos ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Sun 2021-10-31 18:53:46 CST; 6s ago
     Docs: https://kubernetes.io/docs/
  Process: 15696 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
 Main PID: 15696 (code=exited, status=1/FAILURE)

Oct 31 18:53:46 VM-23-145-centos systemd[1]: Unit kubelet.service entered failed state.
Oct 31 18:53:46 VM-23-145-centos systemd[1]: kubelet.service failed.

最后发现是下面这个原因,导致启动失败。

Oct 31 18:54:17 VM-23-145-centos kubelet[16036]: I1031 18:54:17.310044   16036 docker_service.go:264] "Docker Info" dockerInfo=&{ID:AU55:ZTZU:4WX2:CKY5:KCPE:OPOQ:EUEZ:AXZY:GP7R:7CZV:6LBL:V6WA Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:7 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:25 OomKillDisable:true NGoroutines:34 SystemTime:2021-10-31T18:54:17.30101368+08:00 LoggingDriver:json-file CgroupDriver:cgroupfs CgroupVersion:1 NEventsListener:0 KernelVersion:3.10.0-1160.31.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSVersion:7 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000b7c310 NCPU:8 MemTotal:16655941632 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:VM-23-145-centos Labels:[] ExperimentalBuild:false ServerVersion:20.10.10 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:5b46e404f6b9f661a205e28d59c982d3634148f8 Expected:5b46e404f6b9f661a205e28d59c982d3634148f8} RuncCommit:{ID:v1.0.2-0-g52b36a2 Expected:v1.0.2-0-g52b36a2} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] Warnings:[]}
Oct 31 18:54:17 VM-23-145-centos kubelet[16036]: E1031 18:54:17.310092   16036 server.go:294] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs""
Oct 31 18:54:17 VM-23-145-centos systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Oct 31 18:54:17 VM-23-145-centos systemd[1]: Unit kubelet.service entered failed state.
Oct 31 18:54:17 VM-23-145-centos systemd[1]: kubelet.service failed.

这个解决起来比较容易,修改下面的配置,然后重启 docker。

[root@VM-23-145-centos ~]# cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}

然后重新 kubeadm init 一下,就可以看到 kubelet 正常启动了。

[root@VM-23-145-centos ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2021-10-31 18:58:15 CST; 30s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 19917 (kubelet)
    Tasks: 18
   Memory: 37.0M
   CGroup: /system.slice/kubelet.service
           └─19917 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config...

Oct 31 18:58:26 VM-23-145-centos kubelet[19917]: I1031 18:58:26.059691   19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:26 VM-23-145-centos kubelet[19917]: E1031 18:58:26.209071   19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:31 VM-23-145-centos kubelet[19917]: I1031 18:58:31.060399   19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:31 VM-23-145-centos kubelet[19917]: E1031 18:58:31.217841   19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:36 VM-23-145-centos kubelet[19917]: I1031 18:58:36.060914   19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:36 VM-23-145-centos kubelet[19917]: E1031 18:58:36.225707   19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:41 VM-23-145-centos kubelet[19917]: I1031 18:58:41.062031   19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:41 VM-23-145-centos kubelet[19917]: E1031 18:58:41.232789   19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:46 VM-23-145-centos kubelet[19917]: I1031 18:58:46.063112   19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:46 VM-23-145-centos kubelet[19917]: E1031 18:58:46.240827   19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Hint: Some lines were ellipsized, use -l to show in full.

最后,如果你是 root 用户,执行下面的命令,就可以正常工作了。

[root@VM-23-145-centos ~]# export KUBECONFIG=/etc/kubernetes/admin.conf
[root@VM-23-145-centos ~]# kubectl get pods
No resources found in default namespace.
[root@VM-23-145-centos ~]# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
coredns-78fcd69978-4rs67                   0/1     Pending   0          15m
coredns-78fcd69978-tjq2q                   0/1     Pending   0          15m
etcd-vm-23-145-centos                      1/1     Running   0          15m
kube-apiserver-vm-23-145-centos            1/1     Running   0          15m
kube-controller-manager-vm-23-145-centos   1/1     Running   0          15m
kube-proxy-j9pdn                           1/1     Running   0          15m
kube-scheduler-vm-23-145-centos            1/1     Running   0          15m