zl程序教程

您现在的位置是:首页 >  其他

当前栏目

Metrics Server安装以及报错解决

2023-04-18 17:00:46 时间

在查看kubernetes的测试环境中,使用top命令查看Pod的CPU、内存使用过程中,遇到以下问题:

$ kubectl top po
W0818 03:22:46.090578   26207 top_pod.go:140] Using json format to get metrics.e-protocol-buffers flag
error: Metrics API not available

如上看到ERROR信息“Metrics API not available”,这是由于该Kuernetes环境没有安装metric-server组件导致的。

安装metric-server组件可以参考Github上的安装参考资料:https://github.com/kubernetes-sigs/metrics-server,如下所示:

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

在正常可以科学上网的情况下,可以拉取到镜像“k8s.gcr.io/metrics-server/metrics-server:v0.5.0”,即可安装完成。

若是不具备科学上网的条件,可以参考《史上最全操作教程——利用阿里云FREE镜像仓库构建国外DOCKER镜像》这篇文章操作。

上述apply之后,查看该metric-server运行Pod,

$ kubectl get pods --all-namespaces | grep metrics
NAMESPACE     NAME                                READY   STATUS      RESTARTS   AGE
kube-system   metrics-server-6dfddc5fb8-f54vr     0/1     Running     0          44s

该metric-server Pod尚未准备运行完成,describe查看其详细信息:

……
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  79s               default-scheduler  Successfully assigned kube-system/metrics-server-6dfddc5fb8-f54vr to loki
  Normal   Pulled     78s               kubelet            Container image "k8s.gcr.io/metrics-server/metrics-server:v0.5.0" already present on machine
  Normal   Created    78s               kubelet            Created container metrics-server
  Normal   Started    78s               kubelet            Started container metrics-server
  Warning  Unhealthy  9s (x5 over 49s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

可以看到在描述中的事件信息中,“Readiness probe failed: HTTP probe failed with statuscode: 500”。

然后查看该Pod的日志:

$ kubectl logs -n kube-system metrics-server-6dfddc5fb8-f54vr
……
I0816 01:07:06.734107       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I0816 01:07:16.736864       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
E0816 01:07:18.834625       1 scraper.go:139] "Failed to scrape node" err="Get "https://192.168.130.100:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.130.100 because it doesn't contain any IP SANs" node="loki"

如上可以看到Readiness Probe探针检测到Metris Server容器启动后对httpGet探针存活没反应,具体原因是“cannot validate certificate for 192.168.130.100 because it doesn't contain any IP SANs”。

查看Metris Server的deployment文件的Readiness Probe探针描述:

​​​​​​​

……
readinessProbe:
    failureThreshold: 3
    httpGet:
        path: /readyz
        port: https
        scheme: HTTPS
    initialDelaySeconds: 20
periodSeconds: 10
……

如上可知,该Readiness Probe探针判断容器是否可用(Ready状态),通过配置HTTPGetAction方式,kubelet定时发送HTTP请求到https://readyz来进行容器是否Ready的检查。

  1. “InitialDealySeconds:20”:启动容器后进行首次检查的等待时间为20s;
  2. “periodSeconds:10”:间隔10s进行一次探测。

针对“cannot validate certificate for 192.168.130.100 because it doesn't contain any IP SANs”这个报错,可以仔细查阅Github上的安装参考资料:https://github.com/kubernetes-sigs/metrics-server。其中有提到:

1)安装要求,如下红框中说明,Kubelet证书需要由群集证书颁发机构签名(或可以禁用证书验证,通过对Metrics Server配置参数--Kubelet-insecure-tls不安全)

2)配置,如下红框中说明,添加了“--Kubelet-insecure-tls”这个配置,就不会去验证Kubelets提供的服务证书的CA。但是仅用于测试。

本次Kubernetes环境仅作为本机测试使用,可以修改之前的apply的components.yaml文件,添加“--Kubelet-insecure-tls”参数,如下所示(具体内容跟前后都已经省略):

……
template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls
        image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
        imagePullPolicy: IfNotPresent
……

然后重新apply该components.yaml文件,以及查看Pod,详细信息:

$ kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

$ kubectl get pods --all-namespaces | grep metrics
NAMESPACE    NAME                              READY   STATUS      RESTARTS    AGE
kube-system  metrics-server-6dfddc5fb8-f54vr   1/1     Running      0          44s

$ kubectl describe pod/metrics-server-5cd859f5c-nr7h5 -n kube-system
……
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  41s   default-scheduler  Successfully assigned kube-system/metrics-server-5cd859f5c-nr7h5 to loki
  Normal  Pulled     40s   kubelet            Container image "k8s.gcr.io/metrics-server/metrics-server:v0.5.0" already present on machine
  Normal  Created    40s   kubelet            Created container metrics-server
  Normal  Started    40s   kubelet            Started container metrics-server

如上查看到的信息都正常。

验证top命名:

$ kubectl top po
W0815 21:13:43.801129   47845 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                         CPU(cores)   MEMORY(bytes)
cpu-loader-5c8d96447-49rzf       0m           5Mi
cpu-loader-5c8d96447-4ptbt       0m           3Mi
cpu-loader-5c8d96447-f4nsc       0m           3Mi

验证完无误。