解决The swarm does not have a leader
1问题:
最近有一个测试环境的Swarm集群挂了,这个集群有两个管理节点,执行 docker node ls,均报:
The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online
明明两个管理节点都在线。
2分析:
通过docker info命令,看到一条错误信息
Error: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
逐个分析两个节点的日志,发现周期性打印的错误日志:
第一个管理节点:
Mar 4 09:30:05 manager1 dockerd: time="2020-03-04T09:30:05.663865244+08:00" level=error
msg="error sending message to peer" error="rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate has expired or is not yet valid\""
第二个管理节点报:
Mar 4 09:08:01 manager2 dockerd: time="2020-03-04T09:08:01.446858105+08:00" level=warning
msg="error renewing TLS certificate: rpc error: code = Internal desc = connection error: desc = \"transport: remote error: tls: bad certificate\""
初步得出结论,第二个管理节点证书有问题,并且很大可能是过期了,
根据字面信息猜测一下:这里好像是个BUG,刷新本地证书需要请求某一个远程节点,请求远程节点又报证书不对,形成悖论。
查看两台机器的时间,均是正常时间
3验证:
通过命令
docker swarm ca | openssl x509 -noout -text
查看第二个管理节点证书,命令报错无法显示证书信息
直接通过谷歌浏览器访问两个节点的2377端口 https://x.x.x.x:2377
点击证书,查看证书,发现有效期不在当前时间范围内,接着着手更新证书有效期
接着面临问题:证书在哪存放?怎么更新?参考了以下地址的内容:
4最终解决:
管理节点二因为证书失效,直接主动让它离开集群
docker swarm leave --force
管理节点一仍然不正常,在管理节点一上执行命令
docker swarm init --force-new-cluster --advertise-addr x.x.x.x
(x.x.x.x是你服务器的IP地址)
发现无法正常执行,重启了docker进程
systemctl restart docker
等待时间较长,之后再次执行
docker swarm init --force-new-cluster --advertise-addr x.x.x.x
集群恢复正常,并且之前的部署和配置依然存在,算是解决了问题
(完毕)
相关文章
- 解决Jenkins的错误“The Server rejected the connection: None of the protocols were accepted”
- Cannot import the keyfile 'blah.pfx' - error 'The keyfile may be password protected'
- git删除本地和远程文件,解决 fatal: Not a git repository (or any of the parent directories): .git 问题
- 报错:The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
- Xcode真机调试失败:The identity used to sign the executable is no longer valid
- 完美解决 Could not find a version that satisfies the requirement 安装包名字 (from versions: )
- pip命令安装工具包时出现ReadTimeoutError或者THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE问题解决
- 【docker-oracle】java.sql.SQLException: ORA-28001: the password has expired(解决oracle密码过期)
- ERROR: The Python ssl extension was not compiled. Missing the OpenSSL lib?
- The development prospect of SAP consultants in China in the next decade
- 问题解决:psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
- Selenium2学习-012-WebUI自动化实战实例-010-解决元素失效:StaleElementReferenceException: stale element reference: element is not attached to the page document
- git---如何解决The authenticity of host can't be established.
- Vite 使用记录:动态导入静态图片、vite项目报错Only file and data URLs are supported by the default ESM loader解决、Vite多环境配置
- Python version 3.6 required, which was not found in the registry错误解决
- The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.
- The type name 'IComponentConnector' could not be found in the namespace 'System.Windows.Markup'
- DVWA文件上传出现Incorrect folder permissions&The PHP module GD is not installed.的解决方法
- Mysql 1290 - The MySQL server is running with the --secure-file-priv option
- java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986
- 使用paddle RuntimeError: CUDA error: no kernel image is available for execution on the device解决方法