IBM FlashSystem

Find answers and share expertise on IBM FlashSystem

View Only

Back to Blog List

Common Errors found in Rancher setup on RHEL9 machines

By Chebrolu Harika posted Sat July 20, 2024 12:21 AM

Rancher says its unable to connect to Docker
Error seen:
Failed to set up SSH tunneling for host [10.70.45.35]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed)
Possible Resolution:
Start and Enable Docker services.
Run `systemctl enable docker` and `systemctl start docker`
Rancher says it failed to setup SSH Tunneling
Error seen:
Failed to set up SSH tunneling for host [10.70.45.35]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [10.70.45.35:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain.
Possible resolution:
Enable passwordless authentication using ssh-copy-id
Reference: https://www.ssh.com/academy/ssh/copy-id
Error seen:
FATA[0623] [etcd] Failed to bring up Etcd Plane: Failed to create [etcd-fix-perm] container on host [10.70.47.156]: Failed to create Docker container [etcd-fix-perm] on host [10.70.47.156]: Error response from daemon: Conflict. The container name "/etcd-fix-perm" is already in use by container "53fe7ab43aa7b4e323d7408827953db2a03a46b15c03a1ce2067be1249200642". You have to remove (or rename) that container to be able to reuse that name.
Possible Resolution:
just rerun `rke up`.
Rancher says its unable to fetch certs
Error seen:
FATA[0173] Failed to fetch cluster certs from nodes, aborting upgrade: Certificate /etc/kubernetes/.tmp/kube-proxy.pem is not found
Possible Resolution:
Cleanup certs in /etc/kubernetes folder in k8s worker nodes
Run `cp -R /etc/kubernetes/ssl /etc/kubernetes/.tmp/` in all k8s nodes.
Run `rm -rf cluster.rkestate` and `rke up` in client node
Rancher says that etcs containers are unhealthy
Error seen:
WARN[1481] [etcd] host [10.70.45.35] failed to check etcd health: failed to get /health for host [10.70.45.35]: Get "https://10.70.45.35:2379/health": net/http: TLS handshake timeout . FATA[1481] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.70.47.156,10.70.44.47,10.70.45.35] failed to report healthy. Check etcd container logs on each host for more information
Possible Resolution:
rm -f /etc/kubernetes/ssl/kube-service-account-token-key.pem
[root@localhost ~]# rm -f /etc/kubernetes/ssl/kube-service-account-token.pem
[root@localhost ~]# cp /etc/kubernetes/ssl/kube-apiserver-key.pem /etc/kubernetes/ssl/kube-service-account-token-key.pem
[root@localhost ~]# cp /etc/kubernetes/ssl/kube-apiserver.pem /etc/kubernetes/ssl/kube-service-account-token.pem
If the above steps doesn't work then start cleaning up things from all k8s worker nodes using below script..

#!/bin/sh
docker rm -f $(docker ps -qa)
docker volume rm $(docker volume ls -q)
cleanupdirs="/var/lib/etcd /etc/kubernetes /etc/cni /opt/cni /var/lib/cni /var/run/calico /opt/rke"
for dir in $cleanupdirs; do
echo "Removing $dir"
rm -rf $dir
done
Rancher says that the nodes probably has their firewalls enabled or that there’s network issues.
Error seen:
[network] Host [192.168.x.y] is not able to connect to the following ports: [192.168.x.y:2379]. Please check network policies and firewall rules
Possible Resolution:
Run `rke up ./cluster.yml -disable-port-check` or enable firewall rules for those ports using below mentioned commands

`firewall-cmd --add-port=22/tcp --add-port=80/tcp --add-port=2376/tcp --add-port=2379/tcp --add-port=2380/tcp --add-port=8472/tcp --add-port=9099/tcp --add-port=10250/tcp --add-port=6443/tcp --add-port=10254/tcp --add-port=30000-32767/tcp --add-port=30000-32767/udp --permanent`

`firewall-cmd --reload`
kubectl says its unable to connect API server to get details.
Error seen:
While running `kubectl get pods` - E0719 13:59:02.414177 17925 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
Possible Resolution:
export KUBECONFIG in master node.

0 comments

10 views

Permalink

https://community.ibm.com/community/user/blogs/chebrolu-harika/2024/07/19/common-errors-found-in

IBM FlashSystem

IBM FlashSystem

Common Errors found in Rancher setup on RHEL9 machines

By Chebrolu Harika posted Sat July 20, 2024 12:21 AM

Permalink

Additional
Resources

Office

Quick Links

IBM FlashSystem

IBM FlashSystem

Common Errors found in Rancher setup on RHEL9 machines

By Chebrolu Harika posted Sat July 20, 2024 12:21 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources