Primary Storage

 View Only

Common Errors found in Rancher setup on RHEL9 machines

By Chebrolu Harika posted Sat July 20, 2024 12:21 AM

  
  1. Rancher says its unable to connect to Docker
    Error seen:

    Failed to set up SSH tunneling for host [10.70.45.35]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) 
    Possible Resolution:
    Start and Enable Docker services.
    Run `systemctl enable docker` and `systemctl start docker`

  2. Rancher says it failed to setup SSH Tunneling
    Error seen:

    Failed to set up SSH tunneling for host [10.70.45.35]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [10.70.45.35:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain.
    Possible resolution:
    Enable passwordless authentication using ssh-copy-id
    Reference: https://www.ssh.com/academy/ssh/copy-id
  3. Error seen:
    FATA[0623] [etcd] Failed to bring up Etcd Plane: Failed to create [etcd-fix-perm] container on host [10.70.47.156]: Failed to create Docker container [etcd-fix-perm] on host [10.70.47.156]: Error response from daemon: Conflict. The container name "/etcd-fix-perm" is already in use by container "53fe7ab43aa7b4e323d7408827953db2a03a46b15c03a1ce2067be1249200642". You have to remove (or rename) that container to be able to reuse that name. 
    Possible Resolution:
    just rerun `rke up`.

  4. Rancher says its unable to fetch certs
    Error seen:

    FATA[0173] Failed to fetch cluster certs from nodes, aborting upgrade: Certificate /etc/kubernetes/.tmp/kube-proxy.pem is not found
    Possible Resolution:
    Cleanup certs in /etc/kubernetes folder in k8s worker nodes
    Run `cp -R /etc/kubernetes/ssl /etc/kubernetes/.tmp/` in all k8s nodes.
    Run `rm -rf cluster.rkestate` and `rke up` in client node

  5. Rancher says that etcs containers are unhealthy 
    Error seen:

    WARN[1481] [etcd] host [10.70.45.35] failed to check etcd health: failed to get /health for host [10.70.45.35]: Get "https://10.70.45.35:2379/health": net/http: TLS handshake timeout . FATA[1481] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.70.47.156,10.70.44.47,10.70.45.35] failed to report healthy. Check etcd container logs on each host for more information 
    Possible Resolution:           
    rm -f /etc/kubernetes/ssl/kube-service-account-token-key.pem
    [root@localhost ~]# rm -f /etc/kubernetes/ssl/kube-service-account-token.pem
    [root@localhost ~]# cp /etc/kubernetes/ssl/kube-apiserver-key.pem /etc/kubernetes/ssl/kube-service-account-token-key.pem
    [root@localhost ~]# cp /etc/kubernetes/ssl/kube-apiserver.pem /etc/kubernetes/ssl/kube-service-account-token.pem
    If the above steps doesn't work then start cleaning up things from all k8s worker nodes using below script..

    #!/bin/sh
    docker rm -f $(docker ps -qa)
    docker volume rm $(docker volume ls -q)
    cleanupdirs="/var/lib/etcd /etc/kubernetes /etc/cni /opt/cni /var/lib/cni /var/run/calico /opt/rke"
    for dir in $cleanupdirs; do
      echo "Removing $dir"
      rm -rf $dir
    done
  6. Rancher says that the nodes probably has their firewalls enabled or that there’s network issues.
    Error seen:
    [network] Host [192.168.x.y] is not able to connect to the following ports: [192.168.x.y:2379]. Please check network policies and firewall rules
    Possible Resolution:
    Run `rke up ./cluster.yml -disable-port-check` or enable firewall rules for those ports using below mentioned commands

    `firewall-cmd --add-port=22/tcp --add-port=80/tcp --add-port=2376/tcp --add-port=2379/tcp --add-port=2380/tcp --add-port=8472/tcp --add-port=9099/tcp --add-port=10250/tcp --add-port=6443/tcp --add-port=10254/tcp --add-port=30000-32767/tcp --add-port=30000-32767/udp --permanent`

    `firewall-cmd --reload`

  7. kubectl says its unable to connect API server to get details.
    Error seen:
    While running `kubectl get pods` - E0719 13:59:02.414177   17925 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
    Possible Resolution:
    export KUBECONFIG in master node.

0 comments
9 views

Permalink