IBM FlashSystem

IBM FlashSystem

Find answers and share expertise on IBM FlashSystem

 View Only

Common Errors found in Rancher setup on RHEL9 machines

By Chebrolu Harika posted Sat July 20, 2024 12:21 AM

  
  1. Rancher says its unable to connect to Docker
    Error seen:

    Failed to set up SSH tunneling for host [10.70.45.35]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access the service on /var/run/docker.sock. The service might be still starting up. Error: ssh: rejected: connect failed (open failed) 
    Possible Resolution:
    Start and Enable Docker services.
    Run `systemctl enable docker` and `systemctl start docker`

  2. Rancher says it failed to setup SSH Tunneling
    Error seen:

    Failed to set up SSH tunneling for host [10.70.45.35]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [10.70.45.35:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain.
    Possible resolution:
    Enable passwordless authentication using ssh-copy-id
    Reference: https://www.ssh.com/academy/ssh/copy-id
  3. Error seen:
    FATA[0623] [etcd] Failed to bring up Etcd Plane: Failed to create [etcd-fix-perm] container on host [10.70.47.156]: Failed to create Docker container [etcd-fix-perm] on host [10.70.47.156]: Error response from daemon: Conflict. The container name "/etcd-fix-perm" is already in use by container "53fe7ab43aa7b4e323d7408827953db2a03a46b15c03a1ce2067be1249200642". You have to remove (or rename) that container to be able to reuse that name. 
    Possible Resolution:
    just rerun `rke up`.

  4. Rancher says its unable to fetch certs
    Error seen:

    FATA[0173] Failed to fetch cluster certs from nodes, aborting upgrade: Certificate /etc/kubernetes/.tmp/kube-proxy.pem is not found
    Possible Resolution:
    Cleanup certs in /etc/kubernetes folder in k8s worker nodes
    Run `cp -R /etc/kubernetes/ssl /etc/kubernetes/.tmp/` in all k8s nodes.
    Run `rm -rf cluster.rkestate` and `rke up` in client node

  5. Rancher says that etcs containers are unhealthy 
    Error seen:

    WARN[1481] [etcd] host [10.70.45.35] failed to check etcd health: failed to get /health for host [10.70.45.35]: Get "https://10.70.45.35:2379/health": net/http: TLS handshake timeout . FATA[1481] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [10.70.47.156,10.70.44.47,10.70.45.35] failed to report healthy. Check etcd container logs on each host for more information 
    Possible Resolution:           
    rm -f /etc/kubernetes/ssl/kube-service-account-token-key.pem
    [root@localhost ~]# rm -f /etc/kubernetes/ssl/kube-service-account-token.pem
    [root@localhost ~]# cp /etc/kubernetes/ssl/kube-apiserver-key.pem /etc/kubernetes/ssl/kube-service-account-token-key.pem
    [root@localhost ~]# cp /etc/kubernetes/ssl/kube-apiserver.pem /etc/kubernetes/ssl/kube-service-account-token.pem
    If the above steps doesn't work then start cleaning up things from all k8s worker nodes using below script..

    #!/bin/sh
    docker rm -f $(docker ps -qa)
    docker volume rm $(docker volume ls -q)
    cleanupdirs="/var/lib/etcd /etc/kubernetes /etc/cni /opt/cni /var/lib/cni /var/run/calico /opt/rke"
    for dir in $cleanupdirs; do
      echo "Removing $dir"
      rm -rf $dir
    done
  6. Rancher says that the nodes probably has their firewalls enabled or that there’s network issues.
    Error seen:
    [network] Host [192.168.x.y] is not able to connect to the following ports: [192.168.x.y:2379]. Please check network policies and firewall rules
    Possible Resolution:
    Run `rke up ./cluster.yml -disable-port-check` or enable firewall rules for those ports using below mentioned commands

    `firewall-cmd --add-port=22/tcp --add-port=80/tcp --add-port=2376/tcp --add-port=2379/tcp --add-port=2380/tcp --add-port=8472/tcp --add-port=9099/tcp --add-port=10250/tcp --add-port=6443/tcp --add-port=10254/tcp --add-port=30000-32767/tcp --add-port=30000-32767/udp --permanent`

    `firewall-cmd --reload`

  7. kubectl says its unable to connect API server to get details.
    Error seen:
    While running `kubectl get pods` - E0719 13:59:02.414177   17925 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
    Possible Resolution:
    export KUBECONFIG in master node.

0 comments
10 views

Permalink