Cloud Pak for Data

 View Only
  • 1.  ICPD installation gets stuck with GulsterFS

    Posted Fri May 31, 2019 09:04 AM
    Hello everybody,

    I'm trying to setup a 3-node ICPD cluster on Centos systems and get stuck in the step with the basic ICP setup. I tried to solve this with IBM Partner Support, but unfortunately, ICP for data is not supported through PartnerWorld. (Btw., @IBM: when will that change?)

    This is the error I get:

    TASK [image-registry-check : Creating registry storage directory] **************
    ok: [10.x.x.1]
    ok: [10.x.x.2]
    ok: [10.x.x.3]

    TASK [image-registry-check : Creating registry storage check file] *************
    changed: [10.x.x.1]

    TASK [image-registry-check : Checking if set the shared storage for registry or not] ***
    ok: [10.x.x.1]
    fatal: [10.x.x.2]: FAILED! => changed=false
    elapsed: 120
    msg: Please set a shared storage for image registry /var/lib/registry and continue installation
    fatal: [10.x.x.3]: FAILED! => changed=false
    elapsed: 120
    msg: Please set a shared storage for image registry /var/lib/registry and continue installation

    NO MORE HOSTS LEFT *************************************************************
    ...
    Failed running task 'InstallICP : Install ICP' on the node with ip 10.x.x.1

    In the corresponding GlusterFS log it says:

    [2019-05-...] I [socket.c:348:ssl_setup_connection] 0-glusterfs: peer CN = server
    [2019-05-...] I [socket.c:351:ssl_setup_connection] 0-glusterfs: SSL verification succeeded (client: )
    [2019-05-...] E [glusterfsd-mgmt.c:1779:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server
    [2019-05-...] E [glusterfsd-mgmt.c:1879:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/image-manager)
    [2019-05-...] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90) [0x7fc070d21880] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x536) [0x55a3d4b49786] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55a3d4b42e4b] ) 0-: received signum (0), shutting down
    [2019-05-...] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/var/lib/registry'.

    GlusterFS was being installed by ICPD setup. Has anybody seen this error before? Or maybe a hint for solving this?

    Thanks in advance





    ------------------------------
    Bernd Künnen

    ------------------------------

    #CloudPakforDataGroup


  • 2.  RE: ICPD installation gets stuck with GulsterFS

    Posted Fri May 31, 2019 09:42 AM
    Solved the problem, but maybe not the cause. Digging into the glusterfs documentation, I found out that (for reasons unknown) the ICPD setup missed/failed to create the shared volume on glusterfs. So this is the workaround if the error occurs:

    1. make sure glusterfs is in sync:
      gluster volume sync host0[1,2,3]
    2. on one gluster member, create the volume & start it:
      #create volume
      gluster volume create registry replica 3 transport tcp host01:/data/registry host02:/data/registry host03:/data/registry
      #start volume
      gluster volume start registry
      #check
      gluster volume status all
    3. on all gluster members, mount the volume:
      mount -t glusterfs host01:/registry /var/lib/registry
    Worked for me; the following try the setup went over the glusterfs step with no problem. If anybody has an idea what was the cause or what needs to be fixed in the installation script, please let us know.

    ------------------------------
    Best regards
    Bernd Künnen
    ------------------------------



  • 3.  RE: ICPD installation gets stuck with GulsterFS

    Posted Mon June 03, 2019 04:30 AM
    Came across another error after fixing the glusterfs:

    TASK [master : Waiting for Kubernetes to start] ********************************
    ok: [10.x.x.x]
    fatal: [10.x.x.y]: FAILED! => changed=false
      elapsed: 600
      msg: Seems Kubernetes apiserver did not start properly, please check it on master nodes.
    fatal: [10.x.x.z]: FAILED! => changed=false
      elapsed: 600
      msg: Seems Kubernetes apiserver did not start properly, please check it on master nodes.

    For reasons unknown the script didn't setup the docker service script right on the remote hosts.
    File: /usr/lib/systemd/system/docker.service
    Diff:
    -- ExecStart=/usr/bin/dockerd
    ++ ExecStart=/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd

    Plus commands:
    systemctl daemon-reload
    service docker restart

    Just in case this may be helpful for somebody.

    ------------------------------
    Regards
    Bernd Künnen
    ------------------------------