Cloud Pak for Data

Cloud Pak for Data

Come for answers. Stay for best practices. All we’re missing is you.

 View Only
Expand all | Collapse all

Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

  • 1.  Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Wed May 13, 2020 03:09 PM
    We have tried installing CP4D 2.1.0.2 (x86_64.520) into our existing ICP3.2.1 cluster and the installation failed.  The zen-metastoredb pods are in CrashLoopBackOff state:

    I200512 16:02:52.741778 1 cli/start.go:923 CockroachDB CCL v2.0.6 (x86_64-unknown-linux-gnu, built 2018/10/01 13:59:40, go1.10)
    I200512 16:02:53.002743 1 server/config.go:430 system total memory: 1.0 GiB
    I200512 16:02:53.003014 1 server/config.go:432 server configuration:
    max offset 500000000
    cache size 256 MiB
    SQL memory pool size 256 MiB
    scan interval 10m0s
    scan max idle time 200ms
    event log enabled true
    I200512 16:02:53.003092 1 cli/start.go:789 using local environment variables: COCKROACH_CHANNEL=kubernetes-helm
    I200512 16:02:53.003133 1 cli/start.go:796 process identity: uid 0 euid 0 gid 0 egid 0
    I200512 16:02:53.003162 1 cli/start.go:461 starting cockroach node
    E200512 16:02:53.017597 1 cli/error.go:112 failed to start server: unable to resolve RPC address "zen-metastoredb-0.zen-metastoredb.icp-data.svc.cluster.local:26257": lookup zen-metastoredb-0.zen-metastoredb.icp-data.svc.cluster.local: no such host
    Error: failed to start server: unable to resolve RPC address "zen-metastoredb-0.zen-metastoredb.icp-data.svc.cluster.local:26257": lookup zen-metastoredb-0.zen-metastoredb.icp-data.svc.cluster.local: no such host
    Failed running "start"

    The ports are open:
    # ss -tunlp|grep 26257|wc -l
    0
    # ss -tunlp|grep 8080|wc -l
    0

    I notice that zen-metasoredb-public service has a cluster-ip but zen-metasoredb does not have a status:
    cloudant-svc                         ClusterIP 10.4.29.78               <none> 80/TCP,443/TCP 20h
    dsx-influxdb                          ClusterIP 10.4.23.27                <none> 8086/TCP 20h
    redis-svc                                ClusterIP 10.4.25.60                 <none> 26379/TCP 20h
    usermgmt-svc                      ClusterIP 10.4.22.166               <none> 8080/TCP 20h
    utils-api-svc                          ClusterIP 10.4.26.171               <none> 8080/TCP 20h
    zen-metastoredb                ClusterIP None                              <none>  26257/TCP,8080/TCP 20h
    zen-metastoredb-public  ClusterIP 10.4.23.216                  <none> 26257/TCP,8080/TCP 20h

    Please let me know if any information or steps that can be shared to resolve this issue.
    Thanks,
    Karen



    ------------------------------
    Karen Paciocco
    ------------------------------

    #CloudPakforDataGroup


  • 2.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Wed May 13, 2020 04:45 PM

    Hi,

    The missing ClusterIP is ok, so we can ignore this.

    Regarding the pods, can you check if the zen-metastoredb-init job exists and was successful ?

    kubectl get job | grep meta

    kubectl describe job zen-metastoredb-init

    Thanks



    ------------------------------
    TOMASZ HANUSIAK
    ------------------------------



  • 3.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Thu May 14, 2020 10:23 AM
    We were able to get past this issue and the installation fails at:

    2020-05-13 20:53:16 UTC - Running command: //var/lib/icp/icp-data/InstallPackage/components/dpctl --config //var/lib/icp/icp-data/Ins tallPackage/components/install.yaml helm waitChartReady -r icp-data-ibm-iisee-zen100 -t 60
    time="2020-05-13T16:53:19-04:00" level=info msg="No daemonsets under this release. Please continue with the rest of the process."
    time="2020-05-14T02:23:28-04:00" level=fatal msg="Failed to get deployments due to: Unauthorized"
    2020-05-14 06:23:28 UTC - Installation failed for //var/lib/icp/icp-data/InstallPackage/components/../modules/ibm-iisee-zen:1.0.0


    Looking at the icp-data-ibm-iisee-zen100-ibm-iisee-zen-gov... crashed pods it appears to be because of the failed db2 connection.  Here is a sample log.
    [Thu May 14 13:15:36 UTC 2020] Starting database migration tool...
    Flyway Community Edition 5.2.4 by Boxfuse
    ERROR:
    Unable to obtain connection from database (jdbc:db2://is-xmetadocker:50000/xmeta) for user 'xmeta': [jcc][t4][2057][11264][4.23.42] The application server rejected establishment of the connection.
    An attempt was made to access a database, xmeta, which was either not found or does not support transactions. ERRORCODE=-4499, SQLSTATE=08004
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SQL State : 08004
    Error Code : -4499
    Message : [jcc][t4][2057][11264][4.23.42] The application server rejected establishment of the connection.
    An attempt was made to access a database, xmeta, which was either not found or does not support transactions. ERRORCODE=-4499, SQLSTATE=08004

    Are you aware of this issue?  Thank you for all your assistance!

    ------------------------------
    Karen Paciocco
    ------------------------------



  • 4.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Thu May 14, 2020 10:37 AM

    Hi,

    There used to be some issues with the Xmeta pod. Did you check the xmeta pod itself ?

    You should be able to exec into it, then sudo - db2inst1 and do `db2 list db directory` and finally try to connect to one of the db.

    Can you open a support case for it? It may need some deeper troubleshooting/webex.

    Thanks



    ------------------------------
    TOMASZ HANUSIAK
    ------------------------------



  • 5.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Fri May 15, 2020 09:19 AM
    Thank you for your help!
    I do have an open support case and am waiting for instructions.  I checked the Xmeta pod and the the db directory was empty but I was able to create a sample db.  Below are my pods after the failed installation.  I can attempt the installation again but waiting to hear from Support.  If you would like to see any logs, please let me know.  Thank you again.


    ------------------------------
    Karen Paciocco
    ------------------------------



  • 6.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Fri May 15, 2020 09:43 AM

    Hi,

    What I would try is to restart the Xmeta pod. It has an init scripts inside re-creating the databases.

    Thanks



    ------------------------------
    TOMASZ HANUSIAK
    ------------------------------



  • 7.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Fri May 15, 2020 12:04 PM
    Here are the logs on the restart...the Repos do exist but on start are not found.

    + SIGNALS_TRAPPED=(INT HUP QUIT TERM STOP)
    + declare -a SIGNALS_TRAPPED
    + BKP_FILE=xmeta-services.tar.gz
    + BKP_STORE=/opt/IBM/InformationServer/xmeta-bkp
    + datadir=/home/db2inst1/Repos
    + installdir=/opt/IBM/InformationServer
    + '[' -d /home/db2inst1/Repos/xmeta ']'
    + echo 'Extracting repos dedicated volume data'
    Extracting repos dedicated volume data
    + /opt/IBM/InformationServer/initScripts/initReposVolumeData.sh
    + BKP_FILE=xmeta-services.tar.gz
    + BKP_STORE=/opt/IBM/InformationServer/xmeta-bkp
    + DATA_DIRS=Repos
    + DATA_DIR_LIST=($DATA_DIRS)
    + declare -a DATA_DIR_LIST
    + installdir=/home/db2inst1
    + DEDICATED_VOL_MOUNT=/mnt/dedicated_vol/Repository/is-xmetadocker
    + datamountdir=/mnt/dedicated_vol/Repository/is-xmetadocker
    + sudo chown -R db2inst1:db2iadm1 /mnt/dedicated_vol/Repository/is-xmetadocker/Repos
    + [[ -d /mnt/dedicated_vol/Repository/is-xmetadocker/Repos/xmeta ]]
    /mnt/dedicated_vol/Repository/is-xmetadocker already exists
    Symbolic link to Repos does not exist
    + echo '/mnt/dedicated_vol/Repository/is-xmetadocker already exists'
    + newdb=false
    + '[' '!' -L /home/db2inst1/Repos ']'
    + echo 'Symbolic link to Repos does not exist'
    + sudo /bin/rmdir /home/db2inst1/Repos
    + sudo /bin/rm /home/db2inst1/Repos
    /bin/rm: cannot remove '/home/db2inst1/Repos': No such file or directory
    + ln -s /mnt/dedicated_vol/Repository/is-xmetadocker/Repos /home/db2inst1/Repos
    + sudo chown -R db2inst1:db2iadm1 /mnt/dedicated_vol/Repository/is-xmetadocker/Repos
    + sudo chmod -R 755 /mnt/dedicated_vol/Repository/is-xmetadocker/Repos
    + sudo su - db2inst1 -c '/bin/rm -f /home/db2inst1/sqllib/.ftok; /home/db2inst1/sqllib/bin/db2ftok'
    + sudo su - db2inst1 -c '. sqllib/db2profile; db2start'
    05/15/2020 15:44:04 0 0 SQL1063N DB2START processing was successful.
    SQL1063N DB2START processing was successful.
    + sudo su - db2inst1 -c '. sqllib/db2profile; db2 catalog database XMETA on /home/db2inst1/Repos/xmeta; db2 catalog database IADB on /home/db2inst1/Repos/iadb; db2 catalog database DSODB on /home/db2inst1/Repos/dsodb'
    SQL6028N Catalog database failed because database "XMETA" was not found in
    the local database directory.
    SQL6028N Catalog database failed because database "IADB" was not found in the
    local database directory.
    SQL6028N Catalog database failed because database "DSODB" was not found in
    the local database directory.
    + sudo su - db2inst1 -c '. sqllib/db2profile; db2 activate database xmeta; db2 activate db iadb; db2 activate db dsodb'
    SQL1013N The database alias name or database name "XMETA" could not be found.
    SQLSTATE=42705
    SQL1013N The database alias name or database name "IADB" could not be found.
    SQLSTATE=42705
    SQL1013N The database alias name or database name "DSODB" could not be found.
    SQLSTATE=42705
    + sudo su - db2inst1 -c '. /home/db2inst1/sqllib/db2profile; db2 connect to xmeta; db2 -tvf /opt/IBM/InformationServer/initScripts/updateNodeCert.sql; db2 commit; db2 connect reset'
    SQL1013N The database alias name or database name "XMETA" could not be found.
    SQLSTATE=42705
    UPDATE XMETA.REGISTRATION_CONFIGPROPERTY SET VALUE_XMETA='{isnode2048:isf}MIIDIzCCAgugAwIBAgIEILF2WjANBgkqhkiG9w0BAQ0FADBCMQswCQYDVQQGEwJVUzEMMAoGA1UEChMDSUJNMRcwFQYDVQQLEw5Tb2Z0d2FyZSBHcm91cDEMMAoGA1UEAxMDSUlTMB4XDTE5MDgyMzIzMDYzMFoXDTQ3MDEwNzIzMDYzMFowQjELMAkGA1UEBhMCVVMxDDAKBgNVBAoTA0lCTTEXMBUGA1UECxMOU29mdHdhcmUgR3JvdXAxDDAKBgNVBAMTA0lJUzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAIIeUolpzIK5cq2HPEnumMbe5DlZVN24aeJi34sFB7w1FaaIhlR1uKuwaU/j+VQONRAQzWUjno2xgB28FIdwzFfFr40q4OyVx/KZZ8WxXegR5VB4tS0/ZhMthIHoJrn/j54xnsFLfLZOONXeEgoP2uhroUAsokyOZdA8A6RzMveTBoJrKczbBca5zx7PzFnLulmMgJ8C7wzfVBeNnAS/el/jxrIJTJuALVjP7e3rDmxh/pUydbzcbVlZqI5GPfbaEgE++5rJLBWZDT7Cty5QH3u9LJ9fuGLWmygi6W5mNPYmgiMjx2Zr1DLfX5yp9YpU/wMSmifs6tDT7sy0utQG7DMCAwEAAaMhMB8wHQYDVR0OBBYEFBdOnUF1fu5UCYDHpot17GXqU2TxMA0GCSqGSIb3DQEBDQUAA4IBAQAf6AD7dPhqT8DA9UIdemDP2eah9KJac1Fa/y3EpEG9APNbr0lnbiezDcbJSx9jYGAPKcko/7ErfMhqwf6snCg4DWKoh3ypGh5/fQtsWPEVdo5lmc+pI+TvdTRYJH82EocpS4+cnq6CLTz2vTf9LkYFWrqJMSvyN2Qsv+7Crs0OZzNeHsXJ7sS6vBHgIGQV6wYem2h5uq36O0OywVgXb/654XsW7cRaqh1zog529KTAZ0DKwxcOiHaVKfsXIxgngmw7Ti+2LB5UAlFOT0icldgPlZP5Wm0jhCXt0D4BawqjnyPyxrVR0VSS18b7J8AqIw5p8hqmDwwz3wO8x061jSgQ' WHERE NAME_XMETA='NODE_CERTIFICATE'
    DB21034E The command was processed as an SQL statement because it was not a
    valid Command Line Processor command. During SQL processing it returned:
    SQL1024N A database connection does not exist. SQLSTATE=08003

    SQL1024N A database connection does not exist. SQLSTATE=08003
    SQL1024N A database connection does not exist. SQLSTATE=08003
    + wait 1569
    + tail -f /dev/null

    ------------------------------
    Karen Paciocco
    ------------------------------



  • 8.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Mon May 18, 2020 08:00 AM
    Hi,

    Please describe the xmeta pod, and check if there is an init pod called (load-data).
    I believe it can be re-run by removing the /home/db2inst1/Repos/xmeta folder  - please check if its currently empty.

    Thanks

    ------------------------------
    TOMASZ HANUSIAK
    ------------------------------



  • 9.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Tue May 19, 2020 03:31 PM
    The xmeta folder had /db2inst1/NODE0000.  When xmeta was renamed and pod restarted the iisee-zen100 pods continue to not start.

    ------------------------------
    Karen Paciocco
    ------------------------------



  • 10.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Tue May 19, 2020 06:53 PM
    Hi Tomasz,

    I've reran the installation and now all has loaded except the shop4data pods:
    NAME READY STATUS RESTARTS AGE
    shop4info-event-consumer-0 0/1 Init:0/1 0 54m
    shop4info-mappers-service-0 0/1 Init:0/1 0 54m
    shop4info-rest-0 0/1 Init:0/1 0 54m
    shop4info-scheduler-0 0/1 Init:0/1 0 54m
    shop4info-type-registry-0 0/1 Init:CrashLoopBackOff 6 13m
    -bash-4.2$

    ------------------------------
    Karen Paciocco
    ------------------------------



  • 11.  RE: Existing ICP 3.2.1 Cluster CP4D 2.1.0.2 (x86_64.520) E200512 Unable to resolve RPC Address

    Posted Wed May 20, 2020 04:14 AM
    Hi,

    Please describe the shop4info-type-registry-0.
    You should see it's init container, can you see what is it doing (check the Command field).

    Currently I dont have any 2.1 system, as we move from 2.5 to 3.0 soon.

    Thanks

    ------------------------------
    TOMASZ HANUSIAK
    ------------------------------