watsonx.data

 View Only

Watsonx.data Upgrade Issue Resolution

  • 1.  Watsonx.data Upgrade Issue Resolution

    Posted Thu April 25, 2024 09:35 AM

    The Problem: Watsonx.data Upgrade Issue on Cloud Pak 4 Data

    • Upgraded CP4D cluster from version 4.7.1 to 4.8.1.

    • Most services running smoothly post-upgrade, except for Watsonx.data.

    Background

    • The issue lies under the which displays Presto Engine with "3 dots blinking."

    • Upon clicking the engine tile, it indicates "Restart scheduled."

    Root Cause

    • Customer upgraded to WXD version 1.1.1 (CPD 4.8.1).

    • Executed a workaround step intended for wxd 1.1.0 (CPD 4.8.0) upgrades.

    • The workaround command unintentionally changed custom resource versions from 1.1.1 to 1.1.0.

    • The WXD operator, lacking rollback support, disregarded the custom resources, including pendingRestart requests.

    • The operator generated log messages indicating version mismatch issues, e.g., "version 1.1.0 is not supported by operator of version 1.1.1. exiting..."

    First Approach

    Run the following and get the CR name: oc get wxdengine  

    Next, run the following: oc get wxdengine lakehouse-presto-01 -o yaml 

    pendingRestart: true

    Check the value of pendingRestart from the output, the status would be set to true, but we want to make sure, since the status of your engine is "Restart scheduled".

    To confirm everything looks okay after upgrade it would be good to see what is set in: 

    oc get wxdaddon,wxd,wxdengine -n zen

    Next Approach

    The customer ran the patch command as suggested and the Presto Engine has stopped and it no longer displays on Infrastructure Manager. Also, shows that the data lakehouse is also shut down.

    [steve@ucsbastion ~]$ oc get wxdaddon,wxd,wxdengine -n cpd-1

    NAME                  VERSION  SIZE  RECONCILE  AGE

    wxdaddon.watsonxdata.ibm.com/wxdaddon  1.1.1   small  Shutdown  158d

     

    NAME                VERSION  SIZE  HIVE RESTART  RECONCILE  AGE

    wxd.watsonxdata.ibm.com/lakehouse  1.1.1      Shutdown    Shutdown  158d

     

    NAME                        VERSION  TYPE   DISPLAY NAME  SIZE  RECONCILE  STATUS  AGE

    wxdengine.watsonxdata.ibm.com/lakehouse-presto-01  1.1.1   presto  presto-01   small  shutdown  PAUSED  158d

    Time For a New Solution

    1. Suggested the customer to follow the steps in the documentation to restart services under Watsonx.data

    2. Asked the customer to verify if the Postgres table engine includes the engine entry.

      1. oc exec -it ibm-lh-postgres-edb-1 bash -n cpd-instance

      2. psql -U postgres -d ibm_lh_repo -c "SELECT * FROM engine"

    3. Looks like there's a Table engine.

    Purpose of Verifying Postgres Table Engine Entry

    • Objective: Determine the significance of verifying the presence of the engine entry in the Postgres table.

    • Assumption: Initially, we speculated that the absence of the engine entry might be linked to an underlying issue. However, if the engine entry had been absent, the customer wouldn't have been capable of deleting it from the UI during the provisioning of their new engine.

    • Rationale: If the customer had the ability to delete the engine during the provisioning of their new Presto-02 engine, it suggests that the current problem may not be identical to the previous one.

    • Resolution: The customer successfully resolved the issue by deleting the engine and creating a new one. Unfortunately, we couldn't assess the state of Postgres before this resolution.

    Another Troubleshooting Technique

    Asked the customer to provide us with the response of the GET /presto_engines API call, in order to verify that engine_type is Presto.

    1. Go to the infrastructure manager.

    2. Right-click and select "Inspect."

    3. Navigate to the "Network" tab.

    4. Click on presto_engines under name

    5. Check the "Headers" tab to confirm the request method is "GET."

    6. Check the "Response" tab to confirm that the engine type is "Presto"

      1. "engine_display_name": "presto-01",

      2. "engine_id": "presto-01",

      3. "type": "Presto"

    Final Confirmation

    To ensure that the customer now has the correct updated version (v.1.1.1) of their Watsonx.data instance on Cloud Pak 4 Data, I asked them to verify that the instance details in the navigation bar of their Watsonx.data instance on Cloud Pak 4 Data reflect the console build version as v1.1.1.



    #watsonx.data

    ------------------------------
    Eric Zakharian
    IBM Watsonx.data Technical Support
    ------------------------------