The Problem: Watsonx.data Upgrade Issue on Cloud Pak 4 Data
-
Upgraded CP4D cluster from version 4.7.1 to 4.8.1.
-
Most services running smoothly post-upgrade, except for Watsonx.data.
Background
-
The issue lies under the which displays Presto Engine with "3 dots blinking."
-
Upon clicking the engine tile, it indicates "Restart scheduled."![](https://lh7-us.googleusercontent.com/5JFf94vrR99T7KGfQvcyg2JmreyQpn7l0tvSVs9ptwdBGROFtF2NvAUlhNN603Etgf1s6UNudqYRp2rsoyjLsLLqtcK1QIZBkbJ0ylIs76NNVS5c3NI4wXhy7ldQCe0LvMynBL2vBlOjmhLCjk_gHw7Q5Q=s2048)
![](https://lh7-us.googleusercontent.com/w-so_ipG5jxWL0jqbzrHr75Xa7vsFScORLzRpiT1YGEvK-Hly9KY2l-4YjULD1im21yY3e8r5BDFrkp2bjntZcC6AC9s7O7ih_Jn19O95L2kYRkuC3F5dByhWwh6dkEWULzjaFF62UaEJb_eVd1N0matZw=s2048)
Root Cause
-
Customer upgraded to WXD version 1.1.1 (CPD 4.8.1).
-
Executed a workaround step intended for wxd 1.1.0 (CPD 4.8.0) upgrades.
-
The workaround command unintentionally changed custom resource versions from 1.1.1 to 1.1.0.
-
The WXD operator, lacking rollback support, disregarded the custom resources, including pendingRestart requests.
-
The operator generated log messages indicating version mismatch issues, e.g., "version 1.1.0 is not supported by operator of version 1.1.1. exiting..."
First Approach
Run the following and get the CR name: oc get wxdengine
![](https://lh7-us.googleusercontent.com/43XgbWIWCN2IMdI1D5FBaF3_nPnH9IR-wwnA75GT0s5WN5V9fXayxgRAOn-u3JYPxrFPeBOWzeMHQ7xyxvom8wx4t355uqVmo_jdHqyti0GSooGvhYQXEKqAulVsVai40t3RQlq4306vE2XRIaRSNrIwkQ=s2048)
Next, run the following: oc get wxdengine lakehouse-presto-01 -o yaml
pendingRestart: true
Check the value of pendingRestart from the output, the status would be set to true, but we want to make sure, since the status of your engine is "Restart scheduled".
To confirm everything looks okay after upgrade it would be good to see what is set in:
oc get wxdaddon,wxd,wxdengine -n zen![](https://lh7-us.googleusercontent.com/WYvI4b_OSrToefK3ASsFsnbmh8dYl0-CXzJFBnT8t5BDLKMhXhDvr281eynLSR6iDze-ZNxNF9xoEwjtASD8uGWo3ibnKdgZMS2vt_zD4K6nF8MJzRYoVV-iyKdGCMXqHqDq7OY4MIX0KIM3mittIrxgJw=s2048)
Next Approach
The customer ran the patch command as suggested and the Presto Engine has stopped and it no longer displays on Infrastructure Manager. Also, shows that the data lakehouse is also shut down.
[steve@ucsbastion ~]$ oc get wxdaddon,wxd,wxdengine -n cpd-1
NAME VERSION SIZE RECONCILE AGE
wxdaddon.watsonxdata.ibm.com/wxdaddon 1.1.1 small Shutdown 158d
NAME VERSION SIZE HIVE RESTART RECONCILE AGE
wxd.watsonxdata.ibm.com/lakehouse 1.1.1 Shutdown Shutdown 158d
NAME VERSION TYPE DISPLAY NAME SIZE RECONCILE STATUS AGE
wxdengine.watsonxdata.ibm.com/lakehouse-presto-01 1.1.1 presto presto-01 small shutdown PAUSED 158d
Time For a New Solution
-
Suggested the customer to follow the steps in the documentation to restart services under Watsonx.data
-
Asked the customer to verify if the Postgres table engine includes the engine entry.
-
oc exec -it ibm-lh-postgres-edb-1 bash -n cpd-instance
-
psql -U postgres -d ibm_lh_repo -c "SELECT * FROM engine"
-
Looks like there's a Table engine.
Purpose of Verifying Postgres Table Engine Entry
-
Objective: Determine the significance of verifying the presence of the engine entry in the Postgres table.
-
Assumption: Initially, we speculated that the absence of the engine entry might be linked to an underlying issue. However, if the engine entry had been absent, the customer wouldn't have been capable of deleting it from the UI during the provisioning of their new engine.
-
Rationale: If the customer had the ability to delete the engine during the provisioning of their new Presto-02 engine, it suggests that the current problem may not be identical to the previous one.
-
Resolution: The customer successfully resolved the issue by deleting the engine and creating a new one. Unfortunately, we couldn't assess the state of Postgres before this resolution.
Another Troubleshooting Technique
Asked the customer to provide us with the response of the GET /presto_engines API call, in order to verify that engine_type is Presto.
-
Go to the infrastructure manager.
-
Right-click and select "Inspect."
-
Navigate to the "Network" tab.
-
Click on presto_engines under name
-
Check the "Headers" tab to confirm the request method is "GET."
-
Check the "Response" tab to confirm that the engine type is "Presto"
-
"engine_display_name": "presto-01",
-
"engine_id": "presto-01",
-
"type": "Presto"
![](https://lh7-us.googleusercontent.com/D1XXsoMHZiMliJLjOLIgP10RbfkAaURhYzVYHITrUM4j93x6jKqS3a_mm02qRrJ9xnun9EPWOhqwyLesB9Qew9cSzynP5XyPuSTJbSzFbjNTkShDcyZREciA0XGZSSoMqwh0ARmqZfQ-g__XEGyiIHPqTQ=s2048)
![](https://lh7-us.googleusercontent.com/yYkfo97CU0wr8JJdql_mZf3RW0bOndtwK7Rq-liKjOZf-MpT3cK3zIPiHM_JOxCeSr0D6Kgbc565NOLZEi2XoWrRvKelpSLR5TL1Hew3ofWDPl5nPj6soq7AA4XnzOQUZSArGfRb9wqIjZVq6rCgw-ceHw=s2048)
![](https://lh7-us.googleusercontent.com/O1iHaYmYhvvm3EW6eh5tO01rOc18F_qE5rcyD7j1BuCPfIZkqCbQNwjAvafdmKwT0UEGopVV9yKnKoRnF-OxJM3pyilj-uoEI02mOlgAwDXX8M7vWMjExuALo2CxmscVU9elJe_7ewTGO2_Cs8I6TiGqPQ=s2048)
![](https://lh7-us.googleusercontent.com/k1FTz6G-fnn1FgUOO4UZaqzylw6wk_ulFcrSv-jmCo4Kt91ctvPuiMqZ0j_VqTe8ZduIja7Jr2AhTlvxuaKD8KxPM0eiDP-MQtGFE2BP_jkL1uu2U2xhbpNx5CUv6AUNaADtH4KXBvCqlKjGCjwshaSG9A=s2048)
Final Confirmation
To ensure that the customer now has the correct updated version (v.1.1.1) of their Watsonx.data instance on Cloud Pak 4 Data, I asked them to verify that the instance details in the navigation bar of their Watsonx.data instance on Cloud Pak 4 Data reflect the console build version as v1.1.1.
![](https://lh7-us.googleusercontent.com/Keq-dTIKIpDajU4L5zkVEQO0cCfzPjbvZFXfIFF9Cxnjes4fcGuYEPJR28ndoW4_5j_ZtqTo9M3khNKYwjB2rEvbWcNJKNCa94Zq0XPnQ6nP8pNim7wS_eXgp6Azi0MNx7JWO-r2HGieM8-eDGxuWOz12Q=s2048)
#watsonx.data
------------------------------
Eric Zakharian
IBM Watsonx.data Technical Support
------------------------------