watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

View Only

Back to discussions

Expand all | Collapse all

Resolving the wxdaddon Upgrade Issue

Eric ZakharianWed June 19, 2024 06:57 PM

The problem: wxdaddon gets stuck on upgrade Upgrading Cloud Pak for data from 4.8.2 to 4.8.5. ...

Austin RexroatWed June 26, 2024 12:26 PM

Eric, this is great. Thank you for sharing your findings with the community! ------------------------------ ...

1. Resolving the wxdaddon Upgrade Issue

Like
Eric Zakharian
Posted Wed June 19, 2024 06:57 PM

Reply
The problem: wxdaddon gets stuck on upgrade

Upgrading Cloud Pak for data from 4.8.2 to 4.8.5.

When running the apply-cr step, cpd-cli failed when updating wxdaddon to the latest version.

Background:

The wxdaddon upgrade in Cloud Pak for Data was stuck due to issues with the Presto engine. Detailed investigation revealed the root cause tied to these pod failures.

Check status of the pod:

1. Run the following command:
oc describe pod ibm-lh-lakehouse-presto-01-single-blue-0

2. Check the output:
Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Killing 42m kubelet Container ibm-lh-lakehouse-presto failed liveness probe, will be restarted

Normal Pulled 42m kubelet Container image "cp.icr.io/cp/watsonx-data/ibm-lh-presto@sha256:7b31c176a1ba13eeec4fb2c0f577f163d87964a77faca7e64279d7ff6c609995" already present on machine

Normal Created 42m kubelet Created container ibm-lh-lakehouse-presto

Normal Started 42m kubelet Started container ibm-lh-lakehouse-presto

Warning BackOff 10m (x15 over 13m) kubelet Back-off restarting failed container ibm-lh-lakehouse-presto in pod ibm-lh-lakehouse-presto-01-single-blue-0_cpd-instance(bfd3b36a-4249-4bc4-bc0d-955848255146)

Warning Unhealthy 5m42s (x26 over 44m) kubelet Liveness probe failed

Warning Unhealthy 42s (x108 over 45m) kubelet Readiness probe failed: dial tcp 10.129.2.63:8443: connect: connection refused

Output Analysis:

Liveness Probe Failures: The Presto container repeatedly failed its liveness probe, causing Kubernetes to restart it continuously.

Readiness Probe Failures: The container also failed readiness probes, indicating it was not ready to accept traffic due to connection issues.

BackOff Events: Kubernetes applied a back-off mechanism due to repeated failures, delaying further restart attempts.

These failures prevented the Presto engine from stabilizing, leading to the wxdaddon upgrade being stuck.

Another approach to check the pod's health:

oc get po | grep presto

ibm-lh-lakehouse-presto-01-single-blue-0 1/1 Running 0 39h

If the pod shows up as 1/1, that means the pod is healthy

Root Cause of the Issue:

Misconfiguration of the AWS glue database caused the wxdaddon to not update.

Solution steps:

Provisioning New Presto Instance without connecting it to any catalog in the infrastructure manager.

Attaching the new Presto engine to an existing catalog/bucket pair.

Removed the old Presto engine (ID 1) via the infrastructure manager.

The new Presto engine (ID 747) replaced the old Presto engine (ID 1) following its deletion.

Identified that the ibm-lh-lakehouse-presto-01-single-blue-0 pod was failing as mentioned earlier.

Operator Pod Intervention:

Killed the operator pod to expedite the start of all playbooks, avoiding a longer wait time.

Commands used:

oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep ibm-lakehouse

oc delete pod -n ${PROJECT_CPD_INST_OPERATORS} <pod_name>

Completion of wxdaddon Update: The new operator pod completed the necessary work, and the wxdaddon update was successfully completed.

#watsonx.data
#PrestoEngine

------------------------------
Eric Zakharian
IBM watsonx.data Technical Support
------------------------------
2. RE: Resolving the wxdaddon Upgrade Issue

Like
Austin Rexroat
Posted Wed June 26, 2024 12:26 PM

Reply
Eric, this is great. Thank you for sharing your findings with the community!

------------------------------
Austin Rexroat IBM AI Community Manager
------------------------------

Original Message

watsonx.data

watsonx.data

Resolving the wxdaddon Upgrade Issue

Eric ZakharianWed June 19, 2024 06:57 PM

Austin RexroatWed June 26, 2024 12:26 PM

1. Resolving the wxdaddon Upgrade Issue

The problem: wxdaddon gets stuck on upgrade

Background:

2. RE: Resolving the wxdaddon Upgrade Issue

The problem: wxdaddon gets stuck on upgrade

Background:

Additional
Resources

Office

Quick Links

watsonx.data

watsonx.data

Resolving the wxdaddon Upgrade Issue

Eric ZakharianWed June 19, 2024 06:57 PM

Austin RexroatWed June 26, 2024 12:26 PM

1. Resolving the wxdaddon Upgrade Issue

The problem: wxdaddon gets stuck on upgrade

Background:

2. RE: Resolving the wxdaddon Upgrade Issue

The problem: wxdaddon gets stuck on upgrade

Background:

Additional Resources

Office

Quick Links

Additional
Resources