AIOps

 View Only

CP4AIOPS MustGather - How to perform healthcheck on a AIOPS installation?

By Daniel Yeap posted Thu March 02, 2023 01:52 AM

  

Have you ever wondered if your AIOPS installation was successful?

Have you ever wondered if your AIOPS is running properly?

If you have, you are not alone!

We answered those questions through the healthcheck scan of the CP4AIOPS MustGather tool. 

https://www.ibm.com/support/pages/node/6458081

Available since version 1.8.2, you can perform a healthcheck scan on a AIOPS installation with a simple command:

waiops-mustgather.sh -O healthcheck

The healthcheck includes:

(1) listing of all nodes in the cluster with their respective cpu and memory utilization
(2) listing of CSV for all product namespaces (AI Manager, Event Manager and ibm-common-services)
(3) listing of all important product CRDs and their respective phase/status
(4) deployment/statefulset/pod/job analysis (will show resources that are not running properly)
(5) PVC utilization report
(6) missing resources scan
(7) basic compliance check (only OCP version and a known problem checks for now)

Now, let's examine some sample output for the items listed above:

(1) CPU and MEMORY utilization

This report shows:

(a) total CPU and MEMORY available in the cluster

(b) CPU and MEMORY request and limit of individual nodes

(c) WARNINGs when CPU and/or MEMORY utilization is over the 95% mark

(2) CSV for all product namespaces

This report shows:
(a) CSV version
(b) CSV phase

(3) Product CRDs and their respective phase/status

This report shows:
(a) CRD version
(b) CRD phase/status
(c) CRD error (if any)

(4) Resource analysis

This report show simple analysis of deployments, statefulsets, pods, jobs and pvcs:
(a) Total resources that are running properly and those that are not.
(b) Events for problematic pods.

(5) PVC utilization report

This report shows:
(a) All the PVCs and their respective 'df -h' output

(6) Missing resource scan

By default, the missing resource scan covers resources (deployments, statefulsets, and pods) that should exist after a successful installation.
This report shows:
(a) Missing resource (and their expected count)
(b) Resources with mismatched count (eg. expected 2 instances but found only 1)
(c) Resources that exist, but are not configured as needed (eg. resources that are created when certain feature of the product is turned on)

(7) Compliance check

This report shows:
(a) Compliance result of all configured compliance rules (and their respective messages)
***************************************************************************************************************
All the reports above (available on screen and in file) should provide quite a comprehensive check on the health of a AIOPS installation.
If there is a need, the "missing resource scan" and "compliance check" are configurable to suit your requirements!
0 comments
62 views

Permalink