Cloud Pak for Data

 View Only

Monitoring Volume Usage to Prevent Application Failures on Cloud Pak for Data Clusters

By BHARATH DEVARAJU posted Wed February 14, 2024 10:05 PM

  

The performance of your Cloud Pak for Data clusters is highly correlated to the performance of your storage cluster. Hence by monitoring the volume usage for various services and performing periodic maintenance, service failures can be avoided.  

Cloud Pak for data supports various types of storage providers or vendors, as listed in the following link - https://www.ibm.com/docs/en/cloud-paks/cp-data/4.8.x?topic=requirements-storage. The most popular type of persistent storage used in OpenShift clusters is OpenShift Data Foundation or ODF storage. In this article we will discuss the built in monitoring capabilities of ODF storage cluster and how it can be effectively utilised to perform preventive maintenance of Cloud Pak for data clusters. 

OpenShift Data Foundation

Red Hat OpenShift Data Foundation is a highly integrated collection of cloud storage and data services for Red Hat OpenShift Container Platform. It is available as a package in the Red Hat OpenShift Platform Operator Hub, which facilitates simple deployment and management.

As part of the ODF Storage cluster installation process several Prometheus Rules are configured within the openshift-storage namespace. These alerts monitor different aspects of the storage cluster, For example, noobaa-prometheus-rules monitors various Object Bucket Claims defined within the cluster.

Configuring Alert Notifications 

An important aspect of preventive maintenance is choosing and configuring proper alert notification channels, timely alert notifications is the key to avoiding the application failures. OpenShift Alert Manager supports following receiver types – Slack, Pagerduty, email and Webhook. In this article we’ll integrate OpenShift alert manger with Slack, Slack makes it easier to configure multiple notification channels, this allows us to classify alert based on severity, namespace or object type, for example, critical alerts can be published to a critical slack channel and similarly individual channels can be created for warning, info, storage .

            The Slack Integration with Openshift monitoring can be configured by referencing the following article- https://www.redhat.com/en/blog/how-to-integrate-openshift-namespace-monitoring-and-slack . However, for our article we will be using the following Alert manager configuration instead of the example provided in the above article.

We will configure two different slack receivers for different alert types i.e PersistentVolumeUsageCritical and PersistentVolumeUsageNearFull.

                                   Figure 1. PersistentVolumeUsageCritical alert receiver

Figure 1 shows the Receiver configuration for PersistentVolumeUsageCritical alert, this alert is raised whenever the persistent volume usage exceeds 85% of the total volume space, we have configured additional selector for namespace to filter out alerts from other namespaces.

Figure 2. PersistentVolumeUsageNearFull alert receiver

Figure 2 shows the Receiver configuration for PersistentVolumeUsageNearFull alert, this alert is raised whenever the persistent volume usage exceeds 75% of the total volume space, we have configured additional selector for namespace to filter out alerts from other namespaces.

Finally, we’ll simulate the alerts by increasing the disk usage for a specific persistent volume, once the disk usage exceeds 75% and 85% of the total volume size, a warning and critical alert notifications are sent to our slack receiver as shown following,

Conclusion

We have successfully enabled Alert Notifications for monitoring the CPD volumes, we created an alert receiver for specific alert types which allows administrators to focus on important issues and act proactively to avoid service failures.  


#Featured-area-2-home
0 comments
26 views

Permalink