Authors: Rishika Kedia (rishika.kedia@in.ibm.com), Shivang Goswami (shivang.goswami1@ibm.com), Niha Tahoor (Niha.Tahoor.Mohammed@ibm.com), Bhavneet Kaur (Bhavneet.Kaur2@ibm.com), Jitendra Singh (Jitendra.Singh4@ibm.com)
Introduction
Cost management data validation proof of concept validates the correctness of the data generated by Red Hat® OpenShift® Cost Management Operator. This is possible for Red Hat OpenShift running on IBM Z® and IBM® Linux.
Motivation: Cost management data validation is performed to ensure that the data in Prometheus and the Cost Management Operator on the Red Hat Console are the same. This ensures the accuracy of the Cost Management Operator.
Proposed Solution: The cost management data validation fetches the data from Red Hat Console and compares it with the Red Hat OpenShift data presented in Prometheus at that point of time. The code for validation can be accessed using the following link: Cost-Management-Operator-Validation GitHub repository.
Pre-requisites
- Clone the Cost-Management-Operator-Validation GitHub repository.
- Install Go Language, as the code is written in Go Language. To download GO, follow the instructions given at https://go.dev/dl/
- Access the Red Hat service account. For more information see, Red Hat Service Accounts.
- Install “oc” command-line tool (CLI) for Red Hat OpenShift.
- For more information, please refer to the OC CLI documentation.
- Make sure that Cost Management Operator is successfully installed in the cluster and the corresponding source is added to the Red Hat account refer link.
Execution
- Get the cluster specific information following section "Red Hat API (relevant to data from console in program output)". This is required for execution of the code as following available in main.go file.
//populate the clusterID of the openshift cluster
clusterID = ""
//populate the project name for which metrics needs to be fetched here
project = ""
//populate service account id
id = ""
//populate service account secret
secret = ""
To obtain the cluster ID, see clusterID
- Add the information regarding cluster in main.go file.
clusterID = "openshift_cluster_id"
project = "sample_project"
id = "service_acount_id"
secret = "respective-secret"
- Login to the cluster by using the oc login command
oc login -u <username> -p <password> <console-url>

Sample output of oc login
- The code assumes that Prometheus is running locally. Following command establishes a port-forwarding connection from your local machine to a pod running within your Red Hat OpenShift cluster.
oc port-forward -n openshift-monitoring pod/prometheus-k8s-0 9090:9090

Sample output of port forwarding
- Go to the folder of the cloned GitHub repository.
cd Cost-Management-Operator-Validation

The output shows the Prometheus data about the usage of previous 10 days.


Output of running main.go
As highlighted above,
Data from Console – 14.712971370277778
For more information on this, please refer section on ‘Red Hat API’.
Data from Prometheus – 14.168869717939594
For more information on this, please refer section on ‘Prometheus’.
We get data of 10 days, here the screenshots are concatenated for simplicity.
The data for current date might vary in the “Data from Console” from the summary of “Data from Prometheus” because the prior one shows the data till a certain cross-point, where the latter shows the total consumption till now.
Stress-CPU
Although any user deployed application can be leveraged but for this blog, we are using a sample application “stress-cpu” which occupies some CPU cycle on the compute node based on the configuration provided in the deployment yaml file. We create a namespace called stress-cpu and deploy it to use 80% of VCPU’s of all the compute node.
In our Scenario,
Each compute node has 4 vcpu’s.
Total number of vcpu’s = 4* number of compute nodes
Total number of vcpu’s = 4*2= 8
According to the deployment used, we are keeping 3 cores per compute node occupied.
Total number of occupied vcpus = 3*2 = 6
Load = (Total number of occupied vcpus / Total number of vcpus )*100
Load=80%
To create a new namespace
oc create namespace <new-namespace-name>
Create and apply deployment.yaml on the newly created namespace
oc apply deployment <deployment_name.yaml> -n <namespace>

Sample Deployment
- Run the cost management validation program again.

Output of running main.go
As highlighted above,
Data from Prometheus – 168.18328834736266
Data from Console. – 168.7378954113889
Data from Prometheus – 168.27757914491525
Data from Console. – 168.91556043138888
Red Hat API (relevant to data from console in program output)
Red Hat API allows developers and administrators to manage clusters, deploy applications, manage workloads. For more information about Red Hat API, see: https://console.redhat.com/docs/api/cost-management
Input parameters:
- filter[resolution]", "daily" – The report to resolution of daily.
- filter[cluster]", clusterID – The report based on the cluster ID, which is provided as input.
- group_by[project]", project – The report data is grouped according to the project which is also taken as input.
Output:
The following is the truncated output, showing only the usage data.

The output is only referring to the usage data.
For detailed output, visit the URL https://console.redhat.com/docs/api/cost-management
Prometheus (relevant to data from Prometheus in program output)
Prometheus uses the following query to get the data
sum by (pod, namespace, node) (
rate(container_cpu_usage_seconds_total{
container!="",
container!="POD",
pod!="",
namespace="` + project + `",
node!=""
}[5m]))
The query dynamically retrieves CPU usage metrics from the containers, aggregated by pod, namespace, and node, with respect to the specified project. It calculates per-second rate of CPU usage over 5-minute window.

Output of the Prometheus query
Summary
We explored the method to validate the Cost Management Operator data against the Red Hat OpenShift data presented in Prometheus. This is used to confirm the correctness of metrics provided by the Red Hat Cloud console. The code snippet mentioned in this document is hosted on GitHub/IBM repository, named Cost-Management-Operator-Validation.
To validate the data across various scenarios, we also created “Stress-CPU” to deploy “stress-cpu-ng” designed to consume approximately 80% of the vCPUs across all compute nodes in the Red Hat OpenShift cluster and to compare the data at extreme scenarios as well. For a sample application, stress-cpu code is available which can be utilized in this scenario to utilize some CPU cycles of the compute nodes. The described method can be used for Red Hat OpenShift running on IBM Z and IBM® LinuxONE.