Red Hat OpenShift

Red Hat OpenShift

Red Hat OpenShift

Kubernetes-based container platform that provides a trusted environment to run enterprise workloads. It extends the Kubernetes platform with built-in software to enhance app lifecycle development, operations, and security

 View Only

Cost management data validation of Red Hat OpenShift data

By Gerald Hosch posted Wed November 06, 2024 04:22 AM

  

Authors: Rishika Kedia (rishika.kedia@in.ibm.com), Shivang Goswami (shivang.goswami1@ibm.com), Niha Tahoor (Niha.Tahoor.Mohammed@ibm.com), Bhavneet Kaur (Bhavneet.Kaur2@ibm.com), Jitendra Singh (Jitendra.Singh4@ibm.com)

Introduction

Cost management data validation proof of concept validates the correctness of the data generated by Red Hat® OpenShift® Cost Management Operator. This is possible for Red Hat OpenShift running on IBM Z® and IBM® Linux.

Motivation: Cost management data validation is performed to ensure that the data in Prometheus and the Cost Management Operator on the Red Hat Console are the same. This ensures the accuracy of the Cost Management Operator.

Proposed Solution: The cost management data validation fetches the data from Red Hat Console and compares it with the Red Hat OpenShift data presented in Prometheus at that point of time. The code for validation can be accessed using the following link: Cost-Management-Operator-Validation GitHub repository.

Pre-requisites
  • Clone the Cost-Management-Operator-Validation GitHub repository.
  • Install Go Language, as the code is written in Go Language. To download GO, follow the instructions given at https://go.dev/dl/
  • Access the Red Hat service account. For more information see, Red Hat Service Accounts.
  • Install “oc” command-line tool (CLI) for Red Hat OpenShift.
  • For more information, please refer to the OC CLI documentation.
  • Make sure that Cost Management Operator is successfully installed in the cluster and the corresponding source is added to the Red Hat account refer link.
Execution
  • Get the cluster specific information following section "Red Hat API (relevant to data from console in program output)". This is required for execution of the code as following available in main.go file.

//populate the clusterID of the openshift cluster

clusterID = ""

//populate the project name for which metrics needs to be fetched here

project = ""

//populate service account id

id = ""

//populate service account secret

secret = ""

To obtain the cluster ID, see clusterID 

  • Add the information regarding cluster in main.go file.

clusterID = "openshift_cluster_id"
project = "sample_project"
id = "service_acount_id"
secret = "respective-secret"

  • Login to the cluster by using the oc login command

oc login -u <username> -p <password> <console-url>

Sample output of oc login

  • The code assumes that Prometheus is running locally. Following command establishes a port-forwarding connection from your local machine to a pod running within your Red Hat OpenShift cluster.

oc port-forward -n openshift-monitoring pod/prometheus-k8s-0 9090:9090

Sample output of port forwarding

  • Go to the folder of the cloned GitHub repository.

cd Cost-Management-Operator-Validation

  • Run the main file

The output shows the Prometheus data about the usage of previous 10 days.

Output of running main.go

As highlighted above, 

  • For date 2024-09-17,

Data from Console – 14.712971370277778

For more information on this, please refer section on ‘Red Hat API’.

Data from Prometheus – 14.168869717939594

For more information on this, please refer section on ‘Prometheus.

We get data of 10 days, here the screenshots are concatenated for simplicity.

The data for current date might vary in the “Data from Console” from the summary of “Data from Prometheus” because the prior one shows the data till a certain cross-point, where the latter shows the total consumption till now.

Stress-CPU

Although any user deployed application can be leveraged but for this blog, we are using a sample application “stress-cpu” which occupies some CPU cycle on the compute node based on the configuration provided in the deployment yaml file. We create a namespace called stress-cpu and deploy it to use 80% of VCPU’s of all the compute node.

In our Scenario,

Each compute node has 4 vcpu’s.

Total number of vcpu’s = 4* number of compute nodes

Total number of vcpu’s = 4*2= 8

According to the deployment used, we are keeping 3 cores per compute node occupied.

Total number of occupied vcpus = 3*2 = 6

Load = (Total number of occupied vcpus / Total number of vcpus )*100

Load=80%

To create a new namespace

oc create namespace <new-namespace-name>

Create and apply deployment.yaml on the newly created namespace

oc apply deployment <deployment_name.yaml> -n <namespace>

Sample Deployment

  • Run the cost management validation program again.

Output of running main.go

As highlighted above, 

  • For date 2024-09-17,

Data from Prometheus – 168.18328834736266

Data from Console.        168.7378954113889

  • For date 2024-09-16,

Data from Prometheus – 168.27757914491525

Data from Console.        168.91556043138888

Red Hat API (relevant to data from console in program output)

Red Hat API allows developers and administrators to manage clusters, deploy applications, manage workloads. For more information about Red Hat API, see: https://console.redhat.com/docs/api/cost-management

Input parameters:

  • filter[resolution]", "daily" – The report to resolution of daily.
  • filter[cluster]", clusterID – The report based on the cluster ID, which is provided as input.
  • group_by[project]", project – The report data is grouped according to the project which is also taken as input.

Output:

The following is the truncated output, showing only the usage data.

The output is only referring to the usage data. 

For detailed output, visit the URL https://console.redhat.com/docs/api/cost-management

Prometheus (relevant to data from Prometheus in program output)

Prometheus uses the following query to get the data

sum by (pod, namespace, node) (

  rate(container_cpu_usage_seconds_total{

    container!="",

    container!="POD",

    pod!="",

    namespace="` + project + `",

    node!=""

  }[5m]))

The query dynamically retrieves CPU usage metrics from the containers, aggregated by pod, namespace, and node, with respect to the specified project. It calculates per-second rate of CPU usage over 5-minute window.

Output of the Prometheus query

Summary

We explored the method to validate the Cost Management Operator data against the Red Hat OpenShift data presented in Prometheus. This is used to confirm the correctness of metrics provided by the Red Hat Cloud console. The code snippet mentioned in this document is hosted on GitHub/IBM repository, named Cost-Management-Operator-Validation.

To validate the data across various scenarios, we also created “Stress-CPU” to deploy “stress-cpu-ng” designed to consume approximately 80% of the vCPUs across all compute nodes in the Red Hat OpenShift cluster and to compare the data at extreme scenarios as well. For a sample application, stress-cpu code is available which can be utilized in this scenario to utilize some CPU cycles of the compute nodes. The described method can be used for Red Hat OpenShift running on IBM Z and IBM® LinuxONE.

0 comments
20 views

Permalink