Cloud Pak for Data

Cloud Pak for Data

Come for answers. Stay for best practices. All we’re missing is you.

 View Only

Unlocking Scalability and Improving Resource Efficiency with Horizontal Pod Auto Scaling (HPA) for IBM Software Hub

By Yongli An posted 2 days ago

  

Introduction  

As more customers integrate their Cloud Pak for Data and Watsonx (CP4D & Watsonx) applications running on IBM SWH (SWH) into their critical business processes, the need for performance, scalability, reliability, and elasticity becomes increasingly important. Meanwhile, in today's dynamic cloud-native environments, efficiently managing computational resources while maintaining optimal performance is crucial for enterprise data and AI platforms. 

IBM SWH is a modern, cloud-native platform designed to streamline the installation, management, and monitoring of IBM's containerized software on Red Hat OpenShift. All the services and runtimes in SWH are built on Kubernetes, which provides Horizontal Pod Autoscaling (HPA) capabilities. Since CP4D 5.0 and Watsonx 1.0, the foundation has been built for enterprise-grade data and AI platform including initial support of HPA. SWH 5.3 has further enhanced HPA support to help customers explore its benefits. 

This blog post explores how HPA can transform your SWH deployment by automatically scaling pods based on your dynamic workloads on demand, reducing costs and improving overall resource utilization. It provides guidance on enabling HPA in SWH, implementation details of HPA support in SWH, and key considerations when using HPA, along with examples to showcase its benefits. 

Why use Horizontal Pod Autoscaling (HPA) 

Here are key reasons to consider using Horizontal Pod Autoscaling in IBM SWH on OpenShift 

  1. Dynamic resource management and resource efficiency
    HPA automatically adjusts the number of pod replicas based on real-time metrics such as CPU and memory usage. For example, the user management service might require 6 pods during peak user login periods but only 2 pods during off-peak hours, with HPA seamlessly scaling according to demand. 

  1. Cost optimization
    By scaling down during periods of low demand, HPA reduces infrastructure costs. For instance, if each pod requests 0.5 cores and 1Gi of memory, reducing from 8 to 2 pods during off-peak hours can significantly cut resource usage. This saves money in an OCP cluster with fixed capacity hosting many services whose loads peak at different times. 
  1. Improved performance, reliability and availability
    Auto-scaling maintains responsiveness during workload spikes by preventing bottlenecks, ensuring service quality during peak loads. 
  1. Enhanced operational simplicity
    Reduces the need for manual tuning, allowing teams to focus on innovation rather than capacity and infrastructure management. 

SWH hosts various services running a wide range of workloads that naturally experience fluctuating resource demands. HPA ensures each service scales appropriately to meet demand without over-provisioning resources upfront. 

For more details on how HPA works, see the Red Hat OpenShift documentation. 

Automatically scaling in SWH  

As explained in the automatically scaling servicesIBM SWH services support autoscaling by using the Red Hat OpenShift Horizontal Pod Autoscaler (HPA). HPA dynamically adjusts the deployment scale of services by increasing or decreasing the number of pods in response to CPU or memory consumption. 

For SWH services that support HPA, CPU utilization is the primary metric used for autoscaling decisions. 

 

Below is an example of how HPA is implemented: 

apiVersion: autoscaling/v2 

kind: HorizontalPodAutoscaler 

metadata: 

  name: sample-service-hpa 

  namespace: cpd-instance 

spec: 

  scaleTargetRef: 

    apiVersion: apps/v1 

    kind: Deployment 

    name: sample-service-deployment 

  minReplicas: 2 

  maxReplicas: 10 

  metrics: 

  - type: Resource 

    resource: 

      name: cpu 

      target: 

        type: Utilization 

        averageUtilization: 70 

  behavior: 

    scaleDown: 

      stabilizationWindowSeconds: 300 

      policies: 

      - type: Percent 

        value: 50 

        periodSeconds: 60 

    scaleUp: 

      stabilizationWindowSeconds: 0 

      policies: 

      - type: Percent 

        value: 100 

        periodSeconds: 30 

      - type: Pods 

        value: 2 

        periodSeconds: 30 

      selectPolicy: Max 

 

Key configuration settings 

 

  • minReplicas: 2 

    • Ensures high availability with minimum of 2 pods 

  • maxReplicas: 10 

    • Caps maximum scale at 10 pods to prevent resource exhaustion 

  • averageUtilization: 70  

    • Triggers scaling when average CPU exceeds 70% of the pod request setting

    • In SWH’s default implementation, this setting is calculated by targeting 70% of the pod limit

    • For example, if the pod CPU request is 1 and cpu limit is 2, targeting 70% of the limit means to set the utilization target to (2x70%)/1 = 140% 

  • stabilizationWindowSeconds300 

    • Prevents flapping (rapid scale-up/down) by waiting for the specified period before scaling actions 

 

How to enable or disable HPA  

There are 2 options to enable / disable HPA in SWH as of the 5.3 release: 

  • Command line method using cpd-cli manage  

  • User interface method in SWH Administration Console (new in 5.3 release)  

When command line is preferred, you can enable or disable HPA in IBM SWH by using the following command as explained in this manage apply-hpa-config page: 

cpd-climanageapply-hpa-config 

For example: 

  • To enablHPA for the SWH control plane (zenservices: 

cpd-cli manage apply-hpa-config \ 

--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \ 

--components=zen \ 

--enable_hpa=true 

 

  • To disable HPA for the SWH control plane (zenservices: 

cpd-cli manage apply-hpa-config \ 

--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \ 

--components=lite-cr \ 

--enable_hpa=false 

You can refer to this documentation page for all the details regarding what services support HPA and the exact command to enable or disable HPA for each service. 

When user interface is preferred, you can easily navigate to the right page in SWH Administration Console using the following steps:

  • Log in to the IBM SWH administration Console  

  • In the left-side navigation menu, click monitoring to get to the monitoring page 

  • On the Monitoring pageunder Status summary section in the left-side pane, click Services then you will see a list of the services that are running on your cluster  

  • Now you are on the Status and use page where you can click the 3-dot icon beside the service for which you plan to enable HPA Then click Configure HPA.  You will see the following screen as an example:

A screenshot of a computer

AI-generated content may be incorrect.

 

  • Select “Enable HPA” then click “Save, you will see the progress bar at the top as shown below:

A screenshot of a computer

AI-generated content may be incorrect. 

After some time, above status message will change to “successfully completed” indicating the HPA is enabled and ready for autoscaling.  

How to check HPA status  

To check whether HPA is enabled for a service, you have two options: 

  1. Command-line method – provides detailed HPA settings. 

  1. SWH Administration Console (UI) – shows only whether HPA is enabled. 

The command-line option gives you full configuration details, while the UI option is limited to displaying the HPA status. 

Command line option 

The first option is to use the manage get-hpa-config command. 

For example, to check whether HPA is currently enabled for the common core services, use this command: 

cpd-cli manage get-hpa-config \ 

--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \ 

--components=ccs 

To get the detailed HPA settings, you can run the following command –  

oc get hpa   

The output will be something like the following: 

Aa high levelit’s good to understand the following: 

  • Name: HPA policy name. Normally in SWH, the name is the deployment name with '-hpa' appended, making it easy to identify which deployment or microservice this policy applies to 

  • Reference: deployment name in SWH  

  • Targets: there are 2 values here.   

    • The first value is the current resource usage percentage, calculated against the deployment pod's resource request setting (either CPU or memory request). Most of the HPA policies are defined using CPU as the resource metric. 

    • The second value is the target threshold (defined as averageUtilization). When usage exceeds this threshold, HPA triggers scaling up to add replicas. When usage falls below this threshold, HPA triggers scaling down to reduce replicas. This is oversimplified as there are other settings that control how gracefully such automatic scaling action happens by avoiding scaling up and down too quickly. For details, please refer to the OpenShift Cloud Platform (OCP) doc on this topic.  

    • Typically, the usage percentage or target threshold would not exceed 100%. However, based on SWH's scaling best practices and HPA implementation, you will often see these values greater than 100%. See the explanation in the next section for more details. 

  • MINPODS: corresponding to the minReplicas setting seen in the HPA definition
  • MAXPODScorresponding to the maxReplicas setting as seen in the HPA definition   

  • REPLICAS: the number of current actively running pods 

Refer to the OCP documentation for additional information regarding the values and fields shown in the sample above. 

Use the SWH Administration console 

The second option is to check the HPA status on the service-specific status page in the SWH Administration Console by clicking the service of interest to view its status. The "HPA enabled" field will display "True" if HPA is enabled, or "False" if it is disabled. 

A screenshot of a computer

AI-generated content may be incorrect. 

Note that the first option from command line allows you to get more HPA setting details, while the UI based option above currently shows only the HPA enabled status.  

Implementation details behind the scenes 

Let’s walk through some additional details on how SWH services support HPA. 

How the maxReplicas and minReplicas are set  

The HPA settings are always associated with a t-shirt size, which is the active configuration when enabling HPA. Each t-shirt size (small, medium, and large) corresponds to a scaling configuration level (level_1 through level_5), and each has its own HPA settings if HPA is supported at that scaling level. 

To control the maximum number of resources a service can use, the maxReplicas setting is set to (2*x)+1 by default, where x is the fixed replica count for that scaling level (i.e., t-shirt size). For example, if a microservice has a fixed replica count of 3 in its medium t-shirt size, the maxReplicas setting will be set to 7 (2*3+1=7) by the HPA policy. 

This implementation allows each scaling level to support workloads within an appropriate range while preventing excessive overlap with the next scale level, which may offer additional benefits beyond replica count (such as larger pod CPU and memory allocations). 

To fully maximize footprint reduction and infrastructure cost savings, the minReplicas setting is configured as follows: 

  • Set to 1 for small t-shirt sizes or scaling levels that do not promise pod-level high availability 

  • Set to 2 for any scaling level that promises pod-level high availability (e.g., medium and large t-shirt sizes) 

How the averageUtilization is set 

The averageUtilization is the target utilization threshold that HPA uses before starting additional replicas. In SWH's HPA implementation, the target utilization is set to 70% of the pod limit setting (in most cases, based on CPU utilization as of release 5.3). Since the averageUtilization setting in the HPA policy is calculated against the pod request setting, mathematical conversion is performed to translate the 70% pod limit threshold into the equivalent pod request percentage. 

For example, if a pod has a CPU request of 1 and a limit of 2, targeting 70% of the pod CPU limit of 2 means: (70% × 2) ÷ 1 = 140%. This is why many HPA policies have averageUtilization settings much higher than 100%—the percentage depends on the ratio between the pod's CPU limit and request settings. 

This point is important to remember: when you enable HPA on a pod that has custom CPU request or limit settings (configured when HPA was disabled), or CPU request and limit settings are tuned after HPA is enabled, the averageUtilization setting must be adjusted accordingly, based on the match explained above. 

HPA in action examples  

In this section, we will use a couple of examples to showcase the footprint reduction and cost-saving benefits with numbers.

A simple sample workload 

Suppose you run a data ingestion pipeline with the following setup: 

  • Normal load: 2 pods  

  • Peak load: up to 8 pods during workload spikes  

  • Per pod resources:  

  • CPU: 500m (0.5 cores) 

  • Memory: 1Gi 

Case 1: Without HPA

Total resources need to be allocated for the spikes and remain the same for both normal load and peak load periods  

  • CPU: 8 × 0.5 cores = 4 cores 

  • Memory: 8 Gi  

Case 2: With HPA enabled (MINPODS=2, MAXPODS=8, target CPU utilization=70%)

During normal load periods, only 2 pods are active:  

  • CPU: 1 core (instead of 4, saving 75%) 

  • Memory: 2Gi (instead of 8Gi, saving 75%)  

During peak load periods, pods scale up as needed. 

Over a typical day, if workload fluctuates and you spend 80% of the time at 2 pods, this leads to significant resource savings of approximately 60% overall. 

A real-life example   

This example simulates a large-scale use case required by one of our customers, where we need to ensure good performance for thousands of active users in SWH performing basic UI navigation and operations such as listing and viewing projects after completing the login step. 

In this example, the following services are enabled with HPA, which includes 20+ HPA policies at the microservices level: 

  • Notebook 

  • Watson studio 

  • Control plane / Zen 

  • Common Core Services (CCS) 

  • IBM Knowledge Catalog (IKC) 

  • Watson Machine Learning (WML) 

 

Once HPA is enabled in the above services, when the system is idle or the load is not high enough to trigger additional replicas beyond the minReplicas setting, the total CPU request and memory request amounts are reduced by 33%-35% based on real time monitoring. This frees up approximately 22.5 vCPUs and 30Gi of memory for other workloads as needed. 

The workload response times with HPA enabled are not negatively impacted and are slightly better than when HPA is disabled based on our monitoring data. 

The graph below uses the user management microservice as an example to showcase HPA in action. 

A screenshot of a computer

AI-generated content may be incorrect. 

Here is a sub list of the related HPA policies: 

 

A few key observations from this graph: 

  • To support thousands of active users performing basic UI navigation in SWH, user management needs 10 pods when HPA is disabled (9:29:00pm to 9:29:45pm) 

  • HPA was turned on around 9:30:00pm, and the number of replicas dropped to 2 very quickly as expected (minReplicas is set to 2) 

  • When there is no load or low load on the system (9:30:00pm to 9:30:45pm), the replica count remains at 2 

  • When load scales up to the targeted level, the replica count quickly increases to 10 as required to support the load 

This demonstrates that HPA works efficiently, responsively, and smoothly as expected 

Other Guidance and Best Practices 

While planning to use HPA, the following should be kept in mind: 

  • HPA is based on a given t-shirt size (corresponding to a scaling level) 

  • Certain HPA settings such as maxReplicas are specific to that t-shirt size  

  • minReplicas setting remains at 1 for small t-shirt sizes 

  • minReplicas setting is set to 2 by default for medium t-shirt sizes and above to provide the needed pod-level high availability 

  • When HPA is disabled, the pod settings revert to the original default t-shirt size

  • Pod request must be set for HPA to work 

  • Removing pod limit setting may also lead to unknown risks when enabling HPA 

  • For any service pods with custom tuning, as mentioned in the section 'How averageUtilization is set' above, the out-of-the-box HPA averageUtilization may no longer work as intended. This setting needs to be custom tuned accordingly 

Conclusion 

By leveraging Horizontal Pod Autoscaler (HPA) for services running on SWH in OpenShift, organizations can adapt their resource consumption dynamically, ensuring high performance, reliability, and significant cost savings. As our examples demonstrate, HPA delivers substantial infrastructure benefits by automatically adjusting resources based on actual workload demand, maintaining excellent performance while significantly reducing resource consumption during normal operations 

 

References

0 comments
52 views

Permalink