Introduction
As more customers integrate their Watsonx (CP4D & Watsonx) applications running on IBM SWH (SWH) into their critical business processes, the need for performance, scalability, reliability, and elasticity becomes increasingly important. Meanwhile, in today's dynamic cloud-native environments, efficiently managing computational resources while maintaining optimal performance is crucial for enterprise data and AI platforms.
IBM SWH is a modern, cloud-native platform designed to streamline the installation, management, and monitoring of IBM's containerized software on Red Hat OpenShift. All the services and runtimes in SWH are built on Kubernetes, which provides Horizontal Pod Autoscaling (HPA) capabilities. Since CP4D 5.0 and Watsonx 1.0, the foundation has been built for enterprise-grade data and AI platform including initial support of HPA. SWH 5.3 has further enhanced HPA support to help customers explore its benefits.
This blog post explores how HPA can transform your SWH deployment by automatically scaling pods based on your dynamic workloads on demand, reducing costs and improving overall resource utilization. It provides guidance on enabling HPA in SWH, implementation details of HPA support in SWH, and key considerations when using HPA, along with examples to showcase its benefits.
Why use Horizontal Pod Autoscaling (HPA)
Here are key reasons to consider using Horizontal Pod Autoscaling in IBM SWH on OpenShift:
-
Dynamic resource management and resource efficiency
HPA automatically adjusts the number of pod replicas based on real-time metrics such as CPU and memory usage. For example, the user management service might require 6 pods during peak user login periods but only 2 pods during off-peak hours, with HPA seamlessly scaling according to demand.
- Cost optimization
By scaling down during periods of low demand, HPA reduces infrastructure costs. For instance, if each pod requests 0.5 cores and 1Gi of memory, reducing from 8 to 2 pods during off-peak hours can significantly cut resource usage. This saves money in an OCP cluster with fixed capacity hosting many services whose loads peak at different times.
- Improved performance, reliability and availability
Auto-scaling maintains responsiveness during workload spikes by preventing bottlenecks, ensuring service quality during peak loads.
-
Enhanced operational simplicity
Reduces the need for manual tuning, allowing teams to focus on innovation rather than capacity and infrastructure management.
SWH hosts various services running a wide range of workloads that naturally experience fluctuating resource demands. HPA ensures each service scales appropriately to meet demand without over-provisioning resources upfront.
Automatically scaling in SWH
As explained in the automatically scaling services, IBM SWH services support autoscaling by using the Red Hat OpenShift Horizontal Pod Autoscaler (HPA). HPA dynamically adjusts the deployment scale of services by increasing or decreasing the number of pods in response to CPU or memory consumption.
Key configuration settings
-
-
Triggers scaling when average CPU exceeds 70% of the pod request setting
-
In SWH’s default implementation, this setting is calculated by targeting 70% of the pod limit
-
For example, if the pod CPU request is 1 and cpu limit is 2, targeting 70% of the limit means to set the utilization target to (2x70%)/1 = 140%
-
- Prevents (rapid scale-up/down) by waiting for the specified period before scaling actions
How to enable or disable HPA
There are 2 options to enable / disable HPA in SWH as of the 5.3 release:
When command line is preferred, you can enable or disable HPA in IBM SWH by using the following command as explained in this manage apply-hpa-config page:
cpd-cli manage apply-hpa-config
cpd-cli manage apply-hpa-config \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
cpd-cli manage apply-hpa-config \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
You can refer to this documentation page for all the details regarding what services support HPA and the exact command to enable or disable HPA for each service.
When user interface is preferred, you can easily navigate to the right page in SWH Administration Console using the following steps:
- Now you are on the
Status and use page where you can click the 3-dot icon beside the for which you plan to enable HPA. Then click Configure HPA. Y will see the following screen as an example:

- Select“Enable HPA” then click “Save” you will see the progress bar at the top as shown below:
After some time, above status message will change to “successfully completed” indicating the HPA is enabled and ready for autoscaling.
How to check HPA status
To check whether HPA is enabled for a service, you have two options:
-
Command-line method – provides detailed HPA settings.
-
SWH Administration Console (UI) – shows only whether HPA is enabled.
The command-line option gives you full configuration details, while the UI option is limited to displaying the HPA status.
Command line option
The first option is to use the manage get-hpa-config command.
For example, to check whether HPA is currently enabled for the common core services, use this command:
cpd-cli manage get-hpa-config \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
To get the detailed HPA settings, you can run the following command –
The output will be something like the following:

At a high level, it’s good to understand the following:
-
-
The second value is the target threshold (defined as averageUtilization). When usage exceeds this threshold, HPA triggers scaling up to add replicas. When usage falls below this threshold, HPA triggers scaling down to reduce replicas. This is oversimplified as there are other settings that control how gracefully such automatic scaling action happens by avoiding scaling up and down too quickly. For details, please refer to the OpenShift Cloud Platform (OCP) doc on this topic.
-
-
Typically, the usage percentage or target threshold would not exceed 100%. However, based on SWH's scaling best practices and HPA implementation, you will often see these values greater than 100%. See the explanation in the next section for more details.
- MINPODS: corresponding to the
minReplicas setting seen in the HPA definition
Refer to the OCP documentation for additional information regarding the values and fields shown in the sample above.
Use the SWH Administration console
The second option is to check the HPA status on the service-specific status page in the SWH Administration Console by clicking the service of interest to view its status. The "HPA enabled" field will display "True" if HPA is enabled, or "False" if it is disabled.
Note that the first option from command line allows you to get more HPA setting details, while the UI based option above currently shows only the HPA enabled status.
Implementation details behind the scenes
Let’s walk through some additional details on how SWH services support HPA.
How the maxReplicas and minReplicas are set
The HPA settings are always associated with a t-shirt size, which is the active configuration when enabling HPA. Each t-shirt size (small, medium, and large) corresponds to a scaling configuration level (level_1 through level_5), and each has its own HPA settings if HPA is supported at that scaling level.
To control the maximum number of resources a service can use, the maxReplicas setting is set to (2*x)+1 by default, where x is the fixed replica count for that scaling level (i.e., t-shirt size). For example, if a microservice has a fixed replica count of 3 in its medium t-shirt size, the maxReplicas setting will be set to 7 (2*3+1=7) by the HPA policy.
This implementation allows each scaling level to support workloads within an appropriate range while preventing excessive overlap with the next scale level, which may offer additional benefits beyond replica count (such as larger pod CPU and memory allocations).
To fully maximize footprint reduction and infrastructure cost savings, the minReplicas setting is configured as follows:
How the averageUtilization is set
The averageUtilization is the target utilization threshold that HPA uses before starting additional replicas. In SWH's HPA implementation, the target utilization is set to 70% of the pod limit setting (in most cases, based on CPU utilization as of release 5.3). Since the averageUtilization setting in the HPA policy is calculated against the pod request setting, mathematical conversion is performed to translate the 70% pod limit threshold into the equivalent pod request percentage.
For example, if a pod has a CPU request of 1 and a limit of 2, targeting 70% of the pod CPU limit of 2 means: (70% × 2) ÷ 1 = 140%. This is why many HPA policies have averageUtilization settings much higher than 100%—the percentage depends on the ratio between the pod's CPU limit and request settings.
This point is important to remember: when you enable HPA on a pod that has custom CPU request or limit settings (configured when HPA was disabled), or CPU request and limit settings are tuned after HPA is enabled, the averageUtilization setting must be adjusted accordingly, based on the match explained above.
HPA in action examples
In this section, we will use a couple of examples to showcase the footprint reduction and cost-saving benefits with numbers.
A simple sample workload
Suppose you run a data ingestion pipeline with the following setup:
Case 1: Without HPA
Total resources need to be allocated for the spikes and remain the same for both normal load and peak load periods.
Case 2: With HPA enabled (MINPODS=2, MAXPODS=8, target CPU utilization=70%)
During normal load periods, only 2 pods are active:
During peak load periods, pods scale up as needed.
Over a typical day, if workload fluctuates and you spend 80% of the time at 2 pods, this leads to significant resource savings of approximately 60% overall.
A real-life example
This example simulates a large-scale use case required by one of our customers, where we need to ensure good performance for thousands of active users in SWH performing basic UI navigation and operations such as listing and viewing projects after completing the login step.
In this example, the following services are enabled with HPA, which includes 20+ HPA policies at the microservices level:
Once HPA is enabled in the above services, when the system is idle or the load is not high enough to trigger additional replicas beyond the minReplicas setting, the total CPU request and memory request amounts are reduced by 33%-35% based on real time monitoring. This frees up approximately 22.5 vCPUs and 30Gi of memory for other workloads as needed.
The workload response times with HPA enabled are not negatively impacted and are slightly better than when HPA is disabled based on our monitoring data.
The graph below uses the user management microservice as an example to showcase HPA in action.
Here is a sub list of the related HPA policies:
A few key observations from this graph:
This demonstrates that HPA works efficiently, responsively, and smoothly as expected
Other Guidance and Best Practices
While planning to use HPA, the following should be kept in mind:
- For any service pods with custom tuning, as mentioned in the section 'How
averageUtilization is set' above, the out-of-the-box HPA averageUtilization may no longer work as intended. This setting needs to be custom tuned accordingly
Conclusion
By leveraging Horizontal Pod Autoscaler (HPA) for services running on SWH in OpenShift, organizations can adapt their resource consumption dynamically, ensuring high performance, reliability, and significant cost savings. As our examples demonstrate, HPA delivers substantial infrastructure benefits by automatically adjusting resources based on actual workload demand, maintaining excellent performance while significantly reducing resource consumption during normal operations
References