Turbonomic

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

Take Charge of Kubernetes Node Scaling the Way You Want with Turbonomic 8.17.1

By Amil Shah posted Tue August 26, 2025 10:39 AM

Set Max Node Utilization: Custom Scaling for Standard & GPU Kubernetes Nodes

Kubernetes environments are dynamic, but your scaling strategy shouldn’t be one-size-fits-all. With Turbonomic 8.17.1, you can now take complete control of how your Kubernetes nodes scale and when node provision actions are generated. Introducing Set Max Node Utilization — a capability that lets you define exactly when and how your Kubernetes nodes scale. Whether you’re optimizing for performance, driving cost efficiency, or managing AI workloads with GPU resources, this feature delivers the precision and flexibility your teams demand.

What is Set Max Node Utilization?

Set Max Node Utilization allows DevOps engineers and platform teams to override Turbonomic’s default scaling constraint thresholds and set custom utilization targets that align with specific operational requirements. By default, Turbonomic uses fixed thresholds of 70% for limits and 90% for requests. Now, you can set your own values for:

Standard Kubernetes Nodes

Node Utilization vCPU Limit
Node Utilization vMem Limit
Node Utilization vCPU Request
Node Utilization vMem Request

GPU Kubernetes Nodes

GPU Node GPU Utilization
GPU Node Memory Utilization

These thresholds are configured when creating a new automation or default virtual machine policy under operational constraint settings and can be applied to specific node pools, clusters, machine sets, or custom VM groups. You can use all settings together for comprehensive control, or apply only the ones that matter most to your workloads.

Why Custom Thresholds Transform Your Operations

Performance-First Scaling

For mission-critical applications where latency is non-negotiable, default thresholds may trigger scaling too late. By setting more conservative targets—such as 60% vCPU utilization instead of 70%—you ensure adequate performance headroom before congestion occurs. This proactive approach prevents slowdowns in production environments where every millisecond matters.

Cost-Optimized Resource Management

When efficiency is the priority, you can maximize node utilization before scaling occurs. Setting higher thresholds—like 90% vCPU utilization—ensures you extract maximum value from existing infrastructure before provisioning additional nodes. This approach is particularly valuable for expensive resources like GPU nodes, where optimal utilization directly impacts your bottom line.

Tailored Risk Management

Different workloads have different risk profiles. Set Max Node Utilization allows you to customize scaling behavior based on your risk tolerance. Production clusters might use conservative 50% thresholds for guaranteed performance, while development environments could operate at 85% utilization for cost savings.

AI Workload Optimization

GPU-intensive AI workloads, such as inference tasks, require specialized resource management. Custom GPU and GPU memory utilization thresholds ensure your AI applications receive the computational resources they need exactly when they need them, preventing bottlenecks that could impact model performance or training times.

Example Scenario Walkthrough

Consider how Set Max Node Utilization transforms resource management in practice:

vCPU Constraint Scenario: A Kubernetes node running a latency-sensitive application has 100 vCPU cores capacity with current usage at 85 cores. The team sets a conservative 30% target utilization to ensure optimal performance.

The vCPU Calculation:

Total usage to distribute: 85 cores

Target usage per node: 30 cores maximum (30% of 100-core capacity)

Required nodes: 85 ÷ 30 = 2.83 ≈ 3 nodes needed

Scaling action:

- Current nodes: 1
- Needed nodes: 3
- Nodes to provision: 2 additional nodes

This proactive scaling ensures the application maintains peak performance while preventing CPU resource contention.

vMem Constraint Scenario: A memory-intensive analytics workload is running on a node with 256GB memory capacity and current usage at 230GB (90% memory utilization). The team sets a 60% target utilization to prevent memory pressure and maintain consistent performance.

vMem Scaling Calculation:

Total usage to distribute: 230GB

Target usage per node: 153.6GB maximum (60% of 256GB capacity)

Required nodes: 230 ÷ 153.6 = 1.5 ≈ 2 nodes needed

Scaling action:

- Current nodes: 1
- Needed nodes: 2
- Nodes to provision: 1 additional node

This prevents memory pressure and ensures the analytics workload has sufficient memory headroom for optimal processing.

GPU Constraint Scenario: An AI inference workload is running on a GPU node with 8 GPU cores capacity and 32GB GPU memory. Current usage shows 7.2 GPU cores (90% GPU utilization) and 25.6GB GPU memory (80% GPU memory utilization). The team sets conservative targets of 50% for both GPU utilization and GPU memory to ensure consistent inference performance.

GPU Scaling Calculation:

Current GPU usage: 7.2 cores, target per node: 4 cores (50% of 8-core capacity)

Current GPU memory usage: 25.6GB, target per node: 16GB (50% of 32GB capacity)

Required nodes for GPU cores: 7.2 ÷ 4 = 1.8 ≈ 2 nodes needed

Required nodes for GPU memory: 25.6 ÷ 16 = 1.6 ≈ 2 nodes needed

Scaling action:

- Current nodes:
- Needed nodes: 2
- Nodes to provision: 1 additional GPU node

This ensures AI workloads maintain optimal performance with sufficient GPU resources for consistent inference times.

Precision Where It Matters Most: GPUs

For organizations running AI workloads, GPU resource availability directly impacts response time and throughput. With Set Max Node Utilization, you can set GPU core and memory utilization thresholds to:

Prevent performance bottlenecks.
Avoid over-provisioning expensive GPU infrastructure.
Keep inference and training workloads running smoothly.

Get Started Today 🚀

1. Upgrade to Turbonomic 8.17.1 to enable Set Max Node Utilization.

2. Review Documentation:

Container Node Policies

Container Node Provisioning

3. Define Your Targets: Choose vCPU, vMem, and GPU values based on performance, cost, and risk tolerance.

4. Try It Free: Sign up for a 30-day trial Sign-Up for a Free Turbonomic Trial 🔑

5. Share Feedback: Help shape future scaling automation by submitting your ideas. Submit Idea

0 comments

40 views

Permalink

https://community.ibm.com/community/user/blogs/amil-shah/2025/08/22/take-charge-of-kubernetes-node-scaling-the-way-you

Turbonomic

Turbonomic

Take Charge of Kubernetes Node Scaling the Way You Want with Turbonomic 8.17.1

By Amil Shah posted Tue August 26, 2025 10:39 AM

Set Max Node Utilization: Custom Scaling for Standard & GPU Kubernetes Nodes

What is Set Max Node Utilization?