Set Max Node Utilization: Custom Scaling for Standard & GPU Kubernetes Nodes
Kubernetes environments are dynamic, but your scaling strategy shouldn’t be one-size-fits-all. With Turbonomic 8.17.1, you can now take complete control of how your Kubernetes nodes scale and when node provision actions are generated. Introducing Set Max Node Utilization — a capability that lets you define exactly when and how your Kubernetes nodes scale. Whether you’re optimizing for performance, driving cost efficiency, or managing AI workloads with GPU resources, this feature delivers the precision and flexibility your teams demand.
What is Set Max Node Utilization?
Set Max Node Utilization allows DevOps engineers and platform teams to override Turbonomic’s default scaling constraint thresholds and set custom utilization targets that align with specific operational requirements. By default, Turbonomic uses fixed thresholds of 70% for limits and 90% for requests. Now, you can set your own values for:
Standard Kubernetes Nodes
- Node Utilization vCPU Limit
- Node Utilization vMem Limit
- Node Utilization vCPU Request
- Node Utilization vMem Request

GPU Kubernetes Nodes
- GPU Node GPU Utilization
- GPU Node Memory Utilization

These thresholds are configured when creating a new automation or default virtual machine policy under operational constraint settings and can be applied to specific node pools, clusters, machine sets, or custom VM groups. You can use all settings together for comprehensive control, or apply only the ones that matter most to your workloads.
Why Custom Thresholds Transform Your Operations

Performance-First Scaling
For mission-critical applications where latency is non-negotiable, default thresholds may trigger scaling too late. By setting more conservative targets—such as 60% vCPU utilization instead of 70%—you ensure adequate performance headroom before congestion occurs. This proactive approach prevents slowdowns in production environments where every millisecond matters.
Cost-Optimized Resource Management
When efficiency is the priority, you can maximize node utilization before scaling occurs. Setting higher thresholds—like 90% vCPU utilization—ensures you extract maximum value from existing infrastructure before provisioning additional nodes. This approach is particularly valuable for expensive resources like GPU nodes, where optimal utilization directly impacts your bottom line.
Tailored Risk Management
Different workloads have different risk profiles. Set Max Node Utilization allows you to customize scaling behavior based on your risk tolerance. Production clusters might use conservative 50% thresholds for guaranteed performance, while development environments could operate at 85% utilization for cost savings.
AI Workload Optimization
GPU-intensive AI workloads, such as inference tasks, require specialized resource management. Custom GPU and GPU memory utilization thresholds ensure your AI applications receive the computational resources they need exactly when they need them, preventing bottlenecks that could impact model performance or training times.
Example Scenario Walkthrough
Consider how Set Max Node Utilization transforms resource management in practice:
vCPU Constraint Scenario: A Kubernetes node running a latency-sensitive application has 100 vCPU cores capacity with current usage at 85 cores. The team sets a conservative 30% target utilization to ensure optimal performance.
The vCPU Calculation:
Total usage to distribute: 85 cores
Target usage per node: 30 cores maximum (30% of 100-core capacity)
Required nodes: 85 ÷ 30 = 2.83 ≈ 3 nodes needed
Scaling action:
-
- Current nodes: 1
- Needed nodes: 3
- Nodes to provision: 2 additional nodes

This proactive scaling ensures the application maintains peak performance while preventing CPU resource contention.
vMem Constraint Scenario: A memory-intensive analytics workload is running on a node with 256GB memory capacity and current usage at 230GB (90% memory utilization). The team sets a 60% target utilization to prevent memory pressure and maintain consistent performance.
vMem Scaling Calculation:
Total usage to distribute: 230GB
Target usage per node: 153.6GB maximum (60% of 256GB capacity)
Required nodes: 230 ÷ 153.6 = 1.5 ≈ 2 nodes needed
Scaling action:
-
- Current nodes: 1
- Needed nodes: 2
- Nodes to provision: 1 additional node

This prevents memory pressure and ensures the analytics workload has sufficient memory headroom for optimal processing.
GPU Constraint Scenario: An AI inference workload is running on a GPU node with 8 GPU cores capacity and 32GB GPU memory. Current usage shows 7.2 GPU cores (90% GPU utilization) and 25.6GB GPU memory (80% GPU memory utilization). The team sets conservative targets of 50% for both GPU utilization and GPU memory to ensure consistent inference performance.
GPU Scaling Calculation:
Current GPU usage: 7.2 cores, target per node: 4 cores (50% of 8-core capacity)
Current GPU memory usage: 25.6GB, target per node: 16GB (50% of 32GB capacity)
Required nodes for GPU cores: 7.2 ÷ 4 = 1.8 ≈ 2 nodes needed
Required nodes for GPU memory: 25.6 ÷ 16 = 1.6 ≈ 2 nodes needed
Scaling action:
-
- Current nodes:
- Needed nodes: 2
- Nodes to provision: 1 additional GPU node

This ensures AI workloads maintain optimal performance with sufficient GPU resources for consistent inference times.
Precision Where It Matters Most: GPUs
For organizations running AI workloads, GPU resource availability directly impacts response time and throughput. With Set Max Node Utilization, you can set GPU core and memory utilization thresholds to:
- Prevent performance bottlenecks.
- Avoid over-provisioning expensive GPU infrastructure.
- Keep inference and training workloads running smoothly.

Get Started Today 🚀
1. Upgrade to Turbonomic 8.17.1 to enable Set Max Node Utilization.
2. Review Documentation:
Container Node Policies
Container Node Provisioning
3. Define Your Targets: Choose vCPU, vMem, and GPU values based on performance, cost, and risk tolerance.
4. Try It Free: Sign up for a 30-day trial Sign-Up for a Free Turbonomic Trial 🔑
5. Share Feedback: Help shape future scaling automation by submitting your ideas. Submit Idea