Turbonomic

 View Only

Turbonomic support for GPU optimization on containers reaches general availability

By Paul Carley posted 5 days ago

  

For GenAI large language model (LLM) inference workloads that use GPU resources and are deployed in a Kubernetes cluster, Turbonomic can now generate workload controller scale actions to maintain Service Level Objectives (SLOs) for these workloads.

With the release of IBM Turbonomic version 8.12.6, current Turbonomic customers can leverage GPU-specific metrics to scale inference workloads out and in to meet application demands to maximize utilization of GPUs. This is important for customers leveraging containers to develop their AI platforms as Generative AI (gen AI) and LLM workloads that require immense GPU processing power to operate at efficient levelsof performance. Turbonomic is engineered to optimize GPU resources to make sure gen AIworkloads meet performance standards while addressing efficiency in resource optimizationand cost.    

With this new service, customers leveraging GPUs for containers can now leverage GPU-specific metrics to scale inference workloads out and in to meet application demands. This can help avoid performance issues caused by high GPU utilization and it also saves time and reduces waste by maximizing application-to-infrastructure density.


For additional details, see the detailed Scale Actions for GenAI LLM Inference Workloads documentation or read the latest blog around how IBM has leveraged this directly for watsonx. Or to see it in action, check with your IBM representative or visit IBM.com.  

0 comments
3 views

Permalink