Turbonomic

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

Turbonomic support for GPU optimization on containers reaches general availability

By Paul Carley posted Thu June 27, 2024 02:21 PM

For GenAI large language model (LLM) inference workloads that use GPU resources and are deployed in a Kubernetes cluster, Turbonomic can now generate workload controller horizontal scale actions to maintain Service Level Objectives (SLOs) for these workloads simultaneously across 5 critical performance indicators - Concurrent Queries, Queueing Time, Service Time, Response Time and Transactions.

With the release of IBM Turbonomic version 8.12.6, current Turbonomic customers can leverage key performance metrics (Text Generation Inference or TGI metrics) to scale inference replicas out and in to meet application demands to maximize throughput and achieve better response time. This is important for customers leveraging container platforms to develop Generative AI (gen AI) and LLM workloads that require immense GPU processing power to operate at efficient levels of utilization. Turbonomic is engineered to optimize gen AI workloads to meet performance standards while optimizing GPU utilization to find that balance of efficiency in resource optimization and cost.

For additional details, see the detailed Scale Actions for GenAI LLM Inference Workloads documentation or read the latest blog around how IBM has leveraged this directly for watsonx. Or to see it in action, check with your IBM representative or visit IBM.com.

0 comments

16 views

Permalink

https://community.ibm.com/community/user/blogs/paul-carley/2024/06/27/turbonomic-support-for-gpu-optimization-on-contain

Turbonomic

Turbonomic

Turbonomic support for GPU optimization on containers reaches general availability

By Paul Carley posted Thu June 27, 2024 02:21 PM

Permalink

Additional
Resources

Office

Quick Links

Turbonomic

Turbonomic

Turbonomic support for GPU optimization on containers reaches general availability

By Paul Carley posted Thu June 27, 2024 02:21 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources