IBM TechXchange Community Group Library

TechXchange Group

Your hub for all things community! Ask questions, connect with fellow members, get the support you need, and stay informed with the latest updates.

#Other
#TechXchangePresenter

View Only

Back to Library

4324 - Learn how LLM inference goes distributed with llm-d

21 days ago

Sophia Antar

llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.

With llm-d, users can operationalize gen AI deployments with a modular, high-performance, end-to-end serving solution that leverages the latest distributed inference optimizations like KV-cache aware routing and disaggregated serving, co-designed and integrated with the Kubernetes operational tooling in Inference Gateway (IGW).

Join us and learn more.

Session Topic: Open Source
Industry: Cross Industry
Speaker(s): Carlos Costa

Statistics

0 Favorited

1 Views

1 Files

0 Shares

1 Downloads

Attachment(s)

4324 - Learn how LLM inference goes distributed with llm-d.pdf 1.74 MB 1 version
Uploaded - Fri October 24, 2025

Download

IBM TechXchange Community Group Library

TechXchange Group

4324 - Learn how LLM inference goes distributed with llm-d

Additional
Resources

Office

Quick Links

IBM TechXchange Community Group Library

TechXchange Group

4324 - Learn how LLM inference goes distributed with llm-d

Additional Resources

Office

Quick Links

Additional
Resources