IBM TechXchange Community Group Library

TechXchange Group

Your hub for all things community! Ask questions, connect with fellow members, get the support you need, and stay informed with the latest updates.


#Other
#TechXchangePresenter

 View Only

4324 - Learn how LLM inference goes distributed with llm-d 

21 days ago

llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.

 

With llm-d, users can operationalize gen AI deployments with a modular, high-performance, end-to-end serving solution that leverages the latest distributed inference optimizations like KV-cache aware routing and disaggregated serving, co-designed and integrated with the Kubernetes operational tooling in Inference Gateway (IGW).

 

Join us and learn more.

Session Topic: Open Source
Industry: Cross Industry
Speaker(s): Carlos Costa

Statistics
0 Favorited
1 Views
1 Files
0 Shares
1 Downloads
Attachment(s)
pdf file
4324 - Learn how LLM inference goes distributed with llm-d.pdf   1.74 MB   1 version
Uploaded - Fri October 24, 2025