Your hub for all things community! Ask questions, connect with fellow members, get the support you need, and stay informed with the latest updates.
#Other#TechXchangePresenter
llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.
With llm-d, users can operationalize gen AI deployments with a modular, high-performance, end-to-end serving solution that leverages the latest distributed inference optimizations like KV-cache aware routing and disaggregated serving, co-designed and integrated with the Kubernetes operational tooling in Inference Gateway (IGW).
Join us and learn more.
Session Topic: Open SourceIndustry: Cross IndustrySpeaker(s): Carlos Costa