Your hub for all things community! Ask questions, connect with fellow members, get the support you need, and stay informed with the latest updates.
#Other#TechXchangePresenter
As generative AI continues to transform the AI landscape, the demand for scalable, efficient, and interoperable model serving infrastructure is growing rapidly. This session explores the evolution from early, custom-built deployment patterns to today’s Kubernetes-native model serving solutions. We'll unpack the unique challenges of serving large language models (LLMs) — including inference efficiency, distributed execution, KV-cache management, and cost optimization. We’re excited to announce the release of KServe v0.17, a significant milestone that brings native support for generative AI workloads; purpose-built LLMInferenceService CRD designed for advanced LLM-serving patterns such as disaggregated serving, enhanced model and KV caching, and seamless integration with Envoy AI Gateway.
Session Topic: Open SourceIndustry: Cross IndustrySpeaker(s): Yuan Tang