Db2 for z/OS and its ecosystem

Db2 for z/OS and its ecosystem

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

Unlocking enterprise-scale gen AI inferencing: Announcing GA of IBM AI Optimizer for Z 2.1

By Animol Nair posted 2 days ago

  

Accelerate gen AI on IBM Z and optimize inference for maximum performance, efficiency and security.

IBM AI Optimizer for Z 2.1 is now generally available[, delivering high-performance, low-latency gen AI inferencing on IBM Z powered by IBM SpyreTM Accelerator. This release represents the first step in a continuous delivery roadmap, with additional capabilities and optimizations planned for rollout over the next several quarters.

Optimizing gen AI workloads for greater performance

AI Optimizer for Z  harnesses the power of IBM Spyre™ Accelerator (Spyre) to deliver high-performance, low-latency inferencing for AI models. By combining Spyre with advanced capabilities like KV caching and real-time monitoring, it enables enterprises to optimize gen AI workloads across infrastructures with unmatched efficiency, scalability and security.

Key capabilities of AI Optimizer for Z 2.1 include:

1. Real-time monitoring and observability:

AI Optimizer for Z provides advanced real-time monitoring for gen AI workloads using Prometheus for metric collection and Grafana for intuitive visualization. It tracks key metrics such as token throughput, latency per request, cache hit ratio, time-to-first-token and memory utilization, along with a plan to include hardware usage metrics like GPU/accelerator utilization.

AI Optimizer can integrate with the OpenTelemetry (OTel) collector when it is configured with Prometheus receivers. This enables seamless telemetry ingestion and interoperability for unified observability across hybrid environments. These insights empower organizations to make informed decisions on capacity planning, workload routing, performance monitoring and infrastructure optimization—helping avoid over-provisioning, reduce costs and improve overall performance.

2. Multi-level caching:

In a staged delivery plan, AI Optimizer for Z will introduce multi-level caching to accelerate gen AI inferencing.

At the first level, KV caching reuses previously computed token sequences within a single Large Language Model (LLM) deployment, reducing time-to-first-token and improving throughput.

At the second level, extended caching shares these computations across multiple LLM deployments, enabling even greater efficiency for large-scale workloads. This capability translates into significant business value by lowering infrastructure costs, improving response times for customer-facing applications and enabling enterprises to scale AI services without over-provisioning resources.

3. Inferencing optimization:

AI Optimizer for Z enables flexible tagging of LLMs, allowing users to group models by application, business use case, or performance requirements. These tags can be applied to inferencing requests, ensuring intelligent routing and optimized resource utilization across multiple deployments.

Additionally, the solution supports registration of external LLMs running outside IBM Z or LinuxONE, integrating them into the same tagging and routing framework for unified optimization. This capability provides enterprises with greater control and agility, enabling consistent performance and cost efficiency across hybrid AI environments.

Why this matters: Enterprise AI without compromise

Organizations operating on IBM Z face unique constraints such as data residency, privacy mandates, low latency requirements, and mission-critical reliability. AI Optimizer for Z 2.1 enables GenAI adoption without requiring workload movement or architectural risk, delivering immediate value across industries like banking, insurance, manufacturing, and the public sector. With a continuous delivery approach, enterprises can expect ongoing enhancements that further strengthen performance, scalability and security

A planned, strategic path forward

The GA of AI Optimizer for Z 2.1 marks the beginning of a broader roadmap. Core inferencing and observability capabilities are available now, and additional optimizations will be delivered through incremental releases over the next several quarters, ensuring customers can continuously benefit from innovation without disruption.

Learn more: https://www.ibm.com/products/ai-optimizer-for-z

Register here for the webinar to deep-dive.

0 comments
5 views

Permalink