AI on IBM Z & IBM LinuxONE

AI on IBM Z & IBM LinuxONE

AI on IBM Z & IBM LinuxONE

Leverage AI on IBM Z & LinuxONE to enable real-time AI decisions at scale, accelerating your time-to-value, while ensuring trust and compliance

 View Only

Inside IBM z17®: Built for AI at Scale with Telum II®

By Joy Deng posted yesterday

  

It is an exciting time for AI on IBM Z as the newest infrastructure becomes Generally Available for IBM z17® on June 18th. 2025.  The core of every business includes digital data, and the availability of this next generation platform brings together highly customized hardware, software, and services offerings to quickly enable enterprise-level AI.  

The new IBM z17 brings together innovation across the stack, building on IBM’s deep experience with enterprise workloads on z/OS and Linux on IBM Z. From a hardware perspective, innovations include the IBM Telum II® processor, which is designed to enable clients to leverage multiple AI models for real-time inferencing within a transaction. IBM z17 can process up to 450 billion inference operations per day with 1 ms response time1.

Telum II unlocks broad support for encoder Large Language Models (LLMs) enabling the use of both structured and unstructured data for use in AI analysis. Combining encoder LLMs with predictive AI can be a strategy to improve AI predictive scoring accuracy, with benefits such as lowering risk, saving costs, growing business, optimizing worker productivity, and increasing customer satisfaction and retention.

From a software perspective, many offerings including the ones highlighted below have been optimized to take advantage of the Telum II chip. Let's explore this in further detail.

Telum II Hardware features enable even greater throughput and AI acceleration

  • Developed using Samsung 5nm technology, Telum II includes eight high-performance cores running at 5.5GHz, boosting speed while efficiently managing large workloads.
  • Features a built-in low-latency data processing unit (DPU) for accelerated I/O.
  • Includes a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing to 360MB and 2.88GB respectively, enabling fast transactions and real-time analytics.
  • Intelligent routing enhancements enable each AI accelerator to accept work from any core in the same drawer to improve the load balancing across all eight of those AI accelerators.
  • Support for INT8 as a data type has been added to increase AI compute capacity and efficiency for AI inference, enabling the deployment of newer more efficient models.
  •  Learn more about Telum II here

Software Optimized for Telum II

To take advantage of the on-chip AI inferencing acceleration in Telum II in z17, IBM has made enhancements within our software to optimize for Telum II. These enhancements are included in Machine Learning for z/OS® 3.2 and frameworks in AI Toolkit for IBM Z® and LinuxONE (TensorFlow, PyTorch, TensorFlow Serving, Triton Inference Server, Z Deep Learning Compiler, and SnapML).

Machine Learning for IBM z/OS®

Machine Learning for IBM z/OS® (MLz) is a full-feature transactional AI platform tailor made for AI infusion into applications running in z/OS environments. Whether you have your applications running in CICS, IMS or Batch, MLz offers highly optimized application native APIs (REST APIs are also available) that can leverage in-memory inference requests allowing for high throughput and sub millisecond latency. MLz also allows clients to leverage AI models trained on any platform or framework to be easily imported and deployed on the platform through a rich ecosystem of supported formats like Spark, PMML, Snap ML and ONNX.

The latest update to Machine Learning for IBM z/OS will enable clients to take advantage of all the enhancements (see above) of the Telum II on-chip AI accelerator. MLz leverages the embedded IBM Z Deep Learning Compiler along with the IBM Z Deep Neural Network Library to execute inference requests on Telum II.

Other enhancements include dual control and serving ID that provides more fine-grained control over the deployment workflow along with continuous security updates. The expected availability of this PTF for MLz v3.2 is July 18, 2025.

AI Toolkit for IBM Z® and LinuxONE

All components of the AI Toolkit run on Linux on Z and z/OS via IBM z/OS Container Extensions (zCX), and are designed to leverage IBM Z’s Integrated Accelerator for AI, the on-chip AI inferencing accelerator. With this release now available, we have specifically updated the following to take advantage of Telum II

  • IBM Z Accelerated for PyTorch v1.2
  • IBM Z Accelerated for TensorFlow v1.4
  • IBM Z Accelerated Serving for TensorFlow v1.
  • IBM Z Accelerated for Snap ML v1.4
  • IBM Z Accelerated for NVIDIA Triton Inference Server  v1.4
  • IBM Z Deep Learning Compiler v5.0

These updates enable significantly improved AI inference throughput, support multiple AI model strategies, and offer a consistent experience across the AI lifecycle—from model training to real-time inferencing. (Read more on the latest updates here)

Db2 13 with SQL Data Insights

An AI-powered feature of Db2 for z/OS version 13 is Db2 SQL Data Insights, which continues to exploit the on-chip AI accelerator on Telum II. SQL Data Insights has the ability to run AI queries that discover, match, and cluster semantic similarities and dissimilarities in your Db2 data. With IBM z17 enhancements that utilize remote AI inferencing acceleration within a drawer, SQL Data Insights AI_Analogy query enjoys additional throughput improvement.

IBM Z Platform for Apache Spark v1.1.0

IBM Z Platform for Apache Spark brings the open-sourced distributed computing power of Apache Spark directly to the IBM Z environment, allowing your organization to run analytics in-place where your mission critical data resides. This allows enterprises to act on their most valuable data in real time from local data sources. By infusing analytics into transactional environments and leverage an enterprise grade open-source compute platform, organizations can run general purpose applications natively on z/OS to unlock new value in their core infrastructure.

In our latest delivery of IBM Z Platform for Apache Spark v1.1.0 we are enabling WebUI Authentication which is a security mechanism used to control access to Spark’s web-based user interfaces while running on z/OS. These updates will be available on June 30th.

Synthetic Data Sets

Adopting AI can sometimes be delayed due to data privacy regulations preventing data access for AI model training. To help alleviate early challenges for AI, IBM now offers IBM Synthetic Data Sets, which are a family of artificially generated data sets.  These data sets are simulated rather than masked, so there is no real PII that would be at risk of misuse.  In this way synthetic data can be used to accelerate and enhance AI projects for optimizing models and deployment options. In addition, the datasets include broader and richer data than might be available in reality, such as labels for fraud and money laundering on all transactions.

IBM Synthetic Data Sets recently won “Best AI Solution - Data Insights & Knowledge Management” at the FinTech Futures Banking Tech Awards USA 2025 in New York City. We are proud that this recognition validates the value we deliver to clients in balancing the need for robust AI training with the critical importance of data privacy.

Next Steps

This is indeed an exciting time for AI on IBM Z. We look forward to engaging with you in discussing possibilities of AI on your mission critical data in IBM Z.

Please join the community and subscribe to get notifications regarding new developments happening on the platform.



1 Disclaimer: Performance result is extrapolated from IBM® internal tests running on IBM Systems Hardware of machine type 9175. The benchmark was executed with 64 threads performing local inference operations using a synthetic credit card fraud detection (CCFD) model based on an LSTM (https://github.com/IBM/ai-on-z-fraud-detection) and a TabFormer (https://github.com/IBM/TabFormer) model. The benchmark exploited the Integrated Accelerator for AI using IBM Z Deep Learning Compiler (zDLC) and IBM Z Accelerated for PyTorch. The setup consists of 64 threads pinned in groups of 8 to each chip (1 for zDLC, 7 for PyTorch). The TabFormer (tabular transformer) model evaluated 0.035% of the inference requests. A batch size of 160 was used for the LSTM based model. IBM Systems Hardware configuration: 1 LPAR running Ubuntu 24.04 with 45 cores (SMT), 128 GB memory. Results may vary.

0 comments
10 views

Permalink