Innovation at IBM is focused on the needs of our clients, and our design-for-purpose approach on the mainframe is a prime example of this. Designed from the silicon up to run mission-critical transaction workloads, our focus is on scale, availability, and security. The scale of transaction processing required for one of these systems is massive: from the thousands of supported devices to the hundreds of pluggable cards, to the isolation required to support 1000's of OS instances that can run on a system. And now, with the preview of the IBM Spyre ™ Accelerator for Z, the need for I/O acceleration and data processing within a box only increases.
To continue meeting the ever-increasing demands for today’s workloads, IBM announced the new Telum® II processor which features a brand-new Data Processing Unit (DPU) for I/O Acceleration. The I/O acceleration, integrated within the Telum II processor, is expected to be available in 2025. This DPU integrates I/O acceleration directly onto the processor chip. One of the benefits in doing this is to move from a 2-port FICON card to a 4-port card and consolidate the OSA Express and RoCE Express offerings at the system level. This change, available beginning with the next-generation IBM Z in the first half of 2025, will allow clients to maintain the same I/O configuration in a smaller footprint, to reduce data center floorspace as they upgrade and modernize their infrastructure.
Each DPU includes four processing clusters, each with eight programmable micro-controllers where the firmware is loaded, and an I/O accelerator that manages the four processing clusters and the I/O subsystem for 2 I/O drawer domains. The DPU also features a separate L1 cache and a request manager to track outstanding requests. The request manager manages the state across the system, determining how to distribute work to maintain balance and encourage cache affinity.
The clusters and the accelerator connect externally via both the PCIe interface and the cache fabric. Connecting the DPU to the cache fabric prevents flooding the L2 cache with bulk data transfers and routes the data directly to its destination. The clusters and the accelerator also connect to each other via the coherency fabric, which ties the DPU into the cache fabric via its private 36MB L2 cache.
The DPU sits between the main processor fabric and the PCIe fabric. The PCIe interface connection is protected by a translation mechanism to ensure memory accesses are secure. The main processors and the DPU communicate through this low-latency fabric, optimizing the latency between the processor and I/O subsystem. Directly attaching the DPU to the fabric reduces overhead for data transfers while improving throughput and power efficiency.
In a maximum configuration, future IBM Z systems can be equipped with up to 32 Telum II processors and 12 I/O cages. Each cage can accommodate up to 16 PCIe slots, allowing the system to support up to 192 PCIe cards. Additionally, custom I/O protocols enhance availability, error checking, and virtualization to meet massive bandwidth requirements, as well as provide redundancy and multi-pathing for protection against simultaneous multi-failure scenarios.
The introduction of the DPU for I/O Acceleration on the Telum II processor represents a transformative leap in mainframe technology, enabling clients to handle increasingly complex workloads. By doubling I/O capacity while reducing physical footprint and power usage, IBM is not just meeting current demands but is also working to make the mainframe platform future-ready to support emerging AI-driven applications and massive data transfers. This innovation ensures that each new generation of IBM Z will continue to set industry benchmarks for performance, scalability, and sustainability, empowering clients to maintain their competitive edge.
Statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.