Machine Learning for IBM z/OS, MLz, is a full-feature transactional AI platform tailor made for AI infusion into applications running in z/OS environments. Whether you have your applications running in CICS, IMS or Batch, MLz offers highly optimized application native APIs (REST APIs are also available) that can leverage in-memory inference requests allowing for high throughput and sub millisecond latency. MLz also allows clients to leverage AI models trained on any platform or framework to be easily imported and deployed on the platform through a rich ecosystem of supported formats like Spark, PMML, Snap ML and ONNX.
The latest enhancement to MLz 3.2, available with APAR PH66196, includes support for dual control, serving ID, and leverages the embedded IBM Z Deep Learning Compiler, zDLC, along with the IBM Z Deep Neural Network Library, zDNN, to execute inference requests on Telum II on-chip AI Accelerator available with IBM z17.
With dual control, online deployment related activity in MLz requires an additional user to approve the changes for the content to go live. This feature allows for tighter control and traceability over any changes that can impact the production environment, such as augmenting the contents of the model being used for scoring by the various COBOL workloads. Dual control is disabled by default but can be enabled at any time and will also keep an audit trail of what has been approved. With dual control, approval is required when a deployment is being created, updated, or deleted. The approver must minimally be an MLz System Administrator and is required to be a different user ID than the one that originated the update.

The latest edition also enhances online deployments with support for serving ID. The serving ID is a customer provided value that can be re-used for multiple deployments. The only requirement for multiple deployments sharing the same serving ID is that the model or models deployed maintain an identical input and output schema. The specified serving ID replaces the auto-generated deployment ID in the context of inferencing requests, regardless if driven via REST, WOLA, or using the new native CICS LINK API. One or more models can now be deployed to various scoring servers—cluster or standalone—across numerous LPARs, and each deployment can specify the same serving ID. This eliminates the need for look up of the auto generated deployment ID and alterations of COBOL applications as the applications progress from test to the production environment.
CICS COBOL applications will call ALNSRVSC to take advantage of the serving ID feature. While application that use WOLA interfaces, will reference a new WOLAHandler, com.ibm.ml.scoring.online.service.WOLASrvHandler. In addition, with serving ID, MLz will automatically generate the helper classes, no need to execute gen_helper_class.sh ever again (I can sense the “finally” thought that came across your mind!) and the required COBOL input data structure scheme also looks much simpler, no need to provide the input and output class names.
Figure 5: sample COBOL application that takes advantage of a deployment that set serving ID (onnxFraud)
And finally—drumroll—this latest MLz delivery includes support that enables ONNX models to leverage the latest Telum II on chip-ai accelerator. This allows for additional AI models, including encoder-based large language models (e.g. BERT, RoBERTa, etc.), to be exploited on the z/OS platform. When an ONNX model is imported into MLz instance that runs on an IBM z17 and has the latest zDNN APAR applied (one of the few things MLz doesn’t bundle so please don’t forget to install!), the model details will now show that it can run on either CPU, Telum I, or Telum II on-chip AI accelerator.
Figure 6: Model view details
Additional details regarding the new features and much more, including REST APIs for all the features introduced, can be found in the Machine Learning for IBM z/OS Enterprise Edition v3.2.0 documentation.