AI on IBM Z & IBM LinuxONE - Group home

Leveraging ONNX Models on IBM Z and LinuxONE

  
Bringing AI projects from a pilot stage to production can be a significant challenge.

Practical implementation in production introduces a number of considerations including data access, performance, model governance, monitoring and more.  A set of these challenges stems from differences in the platform used for developing and training AI assets from the platform on which the assets are deployed. AI projects often start in a sandbox environment that is absent of the constraints and requirements that will be faced in a production environment.

As we’ve discussed in the blog, “Leveraging AI on IBM Z and LinuxONE for accelerated insights”, by Elpida Tzortzatos, a key part of the IBM Z AI strategy is to enable our clients to build and train models anywhere. This allows you to leverage existing investments in AI platforms, or to train on Z if desired. Once the model is ready for deployment, our goal is to enable simple portability to IBM Z with seamless optimization.

One of the key technologies that IBM is leveraging to meet this requirement on the IBM Z platform is the Open Neural Network Exchange, or ONNX.  In a recent post we discussed how ONNX fits into the IBM Z and LinuxONE AI strategy. In this blog, we will dive into additional details on working with ONNX models and bringing them to production on IBM Z - from converting models to the ONNX format through deploying the models for use.

ONNX Ecosystem Overview

As a standard, ONNX defines a set of common operators. These operators correspond to common machine learning and deep learning primitives such as matrix multiplication or convolution, with well over 150 operators defined.  Additionally, ONNX defines a common model file format for representing models. Together, this standardization approach enables a broad ecosystem of tools, runtimes and compilers.

ONNX establishes a streamlined path to take a project from playground to production.  With ONNX, you can start a data science project using the frameworks and libraries of your choosing, including popular frameworks such as PyTorch and TensorFlow. The model can be developed and trained leveraging these frameworks on the training platform of your choice. Once the model is trained and ready to begin the deployment journey, you would export or convert it to the ONNX format. Tools such as Netron allow inspection and exploration of an ONNX model. When it comes to running the model, there are various back-ends that can be used to test and serve ONNX models. This includes model compilers such as ONNX-MLIR, and runtimes like ONNXruntime.

Converting a Model to the ONNX format

The use of ONNX on IBM Z and LinuxONE mirrors the journey described above. This is a very critical point, as it allows a client to leverage many of the freely available open-source projects that have been created to work on ONNX models.

A typical scenario for IBM Z and LinuxONE may include the model being developed and trained on your platform of choice. While these steps can certainly be done on Z, many data scientists  have a platform or environment of choice, whether their personal work device or specialized commodity platform. In either case, we recommend that a user export or convert the model to ONNX on the platform type where the training occurred. For example, if the model was developed and trained on x86, then convert the model on x86 before deploying to Z.

In the example Jupyter notebooks provided, simple models are created to demonstrate converting a model to the ONNX format. Some frameworks (like PyTorch) allow the export of models directly to the ONNX format; the PyTorch example provided demonstrates this. TensorFlow models require the use of an open-source model convertor. In the TensorFlow example, we use tensorflow-onnx to convert the model from the TensorFlow SavedModel format to the ONNX format. The tensorflow-onnx convertor provides both command-line and python interfaces; our example demonstrates the use of the command line.

This .onnx file can be loaded in Netron to inspect the model characteristics. For our simple toy example, the model will look have one matmul (matrix multiplier) operation. It should look like this in Netron:


Netron allows you to inspect each node in the model, including inputs, outputs, weights and other characteristics. For complex models, this is a very helpful tool and provides a basis for comparison between pre and post-conversion models.

Deploying a model to IBM Z and LinuxONE

Once the model has been converted, it is ready to be deployed for testing and production use on IBM Z or LinuxONE. At this deployment stage, we must be able to leverage technology that enables us to create an inference solution that is fit for purpose for IBM Z and LinuxONE workloads. These critical workloads frequently have low latency requirements and rely on the tight colocation and interaction of various products: whether CICS and Db2 solution, IMS, or others.  AI inference solutions must be able to coexist and thrive in this environment: delivering predictions within SLA windows, while executing alongside these critical business workloads.

To meet this requirement for ONNX models, IBM Z and LinuxONE use an ONNX model compiler that builds on the ONNX-MLIR open-source project. Developed by IBM Research, this compiler uses MLIR (Multi-Level Intermediate Representation) to transform an ONNX model from a .onnx file to a highly optimized shared object library. As part of the compiler evaluation, it optimizes the graph for inference, while also generating a library that can target the latest IBM Z and LinuxONE hardware optimizations.  When output by the compiler, this library represents an optimized, lightweight, minimal program that can be invoked for inferencing based on the input ONNX model.

This compiler is available for use today - and our design intent is for the ONNX model compiler to leverage the IBM Telum capabilities once available.

Earlier this year, Watson Machine Learning for z/OS released the “IBM Watson Machine Learning for z/OS Online Scoring Community Edition”. This free trial features the ONNX model compiler capability, and additionally builds a great deal of functionality around it.  The WMLz OSCE features an easy to use graphical interface, which allows you to upload and deploy ONNX models for serving through a REST endpoint.  Since it is a trial, it is limited to deploying 10 models at a time – but this should be more than enough to validate the capability provided and get started with a proof of concept. Note that the WMLz OSCE runs on z/OS Container Extensions, and so is zIIP eligible.

Watson Machine Learning for z/OS V2.3 (WMLz) features the ability to score ONNX models, plus additional features to provide for high availability and scalable model serving, while providing solutions for the entire AI lifecycle – including AI model management and more. In addition to the ONNX model support, WMLz provides support for several traditional machine learning frameworks and libraries.

These Watson Machine Learning for z/OS offerings are focused on the z/OS client. For Linux on Z and LinuxONE clients, we are currently running a closed beta program. If you are interested in this, contact aionz@us.ibm.com for details.

ONNX is clearly an exciting technology for AI on Z, and it fits well into the broader ecosystem-based strategy. Explore the links in this blog to learn more about ONNX and try out the examples. If you are interested in hearing more about this technology or discussing a use case, let us know at aionz@us.ibm.com!

Resources: