AIOps on IBM Z - Group home

Best Practices for AIOps for IBM Z - Predictive workload automation

  

Introduction

Reliably scheduling, monitoring, and managing batch jobs on the mainframe is business critical to any IBM Z based enterprise. Critical jobs are processed by the scheduler, such as creating payrolls, transfer money between accounts, creating financial statements, automatically process orders in an online shop, etc. With these jobs it is important that they complete in time as defined in Service Level Agreements.

IBM Z Workload Scheduler is the proven, reliable, and highly scalable product to manage and automate this kind of workload. As the complexity of workloads is increasing, the environments are becoming more hybrid, the batch window is getting shorter, and organizations are facing skills shortages, companies are increasingly turning to AI Operations (AIOps) to harness the potential of AI-driven intelligence and automation.

A recent publishing by Sreekanth Ramakrishnan, serves as a comprehensive guide to navigate through best practices for taking a hybrid approach to AIOps for IBM Z. In this blog, I will delve deeper into the “predictive workload automation” best-practice – a capability under the Act category of our IBM Z AIOps framework which is provided by IBM Z Workload Scheduler.

Client Challenges

IT operations teams face the following main challenges in today’s environments

  • Growing complexity of workload to be managed, with dependencies between jobs running on different platforms (hybrid cloud).
  • Islands of different automations that are not integrated.
  • Shortening batch windows requiring continuous optimization of the batch execution, avoiding violation of SLA constraints.
  • Skills issues: Next generation of schedulers are not familiar with mainframe traditional interfaces and don’t have the experience to assess if critical jobs are running as expected.

IBM’s Solution

IBM Z Workload Scheduler provides end-to-end workload automation with embedded predictive scheduling for SLA management cross enterprise.

In addition to automating jobs on the mainframe, it provides a single point of control to centrally manage cross-enterprise, heterogeneous workloads to support business goals and service levels by driving workloads according to business policies. The intuitive interface (Dynamic Workload Console) helps users model, manage and monitor their workloads with enhanced graphical views, embedded analytics and customizable dashboards.

Let’s take a look at some of the advanced features of IBM Z Workload Scheduler:

Embedded predictive analytics to identify risk of SLA violations

The product has embedded analytics capabilities to predict if critical jobs are at risk to miss a service level agreement defined milestone. This feature, which is called “Workload Service Assurance” constantly monitors the critical job path and estimates when it will finish. The result of the estimated duration versus the expected end time of the critical job defines the risk level of meeting or not meeting defined milestones. In case of delays that impact the critical workload, users can be notified, and the product can automatically change schedules and execution priorities.

Leveraging a built-in integration with IBM Workload Manager for z/OS (WLM), the product is able to automatically change the WLM service class of the job, if the job is at risk to violate an SLA milestone.

On top of that, IBM Z Workload Scheduler provides “What-if” simulation dashboards in the Dynamic Workload Console that allows the user to find out the impact of changes to critical jobs. For example, the What-if dashboard can be used to visualize the impact to the critical path if an internal payroll job arrives later than expected, or if new dependent jobs are added or removed, etc.

AI powered anomaly detection

A new AI Data Advisory (AIDA) component has been added to the product. It allows to detect anomalies in workload execution early by analyzing historic job execution data. Periodically the AI component reads data, predicts trends and detects anomalies. If an anomaly is detected, it sends an alert to the Dynamic Workload Console (DWC) user, or alternatively through other notification means such as e-mail.

From the alert, the user can decide to launch into a historical data analysis user interface (UI), which provides the data needed to analyze the situation and identify the root cause of the issue.

With this information a user can take actions before any critical impact.

Automation Hub: Wide variety of advanced job types available to integrate cloud and container environments

The world is going hybrid and the mainframe is at the center of it. It is important for a scheduler product to be able to include workload running off-platform, e.g. in cloud environments. IBM Z Workload Scheduler is able to control this new workload from a controller running on z/OS by making use of zCentric agents and plugins from the automation hub.

The automation hub is a web portal that provides access to a variety of job types available out-of-the-box at no additional cost. More than 100 different plug-ins and integrations to include in your job-streams are available on the automation hub, spanning from file-transfer tools, databases, Robotic Process Automation (RPA), cloud services, Enterprise Resource Planning (ERP) Process Orchestration, IT Service Management, data center automation, DevOps, and more. 

Any IBM Workload Scheduler user can navigate to the web portal, select the plugin of interest, download it, and start using it. For example, clients can use plugins to orchestrate container deployment or run jobs on a Kubernetes cluster, or they can submit and track Ansible playbooks as part of a ZWS controlled job stream.

Integration into analytics and observability platforms

Workload automation data is exposed as metrics based on standards such as OpenMetrics for easy integration in observability platforms such as IBM Instana.

IBM Z Workload Scheduler also integrates with the Z Common Data Provider, which is, like IBM Z Workload Scheduler, part of the IBM Z Service Automation Suite. Through this integration it is possible to easily stream scheduling log and messages to the analytics platforms that Z Common Data Provider supports, for example to Splunk® or Elasticsearch®.

Z ChatOps and Service Management Unite integration

IBM Service Management Unite (SMU) is a web-based and highly customizable dashboard user interface that brings mainframe management information and tasks from disparate sources into a single environment. SMU visualizes and consolidates information from products such as IBM Z System Automation, IBM Z NetView, OMEGAMON, and IBM Z Workload Scheduler enabling next generation z operators to quickly identify, isolate and resolve problems.

IBM Z ChatOps provides a chatbot that gives users access to information and tasks from these Z AIOps tools within popular collaboration platforms like Slack, Microsoft Teams, and Mattermost.

Both, SMU and IBM Z ChatOps, are also included in the IBM Z Service Automation Suite and IBM Z Workload Scheduler has an out-of-the-box integration with SMU and Z ChatOps:

The Workload Scheduler dashboard available in SMU provides an “at a glance” view of scheduling objects and metrics and provides seamless navigation to the Dynamic Workload Console.

With IBM Z ChatOps, users can be notified in an enterprise chat platform about alerts from IBM Z Workload Scheduler, for example if a critical job is late. In addition, chat users can use the chatbot to perform Z Workload Scheduler queries, such as viewing the workstations and their status, listing jobs in error or jobs of a particular job-stream, and much more. On top of that, they can even take actions through the chatbot, for example, restart a failed job, without having to leave the team’s chat channel.

This ChatOps integration fosters collaboration during incident management and helps to isolate and resolve problems quickly.

To learn more about this collaborative incident remediation, also read this blog.

Zowe CLI and REST API

Zowe is an open-source project created to host technologies that provide value to the IBM Z platform and provides modern interfaces to interact with z/OS.

Two core capabilities provided as part of the Zowe framework are the API mediation layer and the command line interface.

The API Mediation Layer provides a gateway and catalog for REST APIs to access z/OS services. Base Zowe provides core services for working with MVS Data Sets, JES, as well as working with z/OSMF REST APIs. IBM Z Workload Scheduler provides a full set of REST APIs to interact with the product. This REST API can be included into the Zowe REST API catalog and accessed through the API mediation layer, too.

Zowe CLI provides a command-line interface that lets you interact with the mainframe remotely and use common tools such as Integrated Development Environments (IDEs), shell commands, bash scripts, and build tools for mainframe development. The Command Line Interface (CLI) provides a core set of commands for working with data sets, UNIX System Services (USS), JES, as well as issuing TSO and console commands. In addition, many Zowe CLI plug-ins exist for different z/OS products. One of them is the IBM Z Workload Scheduler Zowe CLI plugin, which enables users to interact with IZWS through the Zowe command line interface.

Self-service Catalog with mobile interface

IBM Z Workload Scheduler provides a mobile-enabled Self-Service Catalog. Self-Service Catalog is a solution to automate routine business tasks (services) and allow business users to run them from mobile devices or a web interface without the need to have Workload Scheduler knowledge.

The IBM Workload Scheduler administrator or application designer creates annotations in the Dynamic Workload Console and marks jobs as services, so that they are available for managing from the Self-Service Catalog interface.

Each Service is associated to an IBM Z Workload Scheduler Job Stream and, optionally, to a variable table, a collection of parameters used by the Job Stream. If a variable table is used, business users are prompted to fill parameters during the service submission phase.

By integrating Jira or ServiceNow jobs into a job stream that open a ticket, it is even possible to integrate human approvals into a service workflow. The self-service catalog UI will show that the service is “waiting for approval” until the ticket is closed.

Where to learn more

Regardless of where you are on your journey to adopting AIOps within your IBM Z environment there are many resources available to you to learn more. If you have not yet read the new AIOps for IBM Z handbook, this is a great place to start to understand the concepts and technologies around AIOps. 

To learn more about IBM Z Workload Scheduler, you can explore the product page.

To learn more about IBM Z Service Automation Suite, which bundles IBM Z Workload Scheduler, IBM Z System Automation, IBM Z NetView, Service Management Unite, Z ChatOps, and Z Common Data Provider, check out its product page.

Finally, if you have any questions on this topic please reach out to me via email, or post a question within the AIOps on IBM Z Community.