Parallel Sysplex on IBM Z provides a highly reliable, redundant and robust environment by clustering together multiple z/OS systems with one or more coupling facilities to achieve near-continuous availability. However, simply running your workload applications in a Parallel Sysplex doesn’t ensure continuously available workloads. Parallel Sysplex is an enabling technology. For workloads to take full advantage of the highly available properties of a Parallel Sysplex, their applications need to be able to run instances in parallel on any z/OS system in the cluster and access data in a data sharing environment.
Applications With Affinities
Applications with affinities may not be eligible to leverage the availability features of a Parallel Sysplex. A typical example of an affinity is a workload application with a dependency on non-shared data that’s only available locally on one z/OS system. Another case of an application affinity to a single z/OS system is application access to non-shared resources such as an external feed from another company.
Solutions to these types of affinities may require changes to the infrastructure or the workload applications themselves. Unless these changes are made, the workload applications aren’t continuously available. That means the workloads a company depends on could be subject to an outage if the application, database management system (DBMS) or the z/OS system itself fails. Once the outage is finally detected, the cause of the failure must be determined, and a decision as to whether to restart the workload in place or on an alternate z/OS system must be made.
From Multiple Hours to One Minute
The time an outage occurs to the time the workload is available again could span multiple hours. Although an outage is sometimes unavoidable, the time to recover from an outage can be reduced. If you have a Parallel Sysplex but you haven’t, or can’t enable data sharing for your workload applications, one solution is to utilize IBM Multi-site Workload Lifeline with an appropriate software replication product such as IBM InfoSphere Data Replication for Db2, and possibly an external load balancer—depending on how the workload is accessed.
InfoSphere Data Replication for Db2
IBM provides software replication products for three data sources that run on z/OS—Db2, IMS and VSAM. These software replication products provide a transactionally consistent copy of the data source in an alternate location. Typically, this data source copy is used as a backup in the event of a failure of the original data source, or used as a read-only copy of the data source for performing data analytics.
Multi-site Workload Lifeline
Multi-site Workload Lifeline is a product that provides workload monitoring and routing. It can monitor workloads with data sharing applications running on two Parallel Sysplexes in different data centers as well as non-data sharing applications running on two z/OS systems each in their own monoplex or within the same Parallel Sysplex. In the event of a workload failure for data sharing applications, Multi-site Workload Lifeline facilitates the routing of new workload connections or MQ messages to data sharing applications in an alternate Parallel Sysplex. For a workload failure with non-data sharing applications, Multi-site Workload Lifeline orchestrates the routing of new workload connections or MQ messages to the non-data sharing application on the alternate z/OS system. A workload failure can occur if the workload applications are no longer healthy or active, the z/OS systems where the workload applications run have failed, or there is a Parallel Sysplex outage where the workload is active.
Multi-site Workload Lifeline supports a variety of workload types that run on z/OS systems, including:
- TCP applications, such as transaction management systems like CICS or IMS. Multi-site Workload Lifeline monitors these workloads, and provides routing recommendations to external load balancers on how to distribute workload connections to these applications.
- SNA workloads. SNA applications are monitored for health and availability. Multi-site Workload Lifeline directs external load balancers to connect to a subset of gateways, such as TN3270, in order to create sessions to specific SNA applications.
- For workloads that use messaging services provided by an MQ cluster, the MQ queue managers and cluster queues are monitored. Multi-site Workload Lifeline controls how MQ messages are delivered to eligible MQ queue managers in the MQ cluster.
For these workload types, Multi-site Workload Lifeline provides system administrators with a centralized view for determining workload application status and a method for controlling how the workload connections or MQ messages are routed.
A Use Case Scenario
This section will detail out a use case scenario of Multi-site Workload Lifeline and IBM InfoSphere Data Replication for Db2 providing continuous availability in the event where a Parallel Sysplex cannot. A customer has a z/OS workload deployed in a Parallel Sysplex that consists of an application running in CICS that updates/queries Db2. Access to the CICS application is through a web browser. Due to some design restrictions, Db2 data sharing may not be used and all workload connections are processed from a single instance of the CICS application and Db2.
To provide near-continuous availability for this workload within the Parallel Sysplex without requiring application changes, the following steps can be implemented.
- Using IBM Infosphere Data Replication for Db2, create a second copy of the database on a second z/OS system in the Parallel Sysplex that is continuously replicated from the original Db2 data source. A second Db2 DBMS is used to manage this copy of the database. Neither Db2 database is enabled for data sharing.
- Ensure that a second CICS application instance is running on the z/OS system where the second Db2 is running.
- Configure the workload to Multi-site Workload Lifeline so that it can monitor both CICS application instances and z/OS systems.
- Ensure an external load balancer is configured to communicate with Multi-site Workload Lifeline. We recommend F5 Networks BIG-IP Local Traffic Manager.
Because just one CICS application can process workload connections from web browsers, the workload can only be processed from one z/OS system. Web browsers connect to the external load balancer, instead of directly to the CICS application. With Multi-site Workload Lifeline, one CICS application instance is selected as the active instance, and the external load balancer will be directed to route all workload connections to this one CICS application. Multi-site Workload Lifeline will also detect a failure of the active CICS application or z/OS system where the active CICS application is running. In the case of a failure, Multi-site Workload Lifeline will either:
- Automatically switch the workload by directing the external load balancer to route new workload connections to the alternate CICS application and its copy of Db2
- Prompt to have a controlled workload switch performed by the system administrator
The web browsers continue to connect to the external load balancer and are unaware that a different CICS application is used. This reduces the elapsed time of a workload outage to around one minute.
Robust Workload Monitoring and Management
Running a workload in a Parallel Sysplex environment doesn’t guarantee continuous availability. Depending on how the workload’s applications are designed, there may be affinities present in the application that prohibit it from participating in a data sharing environment. For this type of workload, you can provide near-continuous availability by using IBM Multi-site Workload Lifeline with the appropriate software replication product for your workload. Multi-site Workload Lifeline monitors the health and availability of your workload application and z/OS systems and coordinates the switching of your workload to an alternate workload application and data source. This type of configuration also lays the groundwork for the adoption of the GDPS Continuous Availability solution, which provides 99.999 percent availability and even more robust workload monitoring along with improved workload and systems management.
Click here for more information on Multi-site Workload Lifeline.