IBM Z and LinuxONE - IBM Z - Group home

Improve your IT resilience with IBM Multi-site Workload Lifeline

  

Multi-site Workload Lifeline (Lifeline) is a workload monitoring and routing solution designed to intelligently load balance workloads across two centers (or sysplexes). This product is available both as an independent offering and as part of the GDPS® Continuous Availability solution. Lifeline takes IBM Z Resiliency even further by helping with a continuous availability solution, for both planned and unplanned outages. Lifeline supports the following workload types:

  • TCP/IP based workloads

  • Linux on z Systems workloads

  • SNA workloads

  • MQ cluster workloads

  • Db2 sysplex routed workloads

Components

Lifeline consists of two key components: The Lifeline Agent(s) and Lifeline Advisor.

  • Lifeline Agent(s): A monitoring program that collects health information about a site (or sysplex), along with the health of the specified applications in a workload

  • Lifeline Advisor: A coordinating program that collects the health reports by the agents, and provides load balancing recommendations to an external load balancer (such as F5)

The Lifeline Agents send server availability and health information to the primary Lifeline Advisor. The primary Lifeline Advisor provides systems administrators with a central location for determining server and state status, and controlling the routing of each configured workload.

The primary Lifeline Advisor communicates with external load balancers, if applicable, one or more Lifeline Agents, and possibly a secondary Lifeline Advisor. Lifeline Advisor uses the Server/Application State Protocol (SASP) when communicating with external load balancers.

Configurations

Each workload that is configured to Multi-site Workload Lifeline is classified as an Active/Standby or Active/Query workload.

  • An active-standby workload is active in one site. Lifeline directs load balancers to route incoming connections to the active site. When database updates are made, database software replication transmits those changes asynchronously from the active instance of the workload to the standby instance of the workload. At the standby site, the standby instance of the workload is active and ready to receive work. The updated data from the active site is applied to the database subsystem running in the standby site in near real time.

  • An active-query workload can be active in one or both sites. Lifeline provides routing recommendations to the load balancers to intelligently balance connections across both sites. When database updates are made by the associated active-standby workload, database replication latency is monitored by Lifeline to ensure connections are not routed to a site if the replicated database on that site contains data that is too out of date with the database on the active site.

    Lifeline product introduction 1Lifeline product introduction 2
    lifeline product introduction 3

Lifeline’s role in GDPS Continuous Availability

Lifeline is one of the products required for the GDPS Continuous Availability (previously called GDPS/Active-Active) solution. This solution aims to significantly reduce the time spent to recover systems in a disaster recovery situation. Lifeline is used to switch workloads between sites in the event of a planned or unplanned outage.

Active-Active refers to a multi-site workload configuration in which z/OS systems are actively running with active subsystems in more than one site at the same time. With data sharing, dynamic workload balancing, Parallel Sysplex and resource duplexing/replication, applications could freely move from one site to another. 

Lifeline plays an integral role in the GDPS Continuous Availability solution and provides the following benefits:

  • Improved performance: New connections of workloads are routed to the applications, servers, and systems most capable of processing them so that transaction response time is reduced. System resources are used more efficiently.

  • Improved availability: New connections of workload can be routed to available applications and systems when some of them down. Outages for maintenance updates or other planned events can be minimized.

  • Reduced recovery time: Reduce Recovery Time Objective from hours to minutes. With disk replication, traditional DR solutions recover on standby site by restarting systems or applications. Normally that takes hours and IT services are out for this period. With Lifeline working within GDPS/AA solution, workload can be switched to the standby site in minutes.

lifeline.png
Resources: