Cloud Pak for Integration

 View Only

Exploring High Availability options for WebSphere MQ in a virtual environment

By Connor Smith posted Tue May 10, 2016 05:06 AM

  

High availability is an area in which companies are constantly exploring, and striving for, capabilities to minimize unplanned downtime and service disruption for their applications; with this question being increasingly asked in relation to a virtual environment. In this post, I will give an insight into how high availability can be realised in your virtual environment, covering the VMWare vSphere suite of tools and how these can work in harmony with a business-critical application like IBM WebSphere MQ.

vSphere: A Brief Overview

The VMWare vSphere suite of tools provide a uniform virtualization platform to manage your infrastructure, allowing you to leverage the best performance and availability for your applications. It comprises the following component layers:

  • Infrastructure Services – A set of categorized services and technologies designed to distribute, aggregate and allocate compute, storage and network resources simply and efficiently. Adeptly named vCompute, vStorage and vNetwork.
  • Application Services -  Services such as vSphere High Availability (HA), and vSphere Fault Tolerance, designed to ensure scalability, security and availability for applications.
  • vCenter Server -  A centralized management application providing a single point of access across your virtual enterprise environment for configuration and performance analysis. Accessed by VSphere client.
  • Clients – Clients such as vSphere client allow you to access your virtual environment, with vSphere web access allowing entry through a web browser.

Image 1.png

VMWare ESXi is a further crucial component to the vSphere infrastructure; a type 1 hypervisor at the virtualization layer that runs on physical servers enabling you to control and distribute resources across multiple virtual machines.

Understanding this powerful suite of tools is essential to being able to utilize them for purpose, which in this case is High Availability. Read on to find out how to exploit certain components of VSphere, in particular VSphere HA and Fault Tolerance, to accomplish high availability.

vSphere HA: Applications and considerations

 vSphere HA (High Availability) is a component of vSphere present at the Application Services layer, designed to provide uniform and automated protection for all applications, without modifications to the guest operating system or applications themselves. When configuring vSphere HA, considerations must still be made for downtime, however small it may be. A virtual machine restart could cause significant disruption to business critical applications such as IBM WebSphere MQ, with certain system, network and application considerations to be taken into account. Questions such as; “Will my Queue Manager be given the same IP address?”, “Will clients need to reconnect?” and “Will Channels or Listeners need to be restarted?”.

Using shared storage, and deployed at the cluster level, vSphere HA can detect and protect against the following three failure scenarios, with different configuration and results:

ESXi Host Failure:

image 2.png

In regards to this configuration, all ESXi hosts within a cluster will periodically send a time-configurable heartbeat (default of 15 seconds) between each other. At such a time a host fails (power outage, server crash etc.), vSphere HA will restart the affected virtual machine on other hosts.  In order to have a highly available cluster, all hosts must be configurable by the same vCenter Server, and be able to see the same shared storage, as this is where virtual machine files will reside, ready at time of failover.

Guest OS Failure:

image 3.png

In the event of guest OS failure, vSphere HA can detect and restart the affected virtual machines. The differentiator with this failure is the position of the heartbeat: it is now between the vCenter server and VMWare tools (which must be installed on the VM, as its’ heartbeat mechanism is used). In the event of an operating system ‘hang’, VMWare tools will also hang, and subsequently, when the heartbeat is lost, the virtual machine will be restarted.

Application Failure:

Image 4.png

Similar to the previous scenario, however the heartbeat this time is sent between the vCenter Server and an application. In the event the application was to ‘hang’, the heartbeat would be lost, and vSphere HA would proceed to restart the virtual machine. This HA scenario also requires VMWare tools to be installed on the virtual machine, as VMWare tools application monitoring is used to configure the heartbeat between the vCenter server and the application.

How Does This Relate to WebSphere MQ?

All three of the above failure scenarios will require a restart; whether it is a restart of the ESXi host or the virtual machine itself, this will almost definitely result in clients having to reconnect, potential manual intervention in relation to channel/listener restarts, and if the Queue Manager had been in the middle of a transaction, uncommitted messages being lost. Be aware of these considerations and the impact they may have.

 So, what if there was an option where there is no need for MQ client reconnects? Or Queue Manager restarts? Read on to explore a further high availability option in vSphere Fault Tolerance.

vSphere Fault Tolerance: Overview and Considerations

image 5.png

vSphere Fault Tolerance is a component of vSphere, working within a vSphere HA cluster, that provides continuous availability for applications in the event of server failures. Fault Tolerance is different from vSphere High Availability as it creates a live, secondary ‘standby’ copy of a virtual machine on a separate ESXi host, with a ‘heartbeat’ sent between a primary and a secondary. Instructions and events executed on the primary VM are recorded, and then replayed on the secondary VM through a process called vLockstep.

vLockstep technology guarantees the Primary and Secondary VMs execute the same instructions in an identical sequence, with the secondary VM receiving events across a VMware Fault Tolerance logging network. Outputs of the Primary VM (disk writes, network packets etc.) take effect, while the Secondary VMs output is suppressed by the ESXi host. This ensures that the secondary VM’s state is identical to the primary, and it can take over instantaneously if that all important ‘heartbeat’ is lost. A third VM is then created to compensate for the first one being lost, and then synchronised as the secondary to re-establish fault tolerance. Additionally, as both machines are identical, so are the network configurations established on both, eliminating the need for client reconnections in failure scenarios.

Applications with MQ

vSphere Fault Tolerance eliminates the need for client reconnections and Queue Manager restarts as the failover is almost instantaneous. However, take note that vSphere Fault Tolerance is designed for a physical server failure only, it cannot save you against OS or application failure!

Considerations
  • Network Bandwidth – A minimum recommended here of 1Gbe NIC between Fault Tolerant VMs, with ‘sub-millisecond’ latency also a bonus, which could cause significant strain!
  • Storage –  By implementing Fault Tolerance, and a secondary, standby VM with identical outputs, your storage needs will double. Worth taking note!
  • Fault Tolerance doesn’t provide protection against application failures – Due to the nature of Fault Tolerance, and the creation of an ‘identical’ secondary VM, any software, application or OS failures will occur on both machines, so it is important to understand what Fault Tolerance is designed for, which is protection against physical server failure.
  • VCPU requirements – Until vSphere version 6, Fault Tolerance only supported virtual machines with a single CPU configuration, this has since changed with version 6 and up to 4 VCPUs are now supported. Check your version!

 

What next?

So, with vSphere High Availability providing uniform and automated protection against host, application and operating system failure (with some downtime), and vSphere Fault Tolerance providing continuous availability against server failures (but network and storage considerations to be taken into account), what’s next? Can these components be leveraged to protect business-critical applications with minimal-zero downtime? In the next two sections, I will focus on a third party monitoring agent, Symantec ApplicationHA for WebSphere MQ.

Symantec ApplicationHA: An Overview

 VMWare have developed an API, known as the ‘third-party monitoring API’, that provides third-parties the ability to create agents that can monitor the health of an application within a virtual machine, and take the necessary steps to restart the application, or if all else fails, inform VSphere HA to restart the VM. Symantec are a prime example of a third-party that have done just that - Symantec ApplicationHA: an application monitoring agent integrated with VCenter server (via a plugin). It is worth noting there are other application monitoring agents available, and I have just focused on the Symantec ApplicationHA due to its synchronistic relationship with WebSphere MQ. 

The Application Monitoring Agent is installed within a VM, and monitors the health of an application and associated services via a ‘heartbeat’, sent between the application and the agent. A secondary heartbeat is then sent via a ‘heartbeat service group’ between the Symantec agent and VSphere HA. A heartbeat service group is a tool incorporated within Symantec ApplicationHA that intercepts the heartbeat mechanism already in place between VSphere HA and the virtual machine, and allows the heartbeat of an application to become disrupted and not initiate an entire restart of a VM. This allows Symantec ApplicationHA to restart the affected application with minimal disturbance. If Symantec ApplicationHA fails to restart the application after a number of configurable attempts, the heartbeat within the heartbeat service group is disrupted once more to allow VSphere HA to initiate a VM restart.

Symantec ApplicationHA for WebSphere MQ: High Availability for a Business-Critical Application

The Symantec ApplicationHA monitoring agent for WebSphere MQ provides high availability for all MQ Queue Managers within a virtual machine, providing the ability to start, restart and monitor Queue Managers, detect failures, and start and stop Queue Manager listeners.

Once installed, Symantec ApplicationHA for WebSphere MQ provides an easy to configure plugin, present in the vSphere client, which allows you to select the WebSphere MQ instances you would like to monitor, and whether you prefer to monitor Queue Managers, listeners or a combination of both; independently or together.

Take into consideration that the Symantec ApplicationHA will restart your Queue Manager, and potentially the virtual machine it is running on, in the event of failure. This, like vSphere High Availability, will mean client reconnects etc.

Tying it all Together

When implementing WebSphere MQ within a virtual environment, VMWare vSphere offers a powerful set of tools to provide uniform, automated protection with vSphere High Availability (effective against host, application and operating system failure), and continuous availability with vSphere Fault Tolerance, aimed at providing zero downtime in the event of server failure. Third-Party monitoring agents, such as Symantec ApplicationHA for WebSphere MQ, work in conjunction with vSphere vCenter server to provide fine-grained protection to Queue Managers and Listeners, without causing disruption to the virtual machine it is running on.

When exploring all high availability options at your disposal, important considerations must be taken into account. Implementing traditional WebSphere MQ High Availability options such as Multi-Instance Queue Managers within your virtualized environment will give you the failover speed you may be lacking with the options described in this post but will still require a client reconnection.

0 comments
16 views

Permalink