WebSphere Application Server & Liberty

WebSphere Application Server & Liberty

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Is your production environment resilient? Part 6 - Exploring resiliency levels (1)

By Samir Nasser posted Sun March 24, 2019 11:56 AM

  

In the last three posts (Part 3, Part 4, Part 5), I addressed the various ways to discover solution resources: architecture diagrams, Java thread dumps (for Java environments), execution trace, various types of configuration, and certain monitoring tools.

In this post, I will list the various resiliency levels that can be relevant to a given software solution. Then, I will go into the details of each resiliency level. I will start addressing this topic in this post and will continue the discussion in the next post. For a software to be resilient, the following resiliency levels or types are relevant, especially for a distributed software:

  1. Resource high-availability: The basic idea here is that if a resource required by a solution becomes unavailable, a copy of the resource remains available to continue processing requests.
  2. Resource disaster recovery: The basic idea here is that if one or more resources of a software solution become(s) unavailable, the decision is to fail the whole solution over from one data center to an alternate data center to continue processing requests.
  3. Resource resiliency: The basic idea here is that the parameters of each resource of a software solution should be configured in such a way that the resiliency of the overall solution is increased. This is important as having resource high-availability alone is not good enough for resiliency improvement. I have been involved in the resiliency tuning of many solutions where they had suffered from resiliency issues even though resource high-availability was employed.
  4. Transaction high-availability: The basic idea here is that if a resource processing a transaction fails, another copy of this resource can continue processing this transaction instead of failing the transaction.

A.   Resource High-Availability

Every software resource in a software solution should be made highly-available if the software solution needs to be highly-available. This means that if a resource fails in a solution, another replica resource is automatically available to pick up the load. A resource may be made highly-available in an active/active or active/passive configuration. In the active/active configuration, two copies of the same resource run and process requests independently from each other. In the active/passive configuration, two copies of the same resource are set up. One resource processes requests while the other resource may or may not be processing some other types of requests. As soon as the active resource fails, the other resource becomes active.

We mentioned in an earlier post that resources may be of different granularity levels. Some coarse-grained resources contain other resources. For example, a WebSphere Application Server JVM is a resource that contains many other resources such as a Java heap, thread pools, connection pools, JIT compiler code cache, etc. To make all these fine-grained resources highly-available, one or more WebSphere Application Server JVMs are set up to run and process requests at the same time. This way, all resources contained within one JVM are also made highly-available because they exist within every single JVM in this highly-available JVM configuration or cluster. Although the fine-grained resources are running and processing requests at the same time, there may be one or more fine-grained resources that are running in the active/standby mode. For example, the embedded JMS messaging engine within a WebSphere Application Server JVM cluster, configured to run that messaging engine, runs in an active/standby mode, also known as 1 of N mode. So, if the WebSphere Application Server cluster contains 10 JVMs, the messaging engine would run and process requests in only one JVM. If the JVM where the messaging engine is running fails, the high-availability manager, a resource running in the JVM, decides where to run/activate this messaging engine in another JVM of the cluster.

0 comments
21 views

Permalink