WebSphere Application Server & Liberty

 View Only

Minimizing Application Down-Time deployment using Application Editions in WebSphere Network Deployment

By Ming Yu posted Wed August 31, 2022 03:39 PM

  

Overview:


The Application Edition feature of Intelligent Management minimizes the down-time for managed applications. Additional capabilities within the Application Edition Feature allows for the validation and roll back of the application, allowing for full control of the application deployment life cycle.

Prior to the Application Edition feature, an update to an application would involve:

  1. Stopping application servers
  2. Uninstalling old version of the application
  3. Installing new version of the application
  4. Replicating application changes to all nodes
  5. Starting application servers

During this process, the application is unavailable from step 1 through 5.

In the case where a revert is needed to the old/previous version of the application, the above steps would need to be repeated, resulting in another long outage.

However, with Application Edition feature, these steps are simplified to the following:

  1. Install new edition of the application
  2. Replicate application changes to all nodes
  3. Roll out new edition of the application

The application may be partially unavailable during step 3 for the servers that are being upgraded, however, that time can be significantly reduced by setting the drainage interval to exceed the application session timeout interval, serializing the sessions, and enabling session replication.

In addition, reverting to the old version of the application is much more efficient: simply roll back to the old edition.

Application Edition feature of Intelligent Management provides:

  • Interruption-free rollouts of application updates
  • Ability to rollback to a previous application edition
  • Validation mode to verify function using a subset of users
  • Concurrent activation to have two editions available simultaneously

The scope of rollout can be a dynamic cluster, static cluster, or a single server.

Edition rollout operation includes:

  • Fencing server(s) from receiving additional requests
  • Quiesce requests for the application running on the server(s)
  • Stop currently active edition application in the server(s)
  • Start the new edition in the server(s)
  • Resume the flow of requests to the server(s)

 Edition rollout can be:

  • Atomic – two editions are not active at the same time
  • Group – two editions may be active at the same time

For either of the rollout strategy (atomic or grouped), the user can choose the restart granularity:

  • Soft reset (default) – restart the application
  • Hard reset – restart the server

 

Prior to rolling out the new edition of the application, Edition Management offers the ability to validate the edition, a process that allows users to install and activate the new edition in a clone cluster and optionally route requests to the cloned cluster to ensure the new edition is stable before it is updated in the production environment.  Once validated, rollout action can move the edition from validation mode to production mode.

 

Validating an edition:

Validating an edition is the process of determining if a new edition is ready to move into production and replace the current edition.

Selecting the Validate action:

  • Creates a clone of the cluster (note – the cloned cluster created has no more than two application server instances)
  • Deploys new edition to the cloned cluster and activates it
  • Users can create routing policy to control edition visibility. This is a manual step which allows users to control which requests go to the new edition thus allowing users to restrict access to new edition.
  • Rollout action can be used to move an edition from validation mode to production mode. Choosing this action will cause:
    • Edition to be deactivated on clone environment
    • Edition is then rolled out on original deployment targets. User can choose either Group or Atomic rollout strategy
    • Clone cluster is removed

User can choose Cancel Validation action to remove new edition if they decide they do not want to roll out the edition. Selecting this action will deactivate the edition on clone cluster and remove the clone cluster.

Additional information regarding validating an edition can be found at:

https://www.ibm.com/docs/was-nd/9.0.5?topic=management-validating-edition

 

When rolling out a new application edition, it is recommended, as best practice, to use the “group rollout with soft reset” path.

Implementing a group rollout strategy with soft reset will result in the least amount of application unavailability/downtime during an upgrade as there will be other servers available to service the request besides the ones being upgraded.

When using a group rollout strategy, the user can choose the group size (number of servers to upgrade concurrently). It is recommended to use larger group size during low load times for quicker rollout and smaller group size during heavier load for slower rollout but minimizing potential for performance issues. Once the new edition of the application is successfully rolled out to a group of servers, the new edition is activated and can begin to serve incoming requests while the next group of servers is being upgraded. Selecting soft reset will only stop the application during configuration update and is therefore less disruptive as opposed to hard reset, which will result in restarting the server and reloading the native code. Soft reset will result in least amount of downtime for the application as server restart can take significant amount of time based on the individual environment.

Additional information on performing a rollout on an application edition can be found at the following location in the IBM Documentation:

https://www.ibm.com/docs/was-nd/9.0.5?topic=management-performing-rollout-edition 

One common failure when rolling out a new edition of application is due to issues around timeouts. While this occurs more often when hard reset is selected during rollout process, users can mitigate this occurrence by having a better understanding of their environment and setting various timeouts based on that knowledge rather than using default timeout values.

Make sure to:

  • Default timeout for entire rollout to complete is 16 minutes
  • Tune SOAP connector request timeout to be greater than the time required to complete the rollout
  • If using admin console, set session expiration greater than the time required to complete the timeout4
  • Drainage interval (default 30 seconds) – Before stopping an application (soft reset) or the application server (hard reset) during a rollout of a new edition, Intelligent Management (IM) quiesces the application server for a maximum time equal to the drainage interval. Only requests with HTTP server affinity are routed to the application server during this drainage/quiesce period. IM checks every 15 seconds (tunable) to determine whether all sessions have expired? If all the sessions expire before the specified/configured drainage interval, the application is stopped at that time, prior to expiration of drainage interval. To determine the appropriate timeout for drainage interval, calculate the average time for some large percentage (e.g. 90+%) of sessions to expire and use this value for drainage interval.
  • Some capacity will be offline during some point in the rollout process so plan the rollout to avoid peak periods or heavy loads
  • No application placement will occur during rollout process

    For more information about deploying and managing application editions with Intelligent Management visit https://www.ibm.com/docs/was-nd/9.0.5?topic=applications-deploying-managing-application-editions-intelligent-management

     

    Troubleshooting:

    If an error is encountered during the edition rollout process that you are unable to resolve based on the error messages returned, please enable the following must gather steps for application edition on the deployment manager prior to contacting IBM Support:

    com.ibm.ws.xd.appeditionmgr.*=all:com.ibm.ws.management.application.*=all:com.ibm.websphere.management.application.*=all

    http://www-01.ibm.com/support/docview.wss?uid=swg21668088

    Please provide the trace*.log, SystemOut.log, and ffdc log files to IBM Support after recreating the failure with the above traces enabled.

    Permalink