Primary Storage

 View Only

IBM Spectrum Protect High Availability with DB2 HADR and ProtecTIER native IP Replication

By Archive User posted Wed November 18, 2015 05:19 AM

  

Originally posted by: JOWALTER


by Jörg Walter, Andre Gaschler and Erik Franz

 

Preface

IBM Spectrum Protect (formerly known as Tivoli Storage Manager / TSM) is made to protect business critical data and applications, requiring continuous availability and disaster protection. High available Spectrum Protect infrastructures in conjunction with ProtecTIER’s data deduplication and replication features lead to minimal RTO/RPO times, combined with maximum space efficiency at the same time.

IBM’s ProtecTIER solution offers great data deduplication and replication features which allows for efficient replication of backup data to offsite locations, without the need to move physical tapes. In addition, the following benefits come along with ProtecTIER:

 

  • Deduplication performance of up to 2.500MB/sec. for Backup and even higher performance for restore operations
  • Single system capacity up to 1 PB physical repository size
  • LANfree backup and restore capabilities
  • Near sync replication for improved RPO
  • Support for multi-site replication requirements

 

In Spectrum Protect and ProtecTIER backup environments, the ProtecTIER deduplicates and replicates the backup data, while Spectrum Protect manages and replicates the backup catalog (meta data), which is stored on an integrated IBM DB2 database.
Combining the IP based replication features of ProtecTIER and Spectrum Protect, it is possible to design a flexible Data Protection environment with multi-site redundancy.

This article describes the setup of a multi-site redundancy backup environment using Spectrum Protect together with ProtecTIER. It is based on the experiences we made during a customer implementation and various tests in the ESCC Mainz Storage Systems Lab.

We will give you a short introduction to DB2 HADR feature and to the ProtecTIER solution. Further on, we’ll explain how a multi-site redundant backup environment based on Spectrum Protect together with ProtecTIER is designed.

 

What is DB2 HADR?

High Availability Disaster Recovery (HADR) is a data replication feature that provides a high availability solution for DB2 databases.
HADR protects against data loss by replicating changes from a source database (Primary) to a target database (Standby).

image

The following list describes why you should think about using DB2 HADR in your Spectrum Protect environment:

 

  • HADR is a standard feature of DB2, which is included with TSM beginning with version 6.x., so it is ready to use.
  • Using HADR only for DB2 bundled with TSM requires no additional licenses.
  • HADR communication is managed by the database, using standard TCP/IP networks, so there are no special requirements regarding disk subsystems or other HW or SW.
  • HADR is easy to setup and manage. Only a few commands are required to configure HADR on an existing TSM instance.
  • HADR allows to implement cluster features on an application layer, with no need for operating system cluster support.
  • HADR supports both, HA and DR scenarios:
    • HADR “sync” peers provide warm standby for HA
    • HADR “async” peers provide warm standby for DR
    • Both variants can be combined in an environment

 

Starting with DB2 v10.1, up to three HADR standby databases can be setup for a primary database. This feature is available with Spectrum Protect (TSM) v7.1, which contains DB2 v10.5.
One system needs to be designated as “Principal Standby”, while additional standby systems can be added as “Auxiliary Standby”.
All of the HADR sync modes are supported on the principal standby, but the auxiliary standbys’ synchronization mode is always SUPERASYNC mode.

 

IBM ProtecTIER at a glance

IBM ProtecTIER with Hyperfactor is a software running on Linux, providing in-line data de-duplication features for backup data (e.g. Spectrum Protect, NetBackup, etc.).
Various configurations are available, e.g.:

 

  • Small TS7620 appliance or TS7650G gateway (single node or cluster)
  • FibreChannel-attached Virtual Tape Library (VTL) emulation
  • Ethernet-attached CIFS, NFS or OST interfaces

image

 

IBM ProtecTIER with native IP Replication

IBM ProtecTIER native IP replication provides an option to replicate virtual cartridges (for VTLs) or files (for systems using the File System Interface / FSI) from one ProtecTIER system to another ProtecTIER system via standard TCP/IP networks. Due to a high grade of parallelism, this option allows to move huge amounts of backup data to an offsite location over large distances. Only the deduplicated portion of data is being replicated. The following diagram gives an overview of an IP replication scenario:

image

 

Designing a multi-site redundant backup environment

The following diagram shows a three-site replicated backup environment:

image

  • A Spectrum Protect server (HADR primary) has two HADR standby servers. The principal standby is in a second data center in the main location and acts as a failover system e.g. for hardware maintenance purpose. The auxiliary standby acts as a failover system for DR purpose.
  • Each server has a ProtecTIER TS7650G (VTL) attached.
  • Virtual cartridges are replicated from PT_A to PT_B and PT_C.

 

The following steps have to be performed to failover TSM and ProtecTIER from DC1 to DC2:

 

  1. Failover the Spectrum Protect (TSM) Application:
  • Checkout all libvolumes from the VTL library (remove=no, checklabel=no)
  • Halt TSM in DC1
  • Restart DB2 in DC1 in standby role
  • Perform DB2 HADR takeover from DC1 to DC2, monitor peering and finally start TSM in DC2

 

  1. Failover the ProtecTIER-based Storagepool(s):
  • Update the VTL library definition to “serial=autodetect”
  • Delete all Drives and all Paths to the VTL library in TSM (e.g. by using “perform libaction”)
  • Re-define the library path and all drives using the proper device names on the failover host
  • Enable “DR mode” for the ProtecTIER in DC2 to stop incoming replication traffic from DC1
  • Use the PT GUI to move the replicated cartridges to the prepared VTL partition in DC2
  • Checkin the libvolumes to the re-defined library in TSM (first checkin scratch, then private)

 

  1. Optional: Prepare for continuing operations in DC2:
  • All replicated tape cartridges are read-only, which allows to perform restores of data
  • In order to perform new backups on the failover site, create new virtual tape cartridges (readwrite)

 

The failback from DC2 to DC1 is the same procedure, vice-versa.

 

Summary

DB2 HADR offers a great approach to replicate a Spectrum Protect server database (the “Meta data”) to one or more (standby) target sites. Combined with the native IP replication feature of the IBM ProtecTIER VTL system, it is possible to build easy-to use, efficient, high available, high capacity and high performance backup solutions, which provide superior Disaster protection at the same time.

 

 










2 comments
11 views

Permalink

Comments

Tue April 18, 2017 04:48 AM

Originally posted by: JOWALTER


Hello Konstantin, unfortunately, there is no way to uncouple the TSM application from the DB2 instance. The way of doing a graceful failover (role-switch) would be: 1) Stop TSM on primary (which stops the DB2 instance, too) 2) Start the DB2 instance on the primary server again (without starting the TSM application): db2start db2 start hadr on db tsmdb1 as primary 3) Issue the takeover hadr command on the standby server db2 takeover hadr on db tsmdb1 If the primary server is unavailable (e.g. due to an outage or disaster), you can skip 1) and 2) and perform a failover (forced takover): db2 takeover hadr on db tsmdb1 by force Note that this could cause data loss if the HADR_STATE was not in PEER before.

Mon April 10, 2017 02:14 PM

Originally posted by: Konstantin_Konson


We are about to implement TSM based on HADR HA with TSA. We face an issue: when TSM goes shutdown, it implicit stops DB2 Primary instance (which is not good for normal operations). Furthermore, thereafter HADR PEER CONNECTED state is no longer established, and we cannot perform takeover hadr We have to again start DB2 Primary and then perform takeover hadr. With other words, it is not an elegant solution. Question: is there a possibility to uncouple DB2 from TSM, so that TSM does not implicit start/stop DB2? Is it a supported / recommended setup, especially for HADR? Many thanks.