IBM Storage Defender

IBM Storage Defender

Early threat detection and secure data recovery

 View Only

IBM Systems Technical University - Day 2 morning

By Tony Pearson posted Fri May 26, 2017 07:43 AM

  

Originally posted by: TonyPearson


Banner-IBM-TechU-Orlando-2017

This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here is my recap of the sessions on the morning of Day 2.

Business Continuity -- The Seven Business Tiers of Business Continuity and Disaster Recovery
s014072-Business-Continuity-Orlando-v1705e-cover
Slides are available on IBM Expert
Network on [Slideshare.net]

I have been involved with Business Continuity and Disaster Recovery my entire career at IBM System Storage. However, with new workloads like Hadoop analytics and new Hybrid Cloud deployments, I thought it would be good to provide a refresh.

The need for Business Continuity and Disaster Recovery has increased recently due to (a) climate change caused by human activity, (b) ransomware and other cyber attacks, and (c) disgruntled employees.

Back in 1983, a task force of IBM clients at a GUIDE conference developed "Seven Business Continuity Tiers for Disaster Recovery", which I refer to as "BC Tiers". I divided the presentation into three sections:

  • Backup and Restore: BC tiers 1 through 3 are based on backup and restore methodologies. I explained how to backup Hadoop analytics data, all of the various options for IBM Spectrum Protect software, and how to encrypt the tape data that gets sent off premises.
     
  • Rapid Data Recovery: BC tiers 4 and 5 reduce the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) with snapshots, database journal shadowing, and IBM Cloud Object Storage.
     
  • Continuous Operations: BC tiers 6 and 7 provide data replication mirroring across locations. I covered 2-site, 3-site and 4-site configurations.

 

IBM Spectrum Virtualize - How it works - Deep dive

Barry Whyte, IBM Master Inventor and ATS for Spectrum Virtualize, covered a variety of internal topics "under the hood" of Spectrum Virtualize. This covers the SAN Volume Controller (SVC), FlashSystem V9000, Storwize V7000 and V5000 products, as well as Spectrum Virtualize sold as software.

In version 7.7, IBM raised the limits. You can now have 10,000 virtual disks per cluster, rather than 2,048 per node-pair. Also, you can now have up to 512 compressed volumes per node-pair. With the new 5U-high 92-drive expansion drawers, Storwize V7000 can now support up to 3,040 drives, and Storwize V5030 can support up to 1,520 drives.

While each Spectrum Virtualize node has redundant components, the architecture is designed to handle entire node failure. The term "I/O Group" was created to refer to the node-pair of Spectrum Virtualize engines and the set of virtual disks it manages. This made sense when virtual disks were dedicated to a single node-pair. Now, virtual disks can be assigned to multiple node-pairs, dynamically adding or removing node-pairs as needed for each virtual disk.

However, even if you have a virtual disk assigned to multiple node-pairs, only one node-pair would manage its cache, causing all other node-pairs to coordinate I/O through the cache-owning node-pair. The other node-pairs are called "access I/O groups".

The architecture allows for linear scalability, double the number of nodes, and you double your performance. Some competitors use n-way caching across four or more nodes, and it is a semi-religious argument on the pros and cons of each approach. Barry feels the 2-way caching implemented by Spectrum Virtualize is the most effective and efficient for performance.

All of the nodes are connected over IP network, but there is one designated as a "config node", and one, often the same, as a "boss node".

A cluster can have up to three physical quorum disks (either drive or mDisk) and optionally up to five IP-based quorums. The IP-based is just a Java program that runs on any server or Cloud, provided it can respond within 80 msec.

Either IP-based or physical quorum can be used for "tie-breaking" a split-brain situations. In the event there is no "active" quorum, the administrator can now serve as the tie-breaker manually. Barry recommends for Storwize clusters, where physical quorum disks are attached to a single node-pair, that you have at least one IP-based quorum for tie-breaking.

However, only physical quorum can be used for T3 Recovery. T3 Recovery happens after power outages. All of the nodes update the quorum disk with critical information of all of the virtual mappings of blocks to volumes, and this is used when bringing up the nodes again.

To protect against one pool consuming all of the cache, Spectrum Virtualize will partition the cache, and prevent any one pool from consuming more than a certain percentage of the total cache. The percentage depends on the number of pools:

Number of Pools Max percentage of any individual pool
2 66 percent
3 40 percent
4 33 percent
5 or more 25 percent

Barry explained how failover works in the event of node failure. There is voting involved, and the majority remains in the cluster. In the case of an even split, called a "split brain" situation, the quorum decides. Orphaned nodes in a node-pair go into write-through mode, since the cache is no longer mirrored.

The I/O forwarding layer has been split between upper and lower roles. The upper layer handles access I/O groups. The lower layer handles asymmetric access to drives, mDisks and arrays.

N-port ID Virtualization (NPIV) drastically improves multi-pathing. Perhaps one of the coolest improvements in awhile, NPIV allows us to assign "Virtual" WWPN to other ports. When an I/O sent to a single port fails, it retries one or more times again, then waits 30 seconds, and then invokes multi-pathing to find a completely different path to the data. With NPIV, when a port fails, its WWPN is re-assigned to a different port, so the retries are likely to be successful before having to wait 30 seconds!

Lastly, Barry covered the delicate art of Software upgrades. Software is rolled forward one node at a time, and the "cluster state" is maintained during this time.

Different presentations this week are at different technical levels. My session was meant to be an overview of the concepts of Business Continuity, independent of specific operating system platform, using specific IBM products to help illustrate specific examples. Barry's was a deep dive into a single product family.

technorati tags: , , , , , , , , , , , , , , , , , , , ,

0 comments
7 views

Permalink