Last year IBM super-charged its FlashSystem A9000 & A9000R all-flash enterprise block storage offering with
HyperSwap, allowing customers to exceed the built-in five-nines availability, as far as getting IBM guarantee for
zero interruption to data availability. With HyperSwap 2-copy active-active data access technology, customers can stretch any individual volume or consistency group across any two A9000/R systems, to protect against failure of one of the systems. FlashSystem A9000/R HyperSwap and replication capabilities provide a range of data protection capabilities that allows enterprise organizations to pick the right level of protection, down to every individual volume or consistency group.
This year, we’re enhancing FlashSystem A9000 & A9000R data protection, such that customers will be able to protect applications with HyperSwap high-availability,
and strengthen that protection with a 3
rd copy of the data for disaster recovery. This enhancement applies to all the family: A9000 or A9000R offerings, models 415, 425 and U25. It also supports any mix of these, in one solution.
The following graphic depicts a simplified* topology: 3 systems - each one could be A9000 or A9000R - with 3 relations: HyperSwap, asynchronous replication, and a Standby asynchronous replication.
*To keep the description simple, the graphic does not include the HyperSwap Quorum Witness, which is a lightweight application that is used by HyperSwap to avoid brain-splits and coordinate a transparent failover. The Quorum Witness is IP-connected to both of the systems that participate in the HyperSwap relation. By best practice the Quorum Witness should be located separately from the Primary and Secondary systems. The graphic also does not depict the connected hosts. For more detailed information about HyperSwap topology, check this post.
The two systems in the front provide HyperSwap protection, so when one of them fails, data will continue to be served by the other system non-disruptively. The asynchronous replication makes sure the copy on the 3
rd system is always updated at a specified RPO (Recovery Point Objective). If the system on the left failed, the asynchronous replication would have continued
non-disruptively between the system on the right, and the 3
rd system, since the Standby replication would be activated. In other words, the asynchronous replication itself is protected.
Let’s check out the key data protection sequences, first for the high-availability protection tier, and then for disaster recovery.
High Availability
In the first picture below, the two system at the top maintain a HyperSwap relation between them: On the left system, volume (or consistency group) A is designated as the
Primary one. On the right system, volume (or consistency group) B is designated as the
Secondary one. Together A and B are a stretch-volume: An active-active pair of volumes that have the same ID and contain the same data. With the new multi-site high-availability & disaster recovery capability, volume A is also replicated asynchronously to
volume C, which is naturally called the
Tertiary copy. Between volumes B & C, the
Standby asynchronous replication is set and idle.
Note: The designation of Primary, Secondary and Tertiary is attributed to individual volumes or consistency groups (CG) that participate in a single multi-site relation. Every system can have multiple volumes (CG’s) with different roles for each, and have different multi-site relations with up to 10 other systems. To keep the description readable, when the following text refers to a “Primary (or Secondary) system”, it will actually mean “the system that currently holds the Primary (or Secondary) volume or consistency group that are the subject of this example”. Similarly when the text refer to "System A", it will actually mean "the system that holds volume (or CG) A".
Let’s assume that the Primary system becomes unavailable due to a sudden data center power failure. As with HyperSwap, I/O continuous to be served to connected hosts non-disruptively, since the Secondary system is available. The next picture shows that when A becomes unavailable, data is not replicated between A and B, and between A and C.
Within seconds, the Secondary system will determine that there is a real failure in the Primary system, and change the role of volume B to Primary. System A, if it's alive, will not serve I/O in order to avoid a brain-split. This is HyperSwap as we know it. However in the multi-site high-availability & disaster recovery solution, HyperSwap logic does
one more thing at the same time: System B will instruct System C to block any data replicated from A, and activate the Standby replication from B to C. Hence
data replication to C is swiftly resumed, without disruption. Later during the recovery process, System A will assume the Secondary role. Eventually it will all look as in the next picture.
Once A is revived, it can be re-synchronized with B, to restore
high-availability protection. Throughout the proceeding so far, the
hosts continued to operate non-disruptively, and replication from B to C kept the
3rd copy current and ready for any improbable disaster. The
original designation of Primary and Secondary roles
can be restored anytime,
non-disruptively, with a single switch role command. Back to all-green!!
To summarize the high-availability protection scenario, no single-system failure can disrupt the protected applications. Moreover, unless the Tertiary system is not accessible, the solution ensures that the 3
rd copy is always current in accordance with its RPO.
So far so good, but what if both A and B fail, or become not accessible, at once?
Disaster Recovery
The likelihood of such scenario is very, very low. The best high-availability practice requires that systems A and B will be at two failure domains, such that a single power failure or a single network failure will not impact both systems. With that level of protection, if it depended on the A9000/R systems only, the probability that both A and B will fail at the same time would be (1-0.99999)
2. Do the math: that’s
ten-nines availability. Yet other data center infrastructure such as network or power may have lower resiliency. In addition there are always applications that require the utmost protection - for example to adhere with the financial industry strict compliance rules. For these cases, it is imperative to also have a 3
rd copy that is ready for the worst.
The next picture shows the result of such extremely unlikely situation: Both A and B are down or not accessible, there’s no HyperSwap activity between A and B, and no replication to C. Not by luck, however,
C contains the most recent data. That’s what the 3
rd copy is for.
As soon as possible, and while A and B go through recovery operations, volume C is mounted, and mapped to standby servers and applications that (most likely) are started in the same site, to recover the operation. As new data is generated, it is written to volume C. When A is revived, asynchronous replication can be established from C to A. Most likely, A preserved its data after the failure, so using A9000/R offline initialization, only new data that was added to C past the disaster will have to flow back to A.
When the data on A is fully updated and RPO-compliant, the asynchronous replication is reversed – data now flowing from A to C - and the original servers on the main site can be mapped again to A, in order to restore the operation of the application at the main-site. The disaster has been recovered, and readiness for future disaster has been restored. However, high-availability protection still needs to be re-established.
To restore high-availability a multi-site configuration is
non-disruptively assembled from the existing asynchronous A-to-C replication. Initially, volume B will be out of sync, but thanks again to offline initialization, only the new data will replicate from A to B. When synchronization of volume B is complete, it’s all-green again.
To summarize the solution for the worst-case scenarios, even in the incredibly-unlikely case of double-system failure, applications may still continue non-disruptively if access to one of the HyperSwap systems survived, or get recovered in the DR site using the 3
rd copy, if only the Tertiary system survived.
Same great performance as with (2-copy) HyperSwap
FlashSystem A9000/R offerings carry on the DNA of the Spectrum Accelerate family. A major part of that DNA is the grid-architecture, which delivers consistent and high performance. For a non-replicated volume on a Flash System A9000/R system - assuming a mainstream example of 50/50 read/write mix of 32K blocks, and 80% cache hit - the latency expectation should be in the range of ~0.5-0.9 millisecond, depending on the system utilization. When a regular (2-copy) HyperSwap protection is applied to the volume, latency will increase by (~0.15 millisecond HyperSwap overhead) + (the latency of the network connecting the peer systems). The network latency is added since HyperSwap uses synchronous replication to maintain data consistency, so every write I/O must be acknowledged by the remote system. Again, for more information about HyperSwap, please check
this blog post.
With FlashSystem A9000/R multi-site high-availability & disaster recovery, hosts connected to the HyperSwap peer systems will experience the
same I/O performance as in a regular 2-copy HyperSwap deployment. In other words, when elevating the protection from HyperSwap to multi-site high-availability & disaster recovery, there will be
no performance impact. The reason is that the 3
rd copy is updated
asynchronously, so the HyperSwap-connected hosts receive their response regardless of the 3
rd system acknowledgement. That also means that if access to the 3
rd system fails for any reason, while the 3
rd copy RPO will be compromised, there would be no performance impact to the hosts.
Simplicity
Simplicity is also part of the Spectrum Accelerate family DNA. Whenever a new capability is added to A9000/R, it will be based on the simplest possible design, and further simplified via the native management software, the Hyper-Scale Manager. The capability discussed in this post is not different: It is based on existing HyperSwap and replication capabilities, so no new skill is required. It is further simplified with unified control that treats the new triple-system topology as
one managed multi-site relation covering 3 systems, 3 relations, and 3 volumes or consistency groups, while showing the health of these underlying entities.
The following screenshot from the new HyperScale Manager shows a one multi-site relation that groups 3 sub-relations and their details on the left. On the right side, the health status is shown for the complete multi-site relation on one tab, and additional tabs for the three underlying relations: HyperSwap, asynchronous replication, and Standby asynchronous replication.
In the next screenshot, a volume view is provided. The volumes of the multi-site relation are grouped together with their details, and when one of them is selected, its status and role are visible on the right.
Next up: Availability, deployment, more simplicity, no special licensing, and the Grid Starter
At the time of writing, new FlashSystem A9000 and A9000R systems are shipped pre-loaded with software version 12.3, which means they have multi-site high-availability & disaster recovery already built in. Existing systems can be upgraded to version 12.3, to become part of a multi-site relation. Like any capability of this family, it’s already included in the license, so no special ordering or extra cost is involved. Simple. As already mentioned above, multi-site high-availability & disaster recovery is applicable to the former 415 models, and you can mix 415 and 425 systems, as well as A9000 and A9000R offerings, in one such solution, thus protecting your existing data,
and your existing A9000/R investment.
Protection is typically applied in tiers: Different applications get different protection. It makes sense that fewer applications require HyperSwap vs those that require simple replication. Likewise, fewer applications will require multi-site high-availability & disaster recovery, vs HyperSwap alone. Thus it makes sense that adding multi-site high-availability & disaster recovery or just HyperSwap to your operation, requires one (or few, depending on your protection level and operation) smaller-capacity pod (A9000) or rack (A9000R) systems. That's an excellent match with the new
lowest entry-level configuration for A9000R,
aka the Grid Starter: A single flash-enclosure configuration that is scale-able. That is
half the capacity of the former entry-level A9000R configuration. Definitely more choices to reconcile your protection needs vs shrinking budgets.
After checking all the boxes above, there are actual running applications to handle.
Elevating the protection level of existing applications to multi-site high-availability and disaster recovery is a
non-disruptive process. In addition
, if the data is already available at the target site, full protection will be achieved fast:
- If an application is protected by synchronous replication, it can be upgraded to (as opposed to replaced by) HyperSwap.
- If the application already protected by HyperSwap, or by asynchronous replication, it can be elevated to multi-site high-availability and disaster recovery.
- Anyway, if a copy of the data is available on the Secondary and/or Tertiary site, A9000/R offline-initialization will ensure that only newer data will have to be replicated.
More information
In a near-future post, I will consolidate all the information available from IBM on FlashSystem A9000 & A9000R multi-site high-availability & disaster recovery. In the meantime, check the
announce: Don’t miss the additional v12.3 enhancements, and the statement of general direction we’ve disclosed there. If you have immediate questions, contact your IBM sales representative or partner.