Crash Consistent, Application Consistent and IBM Copy Services Manager (CSM)
When Spectrum Virtualize v188.8.131.52 released support for Safeguarded Copy, CSM 184.108.40.206 provided support to automatically discover Volume Groups with a Safeguarded policy. Once discovered the CSM server automatically creates a session and a scheduled task for the volume group, and schedules the backups to occur on the interval defined by the policy.
However, CSM is strictly a storage replication management solution. CSM has no knowledge of the applications and as such does not coordinate a quiesce of the application prior to creating a Safeguarded Copy backup. This means that CSM creates what is typically called a "crash consistent"
Now crash consistent copies are still consistent and valuable copies. However, because the backup is a crash consistent copy an application may need to do more work when starting up using a crash consistent image than it would if the application was quiesced before the backup was taken. For example, a database application such as DB2, is very reliant on the order of transactions and what transactions have been considered committed. So starting up from a crash consistent copy requires looking the product to look at the logs and rebuild each database with only the completed committed transactions to ensure that a database is consistent.
The other type of copy is called an "application consistent
" copy. Products that create application consistent copies know about the application and thus can quiesce the application for write i/o prior to taking the backup. By quiescing the application for write i/o (reads can continue), the application can cleanly shut down and thus applications, such as database applications, can complete all transactions prior to the backup being taken. When the application starts up from an application consistent copy, it doesn't have to resolve any issues and thus can startup much faster than it would be able to from a crash consistent copy.
You can now create Application Consistent Copies in CSM!!!
Staring with the CSM 6.3.2 release, it will now be possible to create Application Consistent copies when creating Safeguarded Copy backups. Safeguarded Copy backups are scheduled via what CSM calls a Scheduled Task. A CSM Scheduled Task is very dynamic and allows you to issue multiple commands or actions (such as waiting for a certain state) across one or more CSM sessions.
A new type of Action can not be created when you create the Scheduled Task. By choosing the Action Type "Run External Script" a user can input information that will allow them to invoke and external command or scripts as PART OF the Scheduled Task!!!
By specifying the hostname, userid and password to a server, the action will SSH to that server and issue any command that you can issue through an SSH connection specified on the Command field, which the specified user is authorized to issue.
Optionally you can specify a string in the Success String field, which the action will compare to the stdout after issuing the command. If the string appears in the stdout the action will be deemed a success. If a Success string is not specified, then the action will be considered a success as long as nothing is returned in stderr.
Here's what the Modify Action for a Run External Script type action looks like.
So now...using the above action, you can create a Scheduled Task which has the steps
- Run External Script -> SSH'ing to a server that has a script like the following which will quiesce a database for write i/o.
- Run Command Action -> Issues the Backup command to the session that maps to the Volume Group on the Spectrum Virtualize based system.
- Run External Script -> SSH'ing to a server that has a script like the following which will restart the database.
Putting all this together in the Scheduled Task looks something like the following. So whenever this task is run, the database is quiesced, the backup is taken and then the database is restarted....thus creating an application consistent
Considerations when using Crash Consistent vs. Application Consistent copies for Safeguarded Copy
Now that you can create application consistent Safeguarded Copy backups using CSM, the question then because "should you"?
There is no definite answer to this question. The answer really depends on a number of factors that are dependent on the application you're backing up, your business requirements, and ultimately just how you plan to use that backup.
But let me throw out some key points that you should consider when choosing how to take backups.
- Crash Consistent copies are designed to minimize application impact but will have longer recovery times. Because "crash consistent" copies do not need to know about the application, they are designed to provide a consistent backup as quickly as possible in order to minimize any impact to an application. Ideally, the application wouldn't even know the backup occurred. However, as discussed before, it may take longer for an application to restart using a crash consistent copy.
- Application Consistent copies will have application impact but faster recovery times. While crash consistent copies are designed to create the backup as quickly as possible, "application consistent" copies are designed to minimize the time it takes to bring up the application from that backup. In order to minimize the time to bring up the application from the backup, the application has to be quiesced, which means application consistent copies are known to have much higher application impact.
- A Safeguarded Copy solution is typically intended to provide frequent backups in order to protect against a Cyber Attack. The primary use case of a Safeguarded Copy backup is to protect the data from an internal or external malicious attack. We hope your business never has to react to a Cyber Attack, but there's a possibility you might not even notice an attack occurred right away. The further apart your backups are, the more data you stand to lose. Obviously this depends on the applications being protected, but by this definition, most Safeguarded Copy implementations are geared towards getting as many frequent backups as possible to minimize as much data loss as possible, with the hope that you never have to restore from a backup.
- Periodic and Forensic testing of your backups. While you may hope to never have to restore from a backup, it is often highly recommended that you do periodic testing on the backups. Periodically recovering a backup and testing the backup might be a key factor in determining that a cyber attack actually occurred. Can the application you're backing up, start up from a crash consistent copy fast enough to test it for either your business needs or by the next scheduled test?
- How do you intend to use the backups and how much application impact can your business handle? These may be the ultimate questions in making a decision. Can your business handle quiescing the application for as long as it takes to do the backup and as frequently as your backup needs? Or on the other side, can your application recover using an application impact fast enough for your needs in determining whether an attack has occurred?
Why stop there....what else can you do with the new CSM Scheduled Task Run External Script Action?!!!!
As you can imagine, this new Run External Script action is quite powerful. It can greatly enhance the automation that a CSM Scheduled Task provides today. The solution isn't limited to Safeguarded Copy. You could create Application consistent copies on FlashCopy relationships or remote copy relationships as well.
So what are the possibilities for this feature? Well...basically anything you can imagine. Here are just a few!
- Run post recovery system automation tasks -> CSM primarily manages storage replication. A performing a site switch to a remote site, tasks have to be run to bring your systems up at the remote site such as attaching the hosts to the secondary volumes and IPL'ing the systems. You can now setup a Scheduled Task that issues a Recover to the CSM session and the remotely invokes a script or Ansible playbook that attaches the volumes to the hosts.
- Invoke batch jobs, log collection, etc -> Invoke any number of post command tasks such as starting a set of batch jobs before or after session commands.