IBM Destination Z - Group home

Building a Better Backup Strategy

By Destination Z posted Mon December 23, 2019 03:42 PM


Because operating without reliable backup risks corporate health and can be a profoundly career-limiting move, the most fundamental resolution for mainframe professionals is ”backup, backup, backup.” But beyond that, some may ask where to start and what to do? Challenges and opportunities to better preserve critical software and data resources divide—though not precisely—between technology and human issues.

Let’s address backup—and its indispensable partner, restore—which are separate from more complex issues of business continuity (BC), formerly called disaster planning/recovery. While critical for BC, backup/restore are hardly a complete solution for it. Consider these tips and best practices:

Technology/Logistics Tasks

1. Remember why you're doing this. Let business reasons for backup govern your decisions. Consider disaster recovery, user errors, audit/disclosure/preservation requirements.

2. Back up everything that matters. Do you know where your data is? It’s no longer just nicely boxed in server rooms. Besides servers, desktop and laptop computers, tablets and smartphones can contain essential nowhere-else data. If you’d miss it, back it up. Remember Hardware Management Console (HMC) data, and back it up regularly to a USB drive, DVD, via FTP, etc.

3. Integrate backup processing and data as much as possible. No matter why you're restoring data, it's messy and risky to have to use too many tools to recover varying format/location data.

4. Ensure backups are complete. Some utilities won't include expired files in full-volume backups, or won't write them to tape. After backup procedures are created, verify file inventories are complete.

5. Plan ahead for restoring data in a recovery center. Require vendors to provide emergency keys/codes/passwords for using their products away from home.

6. Automate. As much as possible, avoid manual steps in backing up data, documenting "what's where" for each backup and how to restore it.

7. Create duplicate/redundant/separate backups. Single backup volumes have huge capacity, so losing or damaging one can be a catastrophe. Data Facility Storage Management Subsystem’s (DFSMS) duplex option simplifies this. Don't let one bad tape volume spoil a disaster-recovery drill—or a real disaster recovery.

8. Be secure. Maintain strict control of backup media to avoid a massive data breach appearing in the other media.

9. Use offsite storage. You won't win an award for stellar backup if all data copies are destroyed at once by fire, earthquake, hurricane, flood, tornado. Use enterprise-worthy shipping, perhaps not local delivery services, and don't send duplicates together!

10. Encrypt whatever leaves your local facility. No matter how it's shipped or where it's sent, don't let "out of sight" mean "out of control."

11. Remember stored backup media when changing IT technology. Especially if you're subject to long-term retention (and retrieval) requirements, don't let older backup generations become unreadable. Include backup migration in equipment-upgrade planning.

12. Automate failure notification. Don't rely on manual detection and alerting; it's too easy for processing oddities to become routine without appropriate people knowing.

Human/Management Challenges

1. Ensure BC. Meaningful disaster planning/drill/recovery requires using standard live backup files to recreate enough production operation to remain in business. To avoid unpleasant surprises, restore and verify "everything that matters" working properly.

2. Understand varying backups. Full, incremental and differential backups have different purposes, strengths and weaknesses, as do tape, DASD, virtual tape and FlashCopy technologies. Apply them appropriately to data with special requirements such as DB2 databases, which benefit from DS6800 FlashCopy consistency groups, creating consistent point-in-time copies across multiple volumes.

3. Back up critical files especially carefully and often. What would you do without VM’s system directory, TSO’s user attributes data set (UADS), or a Resource Access Control Facility (RACF) database? Most directory management tools allow backing up directory files; it's useful and comforting to have a few copies, just in case. Always know which copy is authoritative and protect these files as critical, high-exposure data.

4. Plan backup cycles to match business needs. No backup plan or technology fits all situations. High volatility or transaction rates processing mission-critical or customer-sensitive data might need real-time offsite mirroring; ensure that it's far enough away to prevent both data centers being affected by the same incident. More leisurely environments handling fewer or more-easily reconstructed transactions might only require daily backups.

5. Test backup/restore periodically. Appearances can be deceiving; backups seeming to run normally might not be doing anything useful. Occasionally—but reliably—test all backup aspects by restoring and verifying data. This also ensures that restore processes aren't used for the first time in a crisis situation. Even if backups have worked flawlessly, that's not the time to learn how to restore data.

6. Document everything. This includes automatic and manual processes, tools used, file formats, data placements, error recovery, etc. Ensure information is current; don't let "small" changes creep in via oral tradition updates. Keep documentation duplicates onsite, at BC site, perhaps at operations or system programmers’ homes, or on keychain USB drives. Write processes as non-technical, simple checklists that someone can handle cold when seeing them for the first time.

7. Train operations and other staff on backup technologies and processes. Ensure that everyone understands not just backup's critical nature but also how data is being protected, so they're not robotically following mysterious procedures.

8. Train operators to notice and notify on oddities as well as failure/warning alarms. It's too easy for minor glitches to be ignored and grow into major problems.

9. Educate users and management in what's done and what's possible. Help them be realistic in expectations and demands. Ensure they have a voice in designing and planning backup protections. Backup/restore/BC are not purely technical issues; they're fundamental corporate and line-of-business decisions.

10. Provide user-initiated restore. Within reasonable and announced constraints, allow users to automatically restore files without technical support. Of course, ensure that only original data owners can do this.

11. Backup is not archive. Be clear that backups are not forever and that arbitrarily old data cannot be restored. If desired, provide file archiving—user-driven or automated—separate from backup.

12. Consider risks of human error or malicious behavior. Online-only backup might be vulnerable to simultaneous destruction of original data and all copies. So combining online/offline/offsite backups adds reliability, as does separation of duties requiring multiple people to perform sensitive tasks.

As mundane as managing backup is, no "Backup Professional" certification is available. It's a foundation of data center survival. It's best when never needed but potentially catastrophic when missing. Once established and verified, backup processing needn't be burdensome, as long as it's remembered and integrated into change management.

Gabe Goldberg has developed, worked with and written about technology for decades. He can be contacted at