In practice, a hybrid approach often delivers the best results—combining weekly full-database backups, daily incremental or tablespace-level backups, and periodic schema exports. This strategy balances simplicity, precision, and performance.
2.5 Infrastructure Capacity
The success of any backup and recovery strategy is closely tied to the performance of the underlying infrastructure—particularly storage and network throughput. Backup and restore operations rely on reading from and writing to storage systems external to the database itself. As such, the size of the database and the throughput capacity of the surrounding infrastructure—such as storage bandwidth, network speed, and I/O performance—directly impact the ability to meet defined RPO and RTO targets. IBM’s Power PCR leverages the IBM Scale Storage Solution 6000 (SSS6000), a high-throughput, low-latency storage platform that enables continuous online backups without impacting active workloads. To meet disaster recovery SLAs, backup images should be replicated off-site, while archive logs must be written directly to an external SSS6000 by configuring LOGARCHMETH1 appropriately. Without this level of optimization, even well-designed backup strategies can degrade into slow, full-database restores, undermining the benefits of granular, point-of-failure recovery.
3. Data Solution Professionals Best Practices on Power PCR
Building on the foundational concepts of recovery objectives, physical design, and ingest methods, this section outlines recommended backup and restore practices for Data Solution Professionals (DSPs) working with Db2 Warehouse on IBM’s Power PCR. These recommendations are aligned with a Point-of-Failure Restore Strategy, which emphasizes targeted, efficient recovery over broad, full-database restores. While final decisions will be made by DSPs in collaboration with IBM, the following guidance assumes the use of database-level backups for consistency. If tablespace-level backups are preferred, the same principles apply with appropriate substitutions. To ensure robust protection against local database and data corruption on the primary Db2 database, it is recommended to implement the following best practices:
1. Local Backup Storage
Store backup image files locally within the same data center on the Scale Storage System 6000 (SSS6000) using the ESS file system. This ensures fast access and recovery in the event of local failures.
In addition, for each MLN (Member Logical Node), configure the database transaction logs to be archived to a path on the External Storage by setting the database configuration parameter LOGARCHMETH1
accordingly.
Current Setting Example:
On IIAS, LOGARCHMETH1
is currently set to:
DISK:/mnt/db2archive/archive_log/
This should be updated to a valid path on the External Storage.
2. External Backup Storage
To ensure resilience and recoverability, backup image files should be stored externally. This approach not only safeguards data against local failures but also supports enterprise-grade backup and restore strategies. Two common methods for external storage are:
(1) Using File Management Software
IBM Power PCR and all Db2 Warehouse deployments on Kubernetes are designed to integrate seamlessly with leading enterprise backup solutions. While not mandatory, it is strongly recommended to use a file management solution to streamline the handling of backup image files. Supported options include:
-
-
- IBM Storage Protect (formerly Spectrum Protect and TSM)
- Veritas NetBackup
- EMC NetWorker
These tools manage both the backup image files and the underlying storage infrastructure, offering robust scheduling, retention, and recovery capabilities. Integration with these platforms ensures that backups are consistent, secure, and aligned with enterprise data governance policies.
(2) Attaching an External Cluster File System
For organizations preferring direct external storage integration, the Power PCR supports two primary methods:
a. NFS-Attached External Storage
Network File System (NFS) is the simplest and most commonly used method. Since the Power PCR is already connected to the customer’s corporate network, NFS storage can be made available with minimal configuration.
Steps to configure NFS storage:
§ Provision NFS Storage on the corporate network.
§ Create an OpenShift StorageClass
kind: StorageClass
metadata:
name: nfs-external1-rwx-sc
parameters:
archiveOnDelete: "false"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
reclaimPolicy: Delete
volumeBindingMode: Immediate
§ Add the StorageClass to the Db2u Custom Resource:
- name: external1
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: <storage size>Gi
storageClassName: nfs-external1-rwx-sc
type: create
§ Scale Down and Scale Up Db2uInstance to apply changes:
a. Set replicas to 0
, wait for pods to terminate.
b. Restore replicas to original count (e.g., 2
), wait for pods to reach 1/1
ready state.
§ Verify Mount Point:
oc rsh <db2u-pod> bash -l
df -h
Look for /mnt/external/path1
.
§ Run Backup Command:
BACKUP DATABASE BLUDB TO /mnt/external/path1
b. External SAN-Attached Storage
For performance-intensive environments, SAN-attached storage offers significantly higher throughput—often 5 to 10 times faster than NFS. This method requires customer-supplied infrastructure:
-
-
- Dual SAN switches for high availability
- Storage controller and media
- Cabling from controller to switches and switches to each worker node
Once the SAN is powered, configured, and connected, the setup process mirrors the NFS method starting from the creation of the OpenShift StorageClass.
3. Backup Frequency
To ensure data integrity and minimize recovery time, implement a structured backup schedule combining weekly full backups with daily incremental backups:
Weekly Full Online Database Backup
Perform a full online backup of each Db2 database once per week. Db2 will back up tablespace by tablespace, with the ability to parallelize the process across multiple tablespaces and within tablespaces themselves.
In environments like Power PCR BRL+ERL, which contain 120 MLNs (Db2 engines), each MLN will be backed up in parallel, generating at least one backup image file per MLN.
Daily Incremental Online Database Backup
Between weekly full backups, schedule daily incremental backups. These backups capture only the changes since the last full or incremental backup, reducing backup time and storage usage.
Like full backups, incremental backups are performed tablespace by tablespace and support parallel processing across MLNs.
4. Disaster Recovery Replication
Replicate these backup image files to a secondary SSS6000 located in the Disaster Recovery (DR) data center.
5. Local Read/Write Operations
Configure the Db2 database to always read from and write to the local SSS6000 for backup and recovery operations. This minimizes latency and dependency on remote systems during routine operations.
6. Fallback Option
If no file management software is used, direct read/write access to the attached SSS6000 (referred to as External Storage) is still supported.
4. Real-World Example Walk-Through
To evaluate the performance of backup and restore operations on IBM’s P10 PCR system, we deployed an 100TB internal data warehouse workload called Big Data Insights (BDI) on a Base Rack Large Cloud Rack environment. The 100TB refers to the uncompressed Flatfile data size that was used to populate the tables. The database occupied 30.9TB size on disk. This setup provided a realistic scenario to observe how the system handles large-scale, parallelized Db2 backups an restores across multiple MLNs.
4.1 Schema Level Backup
The schema level backup is done via the following stored procedure call and was tested with both a 10TB BDI schema and 100TB BDI schema to determine if backup time and resulting backup image size scale linearly. The backup image is stored on the local disk.
db2 -v "call sysproc.logical_backup('-type full -schema ${schema} -path /mnt/backup/DBBackup/${schema}')"
Results:
The following graph shows that the schema level backup time and image size scale linearly from 9min 23 sec for the 10TB setup to 1h 30min and 30.6TB for the 100TB Setup.
4.2 Full Online Database Backup
In a similar fashion to the schema level backup we performed a full online database backup with only the 100TB BDI schema still in place on the database resulting in the following overall database size:
BLUDB = 30905369 MB [30.9 TB]
SUM(SCHEMA) = 30903301 MB
OTHER METADATA/CACHE/CONFIG FILES = 2068 MB
Backup command:
db2 "BACKUP DATABASE BLUDB ON ALL DBPARTITIONNUMS ONLINE TO ${BackupDir} INCLUDE LOGS WITHOUT PROMPTING"
The backup progress can be monitored using:
db2 list utilities show detail
Results:
Backup Time: 1:02:13h (32% faster than equivalent schema-level backup)
Backup Location: /mnt/backup/DBBackup/FullOnlineDBBackup100TB
Backup Size: 30TB
4.3 Full Restore Procedure
1. Step 1: Prepare the Database for the Restore Process
On each node, we disabled high-availability, terminated active connections, and restarted the database in restricted mode to ensure that only authorized administrative operations—such as restore—can be performed, preventing access by other users or applications during the process.
# Disable Stayalive Probe. This probe checks every 15 minutes to see if wolverine and / or Db2 is running. If down the probe triggers a pod restart after 15 minutes (something we want to avoid during restore).
rah 'touch /db2u/tmp/.pause_probe'
sudo wvcli system disable -m 'Disable HA before Db2 maintenance'
wait_interval
wvcli system ds
db2 terminate
db2 force application all
db2 deactivate database bludb
db2stop
ipclean -a
db2set -null DB2COMM
db2start admin mode restricted access
Duration: < 2 minutes
2. Step 2: Restore the Database on the Catalog Node (MLN 0)
db2_all '<<+0<db2 RESTORE DATABASE BLUDB \
FROM /mnt/backup/DBBackup/FullOnlineDBBackup100TB \
TAKEN AT 20250725184659 \
INTO BLUDB LOGTARGET /mnt/backup/logs \
REPLACE EXISTING WITHOUT PROMPTING'
The restore process can be monitored via:
[db2inst1@c-db2u-cr-db2u-0]$ db2 list utilities show detail
ID = 1
Type = RESTORE
Database Name = BLUDB
Member Number = 0
Description = db
Start Time = 07/28/2025 18:30:21.354600
State = Executing
Invocation Type = User
Progress Monitoring:
Completed Work = 3686903808 bytes
Start Time = 07/28/2025 18:30:21.354606
Duration (catalog): ~ 30 minutes
3. Step 3: Restore the Database on All Other Data MLNs (1..47) in parallel
db2_all '<<-0<||db2 RESTORE DATABASE BLUDB \
FROM /mnt/backup/DBBackup/FullOnlineDBBackup100TB \
TAKEN AT 20250725184659 \
INTO BLUDB LOGTARGET /mnt/backup/logs \
REPLACE EXISTING WITHOUT PROMPTING'
Duration: ~1 hour 18 minutes
4. Step 4: Roll forward to End of Backup
db2 -v "ROLLFORWARD DATABASE BLUDB TO END OF BACKUP AND COMPLETE"
Duration: 10 seconds
5. Step 5: Restart database and resume Normal Operations
db2stop force
ipclean -a
db2set DB2COMM=TCPIP,SSL
db2start
db2 activate database <DBNAME>
#activate HA
sudo wvcli system enable -m "Enable HA after Db2 maintenance"
# Enable Stayalive Probe Trigger.
rah 'rm /db2u/tmp/.pause_probe'
#Connect to database
db2 connect to bludb
4.4 System Utilization during Backup & Restore
The following NMON Visualization graphs show the system utilization on the Power PCR BRL system during the full online database backup and subsequent restore operation. The system is healthy with disk utilization averaging around 50-60% and very low CPU utilization of under 5%.
5.Conclusion
Effective backup and restore strategies are essential for maintaining data resilience in modern analytics environments. On IBM’s Power Private Cloud Rack (PCR), these strategies must be designed not only for performance but also for precision—enabling fast, targeted recovery with minimal disruption to ongoing operations. By aligning backup planning with recovery objectives, leveraging local and replicated storage, and implementing structured backup schedules, Data Solution Professionals can ensure that critical data remains protected and recoverable. The real-world example presented in this post demonstrates how these best practices translate into measurable performance and operational confidence. As data volumes and complexity continue to grow, adopting a recovery-first mindset will be key to sustaining availability, integrity, and trust in enterprise data platforms.
About the Authors
John Bell is a Distinguished Engineer and Data Warehouse Architect at IBM, with over 25 years of experience in data warehousing and analytics. He has played a pivotal role in developing IBM's data warehouse solutions, including IBM’s Power10 Private Cloud Rack reference architecture. He can be reached at john.bell@ibm.com.
Jana Wong is the principal performance focal for Data Warehouse on-premise solutions at the IBM Silicon Valley Lab, with over 15 years of experience in Databases, SQL, QA, and Project Management. She holds a Master’s in Computer Science from the University of Rostock. Recently, she led the development and automation of a benchmark kit for validating IBM's Power10 Private Cloud Rack and played a key role in evaluating the performance of reference architectures such as IIAS/Sailfish and P10 PCR. Jana can be reached at jfitzge@us.ibm.com.
Peter Kokosielis is the manager of Db2 Performance Quality Assurance, Db2 Warehouse on Power Private Cloud Rack QA and Big Data and Data Virtualization QA. He has extensive experience at IBM in Db2 LUW database performance both in OLTP and Data Warehouse settings along with deep experience in platform exploitation on Power and Intel based processing architectures, hardware accelerators, virtualization and operating systems.