QRadar Console-Only Data Sync App – Failover and Failback Explained
1. Introduction:
Disaster recovery is not just about having a backup, it’s about ensuring your system can switch seamlessly when it matters most.
In this blog, you will learn how to confidently perform failover and failback in a QRadar Console-Only Data Synchronization setup, following the correct sequence and best practices.
By the end of this guide, you will be able to:
- Understand how Data Sync works between DC and DR
- Perform a controlled failover (DC → DR)
- Safely execute failback (DR → DC)
- Avoid common mistakes that can break synchronization
Important: All steps in this procedure must be performed using the admin user only, not an admin-privileged user.
This guide is based on IBM’s official documentation but explained in a simple, practical, and step-by-step manner.
Understanding the Architecture
Before jumping into action, it’s important to understand how the setup works.
- Main Site (DC) → Active production system
- Destination Site (DR) → Standby system, waiting for activation
- Data Sync App → Handles backup transfer + restore
At any time:
- One site is ACTIVE
- Other is STANDBY
Workflow (Failover and Failback)
Installation (Understanding the Foundation):
Before performing any failover or failback activity, it is important to ensure that the environment is correctly set up. In a Console-Only Data Synchronization setup, both the Main site (DC) and the Destination site (DR) must have the Data Synchronization application installed and configured properly.
The installation must be performed on both consoles, and it is critical that both systems are running the same QRadar version and the same application version. Any mismatch here can lead to synchronization failures later.
Once the application is installed, the first thing to verify is the status of the application on both sites. Both should show an “OK” status, indicating that the app is healthy and ready for operations.
At a deeper level, the configuration file (/opt/qradar/conf/dr.conf) defines the role of each system. The Main site should be marked as PRIMARY and ACTIVE, while the Destination site should be DR and STANDBY. This distinction is important because it determines how data synchronization and restoration behave during failover.
A sample dr.conf file is as below:
Verify the dr.conf file in your environment against the sample file provided above. If it does not match, it indicates that the configuration is not in the expected state. In such cases, the recommended approach is to perform a factory reset of the Data Sync App on both sites to restore a clean and consistent baseline.
Reference Link: https://www.ibm.com/docs/en/qradar-common?topic=app-implementing-factory-reset
2. Failover (Moving from DC to DR)
Failover is the process of shifting operations from the Main site to the Destination site, typically during a disaster scenario or for testing DR readiness.
Before failover:
1. Break HA on console (if configured) on both sites
2. Take On-Demand Backup on:
o Main site
o Destination site
3. Ensure:
-
- Backup is successfully transferred between sites
4. App Migration (if App Host is used)
Before initiating failover, several critical preparation steps must be completed. One of the most important is breaking High Availability (HA) if it is configured. Failover cannot proceed correctly if HA is still active, as it interferes with role transitions.
Next, you must ensure that backup synchronization is working properly. This is done by taking an on-demand backup on both the Main and Destination sites and verifying that the backup is successfully transferred between them. This step is crucial because the failover process relies on the latest backup to restore the system state.
If your deployment includes an App Host, additional care is required. All applications must be migrated from the App Host to the Console, and their data must be backed up. This ensures that application data is not lost during the transition.
Transfer the app volume backup from the Main Site (DC) Console to the Destination Site (DR) Console by running the following command on the Main Site Console:
systemctl start app_sync
Verify the backup transfer on the Destination Site Console under the directory:
/store/app_sync/backups
If the transfer is unsuccessful or encounters issues, manually copy the app volume backup from the Main Site Console:
/store/apps/backup
to the Destination Site Console:
/store/app_sync/backups
Prechecks:
Before proceeding further, running the prechecks is highly recommended. These checks validate whether the environment is ready for failover and help prevent failures during the process.
Reference Link (Performing Pre-checks):
https://www.ibm.com/support/pages/node/7244328
2.2. Step-by-Step Guide: Failover Process:
Once everything is validated, the failover is initiated from the Destination site by selecting “Activate Destination Site” in the Data Synchronization App.
On the destination QRadar Console, click Admin > Data Synchronization app.
Open the app menu and select Activate destination site.
Click Activate and then confirm the activation.
Click Ok
At this point, QRadar begins a controlled restoration process. The restoration process uses the latest synchronized backups between both sites to ensure consistency before activating the Destination site. The restoration starts on the main site first. Once that is complete, the destination site restoration begins using the latest backup from the main site. This two-step restoration ensures that both systems are aligned and consistent.
You can monitor this process in real time through the logs (/var/log/qradar.log), where restoration activity will be clearly visible.
After the restoration is complete, a full deployment must be performed to apply all configuration changes across the system.
2.3 Re-establish Pairing Between Sites
Another important step after failover is to re-establish trust between the systems by running the pairing script on both sites. This ensures that backup transfers continue to work correctly in the new state.
On the main site:
/opt/ibm/si/dr/bin/dr_create_ssh.sh -i <destination_site_ip>
On destination site:
/opt/ibm/si/dr/bin/dr_create_ssh.sh -i <main_site_ip>
2.4. Restoring the application
Before restoring application data on the destination console, ensure that the required applications are already installed on the destination site. If they are not installed, you will need to install them from the Destination Console by navigating to IBM QRadar Hub (formerly known as IBM QRadar Assistant) → Applications → Installed Extensions.
If applications were backed up earlier on the main site console, they can now be restored carefully on the Destination site console.
https://www.ibm.com/docs/en/qsip/7.5.0?topic=applications-backing-up-restoring-app-data
However, it is important not to restore the Data Synchronization App itself on the destination site console, as it maintains its own internal state.
Note: If the application is migrated to the App Host after the failover process is completed, there is no need to restore the application data, as it will already be present on the app host.
It is important to back up the existing app volume data on the Destination console to avoid any data loss or rollback issues.
The application volume backups transferred from the Main site must be available on the Destination site. If required, copy the backup data from:
/store/app_sync/backups
to:
/store/apps/backup
After ensuring the backups are in place, restore only the required applications. It is recommended to start with smaller applications and avoid restoring everything at once. For environments with multiple or large applications, consider migrating them to an App Host before performing the restore to maintain system performance and stability.
Reference Link: (Backing up and restoring app data)
https://www.ibm.com/docs/en/qsip/7.5.0?topic=applications-backing-up-restoring-app-data
During restoration, always follow the standard approach of using UUID-based restore methods as per IBM best practices. One critical point to remember is that the Data Synchronization application must not be restored on the Destination site. This application maintains its own internal state and is required for successful failback to the Main site.
If any application appears in an error state after restoration or after the failover/failback process, you can restart the applications using the following command:
/opt/qradar/support/qappmanager
https://www.ibm.com/support/pages/qradar-about-qappmanager-support-utility
2.5. License Allocation
Finally, license allocation must be reviewed. In a Console-Only setup, only license information is restored, so you must manually reassign license pools to ensure proper data processing.
Console Admin → System and License Management → Licenses → License Pool Management
At this stage, the Destination site is fully active, applications are restored correctly (excluding the Data Synchronization app), and license allocation is properly configured. This completes the failover process successfully.
3. Failback (Returning from DR to DC)
Failback is the process of restoring operations back to the Main site after the DR site has been active.
3.1 Prerequisites:
Before clicking anything, ensure:
- Break HA on both side console (if configured)
- Backup transfer is working properly
- On-demand backups are taken on both sites
- Apps are migrated from App Host to Console (if applicable)
- Prechecks are completed
Similar to failover, failback also requires careful preparation. If HA is configured, it must again be broken before starting the process. If applications are running on an App Host, they should be migrated to the DR Console, and their data should be backed up.
You must again take on-demand backups on both sites and verify that the backup transfer is functioning correctly. This ensures that the Main site will receive the latest state during restoration.
An important requirement before failback is that the Main site should be in a clean state, typically having only the Console and no additional hosts configured.
Prechecks:
Prechecks should be executed again to validate readiness.
Reference Link (Performing Pre-checks):
https://www.ibm.com/support/pages/node/7244328
3.2 Failback to the main site
Failback is initiated from the Destination site by selecting “Failback to Main Site” in the Data Synchronization App. The system then performs a series of internal validations, such as verifying backup integrity, updating configurations, and preparing data synchronization.
Once initiated, QRadar transfers (synchronizes) the latest data from the Ariel database back to the Main site. The restoration process again follows a sequence: first restoring on the Main site, followed by restoration on the Destination site.
After restoration is complete, all data sources must be redirected back to the Main site, ensuring that log flow resumes correctly.
To finalize synchronization, the seal file must be cleared on the Main site. This allows proper resynchronization of data between the systems.
The final step is to reactivate the Main site from the Data Synchronization App. Once this is done and changes are deployed, the environment returns to its original state, completing the failback process.
Step By Step Guide: Failback Process:
In the QRadar Console on the destination site, go to Admin > Data Synchronization App.
Open the app menu and select Failback to Main Site.
Click Perform Failback, and then confirm.
Click Continue.
After clicking Continue, the QRadar Data Synchronization process performs several validation steps to ensure the failback can proceed successfully. These include:
- Validating the transferred backup
- Re-validating the paired transferred backup
- Performing High Availability (HA) actions
- Saving the failback timestamp
- Verifying iptables rules on the main site
- Updating the QRadar Data Synchronization configuration
- Updating Ariel copy profiles and host mappings
Once all validations are completed successfully, click Next to proceed.
Once the process is complete, click Finish.
After clicking Finish, the Destination site begins sending data back to the Main site. You will receive a notification once the Ariel copy is completed.
Next, point all data sources that were directed to the destination site back to the main site.
After the Ariel copy is completed, the configuration restoration will begin, first on the main site, followed by the destination site.
You can verify that the restoration has started by checking the logs on the main site using the following command:
tail -f /var/log/qradar.log | grep -i restore
Once the restoration process is completed on both the main and destination consoles (first on the main site, followed by the destination site):
- Perform a full deployment on the main site
- Then perform a full deployment on the destination site
Note:
The deployment sequence must be maintained to ensure that managed hosts are properly rehomed to the main site.
3.3 Clear Seal Files (Data Resynchronization)
After the failback is completed, it is important to resynchronize the data between both sites. To achieve this, you must clear the seal files on the Main site Console. This ensures that previously copied data is synchronized properly and avoids inconsistencies.
Run the following script on the Main site Console at the same hour when the failback was completed:
/opt/ibm/si/dr/bin/dr_clear_seal_files.sh
This step is mandatory to allow proper data synchronization between the Main and Destination sites.
3.4 Restoring the Main Site
After clearing the seal files, the Main site must be reactivated from the QRadar GUI.
Navigate to:
Admin → Data Synchronization App
From the application menu, select Reactivate Main Site, then click Reactivate and proceed by clicking Next.
Once the reactivation process is initiated, go to the Admin tab on both the Main site and Destination site Consoles and click Deploy Changes. After the deployment completes successfully, the Main site becomes active again, indicating that the failback process is complete.
3.5 Re-establish Pairing Between Sites
After reactivation, the pairing connection between the Main and Destination sites is removed automatically. To restore synchronization capability for future operations, you must manually re-establish the pairing.
Run the following command on the Main site Console:
/opt/ibm/si/dr/bin/dr_create_ssh.sh -i <destination_site_ip>
Then run the following command on the Destination site Console:
/opt/ibm/si/dr/bin/dr_create_ssh.sh -i <main_site_ip>
This step ensures secure communication is re-established between both sites.
3.6 Application Restore on Main Site
Applications installed on an App Host are not automatically restored or migrated during failover or failback. If applications from the Destination site are required on the Main site, they must be reinstalled manually.
You can reinstall applications from:
Main Console → IBM QRadar Hub (QRadar Assistant) → Applications → Installed Extensions
Ensure that the application versions are identical on both sites to avoid compatibility issues.
Note: If the application is migrated to the App Host, there is no need to restore the application data, as it will already be present on the host.
Before restoring applications, take a backup of the existing app volume data on the Main site Console. Then verify that the required backup files are available. If needed, copy the application backup data from:
/store/app_sync/backups
to:
/store/apps/backup
Restore only the necessary applications, preferably starting with smaller ones. For larger environments, consider migrating applications to an App Host before restoration to maintain system performance.
During restoration, always follow the standard UUID-based restore method. Do not restore the Data Synchronization application, as it is required to maintain system state and support future failover/failback operations.
If any application enters an error state after restoration, restart it using:
/opt/qradar/support/qappmanager
3.7 License Reconfiguration
In a Console-Only setup, only the license key information is restored during failback. The managed hosts retain their existing event and flow limits defined within the license.
Because of this, you must manually reconfigure the license pool allocation to ensure proper event and flow processing.
Navigate to:
Console Admin → System and License Management → Licenses → License Pool Management
Update the license allocation as per your environment requirements.
At this stage, the Main site is fully active again, applications are restored as required, and data synchronization between sites is functioning correctly. The environment is now back to its original state, completing the failback process successfully.
4. References:
Reference for the blog: https://www.ibm.com/docs/en/qradar-common?topic=app-qradar-console-only-dr
Troubleshooting Guide Data Synchronization App: https://www.ibm.com/support/pages/node/7246280
Prechecks: https://www.ibm.com/support/pages/node/7244328
Data Synchronization App FAQs: https://www.ibm.com/support/pages/qradar-data-synchronization-app-faq
If you have any questions about the topics discussed or would like to explore these capabilities further, please feel free to reach out to us for a detailed discussion.
Author - Saket Nimdeokar (saket.nimdeokar1@ibm.com)
- Priya Thonge (priya.thonge@ibm.com)
Reviewer - Nitin Sarode (nitin.sarode@ibm.com)
- Vishal Tangadkar (vishal.tangadkar1@ibm.com