Data Integration

Connect with experts and peers to elevate technical expertise, solve problems and share insights.

View Only

Back to discussions

Expand all | Collapse all

CDC and disaster recovery

1. CDC and disaster recovery

Like
Guy Przytula
Posted Tue February 22, 2022 10:49 AM

Reply
Just wanted to know if any good ideas for disaster recovery and CDC
we have linux servers with db2 luw in HADR/TSA mode
on both servers we mounted an nfs f/s on which we installed CDC
we integrated CDC in tsa with custom scripts (like db2) to be dependent of database. so if db takes over, cdc will take over also.
It is working as it should but :
we noticed that the future of tsa is limited and will be replaced by pacemaker.
I have looked /installed it but looking at discussions, integrate CD will almost be impossible.
Is anybody using this or any other solution that could suit for a hadr implementation ?
how do you protect yourself against DR ?
we export all subscriptions - Access server user/datastores - datastores configs - backup of cdc metadata
but what needs more to be saved ? example bookmark table.
as this table is always in use - is it a good idea to export these tables ?
what if a crash occurred and you need to re-install cdc and create instances, reload MD and subs : do you need full refresh for all tables ?
we have about 350 subs with a total of 2000 tables replicated from central db to 5 distributed databases - in both directions ..
any ideas/link for DR would be welcome

------------------------------
Thanks for all answers
Best Regards,
Guy Przytula
------------------------------

#DataIntegration
#DataReplication
2. RE: CDC and disaster recovery

Like
Robert Philo
Posted Wed February 23, 2022 04:57 AM

Reply
Guy

As far as I knew until I read your post, TSAMP etc was the way to integrate CDC into DB2 HADR. I will have to enquire internally on Pacemaker

Having shareable disk is always a good start for CDC HA/DR as it means that the standby system has the control of the authoritative CDC installation (binaries, metadata, etc.) in the event of a failover. The other essential in all scenarios is to have the CDC server accessible by a virtual IP address which is owned by whichever server is acting as the active CDC host. IP address and hostname changes can be accommodated at failover but the procedure is more complex (and in a scripted failover would require chcclp scripting to change the datastore hostname property and propagate that into the metadata)

The other essential thing of course is the bookmark. As for a target database, the bookmark is held in the database itself, normally in a failover it is not a problem. If the target database is up-to-date on the standby then a restart after a failover will resume from where replication left off. If there is no asynchronous replication between the primary and standby database instances, or you are in a situation where a hard failure has resulted in data loss, then the bookmark is still in line with the state of the database as reflected in the bookmark, even if the bookmark is for a previous database state on the source. As long as the source transaction logs are available (and maybe the source staging store cleared) replication will restart.

Obviously you will need a backup strategy as well to underpin whatever HA is in place. A regular back-up of the CDC metadata held in the Pointbase database under the instance is very useful - using dmbackupmd command (to be run when the instance is active). This creates a copy of the metadata database (mdb and wal files) into conf/backup bxx where xx is an incrementing directory, and this latest copy can be backed up on some regular archive or save media. Just be aware that if you use the copy to restore (simply overwrite the existing files in instance/<instance name>/conf to restore) and you have had mapping changes around that time you may end up with inconsistencies between source and target metadata unless you have a backup of the source metadata from the same time relative to the mapping change. But this will enable to recover from a situation where a hard failure (such as running out of disk on the CDC file system) has corrupted the Pointbase database. Note that the other Pointbase databases in the instance directory (events, statistics etc are sub-critical and if corrupted can simply be deleted to allow the instance to start and create empty databases.

Of course you will also want to back up the full installation directory both as a regular (say weekly) occurrence and specifically before (and after) applying a new build of CDC. In theory the instance should be inactive, but in practice I don't think it really matters too much.

Finally I would recommend a regular export of subscriptions, In a production environment there should not be any DDL changes done ad hoc, and if there are you will probably find that replication stops anyway. You can also generate a chcclp script if you want to minimise the sensitivity of the script to column details as I mentioned in my response to your other current post.

So with all these restore options available, what is the procedure in case of a devastating crash? I suppose something like the following:
Capture the target bookmarks from the target database as I noted in my earlier response you can query the target database for this
On the source you may want to ensure that the earliest open position in the current target bookmark is still available from the archived transaction logs. The ddmdecodebookmark command will convert the bookmark into a series of LSN, SCN's etc. Database dependent tools can then be used to relate the log position to archive log files.
Restore the source or target CDC environment as necessary either by a re-install or restore/copy back of your installation directory (and instance directory if not under the instance directory) If necessary supplement this by restoring the metadata database or recreating the subscriptions from XML import or chcclp script
Perform mark table capture point for all tables in all subscriptions to ensure that none are in refresh status (unless you need to refresh - because the required logs are not available on the source for example)
Run the dmsetbookmark command on the source using the bookmarks got from the target in all subscriptions (unless performing a full refresh or external unload/load)
Start replicating

You have a complex environment so the sort of recovery process done as a last resort across the entire CDC implementation would probably take a day or too to kick off (plus any additional time for the refresh or external unload/load). In your situation I would plan out how I would tackle it, document it and get business buy-in, stressing that this is only preparing for the very worst and extreme scenario. Then you can be sure that as long as your extreme recovery plan is up-to-date, things will go as smoothly and as predictably as possible.

of course we hope that the automated HA/DR procedures you have in place will be sufficient and the CDC failover should then take a few minutes.

Hope this helps

Robert

------------------------------
Robert Philo
------------------------------

Original Message
3. RE: CDC and disaster recovery

Like
Guy Przytula
Posted Thu February 24, 2022 02:57 AM

Reply
many many thanks for all comments/ideas.. good to see that new commands exist, we were not aware of, to help in case of emergency
I will keep these notes and start documenting a dr .. we have a dev/test system where we can do some testing also
we are using dmbackupmd on daily basis. export subs is done on weekly interval. I will schedule export of ts_bookmark also.
these plants we have, are very depending on the data they need, so we have to protect our self as much as possible.

------------------------------
Thanks for all answers
Best Regards,
Guy Przytula
------------------------------

Original Message

Data Integration

Data Integration

CDC and disaster recovery

Guy PrzytulaTue February 22, 2022 10:49 AM

Robert PhiloWed February 23, 2022 04:57 AM

Guy PrzytulaThu February 24, 2022 02:57 AM

1. CDC and disaster recovery

2. RE: CDC and disaster recovery

3. RE: CDC and disaster recovery

Additional
Resources

Office

Quick Links

Data Integration

Data Integration

CDC and disaster recovery

Guy PrzytulaTue February 22, 2022 10:49 AM

Robert PhiloWed February 23, 2022 04:57 AM

Guy PrzytulaThu February 24, 2022 02:57 AM

1. CDC and disaster recovery

2. RE: CDC and disaster recovery

3. RE: CDC and disaster recovery

Additional Resources

Office

Quick Links

Additional
Resources