DB2 HADR is a Disaster Recovery solution which uses "log shipping" to keep the "remote" instance in sync. Slow HADR can cause problems upstream and stall or slow down applications to a point where it becomes unusable. Since, this solution has a lot of moving parts, it becomes difficult to diagnose and troubleshoot the issues. In this blog entry, I'll try and address that.
In order to understand a slow HADR system, knowledge of HADR architecture and the current HADR sync mode is an absolute must. Let me try and explain both of these as I understand it :-)
First, let me start with Sync modes:
The sync modes can be set in the DB Config and they are:
1) SYNC -- Both Primary AND Standby are completely in sync i.e. a transaction is deemed committed ONLY if it has made it into the disks of both P and S Under the hood, the commit is written to disk on P, log buffer is sent to S, S writes to disk, sends ACK back to P.
2) NEARSYNC ( Default ) -- A transaction is deemed committed when P gets an ACK from S that it has received the log buffer. Under the hood, the log buffer is sent to the db2hadrp thread to ship to the S server when a commit is going to happen on P. If network is fast enough, the overhead of getting an ACK is minimal and committing on P and sending log buffer to S is almost simultaneous.
3) ASYNC -- A transaction is committed on P and also at the same time log buffer is sent across to S. The P does not have to wait for an ACK from S to commit.
4) SUPERASYNC - The transaction log writing and shipping of logs are totally independent. The two servers P and S will never be in PEER state. The S server will always be in Remote Catchup State.
In order to understand the sync modes in detail please read this excellent blog.
Next thing is to know which thread ( EDU ) does what within the engine
On the Primary side:
- db2hadrp -- Main HADR thread. It is responsible for log shipping and receiving ACKs from standby
- db2loggw -- This is the normal Log Writer thread which writes to the logs.
- db2lfr -- This is the thread which reads the Log Files ( required in SuperAsync mode )
On the Standby side:
- db2hadrs -- The main HADR thread which receives the logs from primary and sends back the ACKs to primary. This thread is responsible to write the log record to Standby logs as well. There is one such thread per database
- db2shred -- This is the shredder thread which receives the log buffer from db2hadrs and shreds them into individual log records to be applied to the tablespaces. There is one such thread per database
- db2redom -- This is the Redo Master thread which receives the log records from db2shred and manages the log records. There is one such thread per database
- db2redow -- This is the Redo Worker threads which are assigned the records by db2redom thread. This thread replays the log records into the tablespaces. There are multiple such threads per database.
- db2lfr -- This is the log file reader which is sometimes required to read old log files and feed the records to the db2shred thread.
With the above knowledge in hand, you can attempt to figure out what is happening to slow down HADR.
For example, in NEARSYNC mode if you are seeing a slowdown on the Primary server ( symptom should be higher number of applications in CommitActive state ) you know that an ACK is required from the Standby to commit a transaction on the Primary, so either the Standby is slow in sending the Ack due to whatever reason or there is something on the wire which is slowing down the ACK packet being sent. You know the db2hadrs thread and db2hadrp thread are involved in that exchange, so you can take a quick db2trc of the EDUs ( db2trc on -l 512m -t -p <db2sysc pid>.<db2hadrp|db2hadrs EDU id for AIX or Thread id for Linux> ) and then see the flow of how P is sending the packet to S and how quickly S is sending the ACK and when does P receive it. You don't really require to have source code knowledge to get this flow ( the function names are intuitive ). You can then figure out where is the slow down i.e. is the Standby slow in sending the ACK or is P receiving it much later than after S had sent it meaning there is something on the wire which requires tuning.
Ok, now that you have kind of an idea of what can be done, let us get down to the nitty gritty.
What are the tools in hand to debug or setup HADR
To test network bandwidth, ideal LOGBUFSZ, TCP Send and Receive buffer sizes, HADR Buffer size
- simhadr ( Read this blog for more information )
( I feel using this tool before hand will ensure that HADR is tuned optimally right from the beginning rather than scrambling when an issue is hit. In my opinion anyone who is planning to set up HADR should use this tool first and then go about setting up HADR ).
OS
- vmstat
- iostat
- netstat
- tprof ( AIX ), perf or oprofile ( Linux ) -- when CPU usage spikes up on the server
Db2
- db2pd ( -hadr, -dpsprcb, -dpsdbcb -- last 2 are internal commands and typically run on standby and generates binary output )
- MON_GET_HADR table
- db2trc
- db2fodc -hadr ( for data collection on Primary and Standby at the same time ).
- Typical HADR tuning variables
- db2set
- DB2_HADR_BUF_SIZE :
- This controls the standby's log receive buffer. The needed size depends on the anticipated log generation rate and the duration of congestion. The default is 2x Primary Log Buffer size.
- DB2_HADR_SOSNDBUF / DB2_HADR_SORCVBUF
- This controls the receive and send TCP buffer size. When the size is bigger or equal to the size of the log flush it will use only one send() call to send the log buffer. You can read the simhadr blog where you can find a tool named "db2flushsize" to calculate the flush size for your system.
HADR monitoring
In Db2 version 10.1 onwards, you can either use db2pd -hadr or db2 "select * from table(mon_get_hadr(null))" to see what is happening within HADR. The main things to look at are:
- HADR_STATE -- PEER is the best state. For SUPERASYNC however, it will always be REMOTE CATCHUP
- HADR_CONNECT_STATUS -- CONNECTED / DISCONNECTED
- HADR_CONNECT_STATUS_TIME -- If Status is CONNECTED it will show the time when it got connected, If Status is DISCONNECTED it will show when HADR got disconnected
- PRIMARY_LOG_TIME -- The timestamp of the last log record on Primary
- STANDBY_LOG_TIME -- The timestamp of the last log record applied on Standby. You can figure out from this how far behind is the Standby.
- HADR_LOG_GAP -- This shows in bytes how much is Standby behind Primary. If number is high constantly in non SuperAsync mode then it means Standby is not able to keep up with Primary or has slowed down for whatever reasons. The type of log records to apply also matters as some of them are at higher blocking levels i.e. Tablespace, DB etc. which means if this type of a record needs to be processed it cannot be parallelized and it will be either blocked at tablespace level ( records like adding extent to object, merging index leaves which can be done by reorg etc. ) i.e. all tables within the tablespace will have to wait before proceeding. The worst is DB Blocking type of record ( like logmigration etc. ) where everything has to stop to process the record and then only standby can proceed.
- STANDBY_RECV_BUF_PERCENT -- reports Standby recv buffer percent used. If it becomes 100% Primary will block on sending further log buffers till there is some space in the buffer. However, if Standby Spooling is set up, this is not a problem as the Standby will start spooling the log buffers for replay.
- LOG_HADR_WAIT_CUR: How long the primary logger has been blocked waiting on log replication. This is important as that tells Standby is not able to receive log buffers from Primary. If this number keeps growing you have a problem in your hand.
There is an excellent blog on HADR monitoring which goes into this in much more details.
What to do when HADR is slow
1) First off start by monitoring either db2pd -hadr or MON_GET_HADR and check the fields of interest as specified above.
2) If you see that Standby is slowing down ( log gap is increasing, standby recv buf is close to 100% ) and you know for certain that HADR is tuned perfectly as far as HADR_BUF_SIZE and HADR_SOSND/RCV_BUF is concerned or in other words, you have run simhadr, db2flushsize to set the params correctly, initiate data collection
- Option#1: db2fodc -db <dbname> -hadr -host all and it will invoke the script db2cos_hadr on both primary and standby
- Option#2: If the event is short lived and you are sure it was an HADR Congestion issue, then you can use the following:
- db2fodc -db <dbname> -hadr -detect -- It will keep an eye on HADR congestion and moment it happens it will trigger data collection on both Primary and Standby ( More info at this link )
- Option#3: Run the script collectPrimary.ksh <dbname> [ <outputdir> ] on Primary, collectStandby.ksh <dbname> [ <outputdir> ] on Standby
- Option#4: If the issue is hit at odd times, you can automate the data collection using socket scripts which can talk to each other across servers
- HADRTriggerPrimary.pl
- HADRTriggerStandby.pl
- Since, these scripts follow client/server architecture, you have to start HADRTriggerStandby.pl first on Standby. Syntax is:
- HADRTriggerStandby.pl -serverport <port#> -scriptname <scriptname> [ -scriptargs <args> ] -- where you choose a free port on the server to which the other side can connect to. Example syntax: HADRTriggerStandby.pl -serverport 90010 -scriptname /tmp/collectStandby.ksh -scriptargs sample
- Now start HADRTriggerPrimary.pl on Primary. Syntax is:
- HADRTriggerPrimary.pl -remhost <host> -remport <port> -dbname <dbname> -scriptName <name> [ -scriptargs <args> -num <num app> -trigger <time> -sleep <time> -appstatus <status> ] -- where you provide the standby host name, and port# you started HADRTriggerStandby.pl with. Example syntax:
- HADRTriggerPrimary.pl -remhost remote.server.com -port 90010 -dbname sample -scriptname /tmp/collectPrimary.ksh -scriptargs sample -num 10 ( with default values for trigger, sleep and appstatus it means this will wake up every 1.5 seconds to check on number applications in CommitActive state. If they go beyond 10 it'll trigger data collection on both Primary and Standby ).
- The main difference between Option#2 and Option#4 is that Option#2 looks only for Congestion where Option#4 is a bit more flexible and user can provide a specific application status which they observe as a sign of an HADR slowdown.
For the adventurous, you can attempt to format the db2trc and look at the stacks and see if any bottleneck can be spotted ( use db2trc perfrep option if on 10.1, 10.5, 11.1 ). However, if you have been able to collect the above data you are already in a much better shape to provide relevant data to the PMR.
I hope this blog post has helped you in some way to figure your way through HADR slowdowns.
DISCLAIMER: The scripts provided in this blog are as-is without any support.
#Db2