Db2

Db2

Connect with Db2, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

db2pd: The hidden gem! 

Wed January 29, 2020 03:05 PM

Rajib Sarkar (rssarkar)
STSM Db2 Support

July 18, 2016.

Those who are familiar with the Informix database must have used a utility named "onstat" to monitor the database, do some cool tricks by looking at the innards of the Informix engine. The db2pd tool is very similar to that and there are a lot of cool things you can do with this tool some of which are externally documented and some of them are undocumented ( the reason being if misused or wrongly used can be catastrophic for the DB2 engine ). Since, my blog is more "performance" biased, I'll discuss a few cool things you can do with db2pd tool to figure out a performance bottleneck. Let me say one thing at the outset, if you are looking for some kind of nice kind of report from this tool, then this might not be the tool of choice for you. This tool is more like providing a quick insight into the shared memory of the DB2 engine and the outputs can be a bit cryptic and it takes a bit of time to get used to it. But, I can tell you, once you get used to it, there's no peer to this tool .. :-). There is one more great thing about this tool ( perhaps the greatest ) i.e. when your engine is hung and no connections are going through, this is the ONLY tool which can work and get you the state of the engine at the time of hang. Now, that the preamble is out of the way .. let me get down to the nitty-gritty.

 

The following scenarios are based on my experience working in support. These are pretty much the most common "perf" scenarios customers typically face and I'll show you how you can get to the root cause just by using db2pd tool.

 

1) High CPU situation -- A typical situation is CPU spikes or high CPU after migrating to new release or upgrading hardware etc.

My suggestion will be to get a vmstat profile first to see what kind of CPU ticks are being accumulated more i.e. is it %usr or %sys. If you are used to vmstat output and would like to get a similar output on Windows systems running DB2, you can use the db2pd -vmstat option.

  1. a) If %sys CPU is high

This means that for whatever reason, more system calls are being made and system is not doing useful work i.e. executing user/application code but spending more time running system calls. Typically, I deem it as a problem when %sys is 1.5 - 2x or more higher than %usr. So, the trick is to find out what function call is consuming those ticks. If DB2 is driving it, it can be 2 things i.e.

  1. i) I/O calls or Network calls OR
  2. ii) Latch calls

Since, the I/O system calls are outside the "application", OS tools such as  curt ( AIX ), for Linux, oprofile or perf ( good examples in: http://www.brendangregg.com/perf.html and https://perf.wiki.kernel.org/index.php/Tutorial ) or if you are an advanced user and want to be adventurous try systemtap ( it is like dTrace or probevue ) and can write a program to profile the system for system calls ). I did not recommend strace on Linux for the reason that tracing a multi-threaded process like DB2's db2sysc can cause serious impact on the performance of it. The other tools are much less impact choices.

 

For latch calls, you can use the command:

db2pd -latches -rep 1 50 -file latches.log > /dev/null 2>&1 and it will run the command 50 times, 1 second apart. A typical output will be like:

Database Member 0 -- Active -- Up 3 days 17:08:40 -- Date 2016-07-18-15.12.55.033746

Latches:
Address            Holder     Waiter     Filename             LOC        LatchType            HoldCount
0x0780000001340478 1029       0          ../include/sqle_workload_disp.h 1391       SQLO_LT_sqeWLDispatcher__m_tunerLatch 1
 

When there is a problem i.e. latch calls causing high %sys, you should see values in "Waiter" column. The numbers in the "Holder" and "Waiter" columns are the EDU ids which you can co-relate to Application handles using

  1. a) db2pd -db <dbname> -apinfo|egrep "EDU|AppHan"|paste - - |grep <eduid> OR
  2. b) db2pd -agents|awk -v "edu=13880" '/^0x/ { if( $4 == edu ) print "Apphandle: "$2; }' 

 

To ease your work, there are 2 scripts attached:

1) latchToQueryIdentifier.pl -- find out which query is driving the latch contention.

2) parseDb2pdLatch.pl        -- Parse the db2pd -latches output to find out the latch contention.

 

Please note, these scripts are provided as-is without any support.

 

  1. b) If %usr CPU is high

This means that user code is driving the CPU high. After you have done away with identifying the top CPU consuming process ( top output or ps aux or topas ) and narrowed it down to db2sysc process, you can run the following db2pd command to figure out which EDU / agent is consuming the most CPU and then find out which query is driving the high cpu usage.

 

1) To identify the top CPU consuming EDUs ( DB2 v9.7 FP5 and higher ):

db2pd -db <dbname> -edus interval=5 top=10 -file topEdus.txt > /dev/null &

2) Collect the application information to find out the query driving the problem.

db2pd -db <dbname> -apinfo -rep 2 5 -file apinfo.txt > /dev/null 2>/dev/null &

 

Notice the '&' sign at the end. It is to run both the commands at the same time, as it would be required to co-relate the outputs to find out the query or queries. You may require to run the above multiple times to figure out the queries if there are multiple queries which are driving the high CPU. Then of course, the next step being running it through Query Tuner if you are using DSM or get an explain plan and see what can be tuned to make the high CPU consumption go away. Typically, large hash joins are CPU intensive, but there can be other reasons too.

 

2) High number of concurrent queries slowing down the database performance

Sometimes it may so happen, some query which has a bad plan is running at very high numbers concurrently on the database which is making it run slow. If the query is run in isolation, it runs very quick but when run in large numbers simultaneously, it causes problems. You can find out the offending query using:

 

db2pd -db <dbname> -active -rep 2 5 -file active.txt > /dev/null

 

A sample output looks like:

Database Member 0 -- Database SAMPLE -- Active -- Up 3 days 18:42:10 -- Date 2016-07-18-16.46.34.408112

Active Statement List:
Address            AppHandl [nod-index] UOW-ID     StmtID     AnchID StmtUID    EffISO      EffLockTOut EffDegree   EntryTime           StartTime           LastRefTime
0x0700000049604C80 6881     [000-06881] 1          1          377    1          1           -1          0           Mon Jul 18 16:46:28 Mon Jul 18 16:46:28 Mon Jul 18 16:46:34

 

Key things to look at are:

  1. i) Apphandle
  2. ii) Anch ID

iii) StmtUID

  1. iv) Start Time
  2. v) Last Ref Time

 

Note: To identify the query running with highest concurrency, you can run:

db2pd -db <dbname> -active | awk '/^0x/ { print $6","$7; }' | sort | uniq -c | sort -nr

 

The Anch ID and StmtUID combination uniquely identifies the query in the cache. You can then run the following to get the statement.

 

db2pd -db sample -dyn|awk -v "anchid=377" -v "stmtid=1" '/Dynamic SQL Statements:/,/Dynamic SQL Environments:/ { if( match($0,"^0x") && ( $2 == anchid ) && ( $3 == stmtid )  ) print $0; }'

 

You can also use the db2pd -active output to find out long running queries ( The watchThreshold script I provided in one of my earlier blogs uses the same mechanism ) by doing a diff between the "Start Time" and the time shown in the header line of the db2pd output.

 

3) Find out lock waits in the database

The command to figure out the lock wait situation is:

db2pd -db <dbname> -wlocks [ detail ]

 

The <detail> suboption is not available prior to 10.5 releases. It provides the table name as well for the lock. If that option is not available, there are other ways to figure out the table name.

 

A sample output is:

Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:02:28 -- Date 2016-07-18-17.16.41.521224

Locks being waited on :
AppHandl [nod-index] TranHdl    Lockname                   Type       Mode Conv Sts CoorEDU    AppName  AuthID   AppID
30       [000-00030] 420        0002000F000000000000000452 RowLock    ..X       G   3086       db2bp    RSARKAR  *LOCAL.rsarkar.160718211437
32       [000-00032] 397        0002000F000000000000000452 RowLock    ..U       W   15166      db2bp    RSARKAR  *LOCAL.rsarkar.160718211532

 

Apphandle 32 is waiting on Apphandle 30. It is trying to get a Row lock in "U<pdate>" mode but apphandle is holding an 'X' lock on that row which is why it is waiting. If I use the <detail> suboption I can find out on which table the issue is happening.

$ db2pd -db sample -wlocks detail

Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:03:46 -- Date 2016-07-18-17.17.59.830245

Locks being waited on :
AppHandl [nod-index] TranHdl    Lockname                   Type       Mode Conv Sts CoorEDU    AppName  AuthID   AppID                           TableNm            SchemaNm AppNode
30       [000-00030] 420        0002000F000000000000000452 RowLock    ..X       G   3086       db2bp    RSARKAR  *LOCAL.rsarkar.160718211437      STAFF              RSARKAR  db2debug
32       [000-00032] 397        0002000F000000000000000452 RowLock    ..U       W   15166      db2bp    RSARKAR  *LOCAL.rsarkar.160718211532      STAFF              RSARKAR  db2debug
 

So, it is on table STAFF. In order to find out the queries run by the app handle, I can use the following command:

db2pd -db sample -apinfo 30

db2pd -db sample -apinfo 32

 

db2pd -db sample -apinfo 30 results in:

Application :
  Address :                0x0780000001DE0080
  AppHandl [nod-index] :   30       [000-00030]
  TranHdl :                420
  Application PID :        56755468
  Application Node Name :  db2debug
  IP Address:              n/a
  Connection Start Time :  (1468876459)Mon Jul 18 17:14:19 2016
  Client User ID :         rsarkar
  System Auth ID :         RSARKAR
  Coordinator EDU ID :     3086
  Coordinator Member :     0
  Registered Agents :      1
  Active Agents :          1
  Locks timeout value :    NotSet
  Locks Escalation :       No
  Workload ID :            1
  Workload Occurrence ID : 1
  Trusted Context :        n/a
  Connection Trust Type :  non trusted
  Role Inherited :         n/a
  Application Status :     UOW-Waiting
  Application Name :       db2bp
  Application ID :         *LOCAL.rsarkar.160718211437
  ClientUserID :           n/a
  ClientWrkstnName :       n/a
  ClientApplName :         n/a
  ClientAccntng :          n/a
  CollectActData:          N
  CollectActPartition:     C
  SectionActuals:
  UOW start time :         07/18/2016 17:15:15.911638
  UOW stop time :
 

The application handle is in UOW Waiting status, however, as can be seen, it has a open UOW which started @17:15:15.

Application :
  Address :                0x0780000002FA0080
  AppHandl [nod-index] :   32       [000-00032]
  TranHdl :                397
  Application PID :        23074544
  Application Node Name :  db2debug
  IP Address:              n/a
  Connection Start Time :  (1468876532)Mon Jul 18 17:15:32 2016
  Client User ID :         rsarkar
  System Auth ID :         RSARKAR
  Coordinator EDU ID :     15166
  Coordinator Member :     0
  Registered Agents :      1
  Active Agents :          1
  Locks timeout value :    NotSet
  Locks Escalation :       No
  Workload ID :            1
  Workload Occurrence ID : 2
  Trusted Context :        n/a
  Connection Trust Type :  non trusted
  Role Inherited :         n/a
  Application Status :     Lock-wait
  Application Name :       db2bp
  Application ID :         *LOCAL.rsarkar.160718211532
  ClientUserID :           n/a
  ClientWrkstnName :       n/a
  ClientApplName :         n/a
  ClientAccntng :          n/a
  CollectActData:          N
  CollectActPartition:     C
  SectionActuals:
  UOW start time :         07/18/2016 17:16:31.350722
  UOW stop time :

  List of current activities :
    Activity ID :                                 1
    UOW-ID :                                      1
    Package schema :                              NULLID
    Package name :                                SQLC2K26
    Package Version :
    Consistency Token :                           AAAAAfAd
    Section number :                              203
    Statement number :                            1
    Isolation :                                   CS
    Effective degree :                            0
    Actual degree    :                            1
    Sourece ID :                                  0
    Cursor ID :                                   0
    Nesting level :                               0
    Invocation ID :                               0
    Package cache ID :                            0x0000000102910000
    Anchor ID :                                   657
    Statement UID :                               1
    SQL Type :                                    Dynamic
    Statement Type :                              DML, Insert/Update/Delete
    Statement :                                   update staff set years = 10 where id = '200'
    Entry time :                                  07/18/2016 17:16:31.351389
    Local start time :                            07/18/2016 17:16:31.351391
    Last reference time :                         07/18/2016 17:16:31.351389

 

Application handle was running an update statement on staff for a specific ID and is in Lock-wait status. So, an educated guess can be made here that application handle 30 must have run some DML statement to hold an X lock on the row but has not yet committed which is why the lock wait is happening. So, if that user has walked away from the workstation, then the best alternative is to do a "force application" on that application handle 30 to roll back the transaction and let application handle 32 proceed.

 

Note: if <detail> suboption is not available. You can use the MON_FORMAT_LOCK_NAME function where available as:

db2 "SELECT SUBSTR(NAME,1,20) AS NAME,SUBSTR(VALUE,1,50) AS VALUE FROM TABLE( MON_FORMAT_LOCK_NAME('0002000F000000000000000452')) as LOCK" OR

The tablespace id and table id is embedded in the lock name. The first 2 bytes store the tablespace id and next 2 bytes store the table id. You can then issue the command:

db2pd -db <dbname> -tcbstat <tablespace id> <table id> e.g. in this case db2pd -db sample -tcbstat 2 15

$ db2pd -db sample -tcbstat 2 15

Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:14:51 -- Date 2016-07-18-17.29.04.517807

TCB Table Information:
Address            TbspaceID TableID PartID MasterTbs MasterTab TableName          SchemaNm ObjClass DataSize   LfSize     LobSize    XMLSize
0x0700000048F553D8 2         15      n/a    2         15        STAFF              RSARKAR  Perm     1          0          0          0

TCB Table Stats:
Address            TableName          SchemaNm Scans      UDI        RTSUDI               PgReorgs   NoChgUpdts Reads      FscrUpdates Inserts    Updates    Deletes    OvFlReads  OvFlCrtes  PgDictsCrt CCLogReads StoreBytes BytesSaved
0x0700000048F553D8 STAFF              RSARKAR  3          35         35                   0          0          70         0           0          35         0          0          0          0          0          -          -
 

4) Miscellaneous things you can do with db2pd

 

  1. i) Find out how much log space is being used by each transaction -- db2pd -db <dbname> -transactions ( look at the SpaceReserved column ).
  2. ii) Find out the transaction throughput -- db2pd -db <dbname> -transactions | egrep "commit|rollback" -- the numbers are cumulative, so you will have to do the math to find

                                                                                                          per second values.

iii) Find out query throughput -- db2pd -db <dbname> -workload | egrep "WorkloadName|ActCompleted"     -- Numbers are cumulative, so math has to be done to find out per sec

                                                                                                          values.

**Note: With the availability of db2 "call monreport.dbsummary(60)" the throughput values ( query and transactions ) are much more easier to find rather than scripting to find it out of db2pd values.

 

  1. iv) db2pd -everything  -- It collects ALL information about the instance and is useful for support when they are trying to figure out what was going on in the instance during a problem. The db2fodc -perf/hang tools do collect this automatically.

 

  1. v) db2pd -db <dbname> -dirty summary -cleaners -- To find out how the page cleaners are working to keep the buffer pools clean. If you see its filling up too rapidly and page cleaners are not able to keep up, maybe it is time to increase the number of page cleaners.

 

  1. vi) db2pd -db <dbname> -bufferpool -- The bufferpool performance ( hit ratio, I/O speeds and throughput ).

 

vii) db2pd -stack all -- To take stacks of the ALL the agents in the engine. If you want to take a specific EDU stack, just append the EDU id to the command. By default, it dumps the stack file ( *stack.txt ) file in DIAGPATH but if you want to change it, you can use:

db2pd -stack all dumpdir=<dir you want the stacks to dump>

 

Stacks are VERY VERY important in hang like scenarios, as that tells the support engineers in which code path the agent is stuck. It is useful in slow perf issues too but it needs to be used judiciously as it does have an impact.

 

There are many more commands ( db2pd -h shows you a help screen ) which the support engineers use to debug any kind of issue in the engine. It'll need a whole book to put all of it in detail .. :-)

 

In conclusion, I would say, this tool is not for someone who wants GUI based outputs or wants to resolve issues with a few clicks of mouse ( there are tools such as DSM, Optim which do this very well ) but it is for someone who is more of a awk/sed/grep person and not afraid to do quick one line scripts to parse the data. I'm personally biased towards this tool as I am an ex-Informixer :-) and am used to a typical character based output and doing greps/awk/sed etc.

 

I hope the not so brief primer on db2pd was useful for you. Please feel free to comment.

 


#Db2

Statistics
0 Favorited
24 Views
0 Files
0 Shares
0 Downloads

Comments

Tue July 25, 2023 11:30 AM

The links to the scripts is outdated and broken.
Can you please give an updated link for them.