Db2

 View Only

Finding Problem SQL in Db2's Package Cache

By Ember Crooks posted Tue June 11, 2019 10:41 PM

  
Without any additional tooling, there's a fair amount of performance information available in Db2. One of the most neglected areas I see is finding and addressing problem SQL. 

The Package Cache

The package cache is an area of memory that Db2 uses to store the queries it generates access plans for (compiles). This allows for Db2 to reuse the access plans for frequently-executed SQL rather than re-compiling them every time. In addition to the information that Db2 needs internally, the package cache stores performance data about SQL statements across all executions of that statement. This can be useful for finding not just the 1000-lb gorilla, but also the swarm of locusts in the database.

The disadvantage can be that Db2 decides when it is time to evict a statement or retain it, so some statements in the package cache may have been there for seconds and only an execution or two, while others might have been there for the last 285 days - since the last time Db2 restarted. This disadvantage can be mitigated with a package cache eviction event monitor or with more intensive historical monitoring available through tools like DSM or DBI.

SQL to Find Problem SQL

I've been using this SQL or a version of it for a number of years to find problem SQL running in the Db2 databases I support.
WITH SUM_TAB (SUM_RR, SUM_CPU, SUM_EXEC, SUM_SORT, SUM_NUM_EXEC) AS (
        SELECT  FLOAT(SUM(ROWS_READ)),
                FLOAT(SUM(TOTAL_CPU_TIME)),
                FLOAT(SUM(STMT_EXEC_TIME)),
                FLOAT(SUM(TOTAL_SECTION_SORT_TIME)),
                FLOAT(SUM(NUM_EXECUTIONS))
            FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T
        )
SELECT
        SUBSTR(STMT_TEXT,1,10) as STATEMENT,
        ROWS_READ,
        DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) AS PCT_TOT_RR,
        TOTAL_CPU_TIME,
        DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) AS PCT_TOT_CPU,
        STMT_EXEC_TIME,
        DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) AS PCT_TOT_EXEC,
        TOTAL_SECTION_SORT_TIME,
        DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) AS PCT_TOT_SRT,
        NUM_EXECUTIONS,
        DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) AS PCT_TOT_EXEC,
        DECIMAL(FLOAT(STMT_EXEC_TIME)/FLOAT(NUM_EXECUTIONS),10,2) AS AVG_EXEC_TIME
    FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T, SUM_TAB
    WHERE DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) > 10
        OR DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) >10
        OR DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) >10
        OR DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) >10
        OR DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) >10
    ORDER BY ROWS_READ DESC FETCH FIRST 20 ROWS ONLY WITH UR;​
While it's a bit complicated, this SQL statement looks at each statement in context of the other statements in the package cache, and then returns rows only for those statements that are using a critical portion (more than 10%) of the resources used by all statements in the package cache. The full statement text is one critical part not in the above query. The above query only returns the first 10 characters, but displays the data in a more tabular format. I often run this statement at the same time to get full text of the statements the first query looks at:
WITH SUM_TAB (SUM_RR, SUM_CPU, SUM_EXEC, SUM_SORT, SUM_NUM_EXEC) AS (
        SELECT  FLOAT(SUM(ROWS_READ)),
                FLOAT(SUM(TOTAL_CPU_TIME)),
                FLOAT(SUM(STMT_EXEC_TIME)),
                FLOAT(SUM(TOTAL_SECTION_SORT_TIME)),
                FLOAT(SUM(NUM_EXECUTIONS))
            FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T
        )
SELECT
        STMT_TEXT
    FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T, SUM_TAB
    WHERE DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) > 10
        OR DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) >10
        OR DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) >10
        OR DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) >10
        OR DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) >10
    ORDER BY ROWS_READ DESC FETCH FIRST 20 ROWS ONLY WITH UR;​​


It is also important to consider which categories a given statement shows up in. I usually spend the most time on statements with high rows_read and high cpu_time. Depending on whether this is an OLTP environment or an analytics environment, the number of executions of a statement can influence what is important. In an OLTP environment, time spent tuning problem statements that are only executed once may not be useful. It may be best to figure out where they come from and consider preventing them. In an analytics environment, tuning for the one-time statements can be useful if they're representative.


What to do With Problem SQL

Identifying the problem queries is really just the first step. Likely next steps include:

Index Read Efficiency is a metric that is useful for tracking progress toward slaying bad SQL over time.
#Db2
0 comments
11 views

Permalink