Db2

Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

Made from concentrate, but 100% statement 

Mon March 09, 2020 05:53 PM

Posted by: Serge Rielau

Motivation

Some years ago I visited a business partner to help them overcome some performance problems on DB2.
They had a rather simple workload for quality assurance test which ran on a simple Windows PC.
On another DBMS that workload executed without a problem and with acceptable throughput.
On DB2 however that same workload completely overwhelmed the machine and throughput was, frankly, abysmal.
Needless to say the partner was not too amused.
We quickly discovered that the CPU was pegged at 100%.
We further discovered that DB2 saturated the CPU through compilation of SQL statements.
A dump of the package cache showed that it was flooded with simple update statements.
Each update being identical to the next except for input values. 
That in itself could not explain why the CPU usage was so high.
The workload was identical to the competitor's DBMS.
It turned out that there was a recursive update trigger using inline SQL PL.
So what looked like a simple statement was actually quite complex.
The other DBMS did not have the concept of inline SQL PL.
It did not have to pay for trigger compilation for each statement.
Once the problem was known the fix was simple:
Using parameter markers there was no need to keep compiling SQL Statements.
The inline SQL PL provided an additional boost propelling DB2's performance past the other DBMS.  
While the outcome was good, the partner needed to do work to achieve acceptable performance.

Lesson learned: DB2 must run really bad SQL really fast.

 
In DB2 9.7 two features were added which are meant to fix the issue we encountered:
  1. Compiled SQL PL triggers reduce the complexity of UPDATE, INSERT and DELETE statements to that of the statement itself.
    I discussed the inline SQL PL vs. compiled SQL PL in two blog entries in the past.
  2. A statement concentrator reduces detects patterns of similar statements to reduce compilations.
    This is the topic I will discuss today.

A Problem Scenario

It is a best practice when working with DB2, or most other DBMS to use parameter markers or host variables when submitting dynamic SQL.
DB2 maintains a cache of recently executed SQL statements.
Every time a new SQL statement is submitted DB2 will match it to the existing cache entries.
If a match is found then there is no need to parse and compile.
DB2 can execute the SQL statement right away.
Unfortunately many application developers do not follow that practice.
Even more unfortunate is that several abstract APIs which generate SQL under the covers also do not use parameter markers,
Let's take a look what happens here by setting up a simple experiment.
VARIABLE elapsed INTEGER;
BEGIN
  DECLARE i INTEGER DEFAULT 0;
  DECLARE txt VARCHAR(1000);  
  DECLARE res INTEGER;
  DECLARE start TIMESTAMP;
  DECLARE stmt STATEMENT;
  DECLARE cur CURSOR FOR stmt;
  SET start = CURRENT TIMESTAMP;
  WHILE i < 50000 DO
    SET txt = 'SELECT c1 + 1 FROM (VALUES(' || i || ')) AS T(c1)';
    PREPARE stmt FROM txt;
    OPEN cur;
    FETCH cur INTO res;
    CLOSE cur;
    SET i = i + 1;
  END WHILE;
  SET :elapsed = CURRENT TIMESTAMP - start;
END;
/
PRINT elapsed
42

COLUMN STMT_TEXT FORMAT A80
SET WRAP OFF
SET LINESIZE 200
SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT
  FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) 
  WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY;
STMT_TEXT                                                                       
--------------------------------------------------------------------------------
SELECT c1 + 1 FROM (VALUES(49993)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49998)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49894)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49940)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49963)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49925)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49906)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49968)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49994)) AS T(c1)                                     
SELECT c1 + 1 FROM (VALUES(49898)) AS T(c1)                         

10 rows were retrieved.
So 50000 executions of this simple SQL Statement take about 42 seconds to execute.
And each statement was compiled and inserted into the package cache.

The best fix

The above scenario represents bad SQL programming.
The correct way to write SQL is to use parameter markers and bind the values when the query is being executed.
What happens when we use a parameter marker?
FLUSH PACKAGE CACHE DYNAMIC;

BEGIN
  DECLARE i INTEGER DEFAULT 0;
  DECLARE txt VARCHAR(1000);  
  DECLARE res INTEGER;
  DECLARE start TIMESTAMP;
  DECLARE stmt STATEMENT;
  DECLARE cur CURSOR FOR stmt;
  SET start = CURRENT TIMESTAMP;
  WHILE i < 50000 DO
    SET txt = 'SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)';
    PREPARE stmt FROM txt;
    OPEN cur USING i;
    FETCH cur INTO res;
    CLOSE cur;
    SET i = i + 1;
  END WHILE;
  SET :elapsed = CURRENT TIMESTAMP - start;
END;
/
PRINT elapsed
2

SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY; STMT_TEXT -------------------------------------------------------------------------------- SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)
That's a 20 fold increase. when using parameter markers!
Here DB2 will match the txt content with the previously generated plan and simply assign the stmt handle.
We can do even better by doing that ourselves: We pull the prepare outside the loop:
BEGIN
  DECLARE i INTEGER DEFAULT 0;
  DECLARE txt VARCHAR(1000);  
  DECLARE res INTEGER;
  DECLARE start TIMESTAMP;
  DECLARE stmt STATEMENT;
  DECLARE cur CURSOR FOR stmt;
  SET start = CURRENT TIMESTAMP;
  SET txt = 'SELECT c1 + 1 FROM (VALUES(CAST(? AS INTEGER))) AS T(c1)';
  PREPARE stmt FROM txt;
  WHILE i < 50000 DO
     OPEN cur USING i;
    FETCH cur INTO res;
    CLOSE cur;
    SET i = i + 1;
  END WHILE;
  SET :elapsed = CURRENT TIMESTAMP - start;
END;
/
PRINT elapsed;
1
A finer granularity measuring would be better here.
We could use some a technique discussed earlier in this blog to exactly subtract timestamps.
But this last part isn't really the point of investigation.
What we want to achieve is to speed up this test without changing the application!

Using the Statement Concentrator

The statement concentrator is a feature that has been introduced in DB2 9.7.
When the concentrator is turned on every dynamic SQL statement text which cannot be matched to an entry in the cache will undergo some very basic parsing.  
In this parsing DB2 will find constant literals such as 'Hello', and 5 and replace them with typed parameter markers.
So:
VALUES (1) 
get's replaced with the equivalent:
VALUES CAST(? AS INTEGER)
DB2 will then attempt to match the modified SQL statement text once more.
If a match is found then DB2 will execute the cached statement and supply the extracted constant literals.
If no match is found DB2 will compile the statement, place it in the cache and then execute it.

The easiest way to turn of the statement concentrator is through the DB configuration. 
This is online and will effect any subsequent compilations from any session:
UPDATE DB CFG USING STMT_CONC LITERALS;
FLUSH PACKAGE CACHE DYNAMIC;
BEGIN
  DECLARE i INTEGER DEFAULT 0;
  DECLARE txt VARCHAR(1000);  
  DECLARE res INTEGER;
  DECLARE start TIMESTAMP;
  DECLARE stmt STATEMENT;
  DECLARE cur CURSOR FOR stmt;
  SET start = CURRENT TIMESTAMP;
  WHILE i < 50000 DO
    SET txt = 'SELECT c1 + 1 FROM (VALUES(' || i || ')) AS T(c1)';
    PREPARE stmt FROM txt;
    OPEN cur;
    FETCH cur INTO res;
    CLOSE cur;
    SET i = i + 1;
  END WHILE;
  SET :elapsed = CURRENT TIMESTAMP - start;
END;
/
PRINT elapsed
4

SELECT VARCHAR(SUBSTR(STMT_TEXT, 1, 80)) AS STMT_TEXT
  FROM TABLE(MON_GET_PKG_CACHE_STMT('D', NULL, NULL, NULL)) 
  WHERE STMT_TEXT LIKE 'SELECT c1%' FETCH FIRST 10 ROWS ONLY;
STMT_TEXT                                                                       
--------------------------------------------------------------------------------
SELECT c1 + :L0 FROM (VALUES(:L1 )) AS T(c1)     
As you can see the load executed manifold faster now.
But the statement concentrator is no complete substitute for using parameter markers.
There is an overhead to shallow parsing.

Rules

There are several rules to consider when dealing with the statement concentrator:
  • Since the statement concentrator replaces literals with parameter markers the optimizer loses the ability to use those literals to derive plans based on the actual values.
    However, for complex queries such as used in data warehousing the statement concentrator is generally not recommended
  • If there is a parameter marker present within the original query no further substitutions will be made by the concentrator.
    The assumption is that the remaining literals are there for a reason.
    This includes output parameter markers such as: SET ? = (SELECT ...) constructs.
  • The database configuration STMT_CONC LITERALS is only used if there is no explicit client side setting for statement concentration.
    Some such settings are:
    • the SQL_ATTR_STMT_CONCENTRATOR attribute in CLI/ODBC ,
    • the setDBStatementConcentratormethod in JDBC,
    • or the statementConcentrator connection property
  • It is generally safe to enable connection concentration in OLTP workloads that do not use parameter markers.
    The reason is that typically the query plans for OLTP queries are unlikely to depend on the actual literals provided.
  • Detrimental affects of literal replacement can, to some degree be mitigated through the use of REOPT(ONCE).
    This bind option will use the constant literals of the first query to be inserted into the cache to serve as representative for the optimizer.

#Db2

Statistics
0 Favorited
2 Views
0 Files
0 Shares
0 Downloads