Db2

Db2

Connect with Db2, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

Making SQL PL and PL/SQL sensitive to changes in statistics, and more 

Mon March 09, 2020 05:31 PM

Posted by: Serge Rielau

Motivation

Recently one of our major business partners submitted the following feature request:
 
"Please make all the SQL in my SQL Routines dynamic!"

Why, I asked, would you want to do that? If you make the SQL dynamic at lot of bad things would happen to your SQL:
  • No dependencies are recorded anymore.
    So you do not know anymore with which objects the routine interacts
  • SQL executes under "invoker's rights"
    That is the invoker of the routine or trigger would need the authorization to execute the SQL within.
  • There will be no more "conservative binding"
    Every time an SQL statement gets executed it would do so being compiled based on the CURRENT PATH, CURRENT SCHEMA, and CURRENT TIMESTAMP in effect at the time.
    That means there is no assurance the SQL would operate on the same table or invoke the same routines when executed multiple times.
  • The execution plan may change between two invocations if statistics were updated on a table and/or the configuration of the memory is changed

There are probably more side-effects when switching from static SQL to dynamic. But these are the ones I can come up with immediately.

The answer I got was interesting:

"I'm OK with all conservative binding semantics, but I want to be sensitive to statistics changes.
Since the database is empty when the routines are created.
The execution plans will nearly guaranteed be sub-optimal.
"
 
So what this vendor wants is for SQL within routines and triggers to be recompiled whenever there is a change to any of the used objects.  
This may be an update to the statistics or perhaps the addition of an index.
In case of dynamic SQL such operations cause so called "soft invalidation".
That is the cached SQL plans for the statements which depend on the changed objects cannot be reused for new invocations.
So the first new invocation of the the statement will cause a recompilation placing an updated version of the plan in the cache.
 
Well, if that's all you want fro static SQL, then DB2 can actually do that today.
Here I describe how.

Scenario

As always we'll work of a running example. The following schema defines two tables "EMP" and "DEPT".
We then create a procedure which fetches an employee based on the emp id.
The *cough* "complexity" lies in the fact that there is a join.
SET SCHEMA = SAMPLE;
SET PATH = SAMPLE, CURRENT PATH;
CREATE TABLE emp (id INTEGER NOT NULL, name VARCHAR(15), deptid INTEGER NOT NULL);
CREATE TABLE dept(id INTEGER NOT NULL, name VARCHAR(10));
CREATE OR REPLACE PROCEDURE get_emp(IN  id   ANCHOR TO emp.id,
                                    OUT name ANCHOR TO emp.name,
                                    OUT dept ANCHOR TO dept.name)
SPECIFIC GET_EMP
  SELECT emp.name, dept.name INTO name, dept
    FROM emp, dept
   WHERE emp.id = get_emp.id
     AND dept.id = emp.deptid;
Insert some data.
INSERT INTO dept VALUES(1, 'Marketing');
INSERT INTO emp VALUES(1, 'Jones', 1);
Test the procedure.
CALL get_emp(1, ?, ?);
Value of output parameters
--------------------------------
NAME = Jones
DEPT = Marketing

Exposing the problem

Using my SQL PL profiler we can easily display the executed plan:
CALL PROFILE.ENABLE_ACTIVITY_MONITOR();
CALL get_emp(1, ?, ?);
CALL PROFILE.DISABLE_ACTIVITY_MONITOR();
CALL PROFILE.GET_ACTIVITY_ACTUALS(?);

              Rows
           Rows Actual
             RETURN
             (   1)
              Cost
               I/O
               |
                1
                1
             HSJOIN
             (   2)
             13.5704
               NA
         /-----+------\
        1                1
        1                1
     TBSCAN           TBSCAN
     (   3)           (   4)
     6.78499          6.78516
       NA               NA
       |                |
        1                1
       NA               NA
 TABLE: SAMPLE    TABLE: SAMPLE 
      DEPT              EMP
       Q1               Q2
A small problem. I had intended to start with a nested-loop join and argue that this would be a bad join choice as the number of rows increases.
DB2 outwitted me (again).
But there are less subtle ways to "encourage" a plan changes:
ALTER TABLE emp ADD PRIMARY KEY (id);
ALTER TABLE dept ADD PRIMARY KEY (id);
ALTER TABLE emp ADD FOREIGN KEY (deptid) REFERENCES dept(id);

CALL PROFILE.ENABLE_ACTIVITY_MONITOR(); CALL get_emp(1, ?, ?); CALL PROFILE.DISABLE_ACTIVITY_MONITOR(); CALL PROFILE.GET_ACTIVITY_ACTUALS(?);
              Rows
           Rows Actual
             RETURN
             (   1)
              Cost
               I/O
               |
                1
                1
             HSJOIN
             (   2)
             13.5704
               NA
         /-----+------\
        1                1
        1                1
     TBSCAN           TBSCAN
     (   3)           (   4)
     6.78499          6.78516
       NA               NA
       |                |
        1                1
       NA               NA
 TABLE: SAMPLE    TABLE: SAMPLE 
      DEPT              EMP
       Q1               Q2
Nothing has changed! DB2 did not pick the new indices.
None of the changes we did to the table forced DB2 to invalidate the plan, so it didn't.
That's the premise of static SQL.
In order to get DB2 to re-consider the plan we must tell it to do so explicitly:
CALL PROFILE.REBIND_ROUTINE('SAMPLE', NULL, 'GET_EMP'); 
CALL PROFILE.ENABLE_ACTIVITY_MONITOR();
CALL get_emp(1, ?, ?);
CALL PROFILE.DISABLE_ACTIVITY_MONITOR();
CALL PROFILE.GET_ACTIVITY_ACTUALS(?);

                             Rows
                          Rows Actual
                            RETURN
                            (   1)
                             Cost
                              I/O
                              |
                               1
                               1
                            ^NLJOIN
                            (   2)
                            13.5709
                              NA
                  /-----------+-----------\
                 1                           1
                 1                           1
              FETCH                       FETCH
              (   3)                      (   5)
              6.78546                     6.78546
                NA                          NA
           /----+----\                 /----+----\
          1             1             1             1
          1            NA             1            NA
       IXSCAN    TABLE: SAMPLE     IXSCAN    TABLE: SAMPLE 
       (   4)          EMP         (   6)         DEPT
      0.0149777        Q2         0.0149777        Q1
         NA                          NA
         |                           |
          1                           1
         NA                          NA
   INDEX: SYSIBM               INDEX: SYSIBM 
 SQL120820083059340          SQL120820083059420
         Q2                          Q1
So in DB2's preferred mode of operation it is your responsibility to rebind static SQL after updating statistics or adding indexes.
But what if I want DB2 to take care of itself? I have already turned on the self tuning memory manager and automatic runstats collection.
I'm happy how that works for dynamic SQL and I want DB2 to take care of all my SQL PL or PL/SQL as well.

Can VALIDATE RUN do the job?

My original comeback to the vendor was to try VALIDATE RUN.
This BIND option will delay compilation of a static SQL statement until first execution - I thought.
So that will solve the problem.
Unfortunately during development of this scenario I discovered that I had missed a crucial property of VALIDATE RUN.
VALIDATE RUN is a fail-over mechanism only.
That is, only SQL that would otherwise raise a bind time error gets pushed out.
The purpose of VALIDATE RUN is to cross your fingers and hope by the time the SQL statement executes a missing object such as a user temporary table is available.
If your SQL has no such errors, which can be assumed, the SQL will be just as static as without VALIDATE RUN.

What VALIDATE RUN can't REOPT ONCE can.

The idea of using a BIND option however does hold water. We just have to pick a different option.
REOPT ONCE and REOPT ALWAYS are options which delay optimization of an SQL statement until the input values are available.
The idea is that given a specific set of input values a better plan can be generated.
  • REOPT ONCE will do this optimization only once - as the name implied.
    This option is often used in OLTP environments
  • REOPT ALWAYS will recompile the statement for every invocation.
    This option is only useful for reporting queries where the execution time greatly eclipses the compilation time.
When defining SQL Routines, triggers or even anonymous blocks there is no direct way to set the BIND options within the syntax.
Instead you set bind options either at an instance level using the DB2_SQLROUTINE_PREPOPTS variable or on a session basis using the SET_ROUTINE_OPTS()  procedure.
Let's redo our scenario using REOPT ONCE.
DROP TABLE SAMPLE.emp;
DROP TABLE SAMPLE.dept;
SET SCHEMA = SAMPLE;
SET PATH = SAMPLE, CURRENT PATH;
CREATE TABLE emp (id INTEGER NOT NULL,
name VARCHAR(15),
deptid INTEGER NOT NULL);
CREATE TABLE dept(id INTEGER NOT NULL,
name VARCHAR(10));
Enable REOPT ONCE for the following routine definitions.
Note that, despite the name, compiled triggers and anonymous blocks will also be affected.
CALL SET_ROUTINE_OPTS('REOPT ONCE');
CREATE OR REPLACE PROCEDURE get_emp(IN  id   ANCHOR TO emp.id,
                                    OUT name ANCHOR TO emp.name,
                                    OUT dept ANCHOR TO dept.name)
SPECIFIC GET_EMP
  SELECT emp.name, dept.name INTO name, dept
    FROM emp, dept
   WHERE emp.id = get_emp.id
     AND dept.id = emp.deptid;
It's good practice to clean up after ourselves.
CALL SET_ROUTINE_OPTS('');

INSERT INTO dept VALUES(1, 'Marketing');
INSERT INTO emp VALUES(1, 'Jones', 1);

CALL get_emp(1, ?, ?);
Value of output parameters
--------------------------------
NAME = Jones
DEPT = Marketing
We don't expect any changes yet. The plan should be identical to the original scenario:
CALL PROFILE.ENABLE_ACTIVITY_MONITOR();
CALL get_emp(1, ?, ?);
CALL PROFILE.DISABLE_ACTIVITY_MONITOR();
CALL PROFILE.GET_ACTIVITY_ACTUALS(?);

              Rows
           Rows Actual
             RETURN
             (   1)
              Cost
               I/O
               |
                1
                1
             HSJOIN
             (   2)
             13.5704
               NA
         /-----+------\
        1                1
        1                1
     TBSCAN           TBSCAN
     (   3)           (   4)
     6.78499          6.78516
       NA               NA
       |                |
        1                1
       NA               NA
 TABLE: SAMPLE    TABLE: SAMPLE 
      DEPT              EMP
       Q1               Q2
Now, this is where it gets interesting:
ALTER TABLE emp ADD PRIMARY KEY (id);
ALTER TABLE dept ADD PRIMARY KEY (id);
ALTER TABLE emp ADD FOREIGN KEY (deptid) REFERENCES dept(id);
CALL PROFILE.ENABLE_ACTIVITY_MONITOR();
CALL get_emp(1, ?, ?);
CALL PROFILE.DISABLE_ACTIVITY_MONITOR();
CALL PROFILE.GET_ACTIVITY_ACTUALS(?);

                             Rows
                          Rows Actual
                            RETURN
                            (   1)
                             Cost
                              I/O
                              |
                               1
                               1
                            ^NLJOIN
                            (   2)
                            13.5709
                              NA
                  /-----------+-----------\
                 1                           1
                 1                           1
              FETCH                       FETCH
              (   3)                      (   5)
              6.78546                     6.78546
                NA                          NA
           /----+----\                 /----+----\
          1             1             1             1
          1            NA             1            NA
       IXSCAN    TABLE: SAMPLE     IXSCAN    TABLE: SAMPLE 
       (   4)          EMP         (   6)         DEPT
      0.0149777        Q2         0.0149777        Q1
         NA                          NA
         |                           |
          1                           1
         NA                          NA
   INDEX: SYSIBM               INDEX: SYSIBM 
 SQL120820081204820          SQL120820081204900
         Q2                          Q1
Success! The plan changed without the need to rebind.

For the record

REOPT ONCE does not affect the semantics of static SQL:
  • The SQL executes under the authorization id of the routine definer
  • Any functions created after the creation of the routine cannot be seen in the routine's SQL statements
  • Any columns added to tables after the fact will not be visible to the SQL statement
But what about dependency recording?
One of the nice things about static SQL is that you can follow the who's using whom chains:
SELECT BNAME FROM SYSCAT.ROUTINEDEP 
WHERE ROUTINENAME = 'GET_EMP' AND BTYPE ='T'; BNAME -------------------------------------------------- EMP DEPT
There is one more upside to using REOPT ONCE with SQL PL.
While the granularity of invalidation on a schema change is a package in static SQL, using REOPT ONCE will cause only directly affected statements to be invalidated.
This will greatly reduce the impact of schema evolution.
 
One last word of caution:
REOPT ONCE does assume that the plan gets better when given the first set of input values.
While this is true in most cases, it cannot be guaranteed!

#Db2

Statistics
0 Favorited
1 Views
0 Files
0 Shares
0 Downloads