Db2

Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

DB2 10: Paging through result sets by value using row value comparisons 

Mon March 09, 2020 05:45 PM

Posted by: Serge Rielau

Prelude

I have have been engaged in discussions on how to use DB2 for LUW for as long as I have been employed with IBM.
Until not too long ago, before I started blogging,  the primary platform for such discussion was usenet.
And one of the very oldest debates I recall was with Bernard D.
For at least a decade he has been tirelessly trying to convince us to extend DB2 for LUW to support  comparisons of "row value constructors" (rvc). 
In fact the oldest record I can find on Google on this debate dates back to  October 21, 2000.
Well, Bernard, this post is for you.
Now we need to find something else to talk about I guess.

The problem

All of these methods  have one thing in common: They are based on the absolute position of rows within a result set.
For example OFFSET 100 LIMIT 20 will retrieve 20 rows starting with the 100th row.
So, in order for DB2 to find the 100th row DB2 needs to count of the first 99 rows and discard them.
The farther down a result set you page the bigger the waste in CPU and I/O.
Let's have a look:
db2set DB2_COMPATIBILITY_VECTOR = 1
db2stop
db2start

CREATE TABLE emp(name VARCHAR(20), firstname VARCHAR(20), salary INTEGER);

INSERT INTO emp VALUES
  ('Smith'   , 'Jason'  , 20000),
  ('Newman'  , 'Alex'   , 25000),
  ('Zhao'    , 'Min'    , 29000),
  ('Bauer'   , 'Jan'    , 27000),
  ('Giovanni', 'Alfredo', 26000),
  ('Kruft'   , 'Hans'   , 22000),
  ('Jones'   , 'Jim'    , 21000),
  ('Hayes'   , 'Tom'    , 23000),
  ('O''Neill', 'Tim'    , 24000),
  ('McCullen', 'Sarah'  , 30000);

CREATE UNIQUE INDEX empname ON emp(name, firstname);

RUNSTATS ON TABLE emp;

VARIABLE start INTEGER;
BEGIN
  SET :start = 5;
END;
/

SELECT * FROM emp 
WHERE ROWNUM BETWEEN :start AND :start + 4 ORDER BY name, firstname;

NAME FIRSTNAME SALARY -------------------- -------------------- ----------- Kruft Hans 22000 McCullen Sarah 30000 Newman Alex 25000 O'Neill Tim 24000 Smith Jason 20000 5 rows were retrieved.
A look at the optimizer plan shows that there is little DB2 can do to minimize the rows read.
                 Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                   1 
                FILTER
                (   2)
                6.80181 
                   1 
                  |
                  10 
                FETCH 
                (   3)
                6.79467 
                   1 
            /-----+------\
          10               10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   4)             EMP
       0.0211523           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1

Value based paging

A better way to do paging is to use the index not only to derive the order, but also the start key.
VARIABLE name VARCHAR(20);
BEGIN
  SET :name = ' ';
END;
/

SELECT * FROM emp 
WHERE name > :name
ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY -------------------- -------------------- ----------- Bauer Jan 27000 Giovanni Alfredo 26000 Hayes Tom 23000 Jones Jim 21000 Kruft Hans 22000
We can use the last retrieved value as a starting point for the next page:
BEGIN
  SET :name = 'Kruft';
END;
/

SELECT * FROM emp 
WHERE name > :name
ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY -------------------- -------------------- ----------- McCullen Sarah 30000 Newman Alex 25000 O'Neill Tim 24000 Smith Jason 20000 Zhao Min 29000
The optimizer plan is much better now.
It  allows us to find the first row efficiently:
                  Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                3.33333 
                FETCH 
                (   2)
                6.78901 
                   1 
            /-----+------\
        3.33333            10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   3)             EMP
       0.0177397           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1
...
	3) IXSCAN: (Index Scan)
...
		Predicates:
		----------
		2) Start Key Predicate, 
			Comparison Operator: 		Less Than (<)
			Subquery Input Required: 	No
			Filter Factor: 			0.333333

			Predicate Text:
			--------------
			(? < Q1.NAME)
Now, I have been cheating here since the index is on two columns "NAME" and "FIRSTNAME".
If there were two or more employees named 'Kruft' we could have skipped one.

Traditional row wise paging

Here is how one can page using  a multiple column start key in DB2 no matter what version:
VARIABLE firstname VARCHAR(20);
BEGIN
  SET (:name, :firstname) = (' ', ' ');
END;
/

SELECT * FROM emp
WHERE name > :name OR (name = :name AND firstname > :firstname) ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY -------------------- -------------------- ----------- Bauer Jan 27000 Giovanni Alfredo 26000 Hayes Tom 23000 Jones Jim 21000 Kruft Hans 22000
How would the predicate look like if we had three key parts? Lenghty ...
SELECT * FROM emp 
WHERE name > :name OR (name = :name AND firstname > :firstname) OR (name = :name AND firstname = :firstname AND middlename > :middlename ORDER BY name, firstname, middlename
FETCH FIRST 5 ROWS ONLY;
The  problem is that in older versions of DB2 the optimizer cannot figure out that this predicate maps to an index start key.
Therefore DB2 needed to fall back to a scan from the beginning of the index.
Since DB2 9.7.3 the optimizer is smart enough:
         Rows 
        RETURN
        (   1)
         Cost 
          I/O 
          |
        3.55556 
        TBSCAN
        (   2)
        6.79745 
           1 
          |
        3.55556 
        SORT  
        (   3)
        6.79697 
           1 
          |
        3.55556 
        TBSCAN
        (   4)
        6.79574 
           1 
          |
          10 
 TABLE: ADMINISTRATOR
          EMP
          Q1
This is not what we expected because I use dynamic SQL here!
In dynamic SQL DB2 does not see that the two references to :name are in fact the same values.
The following trick allows DB2 to see what we are up to.
It is generally useful whenever you must use the same parameter marker multiple times in a query.
SELECT emp.* 
FROM emp,
(VALUES(:name, :firstname)) AS start(name, firstname) WHERE emp.name > start.name OR (emp.name = start.name AND emp.firstname > start.firstname) ORDER BY emp.name, emp.firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY -------------------- -------------------- ----------- Bauer Jan 27000 Giovanni Alfredo 26000 Hayes Tom 23000 Jones Jim 21000 Kruft Hans 22000 Rows RETURN ( 1) Cost I/O | 3.66667 FETCH ( 2) 6.79117 1 /-----+------\ 3.66667 10 IXSCAN TABLE: ADMINISTRATOR ( 3) EMP 0.0197898 Q1 0 | 10 INDEX: ADMINISTRATOR EMPNAME Q1 ... 3) IXSCAN: (Index Scan) ... Predicates: ---------- 2) Start Key Predicate, Comparison Operator: Less Than (<) Subquery Input Required: No Filter Factor: 1 Predicate Text: -------------- (firstname < Q1.FIRSTNAME) 3) Start Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.366667 Predicate Text: -------------- (name <= Q1.NAME)
This is better! However the language we need to use to model the start key is quite ugly.

Row value comparison

Reaching back to high school math you may recall that the predicates we have used above to impose order are commonly used to compose order on tuples.
A tuple A is an ordered list of elements of the form (a1, a2, ... an).
A tuple A (a1, a2, ..an) is equal to a tuple B (b1, b2, bn) if and only if all the elements ai in A a re equal to the matching bi in B.
A is considered greater than B if a1 > b1 or (a1 = b1 and a2 > b2) or ... and so forth.
It is not surprising that this is also the ordering used in a multi column index or that imposed by an ORDER BY clause.
In math you can directly use comparison operators against tuples with no need to decompose them into the individual components.
You simply write: (a1, a2) > (b1, b2).

In SQL terms a set of elements compose not tuples but rows of course.
Thus the syntax composing a tuple is called a row value constructor (rvc).
You use it every time you insert a row into a table using the values clause.
Just at the beginning of this post we used a values clause with 10 row value constructors.

DB2 10 , finally, introduces the syntax to compare rows value constructors:
SELECT * FROM emp 
WHERE (name, firstname) > (:name, :firstname) ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY -------------------- -------------------- ----------- Bauer Jan 27000 Giovanni Alfredo 26000 Hayes Tom 23000 Jones Jim 21000 Kruft Hans 22000 Rows RETURN ( 1) Cost I/O | 3.66667 FETCH ( 2) 6.79117 1 /-----+------\ 3.66667 10 IXSCAN TABLE: ADMINISTRATOR ( 3) EMP 0.0197898 Q1 0 | 10 INDEX: ADMINISTRATOR EMPNAME Q1 ... 3) IXSCAN: (Index Scan) ... Predicates: ---------- 2) Start Key Predicate, Comparison Operator: Less Than (<) Subquery Input Required: No Filter Factor: 1 Predicate Text: -------------- (firstname < Q1.FIRSTNAME) 3) Start Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.366667 Predicate Text: -------------- (name <= Q1.NAME)
So far we have always used a fixed number of rows to page forward.
But you can also use row value comparisons operators to set an upper bound or even both upper and lower.
VARIABLE startname  VARCHAR(20);
VARIABLE endname    VARCHAR(20);
VARIABLE startfirst VARCHAR(20);
VARIABLE endfirst   VARCHAR(20);
BEGIN
  SET (:startname, :startfirst) = ('M', 'A');
  SET (:endname  , :endfirst  ) = ('T', 'Z');
END;
/

SELECT * FROM emp 
WHERE (name, firstname) BETWEEN (:startname, :startfirst)
AND (:endname, :endfirst) ORDER BY name, firstname;

NAME FIRSTNAME SALARY -------------------- -------------------- ----------- McCullen Sarah 30000 Newman Alex 25000 O'Neill Tim 24000 Smith Jason 20000 Rows RETURN ( 1) Cost I/O | 1 FETCH ( 2) 6.78787 1 /-----+------\ 1 10 IXSCAN TABLE: ADMINISTRATOR ( 3) EMP 0.0173858 Q1 0 | 10 INDEX: ADMINISTRATOR EMPNAME Q1 ... 3) IXSCAN: (Index Scan) ... Predicates: ---------- 2) Start Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 1 Predicate Text: -------------- (startfirst <= Q1.FIRSTNAME) 3) Start Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.366667 Predicate Text: -------------- (startname <= Q1.NAME) 4) Stop Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 1 Predicate Text: -------------- (Q1.FIRSTNAME <= endfirst) 5) Stop Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.366667 Predicate Text: -------------- (Q1.NAME <= endname)
We are explicitly driving the index start/stop key here.

You can also use this syntax to write denser key lookups:
SELECT * FROM emp 
WHERE (name, firstname) = ('Zhao', 'Min');
NAME FIRSTNAME SALARY -------------------- -------------------- ----------- Zhao Min 29000
There is no limit to the usage of the syntax.
It can even be used in JOIN On clauses

A cursor loop without a cursor

As I have been writing this article I have been wondering about an extreme case.
That is if I page through a result set one row at a time, then I should be able to write cursor that really isn't a cursor.
I'm not sure if it's good for anything, but here goes anyway:
VARIABLE sumsalary INTEGER;

BEGIN DECLARE SQLCODE INTEGER; DECLARE vemp ANCHOR TO ROW OF emp; DECLARE vsumsalary INTEGER DEFAULT 0; SET vemp.name = ' '; SET vemp.firstname = ' '; loop: LOOP SELECT * INTO vemp FROM emp WHERE (name, firstname) > (vemp.name, vemp.firstname)
ORDER BY name, firstname FETCH FIRST ROW ONLY; IF SQLCODE = 100 THEN LEAVE loop; END IF; SET vsumsalary = vsumsalary + vemp.salary; END LOOP; SET :sumsalary = vsumsalary; END; / PRINT sumsalary; 247000 SELECT SUM(salary) FROM emp; 1 ----------- 247000
So this anonymous block pages through the entire table one row at a time always getting the "next row" all without actually using a cursor.
  

#Db2

Statistics
0 Favorited
4 Views
0 Files
0 Shares
0 Downloads