Posted by: Serge Rielau
Prelude
I have have been engaged in discussions on how to use DB2 for LUW for as long as I have been employed with IBM.
Until not too long ago, before I started blogging, the primary platform for such discussion was usenet.
And one of the very oldest debates I recall was with Bernard D.
For at least a decade he has been tirelessly trying to convince us to extend DB2 for LUW to support comparisons of "row value constructors" (rvc).
In fact the oldest record I can find on Google on this debate dates back to October 21, 2000.
Well, Bernard, this post is for you.
Now we need to find something else to talk about I guess.
The problem
All of these methods have one thing in common: They are based on the absolute position of rows within a result set.
For example OFFSET 100 LIMIT 20 will retrieve 20 rows starting with the 100th row.
So, in order for DB2 to find the 100th row DB2 needs to count of the first 99 rows and discard them.
The farther down a result set you page the bigger the waste in CPU and I/O.
Let's have a look:
db2set DB2_COMPATIBILITY_VECTOR = 1
db2stop
db2start
CREATE TABLE emp(name VARCHAR(20), firstname VARCHAR(20), salary INTEGER);
INSERT INTO emp VALUES
('Smith' , 'Jason' , 20000),
('Newman' , 'Alex' , 25000),
('Zhao' , 'Min' , 29000),
('Bauer' , 'Jan' , 27000),
('Giovanni', 'Alfredo', 26000),
('Kruft' , 'Hans' , 22000),
('Jones' , 'Jim' , 21000),
('Hayes' , 'Tom' , 23000),
('O''Neill', 'Tim' , 24000),
('McCullen', 'Sarah' , 30000);
CREATE UNIQUE INDEX empname ON emp(name, firstname);
RUNSTATS ON TABLE emp;
VARIABLE start INTEGER;
BEGIN
SET :start = 5;
END;
/
SELECT * FROM emp
WHERE ROWNUM BETWEEN :start AND :start + 4
ORDER BY name, firstname;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
Kruft Hans 22000
McCullen Sarah 30000
Newman Alex 25000
O'Neill Tim 24000
Smith Jason 20000
5 rows were retrieved.
A look at the optimizer plan shows that there is little DB2 can do to minimize the rows read.
Rows
RETURN
( 1)
Cost
I/O
|
1
FILTER
( 2)
6.80181
1
|
10
FETCH
( 3)
6.79467
1
/-----+------\
10 10
IXSCAN TABLE: ADMINISTRATOR
( 4) EMP
0.0211523 Q1
0
|
10
INDEX: ADMINISTRATOR
EMPNAME
Q1
Value based paging
A better way to do paging is to use the index not only to derive the order, but also the start key.
VARIABLE name VARCHAR(20);
BEGIN
SET :name = ' ';
END;
/
SELECT * FROM emp
WHERE name > :name
ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
Bauer Jan 27000
Giovanni Alfredo 26000
Hayes Tom 23000
Jones Jim 21000
Kruft Hans 22000
We can use the last retrieved value as a starting point for the next page:
BEGIN
SET :name = 'Kruft';
END;
/
SELECT * FROM emp
WHERE name > :name
ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
McCullen Sarah 30000
Newman Alex 25000
O'Neill Tim 24000
Smith Jason 20000
Zhao Min 29000
The optimizer plan is much better now.
It allows us to find the first row efficiently:
Rows
RETURN
( 1)
Cost
I/O
|
3.33333
FETCH
( 2)
6.78901
1
/-----+------\
3.33333 10
IXSCAN TABLE: ADMINISTRATOR
( 3) EMP
0.0177397 Q1
0
|
10
INDEX: ADMINISTRATOR
EMPNAME
Q1
...
3) IXSCAN: (Index Scan)
...
Predicates:
----------
2) Start Key Predicate,
Comparison Operator: Less Than (<)
Subquery Input Required: No
Filter Factor: 0.333333
Predicate Text:
--------------
(? < Q1.NAME)
Now, I have been cheating here since the index is on two columns "NAME" and "FIRSTNAME".
If there were two or more employees named 'Kruft' we could have skipped one.
Traditional row wise paging
Here is how one can page using a multiple column start key in DB2 no matter what version:
VARIABLE firstname VARCHAR(20);
BEGIN
SET (:name, :firstname) = (' ', ' ');
END;
/
SELECT * FROM emp
WHERE name > :name OR (name = :name AND firstname > :firstname)
ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
Bauer Jan 27000
Giovanni Alfredo 26000
Hayes Tom 23000
Jones Jim 21000
Kruft Hans 22000
How would the predicate look like if we had three key parts? Lenghty ...
SELECT * FROM emp
WHERE name > :name OR (name = :name AND firstname > :firstname)
OR (name = :name AND firstname = :firstname AND middlename > :middlename
ORDER BY name, firstname, middlename
FETCH FIRST 5 ROWS ONLY;
The problem is that in older versions of DB2 the optimizer cannot figure out that this predicate maps to an index start key.
Therefore DB2 needed to fall back to a scan from the beginning of the index.
Since DB2 9.7.3 the optimizer is smart enough:
Rows
RETURN
( 1)
Cost
I/O
|
3.55556
TBSCAN
( 2)
6.79745
1
|
3.55556
SORT
( 3)
6.79697
1
|
3.55556
TBSCAN
( 4)
6.79574
1
|
10
TABLE: ADMINISTRATOR
EMP
Q1
This is not what we expected because I use dynamic SQL here!
In dynamic SQL DB2 does not see that the two references to :name are in fact the same values.
The following trick allows DB2 to see what we are up to.
It is generally useful whenever you must use the same parameter marker multiple times in a query.
SELECT emp.*
FROM emp,
(VALUES(:name, :firstname)) AS start(name, firstname)
WHERE emp.name > start.name OR (emp.name = start.name AND emp.firstname > start.firstname)
ORDER BY emp.name, emp.firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
Bauer Jan 27000
Giovanni Alfredo 26000
Hayes Tom 23000
Jones Jim 21000
Kruft Hans 22000
Rows
RETURN
( 1)
Cost
I/O
|
3.66667
FETCH
( 2)
6.79117
1
/-----+------\
3.66667 10
IXSCAN TABLE: ADMINISTRATOR
( 3) EMP
0.0197898 Q1
0
|
10
INDEX: ADMINISTRATOR
EMPNAME
Q1
...
3) IXSCAN: (Index Scan)
...
Predicates:
----------
2) Start Key Predicate,
Comparison Operator: Less Than (<)
Subquery Input Required: No
Filter Factor: 1
Predicate Text:
--------------
(firstname < Q1.FIRSTNAME)
3) Start Key Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 0.366667
Predicate Text:
--------------
(name <= Q1.NAME)
This is better! However the language we need to use to model the start key is quite ugly.
Row value comparison
Reaching back to high school math you may recall that the predicates we have used above to impose order are commonly used to compose order on tuples.
A tuple A is an ordered list of elements of the form (a1, a2, ... an).
A tuple A (a1, a2, ..an) is equal to a tuple B (b1, b2, bn) if and only if all the elements ai in A a re equal to the matching bi in B.
A is considered greater than B if a1 > b1 or (a1 = b1 and a2 > b2) or ... and so forth.
It is not surprising that this is also the ordering used in a multi column index or that imposed by an ORDER BY clause.
In math you can directly use comparison operators against tuples with no need to decompose them into the individual components.
You simply write: (a1, a2) > (b1, b2).
In SQL terms a set of elements compose not tuples but rows of course.
Thus the syntax composing a tuple is called a row value constructor (rvc).
You use it every time you insert a row into a table using the values clause.
Just at the beginning of this post we used a values clause with 10 row value constructors.
DB2 10 , finally, introduces the syntax to compare rows value constructors:
SELECT * FROM emp
WHERE (name, firstname) > (:name, :firstname)
ORDER BY name, firstname
FETCH FIRST 5 ROWS ONLY;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
Bauer Jan 27000
Giovanni Alfredo 26000
Hayes Tom 23000
Jones Jim 21000
Kruft Hans 22000
Rows
RETURN
( 1)
Cost
I/O
|
3.66667
FETCH
( 2)
6.79117
1
/-----+------\
3.66667 10
IXSCAN TABLE: ADMINISTRATOR
( 3) EMP
0.0197898 Q1
0
|
10
INDEX: ADMINISTRATOR
EMPNAME
Q1
...
3) IXSCAN: (Index Scan)
...
Predicates:
----------
2) Start Key Predicate,
Comparison Operator: Less Than (<)
Subquery Input Required: No
Filter Factor: 1
Predicate Text:
--------------
(firstname < Q1.FIRSTNAME)
3) Start Key Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 0.366667
Predicate Text:
--------------
(name <= Q1.NAME)
So far we have always used a fixed number of rows to page forward.
But you can also use row value comparisons operators to set an upper bound or even both upper and lower.
VARIABLE startname VARCHAR(20);
VARIABLE endname VARCHAR(20);
VARIABLE startfirst VARCHAR(20);
VARIABLE endfirst VARCHAR(20);
BEGIN
SET (:startname, :startfirst) = ('M', 'A');
SET (:endname , :endfirst ) = ('T', 'Z');
END;
/
SELECT * FROM emp
WHERE (name, firstname) BETWEEN (:startname, :startfirst)
AND (:endname, :endfirst)
ORDER BY name, firstname;
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
McCullen Sarah 30000
Newman Alex 25000
O'Neill Tim 24000
Smith Jason 20000
Rows
RETURN
( 1)
Cost
I/O
|
1
FETCH
( 2)
6.78787
1
/-----+------\
1 10
IXSCAN TABLE: ADMINISTRATOR
( 3) EMP
0.0173858 Q1
0
|
10
INDEX: ADMINISTRATOR
EMPNAME
Q1
...
3) IXSCAN: (Index Scan)
...
Predicates:
----------
2) Start Key Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 1
Predicate Text:
--------------
(startfirst <= Q1.FIRSTNAME)
3) Start Key Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 0.366667
Predicate Text:
--------------
(startname <= Q1.NAME)
4) Stop Key Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 1
Predicate Text:
--------------
(Q1.FIRSTNAME <= endfirst)
5) Stop Key Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 0.366667
Predicate Text:
--------------
(Q1.NAME <= endname)
We are explicitly driving the index start/stop key here.
You can also use this syntax to write denser key lookups:
SELECT * FROM emp
WHERE (name, firstname) = ('Zhao', 'Min');
NAME FIRSTNAME SALARY
-------------------- -------------------- -----------
Zhao Min 29000
There is no limit to the usage of the syntax.
It can even be used in JOIN On clauses
A cursor loop without a cursor
As I have been writing this article I have been wondering about an extreme case.
That is if I page through a result set one row at a time, then I should be able to write cursor that really isn't a cursor.
I'm not sure if it's good for anything, but here goes anyway:
VARIABLE sumsalary INTEGER;
BEGIN
DECLARE SQLCODE INTEGER;
DECLARE vemp ANCHOR TO ROW OF emp;
DECLARE vsumsalary INTEGER DEFAULT 0;
SET vemp.name = ' ';
SET vemp.firstname = ' ';
loop: LOOP
SELECT * INTO vemp FROM emp
WHERE (name, firstname) > (vemp.name, vemp.firstname)
ORDER BY name, firstname
FETCH FIRST ROW ONLY;
IF SQLCODE = 100
THEN LEAVE loop;
END IF;
SET vsumsalary = vsumsalary + vemp.salary;
END LOOP;
SET :sumsalary = vsumsalary;
END;
/
PRINT sumsalary;
247000
SELECT SUM(salary) FROM emp;
1
-----------
247000
So this anonymous block pages through the entire table one row at a time always getting the "next row" all without actually using a cursor.
#Db2