Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

View Only

Back to Library

DB2 10: Paging through result sets by value using row value comparisons

Mon March 09, 2020 05:45 PM

Michelle Betbadal

Posted by: Serge Rielau

Prelude

I have have been engaged in discussions on how to use DB2 for LUW for as long as I have been employed with IBM.

Until not too long ago, before I started blogging, the primary platform for such discussion was usenet.
And one of the very oldest debates I recall was with Bernard D.
For at least a decade he has been tirelessly trying to convince us to extend DB2 for LUW to support comparisons of "row value constructors" (rvc).

In fact the oldest record I can find on Google on this debate dates back to October 21, 2000.

Well, Bernard, this post is for you.
Now we need to find something else to talk about I guess.

The problem

In the past I have described various ways how to page through result sets using constructs such as LIMIT and OFFSET.

All of these methods have one thing in common: They are based on the absolute position of rows within a result set.

For example OFFSET 100 LIMIT 20 will retrieve 20 rows starting with the 100th row.
So, in order for DB2 to find the 100th row DB2 needs to count of the first 99 rows and discard them.
The farther down a result set you page the bigger the waste in CPU and I/O.

Let's have a look:

db2set DB2_COMPATIBILITY_VECTOR = 1
db2stop
db2start

CREATE TABLE emp(name VARCHAR(20), firstname VARCHAR(20), salary INTEGER);

INSERT INTO emp VALUES
  ('Smith'   , 'Jason'  , 20000),
  ('Newman'  , 'Alex'   , 25000),
  ('Zhao'    , 'Min'    , 29000),
  ('Bauer'   , 'Jan'    , 27000),
  ('Giovanni', 'Alfredo', 26000),
  ('Kruft'   , 'Hans'   , 22000),
  ('Jones'   , 'Jim'    , 21000),
  ('Hayes'   , 'Tom'    , 23000),
  ('O''Neill', 'Tim'    , 24000),
  ('McCullen', 'Sarah'  , 30000);

CREATE UNIQUE INDEX empname ON emp(name, firstname);

RUNSTATS ON TABLE emp;

VARIABLE start INTEGER;
BEGIN
  SET :start = 5;
END;
/

SELECT * FROM emp 
  WHERE ROWNUM BETWEEN :start AND :start + 4 
  ORDER BY name, firstname;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
Kruft                Hans                       22000
McCullen             Sarah                      30000
Newman               Alex                       25000
O'Neill              Tim                        24000
Smith                Jason                      20000

5 rows were retrieved.

A look at the optimizer plan shows that there is little DB2 can do to minimize the rows read.

                 Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                   1 
                FILTER
                (   2)
                6.80181 
                   1 
                  |
                  10 
                FETCH 
                (   3)
                6.79467 
                   1 
            /-----+------\
          10               10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   4)             EMP
       0.0211523           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1

Value based paging

A better way to do paging is to use the index not only to derive the order, but also the start key.

VARIABLE name VARCHAR(20);
BEGIN
  SET :name = ' ';
END;
/

SELECT * FROM emp 
  WHERE name > :name 
  ORDER BY name, firstname 
  FETCH FIRST 5 ROWS ONLY;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
Bauer                Jan                        27000
Giovanni             Alfredo                    26000
Hayes                Tom                        23000
Jones                Jim                        21000
Kruft                Hans                       22000

We can use the last retrieved value as a starting point for the next page:

BEGIN
  SET :name = 'Kruft';
END;
/

SELECT * FROM emp 
  WHERE name > :name 
  ORDER BY name, firstname 
  FETCH FIRST 5 ROWS ONLY;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
McCullen             Sarah                      30000
Newman               Alex                       25000
O'Neill              Tim                        24000
Smith                Jason                      20000
Zhao                 Min                        29000

The optimizer plan is much better now.
It allows us to find the first row efficiently:

                  Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                3.33333 
                FETCH 
                (   2)
                6.78901 
                   1 
            /-----+------\
        3.33333            10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   3)             EMP
       0.0177397           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1
...
	3) IXSCAN: (Index Scan)
...
		Predicates:
		----------
		2) Start Key Predicate, 
			Comparison Operator: 		Less Than (<)
			Subquery Input Required: 	No
			Filter Factor: 			0.333333

			Predicate Text:
			--------------
			(? < Q1.NAME)

Now, I have been cheating here since the index is on two columns "NAME" and "FIRSTNAME".
If there were two or more employees named 'Kruft' we could have skipped one.

Traditional row wise paging

Here is how one can page using a multiple column start key in DB2 no matter what version:

VARIABLE firstname VARCHAR(20);
BEGIN
  SET (:name, :firstname) = (' ', ' ');
END;
/

SELECT * FROM emp 
  WHERE name > :name OR (name = :name AND firstname > :firstname)
  ORDER BY name, firstname 
  FETCH FIRST 5 ROWS ONLY;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
Bauer                Jan                        27000
Giovanni             Alfredo                    26000
Hayes                Tom                        23000
Jones                Jim                        21000
Kruft                Hans                       22000

How would the predicate look like if we had three key parts? Lenghty ...

SELECT * FROM emp 
  WHERE name > :name OR (name = :name AND firstname > :firstname)
                     OR (name = :name AND firstname = :firstname AND middlename > :middlename
  ORDER BY name, firstname, middlename 
FETCH FIRST 5 ROWS ONLY;

The problem is that in older versions of DB2 the optimizer cannot figure out that this predicate maps to an index start key.
Therefore DB2 needed to fall back to a scan from the beginning of the index.
Since DB2 9.7.3 the optimizer is smart enough:

         Rows 
        RETURN
        (   1)
         Cost 
          I/O 
          |
        3.55556 
        TBSCAN
        (   2)
        6.79745 
           1 
          |
        3.55556 
        SORT  
        (   3)
        6.79697 
           1 
          |
        3.55556 
        TBSCAN
        (   4)
        6.79574 
           1 
          |
          10 
 TABLE: ADMINISTRATOR
          EMP
          Q1

This is not what we expected because I use dynamic SQL here!
In dynamic SQL DB2 does not see that the two references to :name are in fact the same values.
The following trick allows DB2 to see what we are up to.
It is generally useful whenever you must use the same parameter marker multiple times in a query.

SELECT emp.* 
  FROM emp, 
       (VALUES(:name, :firstname)) AS start(name, firstname)
  WHERE emp.name > start.name OR (emp.name = start.name AND emp.firstname > start.firstname)
  ORDER BY emp.name, emp.firstname 
  FETCH FIRST 5 ROWS ONLY;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
Bauer                Jan                        27000
Giovanni             Alfredo                    26000
Hayes                Tom                        23000
Jones                Jim                        21000
Kruft                Hans                       22000

                 Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                3.66667 
                FETCH 
                (   2)
                6.79117 
                   1 
            /-----+------\
        3.66667            10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   3)             EMP
       0.0197898           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1
...
	3) IXSCAN: (Index Scan)
...

		Predicates:
		----------
		2) Start Key Predicate, 
			Comparison Operator: 		Less Than (<)
			Subquery Input Required: 	No
			Filter Factor: 			1

			Predicate Text:
			--------------
			(firstname < Q1.FIRSTNAME)


		3) Start Key Predicate, 
			Comparison Operator: 		Less Than or Equal (<=)
			Subquery Input Required: 	No
			Filter Factor: 			0.366667

			Predicate Text:
			--------------
			(name <= Q1.NAME)

This is better! However the language we need to use to model the start key is quite ugly.

Row value comparison

Reaching back to high school math you may recall that the predicates we have used above to impose order are commonly used to compose order on tuples.
A tuple A is an ordered list of elements of the form (a1, a2, ... an).
A tuple A (a1, a2, ..an) is equal to a tuple B (b1, b2, bn) if and only if all the elements ai in A a re equal to the matching bi in B.
A is considered greater than B if a1 > b1 or (a1 = b1 and a2 > b2) or ... and so forth.
It is not surprising that this is also the ordering used in a multi column index or that imposed by an ORDER BY clause.
In math you can directly use comparison operators against tuples with no need to decompose them into the individual components.
You simply write: (a1, a2) > (b1, b2).

In SQL terms a set of elements compose not tuples but rows of course.
Thus the syntax composing a tuple is called a row value constructor (rvc).
You use it every time you insert a row into a table using the values clause.
Just at the beginning of this post we used a values clause with 10 row value constructors.

DB2 10 , finally, introduces the syntax to compare rows value constructors:

SELECT * FROM emp 
  WHERE (name, firstname) > (:name, :firstname)
  ORDER BY name, firstname 
  FETCH FIRST 5 ROWS ONLY;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
Bauer                Jan                        27000
Giovanni             Alfredo                    26000
Hayes                Tom                        23000
Jones                Jim                        21000
Kruft                Hans                       22000

                 Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                3.66667 
                FETCH 
                (   2)
                6.79117 
                   1 
            /-----+------\
        3.66667            10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   3)             EMP
       0.0197898           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1
...
	3) IXSCAN: (Index Scan)
...
		Predicates:
		----------
		2) Start Key Predicate, 
			Comparison Operator: 		Less Than (<)
			Subquery Input Required: 	No
			Filter Factor: 			1

			Predicate Text:
			--------------
			(firstname < Q1.FIRSTNAME)


		3) Start Key Predicate, 
			Comparison Operator: 		Less Than or Equal (<=)
			Subquery Input Required: 	No
			Filter Factor: 			0.366667

			Predicate Text:
			--------------
			(name <= Q1.NAME)

So far we have always used a fixed number of rows to page forward.
But you can also use row value comparisons operators to set an upper bound or even both upper and lower.

VARIABLE startname  VARCHAR(20);
VARIABLE endname    VARCHAR(20);
VARIABLE startfirst VARCHAR(20);
VARIABLE endfirst   VARCHAR(20);
BEGIN
  SET (:startname, :startfirst) = ('M', 'A');
  SET (:endname  , :endfirst  ) = ('T', 'Z');
END;
/

SELECT * FROM emp 
  WHERE (name, firstname) BETWEEN (:startname, :startfirst) 
                              AND (:endname, :endfirst)
  ORDER BY name, firstname;

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
McCullen             Sarah                      30000
Newman               Alex                       25000
O'Neill              Tim                        24000
Smith                Jason                      20000

                 Rows 
                RETURN
                (   1)
                 Cost 
                  I/O 
                  |
                   1 
                FETCH 
                (   2)
                6.78787 
                   1 
            /-----+------\
           1               10 
        IXSCAN    TABLE: ADMINISTRATOR
        (   3)             EMP
       0.0173858           Q1
           0 
          |
          10 
 INDEX: ADMINISTRATOR
        EMPNAME
          Q1
...
	3) IXSCAN: (Index Scan)
...
		Predicates:
		----------
		2) Start Key Predicate, 
			Comparison Operator: 		Less Than or Equal (<=)
			Subquery Input Required: 	No
			Filter Factor: 			1

			Predicate Text:
			--------------
			(startfirst <= Q1.FIRSTNAME)


		3) Start Key Predicate, 
			Comparison Operator: 		Less Than or Equal (<=)
			Subquery Input Required: 	No
			Filter Factor: 			0.366667

			Predicate Text:
			--------------
			(startname <= Q1.NAME)


		4) Stop Key Predicate, 
			Comparison Operator: 		Less Than or Equal (<=)
			Subquery Input Required: 	No
			Filter Factor: 			1

			Predicate Text:
			--------------
			(Q1.FIRSTNAME <= endfirst)


		5) Stop Key Predicate, 
			Comparison Operator: 		Less Than or Equal (<=)
			Subquery Input Required: 	No
			Filter Factor: 			0.366667

			Predicate Text:
			--------------
			(Q1.NAME <= endname)

We are explicitly driving the index start/stop key here.

You can also use this syntax to write denser key lookups:

SELECT * FROM emp 
  WHERE (name, firstname) = ('Zhao', 'Min');

NAME                 FIRSTNAME                 SALARY
-------------------- -------------------- -----------
Zhao                 Min                        29000

There is no limit to the usage of the syntax.
It can even be used in JOIN On clauses

A cursor loop without a cursor

As I have been writing this article I have been wondering about an extreme case.
That is if I page through a result set one row at a time, then I should be able to write cursor that really isn't a cursor.
I'm not sure if it's good for anything, but here goes anyway:

VARIABLE sumsalary INTEGER;

BEGIN
  DECLARE SQLCODE    INTEGER;
  DECLARE vemp       ANCHOR TO ROW OF emp;
  DECLARE vsumsalary INTEGER               DEFAULT 0;
  SET vemp.name = ' ';
  SET vemp.firstname = ' ';
loop: LOOP
    SELECT * INTO vemp FROM emp 
	  WHERE (name, firstname) > (vemp.name, vemp.firstname)
          ORDER BY name, firstname
	  FETCH FIRST ROW ONLY;
	IF SQLCODE = 100
	 THEN LEAVE loop;
	END IF;
    SET vsumsalary = vsumsalary + vemp.salary;
  END LOOP;
  SET :sumsalary = vsumsalary;
END;
/
PRINT sumsalary;
247000

SELECT SUM(salary) FROM emp;
          1
-----------
     247000

So this anonymous block pages through the entire table one row at a time always getting the "next row" all without actually using a cursor.

#Db2

Statistics

0 Favorited

4 Views

0 Files

0 Shares

0 Downloads

Db2

Db2

DB2 10: Paging through result sets by value using row value comparisons

Posted by: Serge Rielau

Prelude

The problem

Value based paging

Traditional row wise paging

Row value comparison

A cursor loop without a cursor

Additional
Resources

Office

Quick Links

Db2

Db2

DB2 10: Paging through result sets by value using row value comparisons

Posted by: Serge RielauPrelude

The problem

Value based paging

Traditional row wise paging

Row value comparison

A cursor loop without a cursor

Additional Resources

Office

Quick Links

Posted by: Serge Rielau

Prelude

Additional
Resources