Db2

Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

Returning inserted rows (and updated and deleted) 

Mon March 09, 2020 05:40 PM

Posted by: Serge Rielau

Motivation

Remember the days when DB2 re-entered the TPC-C fight after a long hiatus?
The time was DB2 8.1 FP4. I seriously do not remember years.. Time passes in releases.
While my colleagues were tuning code path, bufferpools disks my team was looking at something more fundamental.
What can we do to make SQL as efficient as possible for TPC-C.
And ideally how can we make SQL more efficient for any class of OLTP workload.
In other words how can we pour the most bang into the least SQL for a typical OLTP transaction?
The result was what we called the new SQL.
If I recall correctly it consisted out of the following enhancements: 
  • Allow updates, deletes and inserts with queries as targets.
    More importantly allow those queries to contain order by clauses.
     
  • Allow inline SQL Table functions to insert, update, or delete
     
  • Allow insert, update and delete in the from clause
I think there was more, but these were the highlights at any rate.   
 
In this post I want to dive into the last bullet.
DB2 8.1 FP4 was a long time ago, but there are still developers who are not yet familiar with this powerful feature.
So it warrants some attention.

The Problem

One very common task in an OLTP system is that of order processing.
You can distinguish between three phases of order processing
  1. Take an order
    In this step all the information for the order is collected  and inserted into the database.
    The order is associated with an order-id which is passed back to the customer for reference.

  2. Dispatch an order
    Here the order is passed on to a process or human being to be fulfilled.
    It is important to prevent more than one process from working the same order.
    Typically this happens in a "first in first out" (FIFO) fashion.
    The dispatch may be polling by an agent or an active dispatch by some other entity.

  3. Fulfill the order
    The order is processed, send on its way and committed as fulfilled.
Let's look at each of these steps in order.

Taking an order

Taking an order is comprised of:
  1. Generating an order-id
  2. collecting and storing the individual order items (your shopping  basket).
  3. Storing the order itself with various information, such as customer name, order-id, price etc.
For the sake of brevity I will forgo order items here.
This is where the inline table function came in in TPC-C.
Our schema is simple:
CREATE TABLE order(order_id INTEGER      NOT NULL PRIMARY KEY,
                   stamp    TIMESTAMP    DEFAULT CURRENT TIMESTAMP, 
                   name     VARCHAR(20),
price INTEGER, agent_id INTEGER);
To submit an order an order-id must be generated.
Different ways to do so exist is various products:
  • Use a global unique identifier which is supplied by the client
    This has the advantage that the client knows the value up front.
    There is no need to pass it back.
    The downside is that no customer will want to write down an ugly hex number as order-id.
     
  • Use a sequence number to generate the order-id.
    The problem is that you need to submit a separate SQL statement to retrieve the generated value.
    Also the "CURRVAL" pseudo column which is used to retrieve a generated value is of no help if you want to submit more than one order in one SQL Statement.
     
  • Use an identity or serial property with the order-id.
    This is very similar to a sequence, except that the value generated is associated with the order-id column directly.
    The problem is the same as for sequences.
    Somehow the generated value needs to flow back to the client.
    Some vendors pick up on simple INSERT statements with identity columns and  flow the value back with the SQLCA.
    But this case is not generic. It is also not able to handle more than one order id.
     
  • Use a trigger to generate a number through some means
    Given that "some means" can be just about anything, it is impossible to have a function to return the values.
Each of these techniques is unsatisfactory:
Either two SQL Statements need to be executed or the statement need to comply with very specific properties: E.g. single row insert with identity column.
As a response to this problem each some vendors have invented extensions to INSERT, UPDATE and DELETE such as a WITH RETURN clause.
The purpose of the clause is to pick up on new, changed or deleted rows and pass them back to the client or insert them into a table or variable. 
 
The approach we took in DB2 has been radically different.
CREATE OR REPLACE SEQUENCE order_seq;

SELECT order_id, stamp 
  FROM NEW TABLE(INSERT INTO order(order_id, name, price) 
                   VALUES(NEXT VALUE FOR order_seq, 'John', 1000));
   ORDER_ID STAMP                     
----------- --------------------------
          1 2012-05-09 22:05:29.156000
What we did in DB2 was to simply expose the trigger transition table NEW TABLE in the from-clause.
So, when you put an INSERT statement into the from clause that insert statement is executed and in the process a transition table is produced.
That transition table, which includes all the modifications of any before triggers can then be queried.
Note in the example above that we did not only pick up the generated "ORDER_ID", but also the value generated for the "STAMP" column.
How is that different from an INSERT WITH RETURN as available in Oracle? 
INSERT WITH RETURN requires an INTO clause. That is the returned values must leave SQL and be returned into a variable.
Once they are returned you can re-scan them for further processing. 
Pushing INSERT into the from clause allows for the full power of SQL to be applied for further processing.
Any client language that knows how to process a cursor can immediately use the feature.
DB2 can handle multiple order inserts:
SELECT order_id, stamp
  FROM NEW TABLE(INSERT INTO order(order_id, name, price) 
                   VALUES(NEXT VALUE FOR order_seq, 'John', 1000),
                         (NEXT VALUE FOR order_seq, 'Beth',  500),
                         (NEXT VALUE FOR order_seq, 'John', 1200));
   ORDER_ID STAMP                     
----------- --------------------------
2 2012-05-09 22:06:36.515000
3 2012-05-09 22:06:36.515000
4 2012-05-09 22:06:36.515000
Better yet, DB2 can process the result. For example it can do totals on the orders:
SELECT order_id, stamp, name, sum(price) over(order by order_id) as total_price
  FROM NEW TABLE(INSERT INTO order(order_id, name, price) 
                   VALUES(NEXT VALUE FOR order_seq, 'John', 1000),
                         (NEXT VALUE FOR order_seq, 'Beth',  500),
                         (NEXT VALUE FOR order_seq, 'John', 1200),
                         (NEXT VALUE FOR order_seq, 'Beth',  700),
                         (NEXT VALUE FOR order_seq, 'Jack', 1100));
   ORDER_ID STAMP                      NAME                 TOTAL_PRICE
----------- -------------------------- -------------------- -----------
5 2012-05-09 22:07:26.062000 John 1000
6 2012-05-09 22:07:26.062000 Beth 1500
7 2012-05-09 22:07:26.062000 John 2700
8 2012-05-09 22:07:26.062000 Beth 3400
9 2012-05-09 22:07:26.062000 Jack 4500
The question which you should immediately ask is:
What is the overhead of NEW TABLE?
How is this different than inserting the rows into an array or temp table and then rescanning that array?
An explain of the statement above will show:
                Rows 
               RETURN
               (   1)
                Cost 
                 I/O 
                 |
                  5 
               TBSCAN
               (   2)
               6.7822 
                  1 
                 |
                  5 
               SORT  
               (   3)
               6.78117 
                  1 
                 |
                  5 
               TBSCAN
               (   4)
               6.77634 
                  1 
                 |
                  5 
               SORT  
               (   5)
               6.7753 
                  1 
                 |
                  5 
               INSERT
               (   6)
               6.77286 
                  1 
          /------+------\
         5                 0 
      TBSCAN     TABLE: ADMINISTRATOR
      (   7)             ORDER
   8.88007e-005           Q5
         0 
        |
         5 
 TABFNC: SYSIBM  
      GENROW
        Q1
Note the absence of any temp here. The INSERT (6) serves immediately as the input for the OLAP function. 
SORT (5) handles the "ORDER BY order_id" of the SUM. 
SORT (3) however is interesting:
	3) SORT  : (Sort)
...
		SORTKEY : (Sort Key column)
			NONE
This SORT doesn't actually sort. All it does is to force the cursor to be INSENSITIVE.
This assures that the INSERT is completed when the cursor is OPENed.
A common technique is to open such cursors as WITH HOLD and then COMMIT before first fetch.
That way no locks are being held while the cursor is being fetched.

Dispatching an order

After the order has been accepted it needs to be processed.
This can either happen in a single database transaction or the workflow can consist of multiple transaction.
In the second case the order needs to be marked on the queue as being processed.
Only once the processing is completed in a later transaction can it be deleted.
Let's assume there are multiple agents who are fulfilling orders.
Here is an efficient way to do that using the same technology as described for INSERT with an UPDATE statement.
CREATE OR REPLACE VARIABLE my_agent_id INTEGER;
SET my_agent_id = 1;

SELECT order_id, stamp, name, price
  FROM OLD TABLE(UPDATE (SELECT * FROM order 
                          WHERE agent_id IS NULL 
ORDER BY order_id FETCH FIRST ROW ONLY)
SET agent_id = my_agent_id); ORDER_ID STAMP NAME PRICE ----------- -------------------------- -------------------- ----------- 1 2012-05-09 22:05:29.156000 John 1000
We have select the oldest order which is unassigned (order_id IS NULL).
That order has been updated with the local agent_id and the row has then been returned as a query.
Note the usage of OLD TABLE here. You can choose either the NEW or OLD transition table for UPDATE operations.
Generally the optimizer plan is better when you use the OLD TABLE whenever possible whenever the subject of the UPDATE is not a base table.
The plan is very tight:
                        Rows 
                       RETURN
                       (   1)
                        Cost 
                         I/O 
                         |
                          1 
                       TBSCAN
                       (   2)
                       13.5567 
                          2 
                         |
                          1 
                       SORT  
                       (   3)
                       13.5561 
                          2 
                         |
                          1 
                       UPDATE
                       (   4)
                       13.5549 
                          2 
                   /-----+-----\
                  1               9 
               FETCH    TABLE: ADMINISTRATOR
               (   5)           ORDER
               6.79517           Q1
                  1 
           /-----+------\
          9                9 
       IXSCAN    TABLE: ADMINISTRATOR
       (   6)            ORDER
      0.0205156           Q2
          0 
         |
          9 
   INDEX: SYSIBM  
 SQL120509220520090
         Q2
After the cursor is opened the transaction can immediately be committed.
As a result update locks are held a minimal amount of time with no chance of a deadlock.

Deleting an order

Once the order has been fulfilled it can easily be deleted from the queue by order_id.
Alternatively the order can be marked as processed in an extra column.
For the sake of this exercise however, let's assume we want to delete the order from the order table and archive it in a separate table.
WITH del AS (SELECT * FROM OLD TABLE(DELETE FROM order WHERE order_id = 1))
SELECT fulfilled 
FROM NEW TABLE(INSERT INTO order_archive SELECT del.*, CURRENT TIMESTAMP FROM del); FULFILLED -------------------------- 2012-05-09 22:32:28.078000
What we have built here is a pipeline of actions.
First we DELETE the order we have finished processing.
But we interrogate the OLD transition table to pass that information on to the order_archive table.
Finally we capture the timestamp when the order was fulfilled from the NEW transition table of the order_archive table.
Theoretically this could all be written in one nested query.
But these statements can be very complex.
What if the query contained joins where the table is updated and select from in the same query?
What if the same table is modified multiple times?
Therefore DB2 restricts NEW TABLE and OLD TABLE to occur in nested queries or in the presence of joins.
Placing them into common table expressions however is allowed.
Common table expressions provide a natural order.
So if there are conflicts DB2 will execute each query in the WITH clause at a time in order of specification.
If however there are no conflicts DB2 can provide a very efficient access path:
                                Rows 
                               RETURN
                               (   1)
                                Cost 
                                 I/O 
                                 |
                                  1 
                               TBSCAN
                               (   2)
                               20.3274 
                                  3 
                                 |
                                  1 
                               SORT  
                               (   3)
                               20.3268 
                                  3 
                                 |
                                  1 
                               INSERT
                               (   4)
                               20.3256 
                                  3 
                           /-----+-----\
                          1               1 
                       DELETE   TABLE: ADMINISTRATOR
                       (   5)       ORDER_ARCHIVE
                       13.5558           Q6
                          2 
                   /-----+-----\
                  1               9 
               FETCH    TABLE: ADMINISTRATOR
               (   6)           ORDER
               6.78603           Q1
                  1 
           /-----+-----\
          1               9 
       IXSCAN   TABLE: ADMINISTRATOR
       (   7)           ORDER
      0.015546           Q2
          0 
         |
          9 
   INDEX: SYSIBM  
 SQL120509220520090
         Q2
Note how the DELETE and INSERT are stacked!
You can use the same technique also to stack INSERTs.
For example to split a staging table across multiple target tables.
I'll save that one for another day though since it will introduce another fancy clause..


#Db2

Statistics
0 Favorited
5 Views
0 Files
0 Shares
0 Downloads