Posted by: Serge Rielau
Motivation
The former naturally produces timestamps, while the later produces a binary string.
But what if what you need is a number?
Of course both timestamps and binary strings can be represented as numbers.
But these will be very long.For example a naive representation of a timestamp could be the number 20120303070659765345.
This is too long even for a bigint. Certainly you would not want to have an employee ID of 26 digits or even an order reference number.
You can, of course, generate numbers from a single row, single column table which holds the "next orderid".
But such solutions provide a serialization point which hinders scalability.
So there is a need for a scalable number generator. In DB2 and some other DBMS this generator is called a SEQUENCE.
Quick Start
Let's start with a simple example:
CREATE OR REPLACE SEQUENCE orders.orderid;
This creates a sequence "ORDERID" in the schema "ORDERS".
The sequence will issue numbers of the type INTEGER starting with 1 all the way up to 2,147,483,647 (max integer) in increments of 1.
How are those numbers issued? Just use the NEXT VALUE FOR expression.
VALUES NEXT VALUE FOR orders.orderid;
1
-----------
1
You can also consume the generated value in place.
CREATE TABLE order(id INTEGER, customername VARCHAR(20), amount INTEGER);
INSERT INTO order VALUES (NEXT VALUE FOR orders.orderid, 'Serge', 23),
(NEXT VALUE FOR orders.orderid, 'John', 200);
SELECT * FROM order ORDER BY id;
ID CUSTOMERNAME AMOUNT
----------- -------------------- -----------
2 Serge 23
3 John 200
This method to produce and consume numbers in one statement is efficient, but it does pose a problem.
How do you know what was produced? You probably want want to return the orderid to the customer.
Typically this is done using the PREVIOUS VALUE FOR expression.
This expression returns the last value generated for this session in a previous SQL statement.
VALUES PREVIOUS VALUE FOR orders.orderid;
1
-----------
3
If you perform single row operation this may be sufficient. But in the case above two rows were inserted and two numbers generated!
I still do not know for sure which orderid is mine.
You cannot even compute other previously generated numbers from the value of PREVIOUS VALUE FOR because other sessions concurrently consume values as well.
There is no guarantee you got consecutive numbers.
The following SQL will preserve the association:
SELECT id
FROM NEW TABLE(INSERT INTO order VALUES (NEXT VALUE FOR orders.orderid, 'Jesse', 312),
(NEXT VALUE FOR orders.orderid, 'Jason', 230))
ORDER BY INPUT SEQUENCE;
ID
-----------
4
5
Now we can associate 4 reliably with 'Jesse' and 5 with 'Jason'.
I will blog about SELECT FROM INSERT and ORDER BY INPUT another time, so we'll let this example stand on its own for now.
Key Properties
- Uniqueness
DB2 will guarantee you that it will never produce a number twice for the same sequence in a given database even if the system crashes.
There are two exceptions however:
First if you alter the sequence all bets are off. For example if you tell the sequence to restart it will oblige and produce duplicate values.
Second if you use explicitly tell the sequence to CYCLE it will do just that.
Lastly you must remember that a sequence generates numbers it is not in control of how you use them.
So if you use a sequence to produce unique numbers for a primary key, but you also provide values of your own from another source the sequence is unaware of that.
So it is your responsibility to prevent collisions from occurring by partitioning the domains and ensuring unique constraints are in place.
- Non transactional
A sequence number, once generated cannot be rolled back any more than you can take an eraser to a check you have written.
You can shred the check and ignore the sequence number.
- No order
Unless specifically defined using the ORDER property DB2 does not guarantee order for generated numbers.
In a simple DB2 setup you are very likely to observe order. However, in a multi-member system such as pureScale or DPF DB2 uses local caches to serve up the numbers.
So one member will allocate numbers 1...20 while the next one gets 21...40 and so on.
So you could see numbers issued such as ...18, 23, 19, 20, 41, ...60, 61, 24.. as members consume caches at different rates.
Again a checkbook works as a reasonable metaphor. You may write one check a year while your spouse uses a lot.
- Gaps
Sequence numbers are non transactional and cached in memory.
So some values may never be consumed because their consuming transaction was rolled back or the cache was lost.
Compatibility
Sequences are part of ANSI SQL and DB2 has developed against that standard. More accurately you may say that we managed to standardized DB2's understanding of sequences.
For compatibility with Oracle however DB2 supports also the pseudo column syntax NEXTVAL and CURRVAL as well as PREVVAL which is DB2 specific :
VALUES orders.orderid.NEXTVAL;
1
-----------
6
VALUES orders.orderid.CURRVAL;
1
-----------
6
VALUES orders.orderid.PREVVAL;
1
-----------
6
The meaning of PREVVAL and NEXTVAL should be self explanatory.
CURRVAL in DB2 is the same as PREVVAL.
In Oracle however CURRVAL can also be NEXTVAL if there is a NEXTVAL in the same VALUES or SELECT.
DB2 chose not to follow this exception since it is very rare and confusing.
A detailed look
Sequence options
The sequence definition used above was very compact and defaults all the numerous settings.
Fully spelled out the DDL looks like this:
CREATE OR REPLACE SEQUENCE orders.orderid AS INTEGER
START WITH 1
NO MINVALUE
INCREMENT BY 1
NO MAXVALUE
NO CYCLE
CACHE 20
NO ORDER;
Let's now look at each option:
- INTEGER
Defines the type to be INTEGER and implicitly also limits the domain of the type to those of a 32bit integer.
The following types are allowed: SMALLINT, INTEGER, BIGINT, DECIMAL(n, 0) where 1 <= n <= 31.
Normally INTEGER is the default. However, if your database is using NUMBER_COMPAT mode then the default is DECIMAL(31, 0)
- START WITH 1
Tells DB2 what shall be the first value to be generated. The default is MINVALUE if specified for positive increments and MAXVALUE for negative increments.
If the relevant boundary is not set then 1 is assumed.
- NO MINVALUE
Indicates what is the lowest possible value DB2 may generate depends on the type.
MINVALUE 1000 for example will set a lower bound of 1000.
- INCREMENT BY 1
DB2 will generate values in increments of 1. E.g. 1, 2, 3, 4 and 21, 22, 23, ..
You can use any whole number including negative numbers. In that case the sequence will be descending from the START WITH value down towards the MINVALUE.
If you want to keep uniqueness across multiple databases you may use a step that is equal to the number of databases and offset the START WITH.
For example you may produce odd numbers in north America and even numbers is Asia and cross-replicate collision free.
- NO MAXVALUE
This is the same as NO MINVALUE, except that it deals with the upper bound.
MAXVALUE 100000 sets the biggest value to be generated to one million.
- NO CYCLE
This important property specifies that DB2 will stop when it reaches MINVALUE or MAXVALUE.
This is default and assures that as long as you do not ALTER SEQUENCE the values generated will be unique.
If you specify CYCLE however DB2 will wrap the sequence around from MAXVALUE to MINVALUE (assuming a positive increment) and may generate duplicates.
This may be what you want because you do not need uniqueness or because you know that the rows using the generated values as primary key are long gone.
- CACHE 20
This value is important for two properties.
First it sets the maximum number of values that may get lost if DB2 is stopped.
Second it allows multiple members of DB2 to work of local caches and thus violate the assumption that a sequence is truly ascending or descending.
In the end using the cache is for performance. When the cache is consumed the new high watermark of the sequence needs to be written to the logs for roll forward recovery.
The more often DB2 needs to write to disk the slower the sequence.
If you absolutely want no loss of numbers due to outage use NO CACHE.
- NO ORDER
This setting is closely related to CACHE.
As stated above in a multi-member environment caches are local and because each member consumes caches at its own pace the issued numbers can, at times look mysterious.
Altering sequences
As mentioned earlier you can also alter a sequence using the ALTER SEQUENCE command. This statement allows you to alter any property of the sequence. In addition it also supports a RESTART WITH.
You may want to use that option when you want to synchronize the sequence with the data after a LOAD.
ALTER SEQUENCE orders.orderid RESTART WITH 1000;
VALUES NEXT VALUE FOR orders.orderid;
1
-----------
1000
NEXT VALUE FOR semantics
Lastly there is some non-obvious trickery with respect to using NEXT VALUE FOR.
If you use NEXT VALUE FOR more than once for the same sequence in the same SELECT list or the same row of a VALUES, then only one value is generated.
VALUES (NEXT VALUE FOR orders.orderid, NEXT VALUE FOR orders.orderid),
(NEXT VALUE FOR orders.orderid, NEXT VALUE FOR orders.orderid);
1 2
----------- -----------
1001 1001
1002 1002
Further more you must not use NEXT VALUE FOR outside of a top level SELECT list or VALUES.
Sequence name resolution and synonyms
Unqualified sequence names, like tables and views use the CURRENT SCHEMA to derive the schema.
VALUES NEXT VALUE FOR orderid;
1
-----------
1003
SET SCHEMA = "ORDERS";
VALUES NEXT VALUE FOR orderid;
1
-----------
1004
If you want to reference a sequence by it's unqualified name in a specific schema or in general you can either define a private or public synonym.
SET SCHEMA = 'SERGE';
CREATE OR REPLACE PUBLIC SYNONYM orderid FOR SEQUENCE orders.orderid;
VALUES NEXT VALUE FOR orderid;
1
-----------
1005
Difference between IDENTITY and SEQUENCE
I may blog about identity and it's difference to sequence in the future.
But in the end an IDENTITY column is a column with a private sequence. DB2 runtime does not know the difference.
Use to your own taste.
#Db2