Db2

Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

Pivoting tables 

Tue March 10, 2020 07:54 PM

Posted by: Serge Rielau

  • PIVOT
    The meaning of pivoting a table is to turn n-rows into n-columns.
    For example given a ales table with a composite primary key of year and quarter you may want to get a result that shows only one row for each year, but a separate column for each quarter.
    CREATE TABLE Sales(Year    INTEGER NOT NULL, 
                       Quarter INTEGER NOT NULL,
                       Results INTEGER);

    CREATE UNIQUE INDEX PK_IDX ON Sales(Year, Quarter) INCLUDE(Results);

    ALTER TABLE Sales ADD PRIMARY KEY (Year, Quarter);

    INSERT INTO Sales VALUES
    (2004, 1, 20),
    (2004, 2, 30),
    (2004, 3, 15),
    (2004, 4, 10),
    (2005, 1, 18),
    (2005, 2, 40),
    (2005, 3, 12),
    (2005, 4, 27);
    The desired result is:
    Year Q1 Q2 Q3 Q4
    ---- -- -- -- --
    2004 20 30 15 10
    2005 18 40 12 27
    There a many ways to achieve pivoting, but there is clearly one that is optimal which I will explain here.
    Pivoting involves two events: First the payload (Results in this case) needs to be dispatch into the Q1 through Q4 columns.
    This can be done with a CASE expression or DECODE (which is another syntax for CASE):
    SELECT Year,
           DECODE(Quarter, 1, Results) AS Q1,
           DECODE(Quarter, 2, Results) AS Q2,
           DECODE(Quarter, 3, Results) AS Q3,
           DECODE(Quarter, 4, Results) AS Q4
      FROM Sales;

    YEAR        Q1 Q2      Q3          Q4
    ----------- ----------- ----------- ----------- -----------
           2004          20        -      -       -
           2004           -        30      -       -
           2004           -        -      15       -
           2004           -           -      -       10
           2005          18           -      -       -
           2005           -          40      -       -
           2005           -         -      12       -
           2005           -         -      -          27

    8 record(s) selected.
    Note how DECODE injects NULL as an implicit ELSE.
    The second step is to collapse the rows to get one row per Year.
    The technique of choice is a GROUP BY here.
    When grouping on Year we need to use some sort of column function for the columns Q1 through Q4.
    MAX or MIN() work for most types and they are very cheap. Especially since all but one value will be NULL per group:
    SELECT Year,
    MAX(DECODE(Quarter, 1, Results)) AS Q1,
    MAX(DECODE(Quarter, 2, Results)) AS Q2,      
    MAX(DECODE(Quarter, 3, Results)) AS Q3,
           MAX(DECODE(Quarter, 4, Results)) AS Q4
      FROM Sales
      GROUP BY Year;

    YEAR        Q1          Q2  Q3          Q4
    ----------- ----------- ----------- ----------- -----------
           2004        20      30     15      10
           2005        18      40     12      27

    2 record(s) selected.
    I noted above that this approach is the best. How do I know. Well let's look at the access plan:
    Access Plan:
    -----------
    Total Cost:         0.0134932
        Query Degree:        1

            Rows
           RETURN
           (   1)
            Cost
             I/O
             |
             1.6
           GRPBY
           (   2)
           0.01322
              0
             |
              8
           IXSCAN
           (   3)
          0.0126886
              0
             |
              8
     INDEX:    SERGE  
           PK_IDX
             Q1
    Pretty hard to beat!

  • UNPIVOT
    Unpivot is the inverse operation of PIVOT. Here we have several similar columns which are to be combined into one column, but different rows.
    Let's reuse the same example from above and run it backwards so to speak
    CREATE TABLE Sales(Year INTEGER NOT NULL PRIMARY KEY,
          Q1   INTEGER,
          Q2   INTEGER,
          Q3   INTEGER,
          Q4   INTEGER);

    INSERT INTO Sales VALUES
    (2004, 20, 30, 15, 10),
    (2005, 18, 40, 12, 27);
    The most efficient way to unpivot is to do a cross join between the Sales table and a correlated VALUES of as many rows as columns that need to be unpivoted.
    SELECT Year, Quarter, Results
      FROM Sales AS S,
           LATERAL(VALUES (1, S.Q1),
                          (2, S.Q2),
                          (3, S.Q3),
                        (4, S.Q4)) AS T(Quarter, Results);

    YEAR        QUARTER    RESULTS
    ----------- ----------- -----------
           2004     1   20
           2004     2   30
           2004     3   15
           2004     4   10
           2005     1   18
           2005     2   40
           2005     3   12
           2005     4          27

    8 record(s) selected.
    Again let's look at the explain:
    Access Plan:
    -----------
     Total Cost:         7.57432
        Query Degree:        1

                      Rows
                     RETURN
                     (   1)
                      Cost
                       I/O
                       |
                        8
                     NLJOIN
                     (   2)
                     7.57432
                        1
               /-------+-------\
              2                   4
           TBSCAN              TBSCAN
           (   3)              (   4)
           7.57317          4.95961e-005
              1                   0
             |                   |
              2                   4
     TABLE:    SERGE     TABFNC:   SYSIBM 
            SALES              GENROW
             Q3                  Q1
    Once more this is the most efficient plan. Since it only contains a single scan.

#Db2

Statistics
0 Favorited
25 Views
0 Files
0 Shares
0 Downloads