In DB2, partitioning (or table partitioning) is a technique used to divide a large table or index into smaller, easier-to-manage sections called partitions. Despite being stored and accessed independently, these partitions together constitute a single logical table or index.
Imagine a situation where you are working with a smaller amount of data and managing it is fairly easy with just an application. Over time, the volume of data increases, transactions occur frequently, and it becomes difficult to archive or remove old data from a table due to performance issues and a lot of downtime.
Main problems faced because of this:
1. Slow query performance — Db2 will have to deal with volume of data.
2. Data management becomes poor — Deleting/archiving data becomes challenging.
3. Management issues — Backup and Restore becomes slow as data is stored in a single table space.
Press enter or click to view image in full size
Image shows difference in partitioned and non-partitioned tables where in partition one , there are multiple table spaces with its own pages while in non-partitioned there is single table space with multiple pages.
Leveraging Attach/Detatch for partitioning:
Assuming the table is partitioned by column — which is typically utilized in where conditions or those with date/time ranges — is required in order to accomplish the ease of management and administrative operations.
We can determine how many partitions are necessary once everything is finalized, and each partition may even belong to a separate table space. Instead of deleting a large amount of data from the table, we can choose to detach the partition from table.
Press enter or click to view image in full size
Image shows detatch/attach of partition on a partitioned table.
Following is the syntax to do that:
ALTER TABLE table name DETACH PARTITION partition_name INTO another_tablename;
We can even attach a partition and set its integrity.
ALTER TABLE table name ATTACH PARTITION partition_name STARTING ‘<start_value>’ ENDING ‘<end_value>’ FROM another_tablename;
Since we have separate table space for each partition, It is an added advantage which results in easier management of detatched partition as new table and deal as per our requirements.
Types of Partitioning in DB2: There are multiple types of partitioning which we can decide on the basis of our requirements and needs of the application.
1. Hash
2. List
3. Range
4. Composite
Amongst these multiple partitions, which one to choose from depends on various factors.
1. Table Partition: This is used if we have large data volume and we need to query on basis of date ranges and where we can drop old partitions (unwanted data) to manage the database.
PARTITION BY RANGE(create_date)
2. Database Partition: When you deal with TBs of data in high volume OLAP systems and want data to scale out horizontally across servers.
DISTRIBUTE BY HASH(event_id)
3. No partitioning: When there is less data and can be managed logically at application level.
Why Choose Range Partitioning on a Date Column?
1. Query Performance and Partition Pruning
Query performance is one of the biggest reasons to choose date partitioning over ID partitioning. Most queries in analytical and transactional workloads are time-bound, such as:
“Get sales for the last quarter”
“Retrieve orders from last month”
“Select customers created this year”
Partition pruning is a technique that Db2 LUW can use to increase query performance when partitioning a table by a date column. This means that only the partitions that correspond to the searched date range are scanned, significantly lowering I/O and enhancing efficiency.
On the other hand, Db2 LUW is less likely to prune partitions if you partition by an ID column. Since there is no inherent connection between IDs and the data range, queries based on ID values will still need to scan over all partitions.
2. Data Lifecycle and Archival
Data typically needs to be archived or purged based on time, such as:
Archiving older transactions
Purging data past a certain threshold (e.g., deleting data older than 5 years)
When partitioning by date, the data can be easily detached or archived by partition, making maintenance more manageable. For instance, entire partitions representing a quarter or a year can be moved to another table or system.
For example:
ALTER TABLE sales DETACH PARTITION p_2022 INTO sales_2022_archive;
If you partition by ID, it’s more challenging to manage data lifecycle processes because IDs don’t correlate with age or lifecycle, which complicates archiving and deletion.
3. Natural Data Distribution
When partitioning by date, the data is naturally distributed across partitions based on logical date ranges. For instance, you can partition data by quarter, month, or year. This ensures that each partition holds a reasonably balanced amount of data, avoiding skewed partitions.
In contrast, partitioning by ID may lead to uneven distribution, especially if the IDs are sequential or have an uneven distribution of data. This can result in some partitions being much larger than others, causing performance bottlenecks.
4. Improved Manageability and Maintenance
Date-based partitioning makes partition management and upkeep much simpler. Partition splitting, merging, and deleting older data are all simple tasks. For instance, it is simple to construct new partitions for the future periods when fresh data enters the system.
Maintaining the system is more difficult with ID partitioning since there is no natural mechanism to specify when partitions should be merged or eliminated.
When to Use Range Partitioning on an ID Column?
While date partitioning is generally the promising option, there are scenarios where ID-based partitioning can make sense:
Multi-tenant Applications: If each tenant’s data is isolated and partitioning by tenant ID is necessary for data isolation.
Sharding: In some systems where data is distributed by customer or region, partitioning by ID can be helpful to ensure each partition holds data for a particular group.
The main purposes and benefits of using partitioning in DB2 are:
Performance Impact: A well-defined partition on key columns can highly improve performance and I/O parallelism and expected to improvise at least by 50% . Database invokes partition pruning feature for efficient data access by ignoring irrelevant data partitions. In other words, working with focused data partitions.
Parallelism of partitions: Having multiple partitions on a table and with each partition on a separate tablespace can really help with managing of table . Database at backend, will consider each tablespace as a separate one and its export/import or load can work parallelly while activities on other partitions are going on.
Data Movement/loading and Maintenance: Each partition can be loaded separately which allows for bulk speed load up and maintenance tasks like REORG, or IMPORT can be done per partition, reducing downtime. LOAD and RUNSTATS are on entire table for all partitions.
Database Partitioning Feature: It allows to distribute data and work across multiple partitions in a multi-node Db2 instance which reduce I/O operations, improves performances and addition/removal of partitions easily.
Optimized Index Usage: We can create local indexes per partition, reducing overhead and improving lookup times. Index rebuilds are faster when scoped to a single partition.
Press enter or click to view image in full size
Image shows difference in accessing of table via portioned and non-partitioned indexes where partition one scans its own table space and non-partitioned scans all the table spaces.
Conclusion: While partitioning introduces some complexity in design, the DB2 partitioning features makes table maintenance easier by enabling crucial administrative operations on individual partitions while ensuring table availability, along with higher performance and scalable table structure.