Introduction:
In today's data-driven business landscape, change is the only constant. As organizations grow and evolve, so too must their data models and database schemas. What was once a simple customer table might now need additional columns for additional information, auditing, program information, or new regulatory requirements. But here's the challenge: How do you evolve your database schema while maintaining historical data integrity and ensuring your data replication processes continue uninterrupted?
IBM Data Replication on Cloud Pak for Data (CP4D) provides a powerful change data capture (CDC) solution to handle schema evolution. This blog post will delve into the "Schema Evolution with Same Table" ddl change_log type, its advantages, and its operational principles.
What is Schema Evolution with Same Table?
Schema Evolution with Same Table is a ddl change_log type configuration parameter in IBM Data Replication on CP4D that enables schema history retainment by the creation of a new table or creation of new columns in the target database. When significant structural changes occur (like creating, dropping, truncating, or renaming tables), the system creates a new table in the target database while preserving the original table with its historical data. Whereas in the case of other DDLs when column attributes are modified, the schema history is retained within the same change log table, by backing up columns whose attributes have changed.
This approach ensures that your schema history is retained either by backing up existing tables when major DDLs alter table attributes, or by preserving history within the same change log table for column-level modifications.
Benefits of Schema Evolution with Same Table:
-
Schema History Retention: This approach allows for the preservation of schema history, enabling businesses to track changes to their data models over time.
-
Improved Data Governance: By retaining schema history in the same table, organizations can better understand the evolution of their data models easier and faster.
- Flexibility: Schema Evolution with Same Table provides the flexibility to manage schema changes in a way that suits the organization's needs, without disrupting the existing data pipeline.
How does it work?
When a DDL operation such as create, drop or rename table is performed on the source database, IBM Data Replication on CP4D creates a new table (only for create and rename table DDL) in the target database with the modified table attributes. The original table is renamed to <TABLENAME><OPERATION><SEQUENCE_COUNTER>, where OPERATION indicates the type of DDL operation performed, and SEQUENCE_COUNTER is a unique identifier for the operation.
In the case of other DDLs the schema history is retained in the same table by creating a backup of the changed column by renaming it as <COLUMNNAME><OPERATION><SEQUENCE_COUNTER>, and then creating a new column with modified attributes as needed.
Important Note:
The new table created after the DDL operation will not contain data from the previous table. Instead, the data prior to the DDL operation will remain in the original table, and new data inserted after the DDL operation will be replicated in the new table.
Customer Example:
Let's consider a retail company, XYZ Inc., that uses IBM Data Replication on CP4D to replicate data from their transactional PostgreSQL database to a analytics database on Db2 Warehouse. The company's data model has undergone several changes over the years, and they need to track these changes to ensure data consistency and accuracy.
DDL: RENAME COLUMN
Initially, the company's table, named CUSTOMER_NEW_LOG looked like this:
|
|
|
1
|
John Smith
|
123 Main St
|
2
|
Jane Doe
|
456 Elm St
|
Later, the company decided to rename Address column to Postal_Address:
Using the Schema Evolution with Same Table ddl change_log type, IBM Data Replication on CP4D, will make the changes to the existing table itself as the DDL is a rename column. It will rename the existing column as Address_renamed_1. and create a new column with modified column name.
CUSTOMER_NEW_LOG:
|
|
|
|
1
|
John Smith
|
123 Main St
|
NULL
|
2
|
Jane Doe
|
456 Elm St
|
NULL
|
New data inserted after the DDL operation, will continue to populate in the same change log table
CUSTOMER_NEW_LOG:
|
|
|
|
1
|
John Smith
|
123 Main St
|
NULL
|
2
|
Jane Doe
|
456 Elm St
|
NULL
|
3
|
Chris Johnson
|
NULL
|
789 Oak St
|
DDL: RENAME TABLE
Initially, the company's table, named CUSTOMER_LOG looked like this:
|
|
|
1
|
John Smith
|
123 Main St
|
2
|
Jane Doe
|
456 Elm St
|
Later, the company decided to rename their table, to CUSTOMER_NEW_LOG:
CUSTOMER_NEW_LOG:
|
|
|
(new data) |
(new data) |
(new data) |
Using the Schema Evolution with Same Table ddl change_log type, IBM Data Replication on CP4D created a new table in the target database with the modified schema. The original table was renamed to "CUSTOMER_LOG_RENAMED_2", and it retained the data prior to the DDL operation:
CUSTOMER_LOG_RENAMED_2:
|
|
|
1
|
John Smith
|
123 Main St
|
2
|
Jane Doe
|
456 Elm St
|
The new table, "CUSTOMER_NEW_LOG", will then be populated with data being inserted after the DDL operation
CUSTOMER_NEW_LOG:
|
|
|
|
3
|
New Customer
|
789 Oak St
|
new.customer@example.com
|
Use Cases:
-
Regulatory Compliance and Audit: For industries with strict regulatory requirements, maintaining a complete history of data changes is essential. With this approach, they can preserve complete history of both data and schema changes, demonstrate data lineage for compliance audits and reconstruct historical states of data for investigations. This also making it easier to demonstrate compliance with regulatory requirements.
-
Data Warehousing: Schema Evolution with Same Table is particularly useful in data warehousing scenarios, where schema changes need to be tracked and managed carefully to ensure data consistency and accuracy.
-
Real-time Analytics: This approach enables real-time analytics by providing a mechanism to manage schema changes in a way that supports real-time data processing and analysis.
Conclusion:
The "Schema Evolution with Same Table" ddl change_log type in IBM Data Replication on CP4D represents a sophisticated approach to a common challenge in database management. By intelligently handling schema changes while preserving historical data and structures, it enables organizations to evolve their data models confidently. This capability is particularly valuable in today's fast-paced business environment, where data requirements change rapidly and the ability to adapt quickly—without losing historical context—can provide a significant competitive advantage.