Introduction:
In today's fast-paced data-driven world, schema evolution is a common phenomenon in database management. As businesses grow, their data models need to adapt to changing requirements, leading to modifications in database schemas. IBM Data Replication on Cloud Pak for Data (CP4D) offers a robust change data capture (CDC) solution that helps manage schema evolution. In this blog, we will explore the "Schema Evolution with New Table" ddl change_log type, its benefits, and how it works.
What is Schema Evolution with New Table?
Schema Evolution with New Table is a ddl change_log type configuration parameter in IBM Data Replication on CP4D that enables the creation of a new table in the target database every time a DDL (Data Definition Language) operation is performed on the source database. This approach ensures that the schema history is retained, and the changes are propagated to the target database in a controlled manner.
Benefits of Schema Evolution with New Table:
-
Schema History Retention: This approach allows for the preservation of schema history, enabling businesses to track changes to their data models over time.
-
Improved Data Governance: By retaining schema history, organizations can better understand the evolution of their data models and ensure that changes are properly documented and approved.
-
Flexibility: Schema Evolution with New Table provides the flexibility to manage schema changes in a way that suits the organization's needs, without disrupting the existing data pipeline.
How does it work?
When a DDL operation is performed on the source database, IBM Data Replication on CP4D creates a new table in the target database with the modified schema. The original table is renamed to <TABLENAME><OPERATION><SEQUENCE_COUNTER>, where OPERATION indicates the type of DDL operation performed, and SEQUENCE_COUNTER is a unique identifier for the operation.
Important Note:
The new table created after the DDL operation will not contain data from the previous table. Instead, the data prior to the DDL operation will remain in the original table, and new data inserted after the DDL operation will be replicated in the new table.
Customer Example:
Let's consider a retail company, XYZ Inc., that uses IBM Data Replication on CP4D to replicate data from their transactional PostgreSQL database to a analytics database on Db2 Warehouse. The company's data model has undergone several changes over the years, and they need to track these changes to ensure data consistency and accuracy.
Initially, the company's customer table looked like this:
Later, the company added a new column, "Email," to the customer table:
Using the Schema Evolution with New Table ddl change_log type, IBM Data Replication on CP4D created a new table in the target database with the modified schema. The original table was renamed to "CUSTOMER_COLUMN_ADDED_1", and it retained the data prior to the DDL operation:
CUSTOMER_COLUMN_ADDED_1:
|
|
|
|
|
1
|
John Smith
|
123 Main St
|
|
2
|
Jane Doe
|
456 Elm St
|
The new table, "CUSTOMER", was populated with new data inserted after the DDL operation:
CUSTOMER:
|
|
|
|
|
|
3
|
New Customer
|
789 Oak St
|
new.customer@example.com
|
By using Schema Evolution with New Table, XYZ Inc. was able to track the changes to their data model and ensure that their data pipeline remained intact.
Use Cases:
-
Data Warehousing: Schema Evolution with New Table is particularly useful in data warehousing scenarios, where schema changes need to be tracked and managed carefully to ensure data consistency and accuracy.
-
Real-time Analytics: This approach enables real-time analytics by providing a mechanism to manage schema changes in a way that supports real-time data processing and analysis.
Conclusion: In conclusion, the "Schema Evolution with New Table" ddl change_log type in IBM Data Replication on CP4D offers a robust solution for managing schema evolution in a controlled and flexible manner. By retaining schema history and providing a mechanism for tracking changes, this approach enables businesses to improve their data governance and support real-time analytics.