Data Integration

Data Integration

Connect with experts and peers to elevate technical expertise, solve problems and share insights.

 View Only

Mastering Schema Evolution in IBM Data Replication: The Power of Schema Evolution With Same Table DDL Change_Log Type

By SHERIN JOSE posted yesterday

  

Introduction:

        In today's data-driven business landscape, change is the only constant. As organizations grow and evolve, so too must their data models and database schemas. What was once a simple customer table might now need additional columns for additional information, auditing, program information, or new regulatory requirements. But here's the challenge: How do you evolve your database schema while maintaining historical data integrity and ensuring your data replication processes continue uninterrupted?

        IBM Data Replication on Cloud Pak for Data (CP4D) provides a powerful change data capture (CDC) solution to handle schema evolution. This blog post will delve into the "Schema Evolution with Same Table" ddl change_log type, its advantages, and its operational principles.

 

What is Schema Evolution with Same Table?

        Schema Evolution with Same Table is a ddl change_log type configuration parameter in IBM Data Replication on CP4D that enables schema history retainment by the creation of a new table or creation of new columns in the target database. When significant structural changes occur (like creating, dropping, truncating, or renaming tables), the system creates a new table in the target database while preserving the original table with its historical data. Whereas in the case of other DDLs when column attributes are modified, the schema history is retained within the same change log table, by backing up columns whose attributes have changed.

        This approach ensures that your schema history is retained either by backing up existing tables when major DDLs alter table attributes, or by preserving history within the same change log table for column-level modifications.

 

Benefits of Schema Evolution with Same Table:

  1. Schema History Retention: This approach allows for the preservation of schema history, enabling businesses to track changes to their data models over time.

  2. Improved Data Governance: By retaining schema history in the same table, organizations can better understand the evolution of their data models easier and faster.

  3. Flexibility: Schema Evolution with Same Table provides the flexibility to manage schema changes in a way that suits the organization's needs, without disrupting the existing data pipeline.

How does it work?

        When a DDL operation such as create, drop or rename table is performed on the source database, IBM Data Replication on CP4D creates a new table (only for create and rename table DDL) in the target database with the modified table attributes. The original table is renamed to <TABLENAME><OPERATION><SEQUENCE_COUNTER>, where OPERATION indicates the type of DDL operation performed, and SEQUENCE_COUNTER is a unique identifier for the operation.

        In the case of other DDLs the schema history is retained in the same table by creating a backup of the changed column by renaming it as <COLUMNNAME><OPERATION><SEQUENCE_COUNTER>, and then creating a new column with modified attributes as needed.

 

Important Note:

        The new table created after the DDL operation will not contain data from the previous table. Instead, the data prior to the DDL operation will remain in the original table, and new data inserted after the DDL operation will be replicated in the new table.

 

Customer Example:

        Let's consider a retail company, XYZ Inc., that uses IBM Data Replication on CP4D to replicate data from their transactional PostgreSQL database to a analytics database on Db2 Warehouse. The company's data model has undergone several changes over the years, and they need to track these changes to ensure data consistency and accuracy.

 

DDL: RENAME COLUMN

Initially, the company's table, named CUSTOMER_NEW_LOG looked like this:

Customer ID

Name

Address

1

John Smith

123 Main St

2

Jane Doe

456 Elm St

Later, the company decided to rename Address column to Postal_Address:

Using the Schema Evolution with Same Table ddl change_log type, IBM Data Replication on CP4D, will make the changes to the existing table itself as the DDL is a rename column. It will rename the existing column as Address_renamed_1. and create a new column with modified column name.

CUSTOMER_NEW_LOG:

Customer ID

Name

Address_renamed_1

Postal_Address

1

John Smith

123 Main St

NULL

2

Jane Doe

456 Elm St

NULL

New data inserted after the DDL operation, will continue to populate in the same change log table

CUSTOMER_NEW_LOG:

Customer ID

Name

Address_renamed_1

Postal_Address

1

John Smith

123 Main St

NULL

2

Jane Doe

456 Elm St

NULL

3

Chris Johnson

NULL

789 Oak St

 

DDL: RENAME TABLE

Initially, the company's table, named CUSTOMER_LOG looked like this:

Customer ID

Name

Address

1

John Smith

123 Main St

2

Jane Doe

456 Elm St

Later, the company decided to rename their table, to CUSTOMER_NEW_LOG:

CUSTOMER_NEW_LOG:

Customer ID

Name

Address

(new data) (new data) (new data)

Using the Schema Evolution with Same Table ddl change_log type, IBM Data Replication on CP4D created a new table in the target database with the modified schema. The original table was renamed to "CUSTOMER_LOG_RENAMED_2", and it retained the data prior to the DDL operation:

CUSTOMER_LOG_RENAMED_2:

Customer ID

Name

Address

1

John Smith

123 Main St

2

Jane Doe

456 Elm St

The new table, "CUSTOMER_NEW_LOG", will then be populated with data being inserted after the DDL operation

CUSTOMER_NEW_LOG:

Customer ID

Name

Address

Email

3

New Customer

789 Oak St

new.customer@example.com

 

Use Cases:

  1. Regulatory Compliance and Audit: For industries with strict regulatory requirements, maintaining a complete history of data changes is essential. With this approach, they can preserve complete history of both data and schema changes, demonstrate data lineage for compliance audits and reconstruct historical states of data for investigations. This also making it easier to demonstrate compliance with regulatory requirements.

  2. Data Warehousing: Schema Evolution with Same Table is particularly useful in data warehousing scenarios, where schema changes need to be tracked and managed carefully to ensure data consistency and accuracy.

  3. Real-time Analytics: This approach enables real-time analytics by providing a mechanism to manage schema changes in a way that supports real-time data processing and analysis.

 

Conclusion:

        The "Schema Evolution with Same Table" ddl change_log type in IBM Data Replication on CP4D represents a sophisticated approach to a common challenge in database management. By intelligently handling schema changes while preserving historical data and structures, it enables organizations to evolve their data models confidently. This capability is particularly valuable in today's fast-paced business environment, where data requirements change rapidly and the ability to adapt quickly—without losing historical context—can provide a significant competitive advantage.

0 comments
9 views

Permalink