Modernization with IBM Power

Connect, learn, share, and engage with IBM Power.

View Only

Back to Blog List

Legacy modernization using AI/ML - AI-driven Data cleansing and Migration

By RajKumar Chindanuru posted Thu April 10, 2025 11:21 PM

From my previous blog post – “Migrating data from legacy IBM i systems to modern platforms can be complex due to inconsistent formats, redundant records, missing or inconsistent values.” Migrating data to modern relational databases (like PostgreSQL or MySQL) involves above mentioned challenges. Manually fixing these issues is time-consuming and error prone. AI-powered tools can automate data analysis, cleansing and transformation, making migration faster and more accurate.

Let us see how AI can help to address those challenges:

Challenge	How AI addresses the problem
Duplicate data	AI can identify possible duplicate records even if names or fields are slightly misspelled.
Missing values	ML models can fill in missing fields by learning from existing data patterns.
Improper File layouts	AI/ML models can read RPG copybooks or DDS and can auto-generate table schemas.
Incompatible data formats	AI can learn transformation rules (e.g., date formats, units) and apply them during extraction, transformation and loading operations.

Next we will see how AI can help data cleansing by taking a sample input from DB2 for i

Cust Id	Name	Phone	State	Zip
CUST001	John Smith	1234567890	TX
CUST002	J. Smith	1234567890	Texas	77001
CUST003	Jane	9876543210	TX	77002

After AI driven cleaning:

Cust Id	Name	Phone	State	Zip	Duplicate of
CUST001	John Smith	1234567890	TX	77001	NULL
CUST003	Jane Doe	9876543210	TX	77002	NULL
CUST002	J. Smith	1234567890	TX	77001	CUST001

The process includes the following steps:

· Extract the DB2 for i data using SQL scripts and export it to a customers.csv file.

· Executes the following python script for duplicate data detection using fuzzy string matching.

from thefuzz import fuzz

import pandas as pd

df = pd.read_csv("customers.csv")

def find_duplicates(df):

duplicates = []

for i, row1 in df.iterrows():

for j, row2 in df.iterrows():

if i < j and fuzz.token_set_ratio(row1['Name'], row2['Name']) > 85:

duplicates.append((i, j))

return duplicates

print(find_duplicates(df))

· It predicts the missing zip codes using scikit-learn (a machine learning model)

from sklearn.ensemble import RandomForestClassifier

df['Zip'] = df['Zip'].fillna(0).astype(str)

train = df[df['Zip'] != "0"]

X = train[['State']]

y = train['Zip']

model = RandomForestClassifier()

model.fit(X, y)

missing = df[df['Zip'] == "0"]

df.loc[missing.index, 'Zip'] = model.predict(missing[['State']])

Note: The above code may need to be refined for better performance or to handle large data sets.

Similarly, AI can help in mapping suitable data structures.

Let us say we have a DDS like

A R CUSTREC

A CUSTID 10A

A NAME 30A

A PHONE 10S 0

A STATE 2A

A ZIP 5S 0

AI tools can convert this into a modern schema:

CREATE TABLE customers (

cust_id VARCHAR(10),

name VARCHAR(30),

phone BIGINT,

state CHAR(2),

zip INTEGER

);

AI-powered data migration accelerates the process accurately and handles millions of records simultaneously.

To conclude, AI combined with traditional ETL processes can modernize legacy systems with less manual effort. Machine learning models help fill data gaps, eliminate duplicates, and transform file-based records into relational databases while preserving business logic. This approach not only accelerates digital transformation but also ensures higher data quality and prepares organizations for cloud-native or analytics-ready systems.

0 comments

5 views

Permalink

https://community.ibm.com/community/user/blogs/rajkumar-chindanuru/2025/04/10/ai-driven-data-cleansing-and-migration

Modernization with IBM Power

Modernization with IBM Power

Legacy modernization using AI/ML - AI-driven Data cleansing and Migration

By RajKumar Chindanuru posted Thu April 10, 2025 11:21 PM

Permalink

Additional
Resources

Office

Quick Links

Modernization with IBM Power

Modernization with IBM Power

Legacy modernization using AI/ML - AI-driven Data cleansing and Migration

By RajKumar Chindanuru posted Thu April 10, 2025 11:21 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources