As organizations steer their business strategies to become wholly data-driven, data and analytics are more crucial than ever before. Insights from data improve business outcomes, but managing data quality amidst the explosion of data growth is a tremendous challenge. According to IDC, stored data will grow 250% by 2025 with data rapidly propagating on premises and across clouds, applications, and locations. And, according to Gartner, through 2025, 30% of Generative AI projects will be abandoned after proof of concept (POC) due to poor data quality. How can organizations get a holistic view of data and simultaneously maintain its quality when their data is distributed across silos? Data fabric is the ideal architecture strategy for effective data quality management.
Data fabric is defined as an architecture with integrated set of technologies and services designed to democratize data access across the enterprise at scale. At the core of the right architecture is the integration of data governance, security, integration, observability, and management capabilities. Strength in quantity of data does not necessarily improve the business outcomes, but high-quality data does. With data fabric architecture, data can be integrated and enriched, governed and protected, and accessed across the organization.
At IBM, we provide 4 entry points to help organizations implement data fabric architecture.
- Data Governance: automate management of data lifecycles with governance, security, and lineage for self-service data consumption.
- Data Integration: provide readily consumable and properly governed data to your teams, essentially anytime and anywhere.
- Data Observability: deliver reliable data by detecting data incidents earlier and resolving them faster with continuous data observability.
- Master Data Management: drive faster and scalable insights by delivering a comprehensive view of entity data across an enterprise.
Integrating data includes transforming data into useful and meaningful information for organizations. IBM Data Integration helps connect data from disparate sources, build data pipelines, remediate data issues, enrich data, and deliver integrated data to multi-cloud platforms where it can easily be accessed by data consumers or built into data products.
A crucial aspect of downstream consumption is data quality. Studies have shown that 80% of time is spent on data preparation and cleansing, leaving only 20% of time for data analytics. Thus, the earlier in the process that data is cleansed and curated, the more time data consumers can reduce in data preparation and cleansing. This leaves more time for data analysis.
Let’s focus on address data. With high quality address data, organizations can:
- develop complete, high quality customer data
- minimize the number of failed deliveries that could drive up the operational cost (leveraging validation/correction and CASS certification in the United States)
- increase operational efficiency based on proximity to customers (leveraging geocoding/reverse geocoding)
- improve product deliverability to a global customer base (leveraging transliteration/validation), and much more.
Within IBM’s Data Fabric architecture, IBM QualityStage Address Verification Interface on Cloud Pak for Data allows data engineers to build data pipelines and remediate poor address data with Address Verification Interface. The Address Verification stage is a part of the DataStage canvas.
Address Verification Interface (AVI) capabilities include:
- Parsing: performs a lexicon analysis of the address without validation against reference data
- Validation, correction & suggestions: validate if possible, corrects & enhances an address and makes suggestions if address validation is ambiguous
- Transliterate: transliterates address character set Native > Latin > Native
- Geocoding: provides latitude and longitude based on an address
- Reverse geocoding: identifies locations nearest to a latitude and longitude coordinate
- CASS certification: returns certified addresses in adherence to United States Postal Service format and standards
How to leverage Address Verification Interface within IBM DataStage on Cloud Pak for Data
Validate Address Data
Pre-requisites:
- An existing DataStage project
- An existing DataStage flow that includes:
- Data source that contains input address data
- Address Verification stage
- Data target to receive the output data
Step 1 - Configure Address Verification stage
- Double-click on Address Verification stage
- Click on "Stage" tab and open "Processing"
- Under Configuration, select "Validation" to validate input address data
- Under Validation type, select "Correction only" to return the strongest correction of an address based on the reference data provided through Address Verification
- Under Include geographic location, if you installed AVI Address & Geo reference data set, this will include the latitude and longitude coordinates in the output result.
- Under Path for initialization and reference files, enter the location of the reference files installation path
- Under Validation Summary Report section, enter a name for the summary report (in text file format) which includes statistics on how well your data is performed: number of records processed, success rate, failure rate. This report will be generated in the "px-storage" folder by default.
-
Expand "Options". This will give you granular statistic summarization on how your data quality by field level, i.e. house number, street address.
- Under Include field status codes, select "Yes"
- Under Include field match score, select "Yes" to return the score from 0-100 to give confidence level of each field in output address data
- Default country or region allows users to set the priority to a particular country, sorting the data in memory as much as possible
- How many suggestions to show returns the number of addresses of an input ambiguous address if "Suggestion" is selected for "Validation type"
-
Click on "Input" tab
- Expand "Address Columns" and click on Edit
- Map the Address Field to Input column Name by click on pencil icon on the right
Click on Apply to save
Click on Apply and Return once you complete the field mapping
Step 2 - Configure the data target stage and fill out all required fields. In this case, we leverage a Sequential File target to write a CSV file.
Step 3 - Compile and Run the DataStage flow
Step 4 - Validate the output file
AVI generates additional columns with suffix "_QSAV" and the codes explanation can be found here
Address Verification Interface enhances data quality by validating and correcting your address data, but it also gives you a summary report on how your input address data performs and the confidence of output data so you can take action.
How IBM customers use Address Verification Interface
Use Case 1: State and Local Government
With multiple services spanning benefits, public housing, infrastructure planning and more, government agencies need complete and trustworthy data to provide high-quality experiences to the public. Government agencies manage issues with siloed data, customer data inaccuracies, and frequent updates in data. These matters make it difficult to capture and manage the public’s information accurately.
IBM QualityStage Address Verification Interface enables agencies to standardize, improve and verify their customers’ addresses. And, with reverse geocoding capability, they can more easily address use cases such as building infrastructure based on their citizens.
User Case 2: Shipping & Logistics
Poor address data quality can lead to failed or late deliveries, and poor customer experience which results in losing customers or impact brand perception.
IBM QualityStage Address Verification Interface on Cloud Pak for Data enables shipping and logistics clients to standardize and format their customer addresses in 249 countries and territories, which improves service reliability and operational efficiency.
Use Case 3: Bank
The collection of customer data through different mediums can increase the risk of having incorrect, incomplete customer data. Poor quality data makes it difficult for organizations to do critical business tasks such as verifying client identity or sending out time-sensitive documents.
IBM QualityStage Address Verification Interface on Cloud Pak for Data enables organizations to address KYC and other regulatory use cases, improve business processes around fraud detection, feed Customer 360 initiatives, personalize customer journeys, and reduce costs for undeliverable addresses.
Enhance your data quality with IBM
Gartner estimates that every year, poor data quality costs organizations $12.9M. IBM’s Data Fabric architecture can prevent financial losses with DataStage with integrated data quality add-on Address Verification Interface so organizations can leverage a holistic view of reliable data to build trusted business models. With the right tools in hands, organizations can jumpstart their data fabric journey with quality data that is integrated and enriched, governed and protected, and easily accessible to their organization.
Start your data quality journey today with Address Verification Interface
Book a meeting to learn more
Try DataStage for free
#DataIntegration