Almost daily, we read about another data breach or exposure—where data that belongs to us can lead to identity theft. That’s a scary thought. As IT people, we have the ability and the responsibility to ensure that our companies aren’t the ones on the front page of the daily newspaper. Data is the key to this effort. We have to understand how people can get to our data. And that means understanding “data lineage.”
What Is Data Lineage?
Data lineage can be thought of as the lifecycle of your data. Where does it come from? Where is any given piece of data moving to? Where does it end up? It’s all about understanding how data is used and where it’s being used. In many cases, the path is quite complex and will vary based on inputs to the transaction and other state data. One instance of a transaction may show the data moving one way and the next transaction will have a different data path. The challenge is trying to document and understand these paths.
Why Does Data Lineage Matter to Me?
People are increasingly collecting, mining and sharing huge volumes of data to better understand their business and their customers. While data is increasingly one of the most valuable assets a company has, this also means it is a target for hackers and identify thieves. The more data you have, the greater risk of exposure. Admittedly, some data has no real meaning to anyone but insiders; if a hacker were to grab some SMF data, we probably wouldn’t worry. But what about social security numbers? Bank account numbers? While the files and databases technically reside on disk drives and tape cartridges, the data may flow out to customers and business partners. As data gets farther away from the storage unit, it’s more likely to be exposed. We have to safeguard the data, but first, we have to know where it is. Data lineage tools make it possible by giving you a visual image of your data flow in each of your application processes.
Compliance and Security
When you know where a piece of data is at any stage in the process, you can better protect it, allowing your company to avoid the legal and financial liabilities of a security breach. It’s actually a matter of law for some businesses. For example, in January 2013, the Basel Committee issued Banking Supervision regulation number 239 (BCBS 239), which introduced 14 principles for effective risk data aggregation and risk reporting. This regulation established that banks had to be able to demonstrate the data flow used to create risk reports. Regulations governing other lines of business quickly followed. Data lineage is now required to meet government compliance, but it's also critical in protecting customer data.
But what does that look like? Well, it means performing a discovery of data sources, application flow and other interconnections, and then establishing a repository of this data to produce quick visualizations as needed.
What Do You Need?
You can’t do this manually, so you’ll need automation to build your database. You’ll also want a way to classify and organize information so you can identify sensitive data. A visualization tool will make it possible to see where and how that sensitive data is processed, so you can secure the data as appropriate.
It’s clear we all have to do data visualization, but security and compliance aren’t the only benefits. You’ll also have a visualization of all your business processes, which allows a much easier mapping between business rules and application code. Business and IT can use this common language to communicate more efficiently and quickly create new business value. When you need to migrate or terminate systems, it’s much easier to track what you’re doing. And of course, documenting your system makes problem resolution much easier.
Let’s get started. Make your life a lot easier with a data lineage solution.
Denise P. Kalm is chief innovator of Kalm Kreative Inc. and consultant to CM First Group.