Pat O’Sullivan – Senior Technical Staff Member – IBM Data and AI
The past does not repeat itself, but it rhymes
25 years ago at the start of the data warehouse era, many organizations were scrambling to figure out how to best structure their analytics data stores to support the various needs of their organization. How to support the specific reporting requirements, how to store the data efficiently and how to future proof the design to address often as-yet-unknown requirements and use cases.
If you substitute the phrase data stores with business vocabulary in the above paragraph, it would be equally relevant today to the challenges facing organizations looking to deploy Business Vocabularies to support the growing governance layer around their Data Lake and the broader data management ecosystem.
As part of the overall increased focus on Data Governance, DataOps and self-service BI, most organization realise that a business vocabulary is a key component of that ecosystem . Such a business vocabulary is a critical component to enable maintenance of consistency of understanding across data assets, reports and other data management artifacts and to support the associated lifecycles regarding their creation, management, and automation. However what is less clear is what are the best practices and approaches to be adopted to build such a business vocabulary.
IBM Knowledge Accelerators
IBM has just announced a suite of Industry specific vocabularies called IBM Knowledge Accelerators which leverage an extensive array of pre-defined Industry content for Healthcare, Energy, Insurance and Financial Services.
These business vocabulary offerings are designed to support the day to day governance needs in terms of enabling Self Service Access by the lines of business, by data scientists and other users. In addition these vocabularies are expressly designed to support the data discovery process and search capabilities of IBM Watson Knowledge Catalog.
However, another key benefit of these new vocabularies is that they provide organizations with a foundation on which they can build and grow their business language layer to underpin their current and future DataOps and Data Governance activities.
In recent years IBM have been assisting various clients with their overall creation of a business vocabulary. From those interactions, a number of key practices have emerged, which have been incorporated into the IBM Knowledge Accelerators.
Addressing critical Business Topics and Business processes
The central component of the IBM Knowledge Accelerators is the Busines Core Vocabulary – the central cross-enterprise collection of business terms. However when all is said and done, the primary goal of any business vocabulary is to support the various business users, citizen analysts and data scientists in their quest to locate the most appropriate data for their particular need. To that end the IBM Knowledge Accelerators include a range of pre-defined industry-specific Analytical-focussed groupings of terms called Business Performance Indicators and broader business process oriented views called Business Scopes. Both of these constructs provide organizations with a quick-start to ensuring that any enterprise-wide business vocabulary they create always has a strong orientation towards serving the needs of the various business users. Finally the IBM Knowledge Accelerators also include Industry Alignment Vocabularies that represent the structure of key Regulations and Standards for each industry.
Detailed metamodel for extensible vocabulary design
Like any other intended long term cross-enterprise resource, it is important to plan what are the overarching structures that enforce consistency and aid usage by both human and machines. Without a commonly agreed framework in place, the business vocabulary will at best always remain a set of departmental business glossaries with limited applicability and little cross-enterprise relevance. That is why as part of the creation of the IBM Knowledge Accelerators, IBM focussed heavily on defining a common and extensible metamodel so that a very precise specification of each type of business term and relationship type could be explicitly defined.
Such a common metamodel ensured that any user or process could rely on a consistent approach to the definition of business language so increasing the likely wider applicability of the vocabulary and encouraging reuse and standardization of language across the enterprise.
Tight integration with the Data Discovery Process
For a Business Vocabulary to be effective, it must be able to grow and react organically to the changes in the technical ecosystem that it is describing. IBM Watson Knowledge Catalog includes an AI-infused Data Discovery capability to automate and accelerate the classification of incoming data sets and the subsequent assignment of this data to the appropriate terms in the business vocabulary. To support this process, the IBM Knowledge Accelerators include pre-defined references to the data classes already provided with Watson Knowledge Catalog to support this data discovery as well as also including an extensive set of Reference Data Sets and Values that organizations can used to create their own additional data classes.
Similar to the folks defining the early data warehouse architectures, we don’t have a functioning crystal-ball, so we cannot say for sure what will be the future use cases and demands made of what will be a key part of the evolving DataOps ecosystem. However we do know that whatever the use case will be, it will be critical to ensure that this valuable representation of business knowledge will be accessible by both humans and to AI-driven processes is key. Hence the focus in the IBM Knowledge Accelerators to ensure that the precise role of each relationship is clear and unambiguous to assist current and future ML/NLP processes. In addition the structures and classification of the terms in the IBM Knowledge accelerators means that this same business knowledge in the future could also be rendered in OWL/RDF ontologies if required to support broader and more expressive use of the semantics in the vocabulary – for example to support additional use cases such as Natural Language queries or chat bots.
Another key anticipated growth area is to ensure that the business vocabulary can also expand to reflect growth in the underlying data stores. So while today we can see how the business vocabulary can grow as more terms in the IBM Knowledge Accelerators are identified that match the classification of new data, there is also the possibility to automate the generation of net new terms based on the contents of the physical data stores - thus ensuring an even tighter linkage between the technical landscape and the business vocabulary.
To learn more about IBM Knowledge Accelerators, visit www.ibm.com/cloud/knowledge-accelerators
Dive deeper into technical details and the associated metamodel by visiting the IBM Knowledge Center .
Related reading: Introducing the IBM Knowledge Accelerators for IBM Watson Knowledge Catalog on Cloud Pak for Data [Coming August 11]