There is a Smart document understanding (SDU) feature that allows users to train IBM Watson Discovery to interpret structural elements of documents including fields, footers, tables, images. To visually annotate a sampling of your documents and instantly visualize the machine learning models will interpret the text and retrieve the desired results.
Here are some important features of the solution:
The data: can come from any number of sources, including from your private data, internet sources and 3rd party sources.
The Process: by which Discovery search data: A process of ingestion that includes lifting and loading documents that will send to convert and enrich data with NLP capabilities. The entities are extracted and identified. All semantic relationships are stored for further processing. Later the data is cleaned up and analyzed for storage and query.
The Query languages allowed to search the documents and unlock the knowledge and insights you never imagine was present in data before and to utilize further in your enterprise applications.
The Relevancy Training: Providing relevant data to train Discovery for improved query results. The continuous relevancy training re-ranks the documents to surface the most relevant and important information to the top.
The supported data sources are: Web Crawl, Sales Force, Share Point, Box, Web Chat, IBM Cloud Object Storage.
Teach Watson your Domain Language: Combined with Watson Knowledge Studio capabilities, customer can train the solution with domain languages and make it relevant for specific applications and industries.

See Watson at work: https://www.ibm.com/ae-en/cloud/watson-discovery
#NLP #AI #Search #Data #ContentManagement #Watson #IBM