Watson Discovery

 View Only

Enriching your documents can make search more effective

By Bill Murdock posted Fri January 14, 2022 12:06 PM

  

Authors: J William Murdock and Sudarshan Thitte

 A landscape with an entity who appears to be searching


Some search systems allow you to perform linguistic analysis on your documents and store the results of that analysis with those documents. The analysis then acts as an enrichment to the text of the document, because the combination of text and structured analysis results is a more powerful representation of content than plain text alone would be.

For example, IBM Watson Discovery provides a set of built-in natural-language enrichment capabilities. It also allows you to add custom enrichments. Enrichments can include many kinds of language analysis. A particularly popular and useful form of enrichment is entities. Other examples of enrichments include relationships between entities, part of speech of words, types of sentences, etc. IBM Watson Discovery’s built-in enrichments can find common kinds of entities like people or places. Users can add also add custom entity detection capabilities. For example, a business that sells cars and trucks may want separate entity types for cars and for trucks.

This article starts by addressing some common misconceptions about how enrichments can be used in search for Watson Discovery. It then elaborates on some things users sometimes mistakenly expect enrichments to do for search. Finally, it describes some of the ways enrichments can be useful for making your search capability more effective.

This article does not provide any details about how to add enrichments to your documents, because that is covered in detail in the product documentation, most notably:

The purpose of this article is to focus on how search applications can use enrichments. We hope this article will help you assess whether enriching your documents with entities and/or other information will be valuable for making search more effective.

Common Misconceptions

Some ancient Egyptian records that may contain some myths and/or facts


There are several common misconceptions about how enrichments can be used for search in Watson Discovery:

1. Myth: Enrichments will automatically make Watson Discovery search more effective without any additional effort. Fact: Enrichments can be used to make search more effective (as described in a later section). However, this only works if you build your application to use those enrichments effectively. Watson Discovery’s natural-language query only uses enrichments when it is explicitly configured to do so.

2. Myth: Watson Discovery is just simple keyword search unless I do enrichments. Fact: Watson Discovery has many powerful AI capabilities that do not need any enrichments. For example, Watson Discovery has sophisticated algorithms to convert many different file formats into text. Watson Discovery has exceptionally powerful and flexible passage retrieval. Watson Discovery estimates the probability of each search result being correct. Watson Discovery has answer finding, which uses a pre-trained deep neural network to identify short answer spans in text. All of these capabilities are natural-language processing in the broad sense of the term, but none of them depend on storing enrichments to the text during ingestion.

3. Myth: If my documents have important domain-specific concepts, then I must build custom enrichments to get good search results from Watson Discovery. Fact: Watson Discovery has powerful search capabilities out-of-the-box. You can get a lot of value from Watson Discovery without building any custom enrichments. The accuracy will not be perfect, but it will be quite good.

4. Myth: If I add enough enrichments, eventually my search will be perfect (or close to perfect). Fact: Adding enrichments can make search accuracy better if you find a way to use those enrichments effectively. However, all of the known techniques for using enrichments have their limitations, and the total impact on search accuracy tends to be fairly modest. For many users, the (quite good) out-of-the-box accuracy that you get from Watson Discovery is a better value proposition than adding enrichments. However, for some business critical applications even a modest improvement in accuracy is worth a significant investment. For such applications, adding enrichments can be quite valuable.

5. Myth: Adding enrichments is the only way to improve search accuracy for Watson Discovery. Fact: Watson Discovery also provides a variety of other technologies for making your search more accurate including relevancy training, custom stop words, curations, synonyms, etc. As with enrichments, all of these methods require significant effort for relatively modest benefits, so just using the out-of-the-box search with no customization can be a better value proposition for many users. As with enrichments, these tools can be useful for high-stakes applications where modest improvements in accuracy can provide significantly more value (especially relevancy training because it is a fairly reliable source of modest improvement).

What enrichments won’t do for search

A vehicle that does not appear to be going anywhere


As noted earlier, Watson Discovery’s natural-language query only uses enrichments if you take some action to make it uses those enrichments. Many users find this surprising. For example, consider the following scenario:

Tina works for Examplomatic Corporation, which makes two kinds of products: widgets and gadgets. She gives Watson Discovery some documents about the products her company makes. She types in a query “what widget is available in purple?”. Her top search result says that the Examplomatic G200 is available in purple, and her second search result says that the Examplomatic W9000 is available in purple. Tina knows that the G200 is a gadget and the W9000 is a widget. So the first result is not relevant to the query but the second one is. So Tina decides to teach Watson about gadgets and widgets by training a custom enrichment in Watson Knowledge Studio. She then applies that model to her collection in Watson Discovery. Tina runs the query again and gets the same results with roughly the same confidence scores. Tina is now confused. “I taught Watson Discovery to distinguish between gadgets and widgets so why is it still getting this wrong?”, she asks. The answer is (again) that Watson Discovery’s natural-language query only uses enrichments if you make it use the enrichments. We will return to this example later to explore what Tina can do in this scenario.

The example above focuses on entity enrichments. The same limitations apply to other enrichments. For example, Watson Discovery lets you specify relations that connect entities. It also lets you identify whether some text has positive or negative sentiment. None of these enrichments are used in a natural-language query unless the calling application does something to use them.

Ways to use enrichments for search

 A workshop full of tools


One way to use enrichments is to add search facets to your user interface based on specific entity types. Search facets are extra controls on a search interface that let users filter search results. For example, if you go to the website of a typical clothing retailer and search for “shirt,” you may get some “facets” on the side of your search results for gender, color, age group, and price range. Facets can make your search more effective because it enables users to narrow down the result list to only the entries of interest. In the example in the previous section, Tina can add a facet for Product Type with entries for Gadget and Widget that she trained Watson Knowledge Studio to recognize. If a user types the same sample query, “what widget is available in purple?”, the user still gets the same original results with a mix of gadgets and widgets. Then the user can then click on Gadget or Widget in the Product Type facet, and then get only the desired type. That’s not ideal because the user asked for a widget in the query and now needs to do extra work to get only widget results. However, it is useful because it gets the user to the right answer.

Another option is to allow end users to write structured queries using the Discovery Query Language. This can be useful for applications targeted to professional information workers. If you build an application that is a significant part of users’ daily work experience, then those users might be willing to learn a complex and powerful structured query language. The Discovery Query Language is explained in detail in the product documentation query overview section.

A third option is adding logic to convert natural-language queries to structured queries. For example, Tina could write an algorithm that takes a query like “what widget is available in purple” and turn it into a structured query asking for an entity of type Widget and the word “purple” in the same document. Alternatively, Tina could write an even more complex algorithm that takes this natural-language query and converts it into a structured query asking for an entity of type Widget that has a available-in-color relation connected to an entity with name “purple”. The latter is more precise, but also more brittle. It requires that you also build a custom enrichment for that relation. Queries can then fail to yield results when the relation is not detected. The Watson Discovery development team has experimented with many variations of this concept. In fact, the original Watson for Jeopardy! used logic like this for some entity and relations that are particularly common in Jeopardy! (e.g., an actor in a movie). For Jeopardy!, we were able to find answers using this approach for a small fraction of clues (around 3.5%). We have not yet found a general-purpose approach to this problem that is consistently effective and efficient. As a result, there is no such feature built in to Watson Discovery. It is may be easier to build a specific implementation that works well on a single data set. So you might still want to consider taking this approach with your application. It is quite difficult to do well, i.e., to generate precise structured queries in a way that applies to a large fraction of actual user requests. So if you do take this approach, expect a significant investment that may or may not have significant impact.

A fourth use of enrichments is to call users’ attention to mentions of concepts such as entities when reviewing documents. That doesn’t make the search any more accurate, but it can make the search experience more valuable because users can spot the information they are looking for in documents more quickly. You may want to let the user control which entity types get highlighted. For example, an engineer trying to understand how metals are used in a product may want to highlight all the entities of type Metal (which would be recognized by a custom enrichment).

A fifth use of enrichments in search is finding specific kinds of content. For example, Watson Discovery allows users to search across tables in their collections using the table_results.enabled parameter of the Discovery v2 query API. This feature uses the table understanding enrichment to locate tables. Other kinds of enrichments can also be found this way by filtering only on documents that contain that enrichment type and then processed downstream using custom logic, as applicable.

None of these approaches are always good for all applications. Facets can be very effective for certain kinds of search, e.g., faceting on color or price in a product search. In other cases, facets can clutter up a user experience while adding little or no value. Even if you are going to use facets, entity types (or other enrichments) may not always be the best tool for identifying facets. Many applications have structured metadata on documents that are quite useful as facets, so no enrichment may be necessary to get good facets. Similarly, the other techniques listed above for using enrichments for search are useful in some applications and not in others. When they are useful, the benefits tend to be modest and incremental (but they can still be significant).

Recommendations

A lake with a sign explaining the fishing options available

In the example scenario described earlier, we would recommend that Tina deploy a search system for real users before trying to do any enrichments. We would also recommend that Tina’s application store a record of every query it receives. That way Tina can see how real users actually use the search system and use that information to make decisions about whether and how to use enrichments in the future.

Tina may be concerned that without enrichments the search application she has built using Watson Discovery is not good enough and nobody will want to use it. However, as noted in the previous section, the benefits of enrichments for search are likely to be modest and incremental. So Tina is in one of two states:

1. Most likely, her Watson Discovery application is good enough for real users. Watson Discovery provides excellent AI-driven information finding capabilities without any need for customization. In this state, she is probably better off deploying it to users, seeing how they use it.

2. Perhaps her Watson Discovery search application is not good enough. Maybe her user population just is not interested in a search capability over the content she has unless it is vastly superior to what Watson Discovery can provide out-of-the-box. Because the improvements that can be had using custom enrichments are modest and incremental, it is very unlikely that Tina will be able to satisfy these sorts of users with custom enrichments. Furthermore, competing solutions from other vendors are also unlikely to satisfy these users. The state-of-the-art is what it is. In this state, Tina is still better off deploying the solution right away and seeing it fail. That’s not as bad as spending an enormous amount of work trying to optimize a doomed application and then eventually seeing it fail anyway.

If Tina does deploy her application and does get real usage, then Tina may want to seriously consider making that application even better. The techniques described in this article can be helpful for this purpose. As mentioned earlier, Watson Discovery also has many other improvement tools such as relevancy training, curations, and query expansion. These can be time consuming and none of them will transform a solution from terrible to amazing. However, they can make a noticeable difference and make a good solution even better. If you have an application based on Watson Discovery that is delivering a million dollars of business value per year, then maybe an assortment of incremental improvements can get it to deliver 1.2 million dollars instead. That’s not a night-and-day difference, but it can be a worthy investment of time and effort.

Conclusions

The land ends where it meets the sea and the sea ends where it meets the land.


You do not need to enrich documents with entities or other language analysis to make effective use of search in Watson Discovery. You can get excellent, AI-powered information finding right away by just feeding your documents into the product. However, there are many cases where entities and other enrichments can increase the value of a search application based on Watson Discovery. This article has provided some ideas on how to use enrichments to make search more effective. If you can think of any more, feel free to add them to the comments section below!

Acknowledgements

Thank you to Sekar Krishnamurthy for extensive feedback and guidance on the content of this article and to the rest of the Watson Discovery development team for also making this amazing technology.


#Featured-area-1-home
#Featured-area-1
0 comments
421 views

Permalink