Watson Discovery

Answer Finding API beta in Watson Discovery v2

By Bill Murdock posted Fri December 18, 2020 03:26 PM


This post is a brief overview of the answer finding API beta in Watson Discovery v2.  The answer finding API extends the passage retrieval API and allows you to find concise answer spans within a passage.  It uses deep-learning-based Reading Comprehension technology as announced in our recent press release.  This post explains the API and how to use it.  For a broader explanation of why to use it and a what it is good for, see our Medium blog post, which will be published soon.

The answer finding API beta in Watson Discovery v2 adds two new parameters within the passageblock of the query API in Watson Discovery v2:

  • find_answers optional and defaults to false.  It is set to true (and the natural_language_queryparameter is set to some query string), the new answer finding feature will be enabled.
  • max_answers_per_passage is optional and defaults to 1.  The answer finding feature will only find at most this many answers from any one passage.

When answer finding is used, a new block is added to the return value within each passageobject.  That new block is calledanswers, and it is a list of answer objects. The list can be up tomax_answers_per_passagein length.  Each answer object has the following fields:

  • answer_text is the text of a concise answer to the query.
  • confidence is a number from 0 to 1 that is an estimate of the probability that the answer is correct.  Note that some answers have very low confidence and are very unlikely to be correct, so we recommend that anyone using the API be selective about what you do with answers depending on this value.
  • start_offset is the start character offset (the index of the first character) of the answer within the field that the passage came from.  It is guaranteed to be greater than or equal to the start offset of the passage (since the answer must be within the passage).
  • end_offset is the end character offset (the index of the last character, plus one) of the answer within the field that the passage came from.  It is guaranteed to be less than or equal to the end offset of the passage.

Here is an example of a query using this API (this example also appears in the Medium blog post linked to above):

{“natural_language_query”: “InfoSphere Information Server 1.3 Firefox versions”, 
“passages”: {
“enabled”: true,
“max_per_document”: 3,
“characters”: 850,
“fields”: [“title”, “content”],
“find_answers”: true,
“max_answers_per_passage”: 1}}

Here is a corresponding response:

{“passage_text”: “<em>InfoSphere</em> <em>Information</em> <em>Server</em> Web Console with Internet Explorer 11, you may get the error message: IBM <em>InfoSphere</em> <em>Information</em> <em>Server</em> supports Mozilla <em>Firefox</em> (ESR 17 and 24) and Microsoft Internet Explorer (<em>version</em> 9.0 and 10.0) browsers.”, 
“start_offset”: 287,
“end_offset”: 526,
“field”: “content”,
“answers”: [{
“answer_text”: “(ESR 17 and 24)”,
“start_offset”: 446,
“end_offset”: 700,
“confidence”: 0.6925222}]}

Theconfidence values shown in the answers are not merely the direct output of the answer finding model, which attempts to find the most likely answer within any single passage.  Instead, the confidence values that we provide as output reflect a combined estimate of how likely the document is to be relevant, how likely the passage is to be relevant, and how likely the answer is likely to be correct given that passage in that document.

We update the confidence values for documents and the ordering of documents and passages that have answers using the same signals from document retrieval, passage retrieval, and the answer finding model *.  As a result, you may find that document retrieval and/or passage retrieval may be more or less accurate when you enable answer finding.  For applications where end-users ask a lot of explicit questions (e.g., "What versions of Firefox does InfoSphere Information Server 1.3 support?") or implicit questions (e.g., "InfoSphere Information Server 1.3 Firefox versions"). we have found that turning on answer finding can substantially improve accuracy.

Because we only perform answer finding on as many documents and passages as you request, you may want to consider requesting more documents and/or more passages per document than you actually need so the answer finding model can be combined with more candidate documents and passages.  For example, if you want to show 10 documents and 1 passage from each document, consider asking for 20 documents and up to 3 passages from each document with answer finding.  That will allow the answer finding to search for answers in up to 20*3 = 60 passages and if it is confident that it has found an answer in one of those passages, that confidence will be combined with the document and passage scores to produce a final ranking which can promote a document or passage you might otherwise have missed.

* Note: Reordering documents based on answer confidences does not occur if theper_documentparameter of passage retrieval is false.  However, that parameter setting is little used, so most users can ignore this issue.




Fri January 15, 2021 10:32 AM

We expect this to work well on all the languages that Discovery supports.  At https://ai.google.com/research/tydiqa you can see the latest from one of the most prominent multi-lingual question answering benchmarks.  IBM is currently in first place by a fairly wide margin.  The model we have in the Discovery product is not the same one that is reported there, but it is quite similar and does quite well on this data too.

Fri January 15, 2021 10:19 AM

Great, thanks!

Do you expect any issues when we apply this to non English text?

Fri January 15, 2021 09:27 AM

Yes, if you set max_answers_per_passage to a value that is more than 1, you can get more than one answer in the array.  For many UIs, setting it to 1 makes sense because you want to just emphasize the one most likely answer in the passage.  However, if you have a UI where you want to show some sort of ranked list of answers or emphasize different answers to different degrees within a passage, then setting this to more than 1 can be useful.

Fri January 15, 2021 08:39 AM

interesting new capability !

Question: why is an array of answers returned when we can only (always) expect 1 ? Or can we ask for more than 1 answer ?