Watson Discovery

Watson Discovery

Accelerate business decisions and processes with an intelligent document understanding

 View Only

Part II : Pattern Induction - Using Watson Discovery to Extract Patterns?

By Bikalpa Neupane posted Wed November 10, 2021 02:50 PM

  
Part I:

In our previous blog, we explained to you briefly about the structure of patterns and how IBM is poised to assist you with the latest offerings in Watson Discovery. In this blog we will explain to you Pattern Induction and how you can utilize it with ease in IBM Watson Discovery.

Pattern Induction is a human-in-the-loop system that combines the expertise of domain experts with automatic learning capabilities to quickly learn a high-quality extractor. In this system we enable human experts to quickly provide examples and feedback to system suggestions to achieve domain-specific results and high coverage and quality.

Let us walk you through a typical Pattern Induction workflow from the perspective of the user. For the sake of the example our goal is to extract revenue information from financial documents, as discussed in our earlier blog.

Prerequisites: Before starting, please create a Pattern Induction project, by following the few easy steps outlined in the “Try out Pattern Induction” section towards the end of this blog.

STEP 1: Highlight a few examples. Once you completed the prerequisites, start by highlighting a few strings that belong to the pattern you want to extract (see example figure 1 below). Once you have provided enough examples (we recommend at least two for this version of the release), the system will learn the general pattern underlying the provided examples.

Tip: We encourage you to start off with providing two examples and waiting for the system to finish learning before you provide feedback to the learned results and/or directly highlight more examples.
Figure 1: User highlights a few examples

Figure 1: User highlights a few examples

STEP 2: Inspect the extractions found by the model and reply to the system’s suggestions. Once the system processes the highlighted examples and learns your first version of the extractor, it updates the screen with two types of information (see figure 2 below): First it highlights in green all pieces of text predicted by the currently learned extractor for you to inspect. Second, the system probes a list of yes/no questions for you to verify — to understand your intent and to correct any wrong extractions.

Tip: We encourage you to answer as many questions as possible (ideally all), as these questions have been strategically chosen by the system to help differentiate between potential patterns that you may want to extract.
Figure 2: System returns a few suggestions for the user to verify

Figure 2: System returns a few suggestions for the user to verify

STEP 3: Wait for a while…. once the system learns an accurate extractor (composed of a small number of patterns) it will inform you accordingly.

Figure 3: Backend algorithm informs that an accurate algorithm has been learned

Figure 3: Backend algorithm informs that an accurate algorithm has been learned

STEP 4: Review extracted examples. To ensure the accuracy of the extractions, you can click on the “Review examples” pane and inspect the list of extracted examples. If you identify any mistakes or missing extractions, you can provide additional examples and/or feedback by repeating steps 1–3 above.

Figure 4: User reviews extracted patterns

Figure 4: User reviews extracted patterns

STEP 5: Saving your pattern. If everything looks correct you can now proceed to the final stage of the process which involves saving the learned patterns for future use. Simply type in a name for your pattern in the top left corner and then click on the “Save pattern” button on the top right corner. 

Supplementary section: Try out Pattern Induction

Follow these easy steps to try Pattern Induction:

  1. Create an IBM account and set up a Watson Discovery project as described below:

Sign up for an IBM account on Watson Discovery and then navigate over to your cloud dashboard: https://cloud.ibm.com. Click on the “Create a resource” button on the top right corner of the screen.

Figure 5: Homepage of your cloud account

Figure 5: Homepage of your cloud account

Search for “Watson Discovery” on your left, and click on the service titled “Watson Discovery”. Select a plan suitable for you, e.g., premium, plus, etc.

Figure 6: Creating a Watson Discovery Service

Figure 6: Creating a Watson Discovery Service

After creating the service, navigate to https://cloud.ibm.com/resources. Here, you can view the recently created service, as shown below. Click on your “Watson Discovery” service and click on the button “Launch Watson Discovery”. This will redirect you to the service where you can create a project for your extraction task.

Figure 7: List of resources. Note the

Figure 7: List of resources. Note the “Watson Discovery” services under the section “Services and software”

To create a project, provide a project name, select “Document Retrieval” as project type (see Figure 8), and click “Next”. Complete the steps to upload your dataset.

Figure 8: Select

Figure 8: Select “Document Retrieval” as your project type

2. Now, to follow along, you can try downloading any one of the following datasets here:

  • From the demo, you may try to extract revenues and cash flows from the IBM Press Release Dataset. Click here.
  • Challenge yourself with the FBI press release dataset and extract percentages of crimes of different types. Click here.

After the data upload is complete, navigate to the “Improve and Customize” screen, where you can access Pattern Induction by clicking on “Patterns” under “Teach domain concepts” (see Figure 9).

Figure 9: Accessing Pattern Induction

Figure 9: Accessing Pattern Induction

Click on “Create” to create a new pattern, select documents to create patterns from (or let the system randomly select documents out of your document collection), and then hit “Next” (see Figure 10). This will navigate to Pattern Induction, where you may start creating patterns.

Figure 10: Select documents to create patterns with

Figure 10: Select documents to create patterns with

In the next blog, we will guide you through the best practices and extractors we have beyond the financial use case. Read here.

  • Language Support: The current version of Pattern Induction is only available in English language. We are currently expanding this work for other languages.

Authors: Dr Maeda Hanafi, Dr Yannis Katsis, Dr Yunyao Li, Dr Bikalpa Neupane


#WatsonDiscovery
0 comments
30 views

Permalink