Content Management and Capture

 View Only

IBM Datacap Accelerator Integrations - Part 1

By Sunanda Rao posted Thu March 18, 2021 01:53 PM

When it comes to extracting information from the documents, today, there are few business operations which involve manual extraction of information from the documents. This manual extraction of information not only slows down the processing but could also be expensive, time consuming - when processed at scale and is error prone. Lot of these document-driven processes can be automated to drive the business process workflows.

One of the approaches to extract information from the business documents is by a rule-based method. Datacap is one of the capture tools that extract the features from the documents and gives a structured output in the form of KVP's. Intelligent Extractor (IE) or Accelerator adds additional set of action libraries and provides intelligent extraction. This data extraction can be customized to meet the requirements by using Datacap rules engine.

In addition to these set of action libraries that comes with the Datacap tool, one can create a custom action library to carry out specific function.

Deploying custom action library.

The custom action library is made available as DLL. Steps to deploy custom action library -

  1. Create a new project using the custom action template in visual studio IDE.
  2. The custom action is written in C# .NET framework.
  3. The project is compiled, or the solution is built to create a DLL.
  4. DLL is then placed in the RULES folder of the Datacap application and are made available in the action library section in the Datacap studio, as shown below.
Action library

These action library (from the action library section in the Datacap studio) are added to the rulesets and are associated at different document hierarchy.

Digitization in Datacap.

  • Document processing in Datacap is triggered either when the documents are placed in the input folder or when documents are scanned and sent to Datacap. These input documents are then converted to image files after executing certain rules, in the vscan step.
  • In page identification step, these image files are first cleaned up, enhanced and are then processed by the OCR engine. The OCR engine detects the textual part in the image, recognizes and extracts the text from it.
  • In recognition step, Datacap analyses the page layout, builds the data model file and identifies the document type for each page.
  • In extraction step, based on the document type - IE extracts the value for fields from each page.

      Ideally, the value extracted for fields are sent to the NVerify step for manual verification.

      At times, there could be issues with digitization like incorrect detection of text. Sending these documents with incorrect text detection for manual verification will lead to unnecessary processing thereby increasing the overall processing time. In order to address digitization issues, its necessary to validate the digitized data before pushing it for the manual verification. Hence, pre-validation custom action library was built and was added before the NVerify step in the workflow.

      The pre-validation custom action validates the digitized data by invoking the ODM service which executes the business rules.

      Integration with IBM Operational Decision Manager (ODM).

      To decide if the digitized data from Datacap is valid or not, Datacap is integrated with Operational Decision Manager (ODM).

      Operational Decision Manager is a rule management system that lets us define the business rules. Any validation to be performed with the digitized data needs custom implementation of the rules. Each of these rules defined are then deployed onto the target server.

      The ODM REST service is invoked from Datacap (by the pre-validation action library) by sending the payload. The payload contains the fields that belong to certain document type whose values needs to be validated against the business rules. Depending on the response received from the ODM service, the document is either rejected or considered for further processing. Thus making the processing more efficient.

      The documents for which the extraction is incorrect, or the value has been extracted correctly but hasn't passed the business checks - These documents are not sent to the NVerify step in the workflow. By doing so, for manual verification, the operator is prompted to investigate only those documents - whose data extraction is valid, and which have passed the business rules. Only these documents are sent for manual verification thereby reducing the overall processing time.

      Additionally, the custom action library developed eliminates the unnecessary downstream processing.

      Likewise, one could leverage this capability of creating custom actions by integrating IBM Datacap with other applications to meet the business requirements.

      Do consider looking into part 2 , for another IBM Datacap Accelerator Integration.

      Check this link to know more about IBM ODM -