Content Management and Capture

Don't Touch That Document - How AI-powered Document Classification can Automate Document Processing

By DAVID Jenness posted Fri July 31, 2020 02:36 PM


Classification is the human need to impose order on nature and find hidden relationships. It is only by grouping organisms and species together under an organized taxonomy that we begin to understand how the world is ordered. When managing content, such as documents, records, images and rich media, the same rules apply.  It is only through classification that huge masses of data can be stored and retrieved quickly.

The problem is, until only recently, this was a manual and tedious process.  When organizations began adopting digital capture and content management systems, they had no way to identify a documents automatically, so teams of “indexers” would view each document image one by one to identify and classify the type of document and manually add the meta-data required for search. For organizations receiving thousands of documents a day, this process was a serious bottleneck.

With the advent of Machine Learning (ML) that can identify patterns in a document – header, footer, layout, etc. - combined with Natural Language Processing (NLP) that can “read” a document like a human does, the possibilities became enticing.  Automated classification promised to streamline the document ingestion process. For large companies that add hundreds of thousand s and somestimes millions of documents a day, the cost and time savings are too hard to ignore.

IBM introduced Automation Content Analyzer in 2018, which applied both ML and NLP, and quickly began developing its classification capabilities.  To show you what it can do, IBM Offering Manager Mira Kim offers a demonstration of how it works in this short video from the “Art Of Automation” series.

But how does it work in the real world?  Recently, Yulin Yin, lead technical engineer from a large Midwestern county government, used Automation Content Analyzer to build an AI-powered application for receiving and classifying documents with no need to rely on human indexers. In this short demonstration excerpted from an AIIM-IBM Virtual Event, Yulin shows how documents can be received through two methods, via e-mail or through a simple online portal, and how Automation Content Analyzer extracts data and applies classification.