Build with Watson Apps

Expand all | Collapse all

Processing First Pages of Large Documents with Watson Discovery Plus t

  • 1.  Processing First Pages of Large Documents with Watson Discovery Plus t

    Posted Mon July 26, 2021 01:01 PM

    We are using Watson Discovery Plus to extract information from PDF files.

    Some of those pdf can have a large number of pages.

    The information we are looking for is in the first 2 or 3 pages.

    Is there a way to tell discovery to process only the first X pages?



    ------------------------------
    Alan Monier
    ------------------------------


  • 2.  RE: Processing First Pages of Large Documents with Watson Discovery Plus t

    User Group Leader
    Posted Tue July 27, 2021 09:58 AM
    Alan,

      If you are processing these docs as full documents, then the entire document will be processed.  You may want to look into tools/services that will "split" PDF files, so you would be able to preprocess these files.  There are a few PDF libraries out there for Python (like PyPDF2), and I have found a few good blog posts on handling PDF's in Python.

    ------------------------------
    Daniel Toczala
    Community Leader and Customer Success Manager - Watson
    dtoczala@us.ibm.com
    ------------------------------