Alan,
If you are processing these docs as full documents, then the entire document will be processed. You may want to look into tools/services that will "split" PDF files, so you would be able to preprocess these files. There are a few PDF libraries out there for Python (like PyPDF2), and I have found a
few good blog posts on handling PDF's in Python.
------------------------------
Daniel Toczala
Community Leader and Customer Success Manager - Watson
dtoczala@us.ibm.com------------------------------
Original Message:
Sent: Fri July 23, 2021 12:27 AM
From: Alan Monier
Subject: Processing First Pages of Large Documents with Watson Discovery Plus t
We are using Watson Discovery Plus to extract information from PDF files.
Some of those pdf can have a large number of pages.
The information we are looking for is in the first 2 or 3 pages.
Is there a way to tell discovery to process only the first X pages?
------------------------------
Alan Monier
------------------------------
#BuildwithWatsonApps
#EmbeddableAI