Skip main navigation (Press Enter).

IBM TechXchange Community Conference Events IBM Developer

IBM TechXchange Community Conference Events IBM Developer

Embeddable AI

Back to discussions

Expand all | Collapse all

Processing First Pages of Large Documents with Watson Discovery Plus t

Alan MonierMon July 26, 2021 01:01 PM

We are using Watson Discovery Plus to extract information from PDF files. Some of those pdf can have ...

Daniel ToczalaTue July 27, 2021 09:58 AM

Alan, If you are processing these docs as full documents, then the entire document will be processed. ...

1. Processing First Pages of Large Documents with Watson Discovery Plus t

Like
Alan Monier
Posted Mon July 26, 2021 01:01 PM

Reply
We are using Watson Discovery Plus to extract information from PDF files.

Some of those pdf can have a large number of pages.

The information we are looking for is in the first 2 or 3 pages.

Is there a way to tell discovery to process only the first X pages?

------------------------------
Alan Monier
------------------------------

#BuildwithWatsonApps
#EmbeddableAI
2. RE: Processing First Pages of Large Documents with Watson Discovery Plus t

Like
Daniel Toczala
Posted Tue July 27, 2021 09:58 AM

Reply
Alan,

If you are processing these docs as full documents, then the entire document will be processed. You may want to look into tools/services that will "split" PDF files, so you would be able to preprocess these files. There are a few PDF libraries out there for Python (like PyPDF2), and I have found a few good blog posts on handling PDF's in Python.

------------------------------
Daniel Toczala
Community Leader and Customer Success Manager - Watson
dtoczala@us.ibm.com
------------------------------

Original Message

Additional
Resources

Discover these carefully selected resources to dive deeper into your journey and unlock fresh insights

Office

If you need immediate assistance please contact the Community Management team

support@communitysite.ibm.com

Monday - Friday: 8AM - 5 PM MT

Powered by Higher Logic

Global message icon