Image PDF to Searchble PDF

View Only

Expand all | Collapse all

Chandrashekar AThu October 15, 2020 12:16 PM

Team, We have a requirement to convert non-searchble PDF (In FileNet) to searchable PDF aand store back ...

TIM PASCARELLAThu October 15, 2020 04:50 PM

Hi Chandra, To get you started, for large sets of documents you will likely want to make use of the ...

1. Image PDF to Searchble PDF

0 Like
Chandrashekar A
Posted Thu October 15, 2020 12:16 PM

Reply
Team,

We have a requirement to convert non-searchble PDF (In FileNet) to searchable PDF aand store back to same repository (FileNet).
Can you please suggest best practices and approaches to do the same.

Can we import PDF direclty from FileNet using Datacap CMIS ingect?

How can we place the document in same place with latest version in FilNet repository.

We have 100M documents to process, kidnly suggest best hardware configuration.

We are not going to extract any feilds, just convert non searchble PDF to searchable and place it in same location.

------------------------------
Thanks,
Chandra.
------------------------------
2. RE: Image PDF to Searchble PDF

0 Like
IBM TechXchange Speaker

TIM PASCARELLA
Posted Thu October 15, 2020 04:50 PM

Reply
Hi Chandra,

To get you started, for large sets of documents you will likely want to make use of the FileNet bulk sweep jobs to export the relevant documents from the repository utilizing FileNet background cycles. There is a Knowledgecenter topic for Datacap 9.1.7 related to how a FileNet bulk sweep can be configured for export and consumption by Datacap: https://www.ibm.com/support/knowledgecenter/SSZRWV_9.1.7/com.ibm.dc.develop.doc/dcdev629.htm

Once you have processed the documents in Datacap you will want to make use of the FNP8_UpdateContent action (I believe the example in the above link shows how to update the metadata properties but not version the content itself). Here is a link to the relevant documentation: https://www.ibm.com/support/knowledgecenter/SSZRWV_9.1.7/com.ibm.dc.reference.doc/dcaca938.htm

Regarding hardware configuration, if you have available cycles the bulk sweep with utilize background cycles and can be scheduled in off-peak hours so as to not affect FileNet responsiveness. You will have to take into account the duration over which you will OCR said documents with Datacap (as OCR is one of the most CPU intensive operations) and how much new ingestion headroom you have for FileNet as the upload will occur over the FileNet WSI web service.

I hope this helps you get started.

Thanks,
Tim

------------------------------
TIM PASCARELLA
------------------------------

Original Message

Content Management and Capture

Image PDF to Searchble PDF

Chandrashekar AThu October 15, 2020 12:16 PM

TIM PASCARELLAThu October 15, 2020 04:50 PM

1. Image PDF to Searchble PDF

2. RE: Image PDF to Searchble PDF