Content Management and Capture

Expand all | Collapse all

Image PDF to Searchble PDF

  • 1.  Image PDF to Searchble PDF

    Posted Thu October 15, 2020 12:16 PM
    Team,

    We have a requirement to convert non-searchble PDF (In FileNet) to searchable PDF aand store back to same repository (FileNet).
    Can you please suggest best practices and approaches to do the same.

    1. Can we import PDF direclty from FileNet using Datacap CMIS ingect?
    2. How can we place the document in same place with latest version in FilNet repository.
    3. We have 100M documents to process, kidnly suggest best hardware configuration.
    4. We are not going to extract any feilds, just convert non searchble PDF to searchable and place it in same location. 



    ------------------------------
    Thanks,
    Chandra.
    ------------------------------


  • 2.  RE: Image PDF to Searchble PDF

    Posted Thu October 15, 2020 04:50 PM
    Hi Chandra,

    To get you started, for large sets of documents you will likely want to make use of the FileNet bulk sweep jobs to export the relevant documents from the repository utilizing FileNet background cycles. There is a Knowledgecenter topic for Datacap 9.1.7 related to how a FileNet bulk sweep can be configured for export and consumption by Datacap: https://www.ibm.com/support/knowledgecenter/SSZRWV_9.1.7/com.ibm.dc.develop.doc/dcdev629.htm

    Once you have processed the documents in Datacap you will want to make use of the FNP8_UpdateContent action (I believe the example in the above link shows how to update the metadata properties but not version the content itself). Here is a link to the relevant documentation: https://www.ibm.com/support/knowledgecenter/SSZRWV_9.1.7/com.ibm.dc.reference.doc/dcaca938.htm

    Regarding hardware configuration, if you have available cycles the bulk sweep with utilize background cycles and can be scheduled in off-peak hours so as to not affect FileNet responsiveness. You will have to take into account the duration over which you will OCR said documents with Datacap (as OCR is one of the most CPU intensive operations) and how much new ingestion headroom you have for FileNet as the upload will occur over the FileNet WSI web service.

    I hope this helps you get started.

    Thanks,
    Tim

    ------------------------------
    TIM PASCARELLA
    ------------------------------