Content Management and Capture

 View Only
Expand all | Collapse all

COLD document compression

  • 1.  COLD document compression

    Posted Tue May 10, 2022 11:00 AM
    I have an image services system which has some COLD documents with Enable Page Compression checked on. I want to know how FileNet compresses COLD document content. I tried zlib and gzip and not able to decompress the content. This must be a very difficult question to answer and i do not know if there is anyone out there in the world who can shed any lights on it.

    ------------------------------
    Kevin Sheng
    ------------------------------


  • 2.  RE: COLD document compression

    IBM Champion
    Posted Wed May 11, 2022 04:42 AM
    Hi,

    I can save you a lot of time...the compression is proprietary and non of the standard tools will work (as you noticed). Only 3 tools can decompress COLD pages:

    1. IDM Desktop (or one of its libraries)
    2. Image Services Toolkit
    3. Daeja Viewer

    We converted > 100 Million COLD documents to PDF and we downloaded the pages using several IDM Desktops in parallel. You could try to decompile the Daeja Applet, but IBM frowns this endeavor as this violates the licensing agreement.

    BUT: When you have the decompressed page, they will be in P-code (specs date back to 1987, the last millenium!). What are you going to to with the pcode encoded pages? 

    Hope this helps,

    /Gerold

    ------------------------------
    Gerold Krommer
    ------------------------------



  • 3.  RE: COLD document compression

    Posted Wed May 11, 2022 04:41 PM
    Thank you Gerold. You made a very good point. As a matter of fact, i do not know i need to decode even after successful decompression. This makes things even harder, doesn't it?

    I did try de-compiling Daeja Viewer code to search for a clue but without success. There is a zip package using GZip but i tried Gzip on the compressed bytes and i got an error message: not a GZip. I know I am asking too much for help, so my guess is as you said, it is proprietary compression no one can get to.

    I am thinking of using ISTK to write some C code but I cannot find any example of using ISTK. Everything becomes a searching needle in a haystack.

    I may have to go back to using IDM desktop even if it requires IS connection, but it will be a hell slow process, unfortunately.

    if you can think of anything i should try, i will be really appreciative for your guidance.


    ------------------------------
    Kevin Sheng
    ------------------------------



  • 4.  RE: COLD document compression

    IBM Champion
    Posted Wed May 11, 2022 05:06 PM
    Kevin,

    it would probably help if you would tell us your requirements :-)... makes advising much more accurate....

    When you install the ISTK there will be a bunch of C samples in a subdirectory, but who wants to do the ** pointer to pointer C  magic nowadays... brrr!

    I have written some programs that actually PRODUCED p-code documents for IS (using ISTK by the way) so could well be I'm the only living veteran who speaks p-code (hurry up, I'm 59). We converted our last customer from IS to P8 in 01/21... (the one with the > 100 mill COLD documents)

    Feel free to ask more,

    /Gerold

    ------------------------------
    Gerold Krommer
    ------------------------------



  • 5.  RE: COLD document compression

    Posted Mon February 26, 2024 12:33 PM

    Hi Gerold,

    We need to migrate the COLD documents from Image Services to FileNet P8. Can you tell me how it can be done?



    ------------------------------
    Saranya V
    ------------------------------



  • 6.  RE: COLD document compression

    Posted Mon February 26, 2024 12:33 PM

    Hi Gerold,

    We need to migrate 300M COLD documents from IS to P8. Can you explain how to do it using ISTK?



    ------------------------------
    Saranya V
    ------------------------------



  • 7.  RE: COLD document compression

    Posted Wed February 28, 2024 05:34 AM

    Hello, 

    If you ask me, that would be a perfect RPA use case. Using several robots to call the IDM Desktop print function to generates PDF out of the COLD documents.

    Olivier BALTUS



    ------------------------------
    Olivier Baltus
    NSI Luxembourg
    ------------------------------



  • 8.  RE: COLD document compression

    IBM Champion
    Posted Wed February 28, 2024 05:55 AM

    Agree it is the simplest was to go, disregarding TOCs and documents > 1000 pages, but...

    At 1 document/sec  (an educated guess) and NO interruptions it takes 3500 das if my math serves me right. If you can make 10 parallel threads it is more tha nnone year

    Kind regards,

    /Gerold



    ------------------------------
    Gerold Krommer
    ------------------------------



  • 9.  RE: COLD document compression

    IBM Champion
    Posted Wed February 28, 2024 05:52 AM

    Hi Saranya,

    this is impossible to describe in detail here.

    In essence you write an ISTK program (in C that is) that retrieves the COLD pages, decompresses them (so far so documented) and THEN you need to interpret the returned P-Code to lay out your text on the page. P-Code is proprietary to IBM (but I happen to have a manual from ANCIENT FileNet times).

    You can imagine P-Code as marked up text (if you 'cat'  such a page you can recognize it). The good news is, that if you have only used the native FileNet COLD software only a limited subset of p-code commands are used. Some more work is ahead if background images are used.

    It is doable and we did it for (I think) 140 Million documents over the course of a few months during a running production. We did NOT use ISTK though, we used Java and interpreted the p code, We also replaced the FileNet COLD with a custom written COLD (host output  to PDF), works fine since years.

    Do not forget added complexity because of the TOCs generated by FileNet COLD and  (if applicable to your situation) the maximum of 1000 pages per document (imposed by IS). We had many such documents and had to add special logic to combine them.

    So you see it is quite a massive undertaking requiring knowledge of IS, P-Code, FileNet Content manager, especially if you do it the first time...

    Good luck,

    /Gerold



    ------------------------------
    Gerold Krommer
    ------------------------------



  • 10.  RE: COLD document compression

    IBM Champion
    Posted Wed February 28, 2024 05:58 AM

    After checking our code stand corrected. We did control IDM Desktop to generate PDF (not Java) and used CEBI to archive to CPE



    ------------------------------
    Gerold Krommer
    ------------------------------



  • 11.  RE: COLD document compression

    Posted Wed February 28, 2024 09:55 AM

    Good morning,

    10 years ago, I was faced with the same request. One of our clients asked us to export millions of documents stored in an IMS library. I ended up building "a nuclear power plant" which consisted of a Java application to orchestrate the migration flow, a custom Image Service Java connector based on the Image Services Resource Adapter to collect image metadata, and a PDF converter custom based on a COM bridge to trigger printing via IDM ActiveX objects and the PDF printer driver to render the document in PDF format. Overall process performance was acceptable. The key point here is to parallelize the tasks, and if the store is OSAR, select the documents based on their families (disks) in order to avoid long access times.

    Like I said, if I had to do the same thing again, I would go the RPA route.

    Olivier BALTUS



    ------------------------------
    Olivier Baltus
    NSI Luxembourg
    ------------------------------