Original Message:
Sent: Sun June 30, 2024 09:51 AM
From: Jay Bowen
Subject: regarding Bulk content move sweep
Michael,
There are settings in ACCE as you show but also at the FCD and site level. How many documents per minute are you processing using bulk sweep? How many pages are in those document objects? I recently finished a migration physical 10+ year old server quad core with slower SAN disk moved to p8 another 10 year old server via federation. I used bulk sweep which for others saw great numbers in my case performance was dismal. I ended up writing a program using content move and different approaches- single thread, multi thread, bulk batch insert, multi thread batch insert. One thing FileNet doesn't like are too many threads for content move or batch move regardless of settings. I altered content move thread tasks and a lot of the obvious settings you shared in your screen print. I aligned my content move to IDs in MSAR's I was moving so the virtual platter would be loaded 1x and read the doc IDs from it. A colleague suggested prefetch and in his migrations MSAR prefetch made a huge difference for me it ended up being just one more thing to do without yielding a big enough difference to continue investing time in it. Eventually I was moving 1 million document objects per day (not counting pages) that was good enough for me but it did take time to get there. Nope- I did not see a log entry or pegged resource telling me to go change setting XYZ I incrementally adjusted based on IBM documentation and suggestions from colleagues.
Now- this may sound like the long way but the other option is reading the MSAR directly, converting to PDF with optional OCR and importing. You would need vendor tools for this but hands down the best option if converting the file format is desired. Otherwise you migrate images into P8 > export > convert to PDF > version.
Settings I can recommend for federated IS to P8
- Websphere
- Increase ORB thread pool either allocate beyond max or change to (125 min and 250 max)
- Increase JDBC thread pools 200 max connections on GCD and Ostores.
- ACCE -
- Abandoned content cleanup 1036800
- Temp file life 1036800
- Dispatcher wait 5 seconds
- Max content queue worker threads 100
- FCD device in ACCE at GCD level
- Timeout seconds 50,000
- FCP pool timeout 600
- FCP pool preferred size 50
- FCP pool max wait seconds 5
- IS
- Increase CSM IO process count one per cache disk
- Add new page cache on fast disk.
- Change cache allocation, increase page cache to 90 or 100%.
- API vs Bulk sweep
- Bulk sweep or highly multi threaded API jobs would lock the content queue - worker when this occurred we would need to recycle the DB + FileNet. IBM has a technote on it there is one table with one row and once buggered up you recycle to clear it.
- I queued documents by platter and I had some documents with 999 pages others with 1 tiny text file. My numbers were all over the place once I was chugging along but I did not have a page count mechanism to tell me if I was going faster or slower based on total size or pages. I used document objects when I started it was 250,000k per day but after some time got to a million documents.
- I preferred the API approach where I could target doc ID's (well guids) for a platter and every now and then would get a FCD exception so I added a loop with retry then eventually targeting just the doc ID that would fail out of a batch for later processing or investigation. Turns out a few platters were bad. Only way I would have found this out was processing by IDs and platters otherwise you are depending on CE logs. I should point out I had cases where the entire platter was bad and/or missed reads checksum errors at the page level. Since I was tracking the surface, doc ID and P8 guide by object store I was able to easily group these errors to certain platters go to IDM desktop or IS admin tools to find root cause.
------------------------------
Jay Bowen
www.bowenecmsolutions.com
Medina, OH
Original Message:
Sent: Fri June 28, 2024 03:20 AM
From: Gerold Krommer
Subject: regarding Bulk content move sweep
Sorry, but what good is it to change parameters if you don't even know where the bottleneck is (= 'im Nebel herumstochern'), at least is it reading or writing. I do not believe the standard setting are responsible for such a abnormal performance. We usually see medium two digits documents/sec without modifying anything.
Also the question about single page TIF is unanswered. 8000 IS documents could in the worst case be 8000000 files (COLD document can have up to 1000 pages per document) and that could explain the performance.
If no other strategy comes to your mind, write a small Java program that reads the federated IS documents to disk and see how many docs/sec you get.
Use CEBI to do mass ingestion of documents and see what you get there, THEN we might be able to propose parameters or a strategy.
Kind regards,
/Gerold
------------------------------
Gerold Krommer
Original Message:
Sent: Thu June 27, 2024 10:37 AM
From: Michael Pressler
Subject: regarding Bulk content move sweep
Hello to all,
I am still having performance problems with a BlukMove sweep to migrate documents from IS to P8. (about 8000 documents / hour).
I must admit that I still have gaps in my knowledge about which settings in P8 (or Image Services) I can use to improve performance. At the moment everything runs more or less with out-of-the-box parameters. Unfortunately, the P8 documentation is not much help here.
For example, it is not entirely clear to me whether the MoveContent is more dependent on the settings in the Replication Subsystem or CFS Import Agendt Subsystem, or both, or not at all.


Via the context help menu I can see what the individual parameters mean, but there is no information about what influence the individual parameters have when you change them and with which settings you can possibly increase the throughput for the MoveContent.
I am of course aware that many factors play a role here. The FileNet system runs as a container in an AKS cluster in Azure, the IS system on a Windows VM also in Azure and the content vin P8 is written to an Azure blob storage. There are many possibilities as to why the MoveContent is so slow. But I want to make sure that everything is set up optimally in FileNet.
I am grateful for any tips and help.
By the way, is there any chance to connect the IBM System Dashboard to a containernd FileNet System in an AKS-Cluster ? I tried but failed at the end.
Regards
Michael
------------------------------
Michael Pressler
Original Message:
Sent: Sat April 20, 2024 05:29 AM
From: Gerold Krommer
Subject: regarding Bulk content move sweep
First, we are really deviating from the topic of the original posting.
Second, this all has been explained else where in this forum and even in this topic. COLD is not just text, it is P-Code(!) AND it is most likely (I would hope so, as one can see > 70:1 compression ratio) compressed. As I said you need the tools (ISRA or IDM Desktop) to get it uncompressed. THEN you can start worrying about layouting yout text.
If it were that easy there wouldn't be the plethora of tools.
Kind regards,
/Gerold
------------------------------
Gerold Krommer
Original Message:
Sent: Fri April 19, 2024 04:01 PM
From: Venkat S
Subject: regarding Bulk content move sweep
thanks Gerold/Jay for your prompt response.
I ran the queries as you advised and found that all cold documents are text based without any background template. if this is the case, Can I use simple java APIs(ex:Aspose) to convert text to pdf and store them to FileNet? please let me any thoughts on this.
1 - INX_TEXT_DOC - Text/Cold documents without background image
thanks,
Venkat
------------------------------
Venkat S
Original Message:
Sent: Fri April 19, 2024 09:53 AM
From: Gerold Krommer
Subject: regarding Bulk content move sweep
The FileNet Image Services Index and Workflo Database Contents manual is our friend (especially page 74). Assuming Oracle
SELECT COUNT(*) FROM F_SW.DOCTABA WHERE F_DOCTYPE =1 OR F_DOCTYPE=3
gives you the number of documents (1=text,3=mixed (unlikely you have one))
Number of pages:
SELECT SUM(NVL(F_PAGES,1) FROM F_SW.DOCTABA WHERE F_DOCTYPE =1 OR F_DOCTYPE=3
(This is from memory as I do not have an IS system in reach any more)
Hope this helps,
/Gerold
------------------------------
Gerold Krommer
Original Message:
Sent: Thu April 18, 2024 08:48 AM
From: Venkat S
Subject: regarding Bulk content move sweep
may I know the database query to run on IS doctaba identify cold documents and their count? basic I need to know the difference b/w normal documents and COLD documents by looking at database enrtries.
much appreciated if someone can answer my query.
thanks,
Venkat
------------------------------
Venkat S
Original Message:
Sent: Thu April 18, 2024 05:31 AM
From: dorothea vulcan
Subject: regarding Bulk content move sweep
Related to COLD documents = Computer Output to Laser Disc.
they could be different document types:
- big text document compressed and archived
- could be text documents with templates
For the last type there is configured in the main document the template (docId) and the position of different parts of the text inside the template.
An idea could be to export them as images uisng IDM Viewer to load them but you may try other tool and export as text and template. You may get a look in the first part of the text document and check the positions...
You must know that theoretically the template is a very old document and there is at the beginning not in page_cache, for example.
Next step will be to archive them as images or how you want to customize it in P8.. you may have the same customizing in P8 As I told you there are a lot of tools during the years made by FileNet Services and after as IBM team and there is no sense to discover now what this team did > 25 years...
perhaps help
______________________________
Dorothea Vulcan
phone: +49 171 7832 120
________________________________
This email is strict confidential otherwise not specified. All other receivers are required to use no content and addresses and to delete them from all clients and servers. In any other cases they could be punished conform international laws.
___________________
Original Message:
Sent: 4/17/2024 5:46:00 AM
From: Gerold Krommer
Subject: RE: regarding Bulk content move sweep
Hi,
I have commented on the same question somewhere else in this forum. IS COLD is formatted in P-Code (of which I have VERY old documentation), but if it is compressed you absolutely MUST use one of the mentioned tools to get it uncompressed.
Only then use one of the very valid approaches mentioned previously or write your own layout engine. If you never used anything else than FileNet COLD only a limited subset of P-Code is used (which makes such an approach feasible, I would NEVER want to write a generic layout engine regardless of syntax).
There are/were other tools out, that would e.g. produce mixed documents (COLD and image mixed on different pages) and then it gets comples.
Hope this helps,
/Gerold
------------------------------
Gerold Krommer
Original Message:
Sent: Tue April 16, 2024 01:54 PM
From: Venkat S
Subject: regarding Bulk content move sweep
Hi Jay,
thanks so much for your response.
I am talking about COLD documents which are template based, I believe they are nothing but overlays.
best regards,
Venkat
------------------------------
Venkat S
Original Message:
Sent: Tue April 16, 2024 01:33 PM
From: Jay Bowen
Subject: regarding Bulk content move sweep
Hi Venkat, which COLD are we talking about compressed text or image overlay? If it is compressed COLD there is a jar part of IBM ISRA APIs for decompression for best speed. For overlays I have found the best approach, highest fidelity is to reach back into the time machine and create a .net winform app and embed the IDM viewer. Use the IDM API's to fetch and view the document in viewer, the viewer has an API method for print and set default printer to PDF ahead of time then capture the file. I know- there could be millions of files so you create 10 nodes or more that just virtual print 24/7 from a concurrent queue or db. Your other option is IDM API to get text, get rows, get columns and manage the layout which you do programmatically. Much faster but requires you to do the layout. Cost is usually a fraction of other approaches it's just the calendar time waiting for the process to finish.
I'd be interested in hearing from vendors that have COLD tools, how they do it and cost brackets.
------------------------------
Jay Bowen
Original Message:
Sent: Tue April 16, 2024 12:34 PM
From: Venkat S
Subject: regarding Bulk content move sweep
Hi Jay/others
could you please provide solution to convert COLD documents to either TIF or PDF.
Thanks and regards,
Venkat
------------------------------
Venkat S
Original Message:
Sent: Tue April 16, 2024 10:37 AM
From: Jay Bowen
Subject: regarding Bulk content move sweep
Hi, I have migrated a number of IS systems typically these were export > convert PDF > import with two traditional migrations IS>P8 via CFS federation. For raw speed reading the MSAR file directly you can process a surface in an hour hands down fastest method of extraction. IDM APIs are much slower but you can have multiple nodes extracting and converting pages, allows in line transformations to PDF or merge TIFF, COLD conversions things you can't do with a CFS approach. WAL is faster for sure but more complicated and sensitive to deployment environment. Now getting to IS to P8 federation the first thing you can do to significantly improve performance is increase your page cache by adding a disk or expanding your current volume(s), then increase the TTL for those objects. If you are not scanning into IS change your allocation % of use so that retrieval cache is your largest pool. My own observations the SAN or disk used for MSAR also has a performance impact obviously there is a lot of I/O going to happen so organize your migrations by surface. Batch content move is great but I have had some issues where a convenient approach with a move sweep revealed some weaknesses in customers environment and then fixing caused a mountain of work. Instead I used the CPE APIs, organized my requests by objects on each surface and perform the move. Takes more prep time but you are notching progress one surface at a time, very manageable as well. Optionally you can prefetch these IDs into cache. Now it doesn't necessarily make sense to me why disk cache is faster than disk MSAR other than nothing is faster than IS page cache. When using the CPE contentmove API approach I have tried batch updates, multi threaded calls, large batches, small batches to see what I could squeeze out of CPE/IS. AS others mentioned I ended up with deadlocks and all sorts of issues when I increased the thread pressure using multiple API calls. In the end I am using batch update with 10 docs/per second per API call so 600 doc objects/second various page counts. Other values you can do for tuning-
WAS - orb.thread.pool 125, WAS JDBC > 100 connections max, ACCE GCD Abandoned cleanup interval 1036800, Temp file lifetime 1036800, content queue max worker 100, site settings level IS object where ISuser/pwd/domain is set I increased timeout seconds to 50000.
------------------------------
Jay Bowen
Original Message:
Sent: Mon April 15, 2024 05:08 AM
From: Michael Pressler
Subject: regarding Bulk content move sweep
Hello everyone,
Exactly the right thread
We have just migrated an IS system to P8 and are currently performing the bulk move which is extremely slow (4 - 5 documents/s).
P8 is running as a container in an AKS cluster in Azure. The IS system was moved to a Windows VM in Azure as a read-only system. The MoveContent is currently running with the default settings. (IS and ImportAgent).
We are currently looking for the handbrake and where we can still improve performance.
I would be very grateful for any kind of hints.
Greetings
Michael
------------------------------
Michael Pressler
Original Message:
Sent: Tue February 06, 2024 08:43 AM
From: Gerold Krommer
Subject: regarding Bulk content move sweep
Hi,
depending on the requirements we have used different strategies including Move Content and export/import.
Assuming - for simplicity - 100 docs/sec it will take you net 115 days to move (if my math is correct), given normal maintenance interruptions it will be half a year wall clock time. If you have nothing to convert, a real plain move of content and the migration is not time critical (you have to pay maintenance for the two parallel systems) then you a are good to go.
If any of the assumptions above does not hold, consider exporting and importing. Using 4 bare metal Linux servers with ample memory and cpu, two additional virtualized servers, dedicated network and dedicated carefully crafted and tuned import clients we achieved 600 docs/sec, but those were not really useable as the TSM (sorry, Spectrum Protect) server could not keep up with that (the staging area filled up and we had to pause the import ) and we achieved a real (net) import rate of 250 docs/sec. This was also for a billion documents from a host that were prepared beforehand and staged on disk.
Needless to say the system wasn't useable during import (ran next to 100% CPU) but didn't have to as the customer switched to P8 after import...
Not a lot of ECM system can do as fast and stable as P8.
Hope this helps,
/Gerold
------------------------------
Gerold Krommer
Original Message:
Sent: Mon February 05, 2024 03:28 PM
From: Venkat S
Subject: regarding Bulk content move sweep
Hi Gerold,
thanks so much for your response.
so you have have followed the same approach i.e. CFSIS for metadata migration and content move sweep for content migration right?
based on your experience, do you recommend the above said approach or some other approach? we are planning to migrate around One billion equivalent to 1000 million documents.
in my case, we will migrate documents first and will ask users to use p8 once migration is done.
------------------------------
Venkat S
Original Message:
Sent: Mon February 05, 2024 11:23 AM
From: Gerold Krommer
Subject: regarding Bulk content move sweep
Hi,
we did many of those migrations and in addition to what Ruth said:
If your IS system still contains documents with multiple pages stored as single-page TIFFs then you will get one document with many content elements and the user experience with those is probably not what they expect, although Daeja has improved over time.
We migrated IS systems with several hundreds of millions of documents from IS to P8 and refrained from much tuning as it always had collateral damage as locking errors or degraded performance for users (the migration was done on productive systems as migrations over weekends or the like were impossible).
Having said that - on a very rough ballpark figure on a reasonably sized system - you should be able to achieve 60 documents/second as an order of magnitude (what that is in docs/hour I leave to your math). We found document size to be more of a limiting factor than amount.
Do not forget you might have to stop migration in backup windows.
It is not uncommon for larger migrations to run for months and it is a matter of scrutinizing, bookkeeping and following up on inevitable errors that will appear.
With one customer we used IBM Cloud Object Storage as the target storage (fixed content device as HW retention was required) and didn't find this to be a limiting factor of any kind, but of course there is the staging area...
Hope that helps,
Gerold
------------------------------
Gerold Krommer
Original Message:
Sent: Fri February 02, 2024 10:02 PM
From: Venkat S
Subject: regarding Bulk content move sweep
I have federated metadata of documents in Image services to FileNet P8(5.5.9) and planning to move content(from FCD area to local FileNet storage area) of the federated documents using Bulk content move sweep so i need help regarding below:
1) throughput means how many document's can moved per hour?
2) can I create multiple sweeps with same criteria and run at same time?
3) will there be a difference(in terms of throughput & performance) if I use cloud storage area(using S3 or Azure blob) instead of local FileNet storage area?
much appreciated if someone can respond on this quickly.
thanks in advance
Venkat
------------------------------
Venkat S
------------------------------