1. regarding Bulk content move sweep

0 Like
Venkat S
Posted Fri February 02, 2024 10:03 PM

Reply
I have federated metadata of documents in Image services to FileNet P8(5.5.9) and planning to move content(from FCD area to local FileNet storage area) of the federated documents using Bulk content move sweep so i need help regarding below:

1) throughput means how many document's can moved per hour?

2) can I create multiple sweeps with same criteria and run at same time?

3) will there be a difference(in terms of throughput & performance) if I use cloud storage area(using S3 or Azure blob) instead of local FileNet storage area?

much appreciated if someone can respond on this quickly.

thanks in advance

Venkat

------------------------------
Venkat S
------------------------------
2. RE: regarding Bulk content move sweep

0 Like
RUTH Hildebrand-Lund
Posted Mon February 05, 2024 11:04 AM

Reply
1) throughput means how many document's can moved per hour? <-- This is going to be dependent on your environment, network speed, and the tuning you do for the sweep.

2) can I create multiple sweeps with same criteria and run at same time? <-- Yes, but you need to make sure they are using different filter expressions -- however this is not likely to improve the total speed with which your content is moved.

3) will there be a difference(in terms of throughput & performance) if I use cloud storage area(using S3 or Azure blob) instead of local FileNet storage area? <-- there could be, again network is going to come into play. Also, you want to consider what your long-term plan for the content and your FileNet installation is. If you plan, long-term to move to containers or a cloud installation, then moving to S3 or Azure blob is a good move. Also, if you have a need for retention settings and/or WORM storage, take that into account too.

If you email me directly (rhildebr@us.ibm.com), I'll send you a presentation on Move Sweep that includes performance tuning information.

------------------------------
RUTH Hildebrand-Lund
------------------------------

Original Message
3. RE: regarding Bulk content move sweep

0 Like
Venkat S
Posted Mon February 05, 2024 03:11 PM

Reply
Hi Ruth, thanks so much for your valuable information.

------------------------------
Venkat S
------------------------------

Original Message
4. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Gerold Krommer
Posted Mon February 05, 2024 11:24 AM

Reply
Hi,

we did many of those migrations and in addition to what Ruth said:

If your IS system still contains documents with multiple pages stored as single-page TIFFs then you will get one document with many content elements and the user experience with those is probably not what they expect, although Daeja has improved over time.

We migrated IS systems with several hundreds of millions of documents from IS to P8 and refrained from much tuning as it always had collateral damage as locking errors or degraded performance for users (the migration was done on productive systems as migrations over weekends or the like were impossible).

Having said that - on a very rough ballpark figure on a reasonably sized system - you should be able to achieve 60 documents/second as an order of magnitude (what that is in docs/hour I leave to your math). We found document size to be more of a limiting factor than amount.

Do not forget you might have to stop migration in backup windows.

It is not uncommon for larger migrations to run for months and it is a matter of scrutinizing, bookkeeping and following up on inevitable errors that will appear.

With one customer we used IBM Cloud Object Storage as the target storage (fixed content device as HW retention was required) and didn't find this to be a limiting factor of any kind, but of course there is the staging area...

Hope that helps,

Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
5. RE: regarding Bulk content move sweep

0 Like
Venkat S
Posted Mon February 05, 2024 03:28 PM

Reply
Hi Gerold,

thanks so much for your response.

so you have have followed the same approach i.e. CFSIS for metadata migration and content move sweep for content migration right?

based on your experience, do you recommend the above said approach or some other approach? we are planning to migrate around One billion equivalent to 1000 million documents.

in my case, we will migrate documents first and will ask users to use p8 once migration is done.

------------------------------
Venkat S
------------------------------

Original Message
6. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Eric Walk
Posted Tue February 06, 2024 08:38 AM

Reply
Hey Venkat,

The approach you are taking is likely the slow path. Generally, you're going to increase throughput the most by tuning the thread pool for one sweep, not creating multiple parallel sweeps. The big risk with the approach you're taking is the load you'll place on the FNIS servers and the potential impact on production performance (which is why some colleagues of mine build a proprietary tool to migrate a different way over a decade ago).

That said, I'm happy to take offline a conversation about creative approaches to migrate faster, depending on your priorities. You might want to prioritize shutting down FNIS due to license cost concerns, you might want to prioritize minimizing disruption to users, you might want to prioritize raw time to complete, each priority-set leads to different tradeoffs and different potential approaches.

Please feel free to reach out privately, I've worked on a number of migrations of the scale you're describing, it's always an adventure.

Best,

Eric

------------------------------
Eric Walk
Director

O: 617-453-9983 | NASDAQ: PRFT | Perficient.com
------------------------------

Original Message
7. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Gerold Krommer
Posted Tue February 06, 2024 08:43 AM

Reply
Hi,

depending on the requirements we have used different strategies including Move Content and export/import.

Assuming - for simplicity - 100 docs/sec it will take you net 115 days to move (if my math is correct), given normal maintenance interruptions it will be half a year wall clock time. If you have nothing to convert, a real plain move of content and the migration is not time critical (you have to pay maintenance for the two parallel systems) then you a are good to go.

If any of the assumptions above does not hold, consider exporting and importing. Using 4 bare metal Linux servers with ample memory and cpu, two additional virtualized servers, dedicated network and dedicated carefully crafted and tuned import clients we achieved 600 docs/sec, but those were not really useable as the TSM (sorry, Spectrum Protect) server could not keep up with that (the staging area filled up and we had to pause the import ) and we achieved a real (net) import rate of 250 docs/sec. This was also for a billion documents from a host that were prepared beforehand and staged on disk.

Needless to say the system wasn't useable during import (ran next to 100% CPU) but didn't have to as the customer switched to P8 after import...

Not a lot of ECM system can do as fast and stable as P8.

Hope this helps,

/Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
8. RE: regarding Bulk content move sweep

0 Like
Michael Pressler
Posted Mon April 15, 2024 05:09 AM

Reply
Hello everyone,

Exactly the right thread

We have just migrated an IS system to P8 and are currently performing the bulk move which is extremely slow (4 - 5 documents/s).

P8 is running as a container in an AKS cluster in Azure. The IS system was moved to a Windows VM in Azure as a read-only system. The MoveContent is currently running with the default settings. (IS and ImportAgent).

We are currently looking for the handbrake and where we can still improve performance.

I would be very grateful for any kind of hints.

Greetings

Michael

------------------------------
Michael Pressler
------------------------------

Original Message
9. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Eric Walk
Posted Mon April 15, 2024 09:12 AM

Reply
Hi Michael,

So, a few thoughts.

Increase the number of pods of CPE that are running, the move content job can parallelize across them.

I would read the documentation carefully about sweeps Sweep policies - IBM Documentation

There's some quirks to how workers and dispatching and threading and batching work for the different types of sweeps.

You need to look at both the settings in the sweep itself as well as the threadand sweep settings at the domain or virtual server level.

Decide if you really need it to delete from the source. Skipping delete can save a ton of time.Moving content with sweeps - IBM Documentation

At some point you're just going to hit the physical limit of your IS system. There are a lot of bottlenecks in the architecture of Image Services when it comes to processing this kind of work. We've developed work arounds for high-volume scenarios in the past to get this kind of work done faster by going around ISRA.

Best,

Eric

------------------------------
Eric Walk
Director

O: 617-453-9983 | NASDAQ: PRFT | Perficient.com
------------------------------

Original Message
10. RE: regarding Bulk content move sweep

0 Like
RUTH Hildebrand-Lund
Posted Mon April 15, 2024 04:58 PM

Reply
Some additional suggestions

•To speed up the dispatcher's search time

•Tune the filter expression and its composite index

•Collect statistics and fix the execution plan

•The first column must be object_id or a property that efficiently narrows down the search results

•The composite index must have

•All columns which are properties used in the Sweep SQL WHERE clause and SELECT clause

•The order of the properties in the composite index must be same as the order of the properties in the filter expression

•Make sure the database optimizer executes the SQL with your composite index

•Columns to include when creating covering indexes for sweeping

•The columns to include depend on the sweep type, target class and the filter expression

•Always include:

•object_id

•home_id

•security_id

•epoch_id

•recovery_item_id

•Add to this the columns associated with any properties referenced in the filter expression

•If Target Class is Document (or subclass), add:

•security_folder_id

•version_status

•If the Target Class is Custom Object or subclass, add:

•security_folder_id

§A covering index created on the table that contains the target objects for a sweep can significantly improve Sweep Framework throughput

§A covering index is a non-clustered index that includes all the columns referenced in either the SELECT clause or the WHERE clause of a particular query

§A covering index gains its advantage from the fact that all the information necessary to satisfy the query is contained in the index

•Columns to include when creating covering indexes for a Bulk Move

•security_folder_id

•version_status

------------------------------
RUTH Hildebrand-Lund
------------------------------

Original Message
11. RE: regarding Bulk content move sweep

0 Like
Miroslav Richter
Posted Tue April 16, 2024 03:41 AM

Reply
Hi Eric,

I'm not sure if your recommendation in the first point is correct. Can the sweep job run in parallel? I'm not sure. In a traditional FileNet P8 installation, the sweep job runs only on one server, even in the case of a multinode installation. But not sure how it is in the container world, I think things are the same. The sweep policies can run in parallel, but this is not your case.

I migrated a few years ago (over 500mil. of scanned docs from SDS InformationArchive to FileNet ASA), and as I recall correctly the max speed I had was about 25docs/sec. At night and weekend, of course. Otherwise it was half.

I didn't do any special tuning. But beware of queues and subscriptions, turn off or filter out unnecessary subscriptions.

------------------------------
Miroslav Richter
------------------------------

Original Message
12. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Eric Walk
Posted Tue April 16, 2024 12:56 PM

Reply
The other note @Michael Pressler is that if none of the ideas anyone's provided about tuning work or get you enough, there are more creative approaches to getting the migration done faster that skip the move content sweep.

We've found it's sometimes possible to get better throughput by building an external tool that calls the movecontent api in batches.

There's also some more creative approaches that avoid some of the bottlenecks in the apis, especially on the FNIS side.

Best,

Eric

------------------------------
Eric Walk
Director

O: 617-453-9983 | NASDAQ: PRFT | Perficient.com
------------------------------

Original Message
13. RE: regarding Bulk content move sweep

0 Like
dorothea vulcan
Posted Tue April 16, 2024 04:15 AM

Reply
Hi,
I will suggest you to get a professional IS - P8 tool to make such a migration. Please remark my comments bellow:
To export, this tool is developed in C based on Image Services C API and exports document content, annotations, securities and properties (indices) and to import in P8 with C and P8 Webservices API imports content(s), annotations, securities.

the speed cannot be compared to the bulk import.
.
You mus not buy the tool or pay maintenance. You can simply checkwith the IBM team what for you the best and you can rent for a while a license of both tools.

Please do the right and get in touch with
Claudia Völk Fanenbruck: cvoelk-fanenbruck@de.ibm.com or
Bernd Geiss: bernd.geiss@de.ibm.com or
Olaf Schwalb: olaf.schwalb@de.ibm.com

good luck
dorothea vulcan

______________________________

Dorothea Vulcan
email:dorotheavulcan@yahoo.com
phone: +49 171 7832 120
________________________________
This email is strict confidential otherwise not specified. All other receivers are required to use no content and addresses and to delete them from all clients and servers. In any other cases they could be punished conform international laws.
___________________

Original Message
14. RE: regarding Bulk content move sweep

1 Like
IBM Champion

Gerold Krommer
Posted Tue April 16, 2024 06:22 AM

Reply
Hi,

I'm not sure missing indices are a problem here. A typical pattern for this would be a looooong time until sweeping is performed but then at reasonable speed...

4-5 docs/sec (I assume) is excessively slow. Without any tuning - just normal functioning infrastructure - I would expect 10 times more throughput... but little is known about our system(s), e.g. are these single page tifs or what...

To trace this down further I would

Write a little program (using ISTK and ISRA, but I do not think that CFS-IS uses ISRA, but I have been wrong bevor) to download IS documents to the CPE server in question and see that the export rate is reasonable (at least in the upper tens/sec)

Use CEBI to import such documents from the CPE server to the same document class and storage area. We should also see upper tens/sec.

If one of them does not perform well we know where to look.

Only if both of them perform well, only then the problem must be somewhere in the internal/bookkeeping mechanisms of CPE.

And I have to contradict what someone said previously. While 'normal' sweep jobs run one CPE server only , move content can run in parallel. I remember completely hanging a system by doing so (performance did scale, but online performance was unbearable, the CPU was close to 100%).

Hope this helps,

Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
15. RE: regarding Bulk content move sweep

0 Like
Jay Bowen
Posted Tue April 16, 2024 10:37 AM

Reply
Hi, I have migrated a number of IS systems typically these were export > convert PDF > import with two traditional migrations IS>P8 via CFS federation. For raw speed reading the MSAR file directly you can process a surface in an hour hands down fastest method of extraction. IDM APIs are much slower but you can have multiple nodes extracting and converting pages, allows in line transformations to PDF or merge TIFF, COLD conversions things you can't do with a CFS approach. WAL is faster for sure but more complicated and sensitive to deployment environment. Now getting to IS to P8 federation the first thing you can do to significantly improve performance is increase your page cache by adding a disk or expanding your current volume(s), then increase the TTL for those objects. If you are not scanning into IS change your allocation % of use so that retrieval cache is your largest pool. My own observations the SAN or disk used for MSAR also has a performance impact obviously there is a lot of I/O going to happen so organize your migrations by surface. Batch content move is great but I have had some issues where a convenient approach with a move sweep revealed some weaknesses in customers environment and then fixing caused a mountain of work. Instead I used the CPE APIs, organized my requests by objects on each surface and perform the move. Takes more prep time but you are notching progress one surface at a time, very manageable as well. Optionally you can prefetch these IDs into cache. Now it doesn't necessarily make sense to me why disk cache is faster than disk MSAR other than nothing is faster than IS page cache. When using the CPE contentmove API approach I have tried batch updates, multi threaded calls, large batches, small batches to see what I could squeeze out of CPE/IS. AS others mentioned I ended up with deadlocks and all sorts of issues when I increased the thread pressure using multiple API calls. In the end I am using batch update with 10 docs/per second per API call so 600 doc objects/second various page counts. Other values you can do for tuning-

WAS - orb.thread.pool 125, WAS JDBC > 100 connections max, ACCE GCD Abandoned cleanup interval 1036800, Temp file lifetime 1036800, content queue max worker 100, site settings level IS object where ISuser/pwd/domain is set I increased timeout seconds to 50000.

------------------------------
Jay Bowen
------------------------------

Original Message
16. RE: regarding Bulk content move sweep

0 Like
Venkat S
Posted Tue April 16, 2024 12:34 PM

Reply
Hi Jay/others

could you please provide solution to convert COLD documents to either TIF or PDF.

Thanks and regards,

Venkat

------------------------------
Venkat S
------------------------------

Original Message
17. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Eric Walk
Posted Tue April 16, 2024 12:48 PM
Edited by Eric Walk Tue April 16, 2024 12:48 PM

Reply
All the tools I know of are proprietary and provided by service providers. So, for example, the team at my firm, Perficient, has built a converter that's part of our migration tool suite, Expert Labs probably has their own as well. Happy to talk offline about this.

I'd be curious if there's an off-the-shelf approach that's workable. I know there wasn't when we originally built our converter (which is why we built it), but that was over a decade ago.

------------------------------
Eric Walk
Director

O: 617-453-9983 | NASDAQ: PRFT | Perficient.com
------------------------------

Original Message
18. RE: regarding Bulk content move sweep

0 Like
Jay Bowen
Posted Tue April 16, 2024 01:33 PM

Reply
Hi Venkat, which COLD are we talking about compressed text or image overlay? If it is compressed COLD there is a jar part of IBM ISRA APIs for decompression for best speed. For overlays I have found the best approach, highest fidelity is to reach back into the time machine and create a .net winform app and embed the IDM viewer. Use the IDM API's to fetch and view the document in viewer, the viewer has an API method for print and set default printer to PDF ahead of time then capture the file. I know- there could be millions of files so you create 10 nodes or more that just virtual print 24/7 from a concurrent queue or db. Your other option is IDM API to get text, get rows, get columns and manage the layout which you do programmatically. Much faster but requires you to do the layout. Cost is usually a fraction of other approaches it's just the calendar time waiting for the process to finish.

I'd be interested in hearing from vendors that have COLD tools, how they do it and cost brackets.

------------------------------
Jay Bowen
------------------------------

Original Message
19. RE: regarding Bulk content move sweep

0 Like
Venkat S
Posted Tue April 16, 2024 01:55 PM

Reply
Hi Jay,

thanks so much for your response.

I am talking about COLD documents which are template based, I believe they are nothing but overlays.

best regards,

Venkat

------------------------------
Venkat S
------------------------------

Original Message
20. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Gerold Krommer
Posted Wed April 17, 2024 05:46 AM

Reply
Hi,

I have commented on the same question somewhere else in this forum. IS COLD is formatted in P-Code (of which I have VERY old documentation), but if it is compressed you absolutely MUST use one of the mentioned tools to get it uncompressed.

Only then use one of the very valid approaches mentioned previously or write your own layout engine. If you never used anything else than FileNet COLD only a limited subset of P-Code is used (which makes such an approach feasible, I would NEVER want to write a generic layout engine regardless of syntax).

There are/were other tools out, that would e.g. produce mixed documents (COLD and image mixed on different pages) and then it gets comples.

Hope this helps,

/Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
21. RE: regarding Bulk content move sweep

0 Like
dorothea vulcan
Posted Thu April 18, 2024 05:31 AM

Reply
Related to COLD documents = Computer Output to Laser Disc.
they could be different document types:
- big text document compressed and archived
- could be text documents with templates
For the last type there is configured in the main document the template (docId) and the position of different parts of the text inside the template.

An idea could be to export them as images uisng IDM Viewer to load them but you may try other tool and export as text and template. You may get a look in the first part of the text document and check the positions...

You must know that theoretically the template is a very old document and there is at the beginning not in page_cache, for example.

Next step will be to archive them as images or how you want to customize it in P8.. you may have the same customizing in P8 As I told you there are a lot of tools during the years made by FileNet Services and after as IBM team and there is no sense to discover now what this team did > 25 years...

perhaps help

______________________________

Dorothea Vulcan
email:dorotheavulcan@yahoo.com
phone: +49 171 7832 120
________________________________
This email is strict confidential otherwise not specified. All other receivers are required to use no content and addresses and to delete them from all clients and servers. In any other cases they could be punished conform international laws.
___________________

Original Message
22. RE: regarding Bulk content move sweep

0 Like
Venkat S
Posted Thu April 18, 2024 08:49 AM

Reply
may I know the database query to run on IS doctaba identify cold documents and their count? basic I need to know the difference b/w normal documents and COLD documents by looking at database enrtries.

much appreciated if someone can answer my query.

thanks,

Venkat

------------------------------
Venkat S
------------------------------

Original Message
23. RE: regarding Bulk content move sweep

0 Like
Jay Bowen
Posted Fri April 19, 2024 09:00 AM

Reply
Hi Venkat, try f_doctype or f_docformat and view the mimetype. If all of your COLD documents are stored in designated classes or disk families that would be another way of locating them, last option you can use other advanced queries using CLI the following link shows some of those tools How can I identify a COLD background template document id that was deleted from IBM FileNet Image Services?

------------------------------
Jay Bowen
------------------------------

Original Message
24. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Gerold Krommer
Posted Fri April 19, 2024 09:53 AM

Reply
The FileNet Image Services Index and Workflo Database Contents manual is our friend (especially page 74). Assuming Oracle

SELECT COUNT(*) FROM F_SW.DOCTABA WHERE F_DOCTYPE =1 OR F_DOCTYPE=3

gives you the number of documents (1=text,3=mixed (unlikely you have one))

Number of pages:

SELECT SUM(NVL(F_PAGES,1) FROM F_SW.DOCTABA WHERE F_DOCTYPE =1 OR F_DOCTYPE=3

(This is from memory as I do not have an IS system in reach any more)

Hope this helps,

/Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
25. RE: regarding Bulk content move sweep

0 Like
Venkat S
Posted Fri April 19, 2024 04:02 PM

Reply
thanks Gerold/Jay for your prompt response.

I ran the queries as you advised and found that all cold documents are text based without any background template. if this is the case, Can I use simple java APIs(ex:Aspose) to convert text to pdf and store them to FileNet? please let me any thoughts on this.

1 - INX_TEXT_DOC - Text/Cold documents without background image

thanks,

Venkat

------------------------------
Venkat S
------------------------------

Original Message
26. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Gerold Krommer
Posted Sat April 20, 2024 05:29 AM

Reply
First, we are really deviating from the topic of the original posting.

Second, this all has been explained else where in this forum and even in this topic. COLD is not just text, it is P-Code(!) AND it is most likely (I would hope so, as one can see > 70:1 compression ratio) compressed. As I said you need the tools (ISRA or IDM Desktop) to get it uncompressed. THEN you can start worrying about layouting yout text.

If it were that easy there wouldn't be the plethora of tools.

Kind regards,

/Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
27. RE: regarding Bulk content move sweep

0 Like
Michael Pressler
Posted 29 days ago

Reply
Hello to all,

I am still having performance problems with a BlukMove sweep to migrate documents from IS to P8. (about 8000 documents / hour).

I must admit that I still have gaps in my knowledge about which settings in P8 (or Image Services) I can use to improve performance. At the moment everything runs more or less with out-of-the-box parameters. Unfortunately, the P8 documentation is not much help here.

For example, it is not entirely clear to me whether the MoveContent is more dependent on the settings in the Replication Subsystem or CFS Import Agendt Subsystem, or both, or not at all.

Via the context help menu I can see what the individual parameters mean, but there is no information about what influence the individual parameters have when you change them and with which settings you can possibly increase the throughput for the MoveContent.

I am of course aware that many factors play a role here. The FileNet system runs as a container in an AKS cluster in Azure, the IS system on a Windows VM also in Azure and the content vin P8 is written to an Azure blob storage. There are many possibilities as to why the MoveContent is so slow. But I want to make sure that everything is set up optimally in FileNet.

I am grateful for any tips and help.

By the way, is there any chance to connect the IBM System Dashboard to a containernd FileNet System in an AKS-Cluster ? I tried but failed at the end.

Regards

Michael

------------------------------
Michael Pressler
------------------------------

Original Message
28. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Eric Walk
Posted 29 days ago

Reply
So there's a whole separate set of parameters for the Sweep Subsystem and then the specific job itself that will have the greatest impact.

------------------------------
Eric Walk
Director

O: 617-453-9983 | NASDAQ: PRFT | Perficient.com
------------------------------

Original Message
29. RE: regarding Bulk content move sweep

0 Like
RUTH Hildebrand-Lund
Posted 29 days ago
| view attached

Reply
See if the tuning and performance information in this document helps

------------------------------
RUTH Hildebrand-Lund
------------------------------

Attachment(s)

Chicago - Sweep Framework.pdf 1.33 MB 1 version

Original Message
30. RE: regarding Bulk content move sweep

0 Like
IBM Champion

Gerold Krommer
Posted 28 days ago

Reply
Sorry, but what good is it to change parameters if you don't even know where the bottleneck is (= 'im Nebel herumstochern'), at least is it reading or writing. I do not believe the standard setting are responsible for such a abnormal performance. We usually see medium two digits documents/sec without modifying anything.

Also the question about single page TIF is unanswered. 8000 IS documents could in the worst case be 8000000 files (COLD document can have up to 1000 pages per document) and that could explain the performance.

If no other strategy comes to your mind, write a small Java program that reads the federated IS documents to disk and see how many docs/sec you get.

Use CEBI to do mass ingestion of documents and see what you get there, THEN we might be able to propose parameters or a strategy.

Kind regards,

/Gerold

------------------------------
Gerold Krommer
------------------------------

Original Message
31. RE: regarding Bulk content move sweep

0 Like
Jay Bowen
Posted 26 days ago

Reply
Michael,

There are settings in ACCE as you show but also at the FCD and site level. How many documents per minute are you processing using bulk sweep? How many pages are in those document objects? I recently finished a migration physical 10+ year old server quad core with slower SAN disk moved to p8 another 10 year old server via federation. I used bulk sweep which for others saw great numbers in my case performance was dismal. I ended up writing a program using content move and different approaches- single thread, multi thread, bulk batch insert, multi thread batch insert. One thing FileNet doesn't like are too many threads for content move or batch move regardless of settings. I altered content move thread tasks and a lot of the obvious settings you shared in your screen print. I aligned my content move to IDs in MSAR's I was moving so the virtual platter would be loaded 1x and read the doc IDs from it. A colleague suggested prefetch and in his migrations MSAR prefetch made a huge difference for me it ended up being just one more thing to do without yielding a big enough difference to continue investing time in it. Eventually I was moving 1 million document objects per day (not counting pages) that was good enough for me but it did take time to get there. Nope- I did not see a log entry or pegged resource telling me to go change setting XYZ I incrementally adjusted based on IBM documentation and suggestions from colleagues.

Now- this may sound like the long way but the other option is reading the MSAR directly, converting to PDF with optional OCR and importing. You would need vendor tools for this but hands down the best option if converting the file format is desired. Otherwise you migrate images into P8 > export > convert to PDF > version.

Settings I can recommend for federated IS to P8

Websphere

Increase ORB thread pool either allocate beyond max or change to (125 min and 250 max)

Increase JDBC thread pools 200 max connections on GCD and Ostores.

ACCE -

Abandoned content cleanup 1036800

Temp file life 1036800

Dispatcher wait 5 seconds

Max content queue worker threads 100

FCD device in ACCE at GCD level

Timeout seconds 50,000

FCP pool timeout 600

FCP pool preferred size 50

FCP pool max wait seconds 5

IS

Increase CSM IO process count one per cache disk

Add new page cache on fast disk.

Change cache allocation, increase page cache to 90 or 100%.

API vs Bulk sweep

Bulk sweep or highly multi threaded API jobs would lock the content queue - worker when this occurred we would need to recycle the DB + FileNet. IBM has a technote on it there is one table with one row and once buggered up you recycle to clear it.

I queued documents by platter and I had some documents with 999 pages others with 1 tiny text file. My numbers were all over the place once I was chugging along but I did not have a page count mechanism to tell me if I was going faster or slower based on total size or pages. I used document objects when I started it was 250,000k per day but after some time got to a million documents.

I preferred the API approach where I could target doc ID's (well guids) for a platter and every now and then would get a FCD exception so I added a loop with retry then eventually targeting just the doc ID that would fail out of a batch for later processing or investigation. Turns out a few platters were bad. Only way I would have found this out was processing by IDs and platters otherwise you are depending on CE logs. I should point out I had cases where the entire platter was bad and/or missed reads checksum errors at the page level. Since I was tracking the surface, doc ID and P8 guide by object store I was able to easily group these errors to certain platters go to IDM desktop or IS admin tools to find root cause.

------------------------------
Jay Bowen
www.bowenecmsolutions.com
Medina, OH
------------------------------

Original Message
32. RE: regarding Bulk content move sweep

0 Like
Michael Pressler
Posted 17 days ago

Reply
A short update.

Changing the number of workers from the default 2 to 4 has more than doubled the throughput from 8000 to 17000 Doc/h.

At the moment, the limiting factor seems to be the AKS cluster and the memory utilization of the CPE pods.

While the CPU load of the CPE pods is at 30%, we see that during the MoveConten the HeapSpace reaches 100% quite quickly and the GC then starts up. We are still optimizing this.

The HeapSpace now has the consequence that the GC starts up regularly and this leads to CPU peaks and the data throughput for the MoveContent is noticeably less, as the GC stops the Java processes while it is running.

The cause of the HeapSpace could be the AzureBlob. AzureBlob is operated by default in FileNet in "Direct file upload mode". This means that there is no StagingFolder as a temporary cache. If I understand the IBM documentation correctly, an "in-memory buffer" is used here. I also don't see any entries in the ContentQueue table. It looks to me as if MoveContent is writing the documents directly to AzureBlob bypassing the content queue.

Maybe there is someone here who can describe what happens in the background during MoveContent within FileNet to better understand the procedure.

The red lines are showing the CPU (left diagram) and Memory (right diagram) of the CPE pods. If we stopp Content Move CPU and memory go back to normal level.

------------------------------
Michael Pressler
------------------------------

Original Message

IBM Business Automation Community

Come for answers. Stay for best practices. All we’re missing is you.

Content Management and Capture

regarding Bulk content move sweep

Venkat SFri February 02, 2024 10:03 PM

RUTH Hildebrand-LundMon February 05, 2024 11:04 AM

Venkat SMon February 05, 2024 03:11 PM

Gerold KrommerMon February 05, 2024 11:24 AM

Venkat SMon February 05, 2024 03:28 PM

Eric WalkTue February 06, 2024 08:38 AM

Gerold KrommerTue February 06, 2024 08:43 AM

Michael PresslerMon April 15, 2024 05:09 AM

Eric WalkMon April 15, 2024 09:12 AM

RUTH Hildebrand-LundMon April 15, 2024 04:58 PM

Miroslav RichterTue April 16, 2024 03:41 AM

Eric WalkTue April 16, 2024 12:56 PM

dorothea vulcanTue April 16, 2024 04:15 AM

Gerold KrommerTue April 16, 2024 06:22 AM

Jay BowenTue April 16, 2024 10:37 AM

Venkat STue April 16, 2024 12:34 PM

Eric WalkTue April 16, 2024 12:48 PM

Jay BowenTue April 16, 2024 01:33 PM

Venkat STue April 16, 2024 01:55 PM

Gerold KrommerWed April 17, 2024 05:46 AM

dorothea vulcanThu April 18, 2024 05:31 AM

Venkat SThu April 18, 2024 08:49 AM

Jay BowenFri April 19, 2024 09:00 AM

Gerold KrommerFri April 19, 2024 09:53 AM

Venkat SFri April 19, 2024 04:02 PM

Gerold KrommerSat April 20, 2024 05:29 AM

Michael Pressler29 days ago

Eric Walk29 days ago

RUTH Hildebrand-Lund29 days ago

Gerold Krommer28 days ago

Jay Bowen26 days ago

Michael Pressler17 days ago

1. regarding Bulk content move sweep

2. RE: regarding Bulk content move sweep

3. RE: regarding Bulk content move sweep

4. RE: regarding Bulk content move sweep

5. RE: regarding Bulk content move sweep

6. RE: regarding Bulk content move sweep

7. RE: regarding Bulk content move sweep

8. RE: regarding Bulk content move sweep

9. RE: regarding Bulk content move sweep

10. RE: regarding Bulk content move sweep

11. RE: regarding Bulk content move sweep

12. RE: regarding Bulk content move sweep

13. RE: regarding Bulk content move sweep

14. RE: regarding Bulk content move sweep

15. RE: regarding Bulk content move sweep

16. RE: regarding Bulk content move sweep

17. RE: regarding Bulk content move sweep

18. RE: regarding Bulk content move sweep

19. RE: regarding Bulk content move sweep

20. RE: regarding Bulk content move sweep

21. RE: regarding Bulk content move sweep

22. RE: regarding Bulk content move sweep

23. RE: regarding Bulk content move sweep

24. RE: regarding Bulk content move sweep

25. RE: regarding Bulk content move sweep

26. RE: regarding Bulk content move sweep

27. RE: regarding Bulk content move sweep

28. RE: regarding Bulk content move sweep

29. RE: regarding Bulk content move sweep

30. RE: regarding Bulk content move sweep

31. RE: regarding Bulk content move sweep

32. RE: regarding Bulk content move sweep