Content Management and Capture

 View Only
Expand all | Collapse all

FileNet S3 Retrieval Latency

  • 1.  FileNet S3 Retrieval Latency

    Posted Fri December 06, 2024 04:02 PM

    Hello,

    We have an on-prem FileNet Content Manager setup, version 5.5.10 and CN 3.0.13, both CN and CPE are on a Linux virtual server.  For content storage we have NetApp storage now, to leverage AWS S3 storage we also configured advanced storage as an alternative option.

    We ran some tests to se the performance of retrieval , from both on-prem NetApp and Cloud S3 storage, up until 4-5 MB sizes the documents load in Content Navigator Viewer almost the same time, but as the documents get bigger and bigger the S3 retrieval show significant latency, like for a 15 MB , 12-15 second and for 30 MB. 20 -25 seconds.

    Wondering anybody tried this in their environments and ran into similar issue and have any suggestions to try.

    Thanks

    Kiran Aithagani



    ------------------------------
    Kiran Aithagani
    ------------------------------


  • 2.  RE: FileNet S3 Retrieval Latency

    Posted Mon December 09, 2024 09:36 AM

    Hi Kiran,
    Unfortunately I do not have good advice for you, however, I am curious if you have attempted to utilize NFS against your on-prem NetApp storage to see if it has the same behavior?



    ------------------------------
    Matt Passini
    Manager - Software Engineering
    Acuity Insurance
    ------------------------------



  • 3.  RE: FileNet S3 Retrieval Latency

    Posted Thu December 12, 2024 09:19 AM

    Have you tried setting the FileNet.Content.GetBlockSizeKB parameter on the ICN server to higher value?

    The multiple round trips to S3 might exponentially increase the overall latency. 

    https://www.ibm.com/docs/en/filenet-p8-platform/5.5.x?topic=engine-improving-content-uploads-downloads



    ------------------------------
    Olaf Erasmus
    ------------------------------



  • 4.  RE: FileNet S3 Retrieval Latency

    Posted Thu December 12, 2024 11:47 AM

    Olaf, thank you for your suggestion, Yes I did experiment with different Max Inline Content Sizes on CPE first and tried downloading the file from CPE, irrespective of what my Max Inline Content set to, CPE is always downloading the document in the chunks of 1MB

    But when I set FileNet.Content.GetBlockSizeKB on CN, and try to view/download the documents then I get bigger chunks from S3, matching up to the FileNet.Content.GetBlockSizeKB value I provided, which is good, however when I get bigger chunks from S3 , it is taking longer to stich those pieces together and stream to Content Navigator.

    Below are not exact numbers but I tried changing the FileNet.Content.GetBlockSizeKB ranging from 1MB to 9MB (setting Max Inline Content Size to 10MB on CPE), over all one way or the other the latency is adding up to the same amount, may be 1-2 seconds less or more, but not significant change with any of the combinations we tried.

    1. when I do default 1 MB chunks , it is doing 30 round trips and takes about 25 seconds to retrieve and +5 seconds to stich them together and stream the doc to CN.
    2. when I  set the FileNet.Content.GetBlockSizeKB to 8 MB, its doing 4 round trips, and finishing up the download in 15 seconds + 15 seconds to stich them and stream the doc to CN


    ------------------------------
    Kiran Aithagani
    ------------------------------



  • 5.  RE: FileNet S3 Retrieval Latency

    Posted Thu December 12, 2024 04:25 PM

    I'm sorry it did not help. 

    Thank you for the detailed feedback though, valuable for anyone considering to store large objects on S3. 



    ------------------------------
    Olaf Erasmus
    ------------------------------



  • 6.  RE: FileNet S3 Retrieval Latency

    Posted Fri February 21, 2025 01:19 PM
    Edited by Stephen Weckesser Fri February 21, 2025 01:19 PM

    Cloud storage will always have latency issues. I suggest to keep new docs in a local 'staging area' for a period of time and then move them to cloud storage later.  Docs are more likely to be retrieved when they are new. After a few days (it will vary by use case) the frequency will decline. You can use a migration policy sweep and filter on the storage area if your containers are broken down by date. This also works well if long term storage is write only (e.g. has a retention policy). It also allows for a window of time allow deletes during a validation period. A sweep will work but if you have a lot of documents the sweep may take a few days to complete before it starts over.  I use a custom app that searches for docs by date in the staging area and gets kicked off by a scheduler.  Since the separate staging area is small (HA and backed up), it only has to examine a few thousand docs instead of every row in the doc version table and I pass the name of the config file so I can run multiple instances that take different parameters. The doc move method does not physically move the document, it just queues it for the server to move. This approach keeps the dba's happier because sweeps block and they no longer get constant alerts in the middle of the night. With this, I control the number of rows and time each run is allowed. If you store directly to the cloud, the system may get backed up without anyone noticing. Files may sit in a temp upload area on the server until the weekend comes along. When nobody is expecting it, the retry kicks in and the system will run hot and can get overwhelmed resulting in memory constraints one day, followed by high cpu load the next as it tries to run garbage collection. This also allows for uploads to continue if the cloud storage does down. (or the vendor changes the certs without notifying you... Again Azure?)



    ------------------------------
    Stephen Weckesser
    ------------------------------



  • 7.  RE: FileNet S3 Retrieval Latency

    Posted Fri February 21, 2025 03:36 PM

    Hi Stephen, very interesting  thoughts, thank you for sharing the ideas, I am following up with IBM with a case, as we were anticipating some latency with S3 storage but it is beyond the acceptance and we would like to know the bottleneck and look into finetuning options to make it better, we do have some thoughts on directing the content to the different storage area based on the size, using sweep or some sort of the subscription etc.

    Thanks

    Kiran Aithagani



    ------------------------------
    Kiran Aithagani
    ------------------------------



  • 8.  RE: FileNet S3 Retrieval Latency

    Posted Mon February 24, 2025 09:46 AM
    Edited by Chuck Hauble Mon February 24, 2025 09:49 AM

    Kiran,

    In the v5.6 API there is a new API class that will allow you to do a multithreaded download from the CPE  - com.filenet.apiimpl.util.mtstream.MtInputStream

    The download by default is single threaded.

    Maybe it could help with the problem. But I see that you are using ICN instead of a custom application so it may require an enhancement request to the ICN team.



    ------------------------------
    Chuck Hauble
    Minneapolis
    ------------------------------



  • 9.  RE: FileNet S3 Retrieval Latency

    Posted Mon February 24, 2025 10:09 AM
    Edited by Kiran Aithagani Mon February 24, 2025 10:21 AM

    I saw that in one of the readme's but I think it come into play when you use the API but for S3 it is a set configuration and not sure if we control  to choose single thread vs multi threaded approach, unless 5.6 s3 configuration is different/has that option than 5.5.10 CPE.  you are right if we wanted to get the content streamed faster between CN and CPE may be its customization on the CN end.
    ------------------------------
    Kiran Aithagani
    ------------------------------



  • 10.  RE: FileNet S3 Retrieval Latency

    Posted Mon February 24, 2025 01:02 PM

    I am not familiar with the new API so I cannot comment on that. 

    Since you mentioned sweeps, I am assuming uploads and not downloads. Each object storage will have different tuning options. If a customer is doing mortage docs, the bulk of retrievals will happen in the first 30 days, for another customer it might be 3 days. After that window, the docs are less likely to be retrieved so you might choose cool storage to save money and after a year change it to cold storage. Cloud storage is cost efficient for archive storage. It may also be the only option if the organization storage policy does not allow docs with PII even if encrypted. 

    Cloud storage also offers simplified redundancy.  You can fail over VM's or containers from one DC to another easily if the storage is loosely coupled and replicated. However, if you don't have cloud redundancy/failover, then you need some sort of backing store to be able to retry uploads when they fail. I saw two storage outages last year; when Microsoft updated their root cert and again when they added a new gateway. It only impacted retrieval of older docs but you never want to see any outage from an operations perspective. 

    Latency is environment specific.  The situation where it was running full tilt on weekends was low volume, but multi-tenant, with multiple customer VM's per host and the storage was in another cloud vendor's data center. I never understood the reason behind that but I assume it was either cost or naively specced. I suggest that if you're running in the cloud on AWS, use S3, if you're running in Azure, use Azure if possible.  You can always run a trace route for comparison. If you're on-premises you really shouldn't have any problems with either unless your pipe is undersized or the connection is poor.

    If you're communicating across continents the latency could be elsewhere.  I have twice seen customers outsource validation to India or South America but refuse to replicate LDAP so the authorization occurred on-premises over a slow VPN.  Even if the storage was replicateted across data centers, the Dashboard metrics showed the lastency and timeouts were occurring because DNS lookups and LDAP authorizations were taking at least 2 minutes and could intermittently timeout under load. 

    Since you mentioned sweeps, if the total number of rows in the docVersion table isn't too large, sweeps will work and they run very fast. Even if the swep is 23x faster, a sweep cannot target a specific storage area. It would be nice if the content migration sweep had an option to specify a source storage area (hint). Instead, it looks at all rows and processes them by object_id and applies a filter, e.g.,

    StorageArea=Object('{F0A47194-0000-CF1D-8122-41E6201DF3D4}')
    and VersionStatus=1
    and DateCreated < NOW() - TimeSpan(1, 'Days')

    I would create/run a sample sweep to get an idea of the time and then make your own decision. My alternate approach worked for me because the sweep takes 3 days to complete and I wanted to move the docs to Azure sooner than that. Using a query, I could target just the staging storage area with a query (where date_created > X and storage_area_id = Y).  Be sure to validate your composite index gets chosen by the optimizer. That sweep runs in under an hour and the docs are moved the next day. 

    It's really impossible to offer specific advice without knowing the specifics so the above is fairly broad but hopefully gives you some ideas. 



    ------------------------------
    Stephen Weckesser
    ------------------------------



  • 11.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 10:38 AM

    Hi Kiran, can you run a retrieval test using a utility app from CPE java or .net and compare performance from your NetApp and S3? From the post you mention ICN is where you are seeing the latency but if we split this to CPE/ICN we can see how much longer the delay is with S3 vs Netapp fetching content. Obviously your on prem NetApp should perform factors better than S3 but curious about your file size and retrieval timings.  I don't know how CPE fetches content but typically multi threading retrieval is more workers for content requests, where a content request is fetched by one thread not split between many threads. For example a single 50MB file is going to use one thread to read/stream it back to the client would be a typical programming approach not five threads x 10MB and then using byte offsets, which then need to be restacked on the client.  This is for retrieval not rendering. 



    ------------------------------
    Jay Bowen
    www.bowenecmsolutions.com
    Medina, OH
    ------------------------------



  • 12.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 12:01 PM

    Hi Jay, yes we did do that and the retrieval on CPE/ACCE also using Java API, in both cases the download takes about 8-9 seconds, when the same document is retrieved from CN and displayed in the ViewONE viewer it doubles up the time and some times triples.  The download from CN without opening the document takes about the same like 809 seconds, some times a seconds more.

    It is evident base on the tests that ,the viewer contributing to majority of the latency like breaking down the document to Image tiles and rendering them on the browser, but we don't have a solution yet on how to reduce that, I did ask IBM on my open case if there are plans to roll out the multi-threaded download feature into Content Navigator in the near future but the answer I got was NO

     



    ------------------------------
    Kiran Aithagani
    ------------------------------



  • 13.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 12:27 PM

    Trace the db calls and check for an index on the security id.  The viewer opens a new session (reuses) and that will do a security lookup. 



    ------------------------------
    Stephen Weckesser
    ------------------------------



  • 14.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 12:56 PM

    Stephen , Not able to get full picture of what you wanted me to try to do here, can you please explain me bit more in details



    ------------------------------
    Kiran Aithagani
    ------------------------------



  • 15.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 04:51 PM

    the Daeja viewer calls back to the server using remote javascript.  The callback hijacks the session and does a sparate lookup for security and annotations before it returns. You can use a fiddler or a browser trace to time the javascript calls. I am sure support has asked for a .har file?  There are things that can slow it down. Daeja debug tracing, excessive logging, SSL handshakes, load balancer round-robin the callback, browser security checks or a slow db query to validate the authorization. On the JVM, add the custom property com.ibm.cacheLocalHost=true to save a dns lookup. Make sure the repo connection uses either http or iiop (or rir) to match your load balancer if there is one between the ICN and CPE. Make sure the HTTPOnly flag is set correctly (It may be over-written by an F5 device). try bundling the signer with the cert on the HTTPS server to make SSL validation go faster. The cookie contains the original server so if call lands on anoither node, the HTTPS plugin will redirect to the original node. Use client based persistence on the local network if there's an F5 or NLB in front of the HTTPS servers. If you're using WebSphere EDge, turn off hardware checksums on the NIC. On the server itself, increase the number of ephemeral ports and lower the time_wait to 30 seconds. Once the request arrives, access has to be authorized.  The security_id in the doc version table has to be looked up and the session identity compared to the access control list. There is no index out of the box but one is usually needed as the number of docs grows.  If the access lookup is slow, that might explain the additional delay from the time the doc is downloads until it is displayed. For Oracle, you'll probably need a composite index so you'll need the trace to figure out what columns are needed for the plan to get chosen. Enable db tracing in perflog.properties and trace it during a quite period. Turn off daeja tracing and repeat. The db trace will show you the db call times. The dba can create and validate the index then create it in ACCE so it's documented. You can collect an ephemeral or WireShark trace at the same time. Sometimes time differences or packet acks can cause deiays as well so ask the network folkds to review. Because of the way the callbacks work with the browser, be sure to set the security policy on the CE to enable the right exceptions. That happens a second time to fetch and send annotations (any annotations?). Sometimes things like McAffee can mess with the javascript as well. There are streaming settings (mostly for multipage tiff and pdfs) in ICN. Try adjusting those or the ICN viewer associations to make sure the correct handler for the mime-type is near the top of the list. Finally, configure and downlaod the PKI metrics using dashboard. Look for things like long delays fetching credentials. Tile rendering performance would be something I haven't seen before so probably least likely. Something else in the mix is probably the root cause. Good Luck



    ------------------------------
    Stephen Weckesser
    ------------------------------



  • 16.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 05:35 PM
    Edited by Stephen Weckesser Wed March 19, 2025 05:57 PM

    I forget - there's an encryption key set when you configure ICN.  If possible, use a local path and not a UNC.
    Copy the config to each node. 

    Check the daeja tmp directory and clear it while there. 

    ICN takes custom properties that map the hostname (think regex search and replace) in an HA configuration.
    In an HA config, you will map the cluster alias to the local machine (or localhost) so that the call is not redirected
    back through the load balancer when the security lookup is done. If that doesn't happen, the call will either fail
    or bounce around. What you want to do is configure your externalhost urls and map them to internalhost (localhost)
    URL's. The steps are described here:

    https://www.ibm.com/support/pages/daeja-viewone-virtual-can-fail-load-balanced-ha-and-sso-environments

    On Windows machines, you also want to configure TCP to prefer IPv4 and make sure localhost maps to 127.0.0.1 
    and not ::1 - The https server can get confused otherwise. 



    ------------------------------
    Stephen Weckesser
    ------------------------------



  • 17.  RE: FileNet S3 Retrieval Latency

    Posted Wed March 19, 2025 09:49 PM

    HI  Stephen , we shared HAR files with IBM few times, and most of the configurations you mentioned are in place if not all, I will go through the detail one more time, to see if we are missing anything that you mentioned, but thank you so much, I really appreciate your feedback, I am sure it takes lot of time patience to put them together.



    ------------------------------
    Kiran Aithagani
    ------------------------------



  • 18.  RE: FileNet S3 Retrieval Latency

    Posted Thu March 20, 2025 10:45 AM

    Hi Kiran, 

    Initially I thought this thread was latency due to storage type but as you point out you see the delay mostly in ICN.  The same 30mb file takes N times longer to render when viewed in ICN? Curious if you download from ICN how does that compare to the render time? Is this a load balanced or HA ICN instance? If so the viewer is a what IBM calls a mini web app and you would need to have entries in ICN instance that resolves the vIP to local host so that ICN viewer will not reach back out to the network. Notes: Daeja ViewONE Virtual can fail in load balanced HA and SSO environments 



    ------------------------------
    Jay Bowen
    www.bowenecmsolutions.com
    Medina, OH
    ------------------------------



  • 19.  RE: FileNet S3 Retrieval Latency

    Posted Thu March 20, 2025 10:55 AM

    Its a two folded issue, the latency between CPE and AWS S3,  then the CN latency to get the document pulled and rendered from CPE. the download from CN is fairly close to direct CPE download, the test we are doing on are mainly with single node CN/CPE, it's still clustered but single node. no load balancer is in place.



    ------------------------------
    Kiran Aithagani
    ------------------------------