Original Message:
Sent: 9/10/2024 2:33:00 AM
From: Vivek Mittal
Subject: RE: SFTP Client Get/List service with a directory that has thousands of files
A couple of minor changes to your BP; but they will lead to slightly improved performance.
1) don't have three Release operations after each other. Combine them into one separating the elements with |. That will also slightly improve efficiency.
2) Don't use // in your Xpath as that requires traversing all nodes (which on a large Process Data can take time and resources. Use absolute path if known.
Try using
<operation name="Release Service">
<participant name="ReleaseService"/>
<output message="ReleaseServiceTypeInputMessage">
<assign to="." from="*"></assign>
<assign to="TARGET">/ProcessData/PrimaryDocument | /ProcessData/SFTPClientListServiceResults/Files/File[1] | /ProcessData/GetResults/DocumentList/DocumentId[1]</assign>
</output>
<input message="inmsg">
<assign to="." from="*"></assign>
</input>
</operation>
Edit:
I'm not sure why you are only releasing a subset of GetResults. I would have thought just releasing /ProcessData/GetResults would provide the same outcome.
------------------------------
Vivek Mittal
------------------------------
Original Message:
Sent: Mon September 09, 2024 01:54 PM
From: Attila Toke
Subject: SFTP Client Get/List service with a directory that has thousands of files
I attached the BP
------------------------------
Attila Toke
Original Message:
Sent: Mon September 09, 2024 01:35 PM
From: Mark Murnighan
Subject: SFTP Client Get/List service with a directory that has thousands of files
Attila,
Without your BP, I can only guess on the logic to add.
In the simplest form, here is a sample release service for a template:
<operation name="Release Service">
<participant name="ReleaseService"/>
<output message="ReleaseServiceTypeInputMessage">
<assign to="TARGET">/ProcessData/DELIVER/FILE[1]</assign>
<assign to="." from="*"></assign>
</output>
<input message="inmsg">
<assign to="." from="*"></assign>
</input>
</operation>
The key line is: "/ProcessData/DELIVER/FILE[1]" meaning the this will delete the first instance of /ProcessData/FILE. In your case you will want to make this 501-last of the list. You can count the list to get the count and replace last with the count. If you can't get the big drop [501-last] you can always loop on count decrementing down to 500 and exiting.
Not in a place to create a sample at the moment.
Mark
------------------------------
Mark Murnighan
Solution Architect
Original Message:
Sent: Mon September 09, 2024 01:00 PM
From: Attila Toke
Subject: SFTP Client Get/List service with a directory that has thousands of files
Hi Mark,
Can you share how we can code that to release the 501-9000 on the 1st release?? If I am understanding you then that would drastically shrink the process data after that first release, and do what we are looking for.
Thnks,
Attila
------------------------------
Attila Toke
Original Message:
Sent: Mon September 09, 2024 12:03 PM
From: Mark Murnighan
Subject: SFTP Client Get/List service with a directory that has thousands of files
Attila,
Does your BP clean/shrink the list on each iteration? Keeps chopping the 1M bytes for the 9K list down to size.
If you want what you asked for in the enhancement request add a release after you get the list to chop the list down to the first 500. Aka release [501-9000] so your list is down to the first 500. Will not be as fast as the enhancement in Java, but after one cleanup will be as fast as getting the enhancement.
Mark
------------------------------
Mark Murnighan
Solution Architect
Original Message:
Sent: Mon September 09, 2024 11:49 AM
From: Attila Toke
Subject: SFTP Client Get/List service with a directory that has thousands of files
Hello,
I put 9K files in the Dir and Ran a test with BP Start Stop Only. it took 2 hours to complete. Doing the same test but with the normal persistence it took 3 hours, so it is about 1/3 quicker, which helps, but still not great.
Thanks,
Attila
------------------------------
Attila Toke
Original Message:
Sent: Mon September 09, 2024 10:10 AM
From: Mark Murnighan
Subject: SFTP Client Get/List service with a directory that has thousands of files
Hi Attila,
Vivek's #1 is huge first step. Also make sure to delete the file processed from the list by releasing it before the next loop iteration and it will run faster as you go.
I have never had to use a child process for the get/delete after you cleanup the the BP and process data.
Mark
------------------------------
Mark Murnighan
Solution Architect
Original Message:
Sent: Fri September 06, 2024 01:03 AM
From: Vivek Mittal
Subject: SFTP Client Get/List service with a directory that has thousands of files
Hi Attila,
We've come across this situation a couple of times and there are two main ways we've tackled this without resorting to customisations.
1) Look at the persistence level - can it be set to None or Start/stop with errors. The difference between persistence level of Full and Start/Stop is very significant. Thousands of files does make process data very big; so removing the database overhead will help greatly.
2) Parallel processing - can you separate out the List and Get/Delete into separate BPs? The first BP will contain the LIST and loop. For each file that you want to download; trigger a second BP in async mode that will do the GET/DELETE. In this way, you are moving from a sequential model to a parallel model; thereby able to significantly increase the overall time to download all files. Sterling's queuing pattern will ensure B2Bi should be able to handle it - just that your queue depth could potentially become quite large and you may see a bit of load balancing going on (if clustered). If there are other processes that get negatively impacted; then look at allocating the second BP to a different queue (say, Q6) and configure Q6 to be constrained by number of threads allocated to. Bit of performance tuning required here.
Would love to hear how others have handled such situations.
Regards,
------------------------------
Vivek Mittal
Original Message:
Sent: Thu September 05, 2024 12:00 PM
From: Attila Toke
Subject: SFTP Client Get/List service with a directory that has thousands of files
Hello,
We are facing a issue with SFTP when we go to a customer's site and the directory has thousands of files in the directory to pull. Currently we do a CD Service, then a Client List Service and get a list of all the files in the directory, and then call release service to make the list smaller, then a Client Get and Client Delete. Thie will repeat over and over until the files are all gone.
The issue is the Client List Service is very large if there are thousands of files in the directory, and that causes the overall process to be slow/unusable.
Has anyone encountered this?? Has anyone built anything they can share to get around this?? I voted on an enhancement to make the list service more configurable, and limit how many it actually returns vs doing them all. Bu that will probably be a year before it makes it into the product.
Thanks,
Attila
------------------------------
Attila Toke
------------------------------