IBM webMethods Hybrid Integration

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

#TechXchangePresenter

View Only

Back to discussions

Expand all | Collapse all

Large Flat File Processing.

1. Large Flat File Processing.

Like
webMethods Community Member
Posted Mon July 24, 2006 08:25 PM

Reply
Hi,

I am getting one gb size flat file which I need to process the header, detail and trailer. I need to create multiple XML file depends on the header information.

I am publishing to Broker to process header, detail and trailer records. I created a trigger to process my records in single thread mode but I would like to create multi thread for detail record.

How can I identify whether all my detail records are processed by multi thread trigger?

convertoValue service createas @fields. I could not publish @fields to my broker. How can I drop them from my convertoValue Output.

Thank you very much for your help.

Thanks
Sam.

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards
2. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue July 25, 2006 05:24 PM

Reply
Hi Sam,
1GB Flat File is indeed a large file and special care should be taken to handle this type of file, considering the fact that IS JVM cannot have more than ~2.5GB allocated.

Below is how I have done to process the file of this size > 1GB containing no. of records > 500,000:

1> To avoid any overhead, used only IS (no Broker, no TN, no Modeller etc…).

2> Wrote a fileSplitter Java service. This service reads the input file as stream, goes through each record to do some validations and creates bunch of tmp files each of records 20,000 (configurable, supplied as input to this service) and creates the output as the list of tmp files thus created.

3> Main flow then processes these files 1 at a time, creates the desired output file and deletes the temp file after processing: all in a loop. The data in the output file is appended each time in the loop.

4> Found that the split size of 20,000 records were optimal (Total processing time < 2 hrs.). Setting it to higher or lower value increased the total processing time.

5> Solution is scalable e.g. if the input file size grows in future, the splitter will create more temp files but the IS will handle only small chunk of data (20,000 recs) at a time and so will not go out-of-memory.

Your integration could be totally different than what I had but wanted to give you some pointers to ponder over while handling a large file > 1GB.

HTH,
Bhawesh.

#Adapters-and-E-Standards
#webMethods
#Integration-Server-and-ESB
3. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue July 25, 2006 05:43 PM

Reply
Nice description Bhawesh.

Questions for Sam:

Is there a reason for publishing the records to the Broker?

Does the order of the records need to be maintained?

Do the records need to be processed as a group?

Do the header and trailer have meaningful information or are they just control records to verify that you’ve received the entire file?

You can publish @ fields (attributes) to the Broker. You just can’t read them with anything on the subscriber side other than IS.

What is the target of this 1G file? Another file in a different format? Inserts into a database table?

The answers will help guide an appropriate solution.

#Integration-Server-and-ESB
#webMethods
#Adapters-and-E-Standards
4. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue July 25, 2006 08:27 PM

Reply
Thank you very much for your help.

Yes, I need to maintain the order in the Line number level like below.

Header
Line 1
Detail1.1
Detail1.2
Line 2
End of record.

Yes, I need to process the whole file.

Header has meaningful information but trail does not have any information.

I am creating an electronic catalog file.

In the feature I need to support multiple version of xml document from single flat file.

#webMethods
#Adapters-and-E-Standards
#Integration-Server-and-ESB
5. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue July 25, 2006 09:05 PM

Reply
Must the data for item 2 be processed after item 1? Of course the lines that make up an item (line1, detail1.1, detail1.2) need to stay together, but can the items be processed independently and in any order?

The question wasn’t whether or not you need to process the entire file. But rather, whether or not all the items in the file must be processed as a single unit–in other words, if one item fails for some reason, can you continue with the remaining items or do you need to stop and rollback all work up to that point?

A flat file? An XML file? An “electronic catalog file” is not descriptive enough.

#Integration-Server-and-ESB
#webMethods
#Adapters-and-E-Standards
6. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue July 25, 2006 09:35 PM

Reply
Each line must be processed as a unit.
Line1
Detail 1.1
Line2
Detail 2.1

Yes, I need to rollback if I have any exception during my processing.

It is an XM FILE.

#Integration-Server-and-ESB
#webMethods
#Adapters-and-E-Standards
7. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue July 25, 2006 09:51 PM

Reply
In this case, the process outlined by Bhawesh should work fine for you too. You do not need to publish items to the Broker. Do not use it as part of your solution. Be careful with how you write your file splitter service–do as little processing as possible.

#Adapters-and-E-Standards
#webMethods
#Integration-Server-and-ESB
8. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Thu July 27, 2006 10:53 PM

Reply
Hi Sam,
I can send you the fileSplitter java service. As Rob mentioned, I have optimized this service for optimal performance, because this the service which takes the hit of reading large data.

Bhawesh.

#Adapters-and-E-Standards
#Integration-Server-and-ESB
#webMethods
9. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Wed January 17, 2007 08:10 PM

Reply
I am having a similar issue with what is being described on this thread and would appreciate any insight anyone can offer. I need to process a 60M-70M file. The format of the file is:
O - Order header information.
B (1 occurrence per O)
S (1 occurrence per O)
P (multiple occurrences per O)
There are about 70,000 Order records that need to be processed. I do not need to store these records anywhere, just process them and send an email.

I am able to run my service if I read a small sample file using getfile. But when I set iterate=True, I cannot do anything with the data. I am mapping ffvalues to a schema document created by a schema, and I use the correct schema name in ffschema value in convertToValues. If I savePipelineToFile, I see data in my schema document, but if I try to write any of the schema data to the debug log, the values are null.

As I mentioned before, if I bring the whole file into memory using getfile, the service works fine. I just cannot seem to be able to stream in the data.

Can anyone help?

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards
10. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Sat October 13, 2007 03:15 PM

Reply
bsingh:

Hi Sam,
I can send you the fileSplitter java service. As Rob mentioned, I have optimized this service for optimal performance, because this the service which takes the hit of reading large data.

Bhawesh.

Bhawesh,

Can you pls send the fileSplitter java service to me also? I have a similar problem come up. My email id is: sspone-wmusers@yahoo.com

Rgds,
Sandeep

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards
11. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Mon October 15, 2007 07:08 PM

Reply
bsingh:

Hi Sam,
I can send you the fileSplitter java service. As Rob mentioned, I have optimized this service for optimal performance, because this the service which takes the hit of reading large data.

Bhawesh.

Hi Bhawesh,

I need to split a huge file.
Can you send me your fileSplitter java service ?

My email : thierry.ahcow@bnc.ca

Thanks
Thierry

#Adapters-and-E-Standards
#Integration-Server-and-ESB
#webMethods
12. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Tue October 16, 2007 07:59 AM

Reply
bsingh:

Hi Sam,
I can send you the fileSplitter java service. As Rob mentioned, I have optimized this service for optimal performance, because this the service which takes the hit of reading large data.

Bhawesh.

Hi Bhawesh,

I require to split failry large files being sent across different IS via a broker, one in the US and one in ASPAC.
The file sizes are approximately 50Mb in size, but there are around 50+ documents that get sent to the broker on the US side, so the broker is receiving quite a lot of traffic.

Could you please email me the filesplitter service to aditya.gollakota@customware.net

Thank you,

Aditya Gollakota

#webMethods
#Adapters-and-E-Standards
#Integration-Server-and-ESB
13. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Wed November 07, 2007 12:42 PM

Reply
Bhawesh/Sandeep/All,

Can you pls send the fileSplitter java service to me also?
My email id is: datta.saru@gmail.com
Plz send it ASAP

Regards,
Datta

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards
14. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Wed November 07, 2007 12:45 PM

Reply
2> Wrote a fileSplitter Java service. This service reads the input file as stream, goes through each record to do some validations and creates bunch of tmp files each of records 20,000 (configurable, supplied as input to this service) and creates the output as the list of tmp files thus created.

Please send me the java service. Please do me the favour
My Id: datta.saru@gmail.com

Regards,
Datta

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards
15. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Wed December 26, 2007 06:18 AM

Reply
Bhawesh,

I saw a lot of requests for the file splitter service that you have written. To avoid lengthening the thread for such requests only, I would request you to attach the zip of the code into the post itself.

Regards

#Integration-Server-and-ESB
#webMethods
#Adapters-and-E-Standards
16. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Thu December 27, 2007 11:41 AM

Reply
Hi,
I get lots of request for this service so thought it will be good idea to attach it here.
HTH,
Bhawesh Singh.
sortSplit.zip (16.1 KB)

#Adapters-and-E-Standards
#Integration-Server-and-ESB
#webMethods
17. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Mon December 31, 2007 05:11 PM

Reply
A quick note for all that splitting a file is also doable with the flat file services provided in IS.

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards
18. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Wed July 02, 2008 11:57 PM

Reply
reamon;55018:

A quick note for all that splitting a file is also doable with the flat file services provided in IS.

Oh they are not necessary flat txt files. They could be zip files as well. I need to connect to the SFTP server and get them. I was thinking of letting the underlying Unix commands to get the file instead of getting it into the pipeline memory! But how do I do that? Any ponderings?

#Adapters-and-E-Standards
#Integration-Server-and-ESB
#webMethods
19. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Thu July 03, 2008 12:07 AM

Reply
You won’t be able to “split” a zip file, per se. The files within the zip will need to extracted, then process the resulting files.

The FTP facilities can be used such that entire files are not loaded into memory.

#Adapters-and-E-Standards
#webMethods
#Integration-Server-and-ESB
20. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Thu November 19, 2009 07:28 PM

Reply
Great Job B Singh…Your code works fine with 32 bit webMethods,However some times it is skipping a portion of a line in 64 bit webMethods.please advise.i dont know Java.Do you have another service which does the same in 64 bit.Thanks

bsingh;54969:

Hi,
I get lots of request for this service so thought it will be good idea to attach it here.
HTH,
Bhawesh Singh.

#Integration-Server-and-ESB
#webMethods
#Adapters-and-E-Standards
21. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Fri October 12, 2018 01:41 AM

Reply
reamon:

A quick note for all that splitting a file is also doable with the flat file services provided in IS.

Hi Reamon, Can you please tell me more about this. I saw the flatfile services but I couldn’t find any splitting up of a file.

My problem is, I need to process files greater than 500MB to 1GB. So, the best way is to split the files into 100MB or so and then process each one in webMethods so that it doesn’t run out of memory.

Can you please let me know the best option. They are going with wM at any cost and they want to process data faster to a database. we are doing batchinsert since we are not doing any transformations. Simply, take the file, validate it against the schema and batchinsert.

Can you please provide if there is any utility to split files (our files are pipedelmited files )

#Integration-Server-and-ESB
#Adapters-and-E-Standards
#webMethods
22. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Mon November 05, 2018 09:58 AM

Reply
reamon:

A quick note for all that splitting a file is also doable with the flat file services provided in IS.

Hi reamon,

Same question as previous member - didn’t find any particular service in IS under flat file package, which could split the files into multiple smaller files - which one could process further. The only option I know is using the $iterator field - however that is not splitting the file, rather processing each record at a time and in case of huge flat file it would cause the IS server crash still. Let me know other wise.

The splitService provided by Bhawesh is great, thanks for sharing those over. Thanks

#webMethods
#Adapters-and-E-Standards
#Integration-Server-and-ESB
23. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Thu November 08, 2018 01:03 PM

Reply
Splitting a file into multiple smaller files is okay, but adds complications. Personally, I would not use the flat file services for file splitting. There are other easier techniques for that.

The key to evaluating options for processing a large file is understanding the content. An approach for a delimited file would be different from the approach for an XML file.

For a delimited file, the flat file services can be used. The convertToValues accepts inputs that support reading a file a bit at a time instead of all at once. The documentation describes the inputs but a quick summary: open the file with loadAs set to stream; iterate set to true; keep track of the ffIterator as you loop over the records.

Read 1 record, or up to X records, do whatever (map, validate, etc.) write the records directly to the target, such as a DB.

If you want to process X records at time, you’ll need to do a bit more work to gather them into a document list–call convertToValues X times, gathering each record into a list.

Loop until convertToValues returns no more records.

For XML, use node iterator techniques described in the documentation.

#Adapters-and-E-Standards
#webMethods
#Integration-Server-and-ESB
24. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Thu November 08, 2018 01:12 PM

Reply
Kumar Kumar:

So, the best way is to split the files into 100MB or so and then process each one in webMethods so that it doesn’t run out of memory…
(our files are pipedelmited files )

“Best” always needs definition.

Splitting the large file into smaller files and processing those is one way. Another way is to process the large file iteratively. For a delimited file, my other post describes how to do so. Using iteration, the file is not loaded completely into memory. Out of the box, you can read 1 record at a time, process it, then get the next, etc.

For efficiency, particularly if the records are to be written to a DB, it would be desirable to read multiple records at time, then write them as a group. Doing that takes a bit more work. You’d do a loop inside a loop:

Open file as stream
LOOP until no more records
…LOOP until groupSize or no more records
…convertToValues → returns 1 record
…add that record to a document list
…Write the list to the DB
Close the file

Obviously this is simplified but gives the high-level approach. You can split this into a couple of different services – e.g. have the inner loop be a service that accepts the iterator and returns a document list, then call it until no more records.

Hope this helps though it may be a bit late.

#webMethods
#Adapters-and-E-Standards
#Integration-Server-and-ESB
25. RE: Large Flat File Processing.

Like
webMethods Community Member
Posted Wed September 18, 2019 07:25 AM

Reply
Hello Himanshu Kumar,

i am checking for large file handling(500MB) and i saw your post here.and i didn’t find any particular service in IS to split files or to handle large files.

Kindly could you please share us the Bhawesh provided split service here that would be really helpful.

Regards,
Deepa

#webMethods
#Integration-Server-and-ESB
#Adapters-and-E-Standards

IBM webMethods Hybrid Integration

IBM webMethods Hybrid Integration

Large Flat File Processing.

webMethods Community MemberMon July 24, 2006 08:25 PM

webMethods Community MemberTue July 25, 2006 05:24 PM

webMethods Community MemberTue July 25, 2006 05:43 PM

webMethods Community MemberTue July 25, 2006 08:27 PM

webMethods Community MemberTue July 25, 2006 09:05 PM

webMethods Community MemberTue July 25, 2006 09:35 PM

webMethods Community MemberTue July 25, 2006 09:51 PM

webMethods Community MemberThu July 27, 2006 10:53 PM

webMethods Community MemberWed January 17, 2007 08:10 PM

webMethods Community MemberSat October 13, 2007 03:15 PM

webMethods Community MemberMon October 15, 2007 07:08 PM

webMethods Community MemberTue October 16, 2007 07:59 AM

webMethods Community MemberWed November 07, 2007 12:42 PM

webMethods Community MemberWed November 07, 2007 12:45 PM

webMethods Community MemberWed December 26, 2007 06:18 AM

webMethods Community MemberThu December 27, 2007 11:41 AM

webMethods Community MemberMon December 31, 2007 05:11 PM

webMethods Community MemberWed July 02, 2008 11:57 PM

webMethods Community MemberThu July 03, 2008 12:07 AM

webMethods Community MemberThu November 19, 2009 07:28 PM

webMethods Community MemberFri October 12, 2018 01:41 AM

webMethods Community MemberMon November 05, 2018 09:58 AM

webMethods Community MemberThu November 08, 2018 01:03 PM

webMethods Community MemberThu November 08, 2018 01:12 PM

webMethods Community MemberWed September 18, 2019 07:25 AM

1. Large Flat File Processing.

2. RE: Large Flat File Processing.

3. RE: Large Flat File Processing.

4. RE: Large Flat File Processing.

5. RE: Large Flat File Processing.

6. RE: Large Flat File Processing.

7. RE: Large Flat File Processing.

8. RE: Large Flat File Processing.

9. RE: Large Flat File Processing.

10. RE: Large Flat File Processing.

11. RE: Large Flat File Processing.

12. RE: Large Flat File Processing.

13. RE: Large Flat File Processing.

14. RE: Large Flat File Processing.

15. RE: Large Flat File Processing.

16. RE: Large Flat File Processing.

17. RE: Large Flat File Processing.

18. RE: Large Flat File Processing.

19. RE: Large Flat File Processing.

20. RE: Large Flat File Processing.

21. RE: Large Flat File Processing.

22. RE: Large Flat File Processing.

23. RE: Large Flat File Processing.

24. RE: Large Flat File Processing.

25. RE: Large Flat File Processing.

Related Content

Split Xml

ConvertToValues problem processing a flat file fixed record length

split XML file

convertToValues with multiple record structures

Large File Splitting Java Service

Additional Resources

Office

Quick Links

Additional
Resources