webMethods

webMethods

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
Expand all | Collapse all

Large Flat File Processing.

  • 1.  Large Flat File Processing.

    Posted Mon July 24, 2006 08:25 PM

    Hi,

    I am getting one gb size flat file which I need to process the header, detail and trailer. I need to create multiple XML file depends on the header information.

    I am publishing to Broker to process header, detail and trailer records. I created a trigger to process my records in single thread mode but I would like to create multi thread for detail record.

    1. How can I identify whether all my detail records are processed by multi thread trigger?
    2. convertoValue service createas @fields. I could not publish @fields to my broker. How can I drop them from my convertoValue Output.

    Thank you very much for your help.

    Thanks
    Sam.


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards


  • 2.  RE: Large Flat File Processing.

    Posted Tue July 25, 2006 05:24 PM

    Hi Sam,
    1GB Flat File is indeed a large file and special care should be taken to handle this type of file, considering the fact that IS JVM cannot have more than ~2.5GB allocated.

    Below is how I have done to process the file of this size > 1GB containing no. of records > 500,000:

    1> To avoid any overhead, used only IS (no Broker, no TN, no Modeller etc…).

    2> Wrote a fileSplitter Java service. This service reads the input file as stream, goes through each record to do some validations and creates bunch of tmp files each of records 20,000 (configurable, supplied as input to this service) and creates the output as the list of tmp files thus created.

    3> Main flow then processes these files 1 at a time, creates the desired output file and deletes the temp file after processing: all in a loop. The data in the output file is appended each time in the loop.

    4> Found that the split size of 20,000 records were optimal (Total processing time < 2 hrs.). Setting it to higher or lower value increased the total processing time.

    5> Solution is scalable e.g. if the input file size grows in future, the splitter will create more temp files but the IS will handle only small chunk of data (20,000 recs) at a time and so will not go out-of-memory.

    Your integration could be totally different than what I had but wanted to give you some pointers to ponder over while handling a large file > 1GB.

    HTH,
    Bhawesh.


    #Adapters-and-E-Standards
    #webMethods
    #Integration-Server-and-ESB


  • 3.  RE: Large Flat File Processing.

    Posted Tue July 25, 2006 05:43 PM

    Nice description Bhawesh.

    Questions for Sam:

    Is there a reason for publishing the records to the Broker?

    Does the order of the records need to be maintained?

    Do the records need to be processed as a group?

    Do the header and trailer have meaningful information or are they just control records to verify that you’ve received the entire file?

    You can publish @ fields (attributes) to the Broker. You just can’t read them with anything on the subscriber side other than IS.

    What is the target of this 1G file? Another file in a different format? Inserts into a database table?

    The answers will help guide an appropriate solution.


    #Integration-Server-and-ESB
    #webMethods
    #Adapters-and-E-Standards


  • 4.  RE: Large Flat File Processing.

    Posted Tue July 25, 2006 08:27 PM

    Thank you very much for your help.

    1. Yes, I need to maintain the order in the Line number level like below.

    Header
    Line 1
    Detail1.1
    Detail1.2
    Line 2
    End of record.

    1. Yes, I need to process the whole file.

    2. Header has meaningful information but trail does not have any information.

    3. I am creating an electronic catalog file.

    In the feature I need to support multiple version of xml document from single flat file.


    #webMethods
    #Adapters-and-E-Standards
    #Integration-Server-and-ESB


  • 5.  RE: Large Flat File Processing.

    Posted Tue July 25, 2006 09:05 PM
    1. Must the data for item 2 be processed after item 1? Of course the lines that make up an item (line1, detail1.1, detail1.2) need to stay together, but can the items be processed independently and in any order?

    2. The question wasn’t whether or not you need to process the entire file. But rather, whether or not all the items in the file must be processed as a single unit–in other words, if one item fails for some reason, can you continue with the remaining items or do you need to stop and rollback all work up to that point?

    3. A flat file? An XML file? An “electronic catalog file” is not descriptive enough.


    #Integration-Server-and-ESB
    #webMethods
    #Adapters-and-E-Standards


  • 6.  RE: Large Flat File Processing.

    Posted Tue July 25, 2006 09:35 PM
    1. Each line must be processed as a unit.
      Line1
      Detail 1.1
      Line2
      Detail 2.1

    2. Yes, I need to rollback if I have any exception during my processing.

    3. It is an XM FILE.


    #Integration-Server-and-ESB
    #webMethods
    #Adapters-and-E-Standards


  • 7.  RE: Large Flat File Processing.

    Posted Tue July 25, 2006 09:51 PM

    In this case, the process outlined by Bhawesh should work fine for you too. You do not need to publish items to the Broker. Do not use it as part of your solution. Be careful with how you write your file splitter service–do as little processing as possible.


    #Adapters-and-E-Standards
    #webMethods
    #Integration-Server-and-ESB


  • 8.  RE: Large Flat File Processing.

    Posted Thu July 27, 2006 10:53 PM

    Hi Sam,
    I can send you the fileSplitter java service. As Rob mentioned, I have optimized this service for optimal performance, because this the service which takes the hit of reading large data.

    • Bhawesh.

    #Adapters-and-E-Standards
    #Integration-Server-and-ESB
    #webMethods


  • 9.  RE: Large Flat File Processing.

    Posted Wed January 17, 2007 08:10 PM

    I am having a similar issue with what is being described on this thread and would appreciate any insight anyone can offer. I need to process a 60M-70M file. The format of the file is:
    O - Order header information.
    B (1 occurrence per O)
    S (1 occurrence per O)
    P (multiple occurrences per O)
    There are about 70,000 Order records that need to be processed. I do not need to store these records anywhere, just process them and send an email.

    I am able to run my service if I read a small sample file using getfile. But when I set iterate=True, I cannot do anything with the data. I am mapping ffvalues to a schema document created by a schema, and I use the correct schema name in ffschema value in convertToValues. If I savePipelineToFile, I see data in my schema document, but if I try to write any of the schema data to the debug log, the values are null.

    As I mentioned before, if I bring the whole file into memory using getfile, the service works fine. I just cannot seem to be able to stream in the data.

    Can anyone help?


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards


  • 10.  RE: Large Flat File Processing.

    Posted Sat October 13, 2007 03:15 PM

    Bhawesh,

    Can you pls send the fileSplitter java service to me also? I have a similar problem come up. My email id is: sspone-wmusers@yahoo.com

    Rgds,
    Sandeep


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards


  • 11.  RE: Large Flat File Processing.

    Posted Mon October 15, 2007 07:08 PM

    Hi Bhawesh,

    I need to split a huge file.
    Can you send me your fileSplitter java service ?

    My email : thierry.ahcow@bnc.ca

    Thanks
    Thierry


    #Adapters-and-E-Standards
    #Integration-Server-and-ESB
    #webMethods


  • 12.  RE: Large Flat File Processing.

    Posted Tue October 16, 2007 07:59 AM

    Hi Bhawesh,

    I require to split failry large files being sent across different IS via a broker, one in the US and one in ASPAC.
    The file sizes are approximately 50Mb in size, but there are around 50+ documents that get sent to the broker on the US side, so the broker is receiving quite a lot of traffic.

    Could you please email me the filesplitter service to aditya.gollakota@customware.net

    Thank you,

    Aditya Gollakota


    #webMethods
    #Adapters-and-E-Standards
    #Integration-Server-and-ESB


  • 13.  RE: Large Flat File Processing.

    Posted Wed November 07, 2007 12:42 PM

    Bhawesh/Sandeep/All,

    Can you pls send the fileSplitter java service to me also?
    My email id is: datta.saru@gmail.com
    Plz send it ASAP

    Regards,
    Datta


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards


  • 14.  RE: Large Flat File Processing.

    Posted Wed November 07, 2007 12:45 PM

    2> Wrote a fileSplitter Java service. This service reads the input file as stream, goes through each record to do some validations and creates bunch of tmp files each of records 20,000 (configurable, supplied as input to this service) and creates the output as the list of tmp files thus created.

    Please send me the java service. Please do me the favour
    My Id: datta.saru@gmail.com

    Regards,
    Datta


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards


  • 15.  RE: Large Flat File Processing.

    Posted Wed December 26, 2007 06:18 AM

    Bhawesh,

    I saw a lot of requests for the file splitter service that you have written. To avoid lengthening the thread for such requests only, I would request you to attach the zip of the code into the post itself.

    Regards


    #Integration-Server-and-ESB
    #webMethods
    #Adapters-and-E-Standards


  • 16.  RE: Large Flat File Processing.

    Posted Thu December 27, 2007 11:41 AM

    Hi,
    I get lots of request for this service so thought it will be good idea to attach it here.
    HTH,
    Bhawesh Singh.
    sortSplit.zip (16.1 KB)


    #Adapters-and-E-Standards
    #Integration-Server-and-ESB
    #webMethods


  • 17.  RE: Large Flat File Processing.

    Posted Mon December 31, 2007 05:11 PM

    A quick note for all that splitting a file is also doable with the flat file services provided in IS.


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards


  • 18.  RE: Large Flat File Processing.

    Posted Wed July 02, 2008 11:57 PM

    Oh they are not necessary flat txt files. They could be zip files as well. I need to connect to the SFTP server and get them. I was thinking of letting the underlying Unix commands to get the file instead of getting it into the pipeline memory! But how do I do that? Any ponderings?


    #Adapters-and-E-Standards
    #Integration-Server-and-ESB
    #webMethods


  • 19.  RE: Large Flat File Processing.

    Posted Thu July 03, 2008 12:07 AM

    You won’t be able to “split” a zip file, per se. The files within the zip will need to extracted, then process the resulting files.

    The FTP facilities can be used such that entire files are not loaded into memory.


    #Adapters-and-E-Standards
    #webMethods
    #Integration-Server-and-ESB


  • 20.  RE: Large Flat File Processing.

    Posted Thu November 19, 2009 07:28 PM

    Great Job B Singh…Your code works fine with 32 bit webMethods,However some times it is skipping a portion of a line in 64 bit webMethods.please advise.i dont know Java.Do you have another service which does the same in 64 bit.Thanks


    #Integration-Server-and-ESB
    #webMethods
    #Adapters-and-E-Standards


  • 21.  RE: Large Flat File Processing.

    Posted Fri October 12, 2018 01:41 AM

    Hi Reamon, Can you please tell me more about this. I saw the flatfile services but I couldn’t find any splitting up of a file.

    My problem is, I need to process files greater than 500MB to 1GB. So, the best way is to split the files into 100MB or so and then process each one in webMethods so that it doesn’t run out of memory.

    Can you please let me know the best option. They are going with wM at any cost and they want to process data faster to a database. we are doing batchinsert since we are not doing any transformations. Simply, take the file, validate it against the schema and batchinsert.

    Can you please provide if there is any utility to split files (our files are pipedelmited files )


    #Integration-Server-and-ESB
    #Adapters-and-E-Standards
    #webMethods


  • 22.  RE: Large Flat File Processing.

    Posted Mon November 05, 2018 09:58 AM

    Hi reamon,

    Same question as previous member - didn’t find any particular service in IS under flat file package, which could split the files into multiple smaller files - which one could process further. The only option I know is using the $iterator field - however that is not splitting the file, rather processing each record at a time and in case of huge flat file it would cause the IS server crash still. Let me know other wise.

    The splitService provided by Bhawesh is great, thanks for sharing those over. Thanks


    #webMethods
    #Adapters-and-E-Standards
    #Integration-Server-and-ESB


  • 23.  RE: Large Flat File Processing.

    Posted Thu November 08, 2018 01:03 PM

    Splitting a file into multiple smaller files is okay, but adds complications. Personally, I would not use the flat file services for file splitting. There are other easier techniques for that.

    The key to evaluating options for processing a large file is understanding the content. An approach for a delimited file would be different from the approach for an XML file.

    For a delimited file, the flat file services can be used. The convertToValues accepts inputs that support reading a file a bit at a time instead of all at once. The documentation describes the inputs but a quick summary: open the file with loadAs set to stream; iterate set to true; keep track of the ffIterator as you loop over the records.

    Read 1 record, or up to X records, do whatever (map, validate, etc.) write the records directly to the target, such as a DB.

    If you want to process X records at time, you’ll need to do a bit more work to gather them into a document list–call convertToValues X times, gathering each record into a list.

    Loop until convertToValues returns no more records.

    For XML, use node iterator techniques described in the documentation.


    #Adapters-and-E-Standards
    #webMethods
    #Integration-Server-and-ESB


  • 24.  RE: Large Flat File Processing.

    Posted Thu November 08, 2018 01:12 PM

    “Best” always needs definition. :slight_smile:

    Splitting the large file into smaller files and processing those is one way. Another way is to process the large file iteratively. For a delimited file, my other post describes how to do so. Using iteration, the file is not loaded completely into memory. Out of the box, you can read 1 record at a time, process it, then get the next, etc.

    For efficiency, particularly if the records are to be written to a DB, it would be desirable to read multiple records at time, then write them as a group. Doing that takes a bit more work. You’d do a loop inside a loop:

    Open file as stream
    LOOP until no more records
    …LOOP until groupSize or no more records
    …convertToValues → returns 1 record
    …add that record to a document list
    …Write the list to the DB
    Close the file

    Obviously this is simplified but gives the high-level approach. You can split this into a couple of different services – e.g. have the inner loop be a service that accepts the iterator and returns a document list, then call it until no more records.

    Hope this helps though it may be a bit late.


    #webMethods
    #Adapters-and-E-Standards
    #Integration-Server-and-ESB


  • 25.  RE: Large Flat File Processing.

    Posted Wed September 18, 2019 07:25 AM

    Hello Himanshu Kumar,

    i am checking for large file handling(500MB) and i saw your post here.and i didn’t find any particular service in IS to split files or to handle large files.

    Kindly could you please share us the Bhawesh provided split service here that would be really helpful.

    Regards,
    Deepa


    #webMethods
    #Integration-Server-and-ESB
    #Adapters-and-E-Standards