Informix

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

14.1 WE - fastest way to load 61Mio lines CSV file

1. 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Marc Demhartner
Posted Wed November 10, 2021 05:25 AM

Reply
Hi all,

i have a question about loading CSV files into informix. Currently we´re getting really huge CSV files from ERP systems (61Mio lines in 1 file) - and it takes hours to load it via dbload.
How you would to it ? Base is Informix 14.10 Workgroup Edition with RHEL8 64bit

------------------------------
Marc Demhartner
------------------------------

#Informix
2. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
khaled Bentebal

IBM Champion
Posted Wed November 10, 2021 05:37 AM

Reply
Hi Marc,

You should an external table. You can even parallelize the loading by split the csv file.
Check the documentation for the syntax to use. But the external route is your best way. The target table should be of type RAW if possible ( that way it will not use the logical logs). After loading, change the table from RAW to Standard.

Khaled Bentebal de mon portable

Original Message
3. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Alexandre Marini
Posted Wed November 10, 2021 05:38 AM

Reply
Hi Marc.
Csv loading is never a "simple trick".
Dbload is a very good utility that has several validations to avoid loading incorrect data.
But... you can also try bewer methods eg: external tables (supposing your data is delimited). That might be a good option.

I am just curious when you say "several hours". How is your dbload storing that? Directly into a permanent table? Do you have any cluster or only single-node instance?

Alexandre Marini

Original Message
4. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Marc Demhartner
Posted Wed November 10, 2021 05:47 AM

Reply
Thanks for your feedback - yes, we´re loading it to a fresh created "loading table" tl_marc e.g. without index - CSV file is 100% valid, we´re converting it on OS before loading it into the DB.

Yes, its a single-node instance.

@khaled Bentebal WE does not allow parallelize.

Can you explain that "external table" thing?

------------------------------
Marc Demhartner
------------------------------

Original Message
5. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Leos Urban
Posted Wed November 10, 2021 06:25 AM

Reply
Hi,

1) CSV file "C:\TMP\somedata.csv" contains:

1|Dog|Haf |
2|Cat|Mnau|
3|Snake|Ssss|

2) External table:

create external table myanimalscsv (
id integer,
animal varchar(10),
sound varchar(10)
) using ( DATAFILES ("DISK:/tmp/somedata.csv") );

select * from myanimalscsv; // is working

3) Raw table:
create raw table myanimals (
id integer,
animal varchar(10),
sound varchar(10)
);

Loading:
insert into myanimals select * from myanimalscsv;

Change to normal table:
alter table myanimals type (standard);

Leos

------ Původní zpráva ------

Original Message
6. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Art Kagel

IBM Champion
Posted Wed November 10, 2021 07:24 AM

Reply
Marc:

Myschema can generate external table unload and load scripts for your tables (note that I'm using the latest November 4th release of myschema, the October 28th release first included changing the table to RAW before the load and back to STANDARD after). Here's an example:

art@Elbereth:~/GoogleDrive/MyConsulting/test$ myschema -d art -t extents --myexport-express --myexport-delimiter >/dev/null
art@Elbereth:~/GoogleDrive/MyConsulting/test$ ls -laF
total 28
drwxrwxr-x 3 art art 4096 Nov 10 07:15 ./
drwxrwxr-x 75 art art 12288 Oct 29 06:27 ../
-rw-rw-r-- 1 art art   431 Nov 10 07:15 art.extents.myexport.sql
-rw-rw-r-- 1 art art   502 Nov 10 07:15 art.extents.myimport.sql
art@Elbereth:~/GoogleDrive/MyConsulting/test$ cat art.extents.myimport.sql
CREATE EXTERNAL TABLE extents_myimport (
       dbsname "informix".lvarchar(10000),
       tabname "informix".lvarchar(10000),
       chunk INTEGER,
       offset INTEGER,
       size INTEGER,
       truth "informix".boolean
) USING ( FORMAT "DELIMITED",
   DATAFILES ( "disk:/home/art/GoogleDrive/MyConsulting/test/exten01930.unl" ),
   DELIMITER ",", EXPRESS, ESCAPE ON );
ALTER TABLE extents TYPE( RAW );
INSERT INTO extents
   SELECT * FROM extents_myimport;
DROP TABLE extents_myimport;
ALTER TABLE extents TYPE( STANDARD );

You would just have to change the DATAFILES option to list the actual path of your CSV file.
If you are not familiar with myschema, you can always download the latest release of my utils2_ak package which includes myschema, from my web site at: My Utilities

.
BTW, you can speed up the dbload run somewhat by altering the table to RAW before the load. It won't be as fast as external tables, but may be fast enough for you.

Art

------------------------------
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
www.askdbmgt.com
------------------------------

Original Message
7. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Paul Watson

IBM Champion
Posted Wed November 10, 2021 07:36 AM

Reply
Marc

As people have suggested you can use external tables.

Personally I would use High Performance Loader, but mainly cos I have it all scripted and ready to run.

Out of box thinking ......

· Add a timestamp and use the TimeSeries loader
· Write a simple UDR, nmap the file, scan and load.

Cheers
Paul

Original Message
8. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Mike Walker

IBM Champion
Posted Wed November 10, 2021 08:15 AM

Reply
I'm with Paul - if this is a load you are doing frequently and need it to be as fast as possible, it is worth testing HPL. My testing against Informix 12 (have not repeated against 14) shows better performance loading with HPL than with external tables, and better still if you can split the source file. Of course, you may have very different results with your data. Be sure to make the target table RAW (as in the external table example) for the best performance.

But if you haven't used HPL before, there is a bit of a learning curve. The "onpladm" utility certainly makes it easier though.

------------------------------
Mike Walker
xDB Systems, Inc
www.xdbsystems.com
------------------------------

Original Message
9. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
Henri Cujass

IBM Champion
Posted Thu November 11, 2021 03:28 AM

Reply
Hi Marc,

very nice challenge :-)

It depends on the process requirements: Single Table / Transaction needed / HDR Cluster installation / constraint checks / insert trigger / isolation level ...a.s.o

For the RAW table load the typical performance is from 200.000 records/second (ARM Jetson NVIDIA) to 450.000 records/second (a modern Intel x86 with enough cores for WE). All based on SD/Flash drives for the chunks and files.

My estimation for this task is between 2 and 10 minutes. So you will feel the great speed of Informix with "wait less work" :-)

Have fun
Henri

------------------------------
Henri Cujass
leolo IT, CTO
Germany
IBM Champion 2021
------------------------------

Original Message
10. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Like
John Smith
Posted Mon November 15, 2021 10:52 AM

Reply
Hi
HPL is my favourite for this case, (create table without indexes, load data only with HPL, than create the indexes with max PDQPRIORITY)
if you using AIX with Exceed, then you can use the HPL with ipload (same but graphical and easy to use :) )

------------------------------
John Smith
------------------------------

Original Message

Informix

Informix

14.1 WE - fastest way to load 61Mio lines CSV file

Marc DemhartnerWed November 10, 2021 05:25 AM

khaled BentebalWed November 10, 2021 05:37 AM

Alexandre MariniWed November 10, 2021 05:38 AM

Marc DemhartnerWed November 10, 2021 05:47 AM

Leos UrbanWed November 10, 2021 06:25 AM

Art KagelWed November 10, 2021 07:24 AM

Paul WatsonWed November 10, 2021 07:36 AM

Mike WalkerWed November 10, 2021 08:15 AM

Henri CujassThu November 11, 2021 03:28 AM

John SmithMon November 15, 2021 10:52 AM

1. 14.1 WE - fastest way to load 61Mio lines CSV file

2. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

3. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

4. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

5. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

6. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

7. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

8. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

9. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

10. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Additional
Resources

Office

Quick Links

Informix

Informix

14.1 WE - fastest way to load 61Mio lines CSV file

Marc DemhartnerWed November 10, 2021 05:25 AM

khaled BentebalWed November 10, 2021 05:37 AM

Alexandre MariniWed November 10, 2021 05:38 AM

Marc DemhartnerWed November 10, 2021 05:47 AM

Leos UrbanWed November 10, 2021 06:25 AM

Art KagelWed November 10, 2021 07:24 AM

Paul WatsonWed November 10, 2021 07:36 AM

Mike WalkerWed November 10, 2021 08:15 AM

Henri CujassThu November 11, 2021 03:28 AM

John SmithMon November 15, 2021 10:52 AM

1. 14.1 WE - fastest way to load 61Mio lines CSV file

2. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

3. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

4. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

5. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

6. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

7. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

8. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

9. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

10. RE: 14.1 WE - fastest way to load 61Mio lines CSV file

Related Content

Bug fix to myschema!

Performance Informix encryption

stropt.h missing when compiling myschema

Art S. Kagel

Backup smartblobs seperate and move from 11.70 to 14.10

Additional Resources

Office

Quick Links

Additional
Resources