IBM Data Management Community Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems. Join / Log in
We are using Streams Export & Import operators extensively.which were implemented in C++ as mentioned in Documentation.
Is export/import operators use TCP IP functionality internally...?
Q1. If a single job exporting the data to more than 3 jobs. Let's assume First job exported 10 records to downstream jobs.
Each job will have its own set of data copy. Is TCP port also creates the data copy for each down stream job and keeps it in TCP buffer..?
However the congestion policy works once the connection is established and restarted the down Stream job then there will not be any data loss. How does exporting job make sure that there will not be any data loss after a restart of down Stream job.
Q2 Job 1 exporting data which is having 30 columns, connecting to 3 down stream jobs.
Ex: Down Stream job one need columns from 1 to 20.
Down Stream job two need columns from 21 to 28
Down Stream job three need columns from 29 to 30
In below mentioned approaches which is advisable...?
Approach 1 :
Job 1 is having a single export Operator and connecting to all 3 down streams job. After import it will filter the columns.
Approach 2 :
Job 1 will create the three output streams with expected columns
O/p stream 1 : 20 columns
O/p stream 2 : 8 columns
O/p stream 3 : 2 columns
and have 3 export operators which will connect to each individual jobs.
Export/Import operators do use TCP/IP to communicate data.
Q1: The Export operator will have a socket connection for each Import operator and will perform a write to each socket using the same buffer.
The Export operator does not make any guarantees about data loss. If an Import operator connection is not present (either because it's restarting or hasn't been brought up yet) then it will miss out on that data.
Regarding congestion policy, you're correct that the congestion policy is only performed once a connection to an Import operator is established. Depending on that policy, the Export operator may drop the connection from the Import operator if it's not consuming data fast enough (with the 'dropConnection' policy or it will wait on for connection to accept more data. Again, if the Import operator connection gets disconnected for whatever reason then the Export will move on to the next connection.
Q2: Either approach will work and each have their own pros and cons. However, I think Approach 2 would be more beneficial since you'll only be sending required data rather than extra data that's going to be filtered out immediately anyway.
Thanks for the reply. got good picture about internal functionalities about the export & Import operators.
Do we have any documentation about these things other than
Operator Export (ibm.com)..?
This page here: https://www.ibm.com/support/knowledgecenter/en/SSCRJU_4.2.1/com.ibm.streams.ref.doc/doc/dynamicappcomposition.html
may provide some more insight about Import and Export but I have not been able to find an exact page that covers our Q&A.