Original Message:
Sent: Fri May 12, 2023 08:50 AM
From: Andreas Legner
Subject: Informix ER -
So you want/need to bring the data over to that (new?) idsLake instance, into one (or fiew) consolidated dataLake database(s), but each row should know its origin - and might have to, to avoid PK clashes?
If NOT, maybe leaving all the data where it resides and configuring the ~30 servers as a shard environment (flavor 'informational shard' where no data movement occurs) would also meet your needs. You can then run so called CLIX queries against the entire setup, or parts of it, which would split the query into branches, executed on local servers, and recombine and consolidate the result sets to the server your connected to.
Should you want to stick with your data consolidation approach, you could use ERKEY, as replication key, and CRCOLS, providing cdrserver + cdrtime shadow columns of which the first would contain the originating (numeric) server ID. Should be good enough as long as you don't plan to update anything on the central server (which would alter the CRCOLS shadow columns - they always contain origin and time of the last modification.) Those shadow columns would not show up in a "select * from ...", but you can select (yet not update) them explicitly.
It would be nice if one could stick a literal value into the select clause of a replicate, but that's not possible now. Moreover these select clauses have a meaning on source and on target, so such literal value had to have an additional source/target property.
What also would help your purpose, but isn't available afaik, is if an incoming ER transaction's source server or database name would be available on SQL level, during ER data apply, so it could be used in an insert or update trigger. If source server ID (from CRCOLS) isn't sufficient, you might be able to translate this into the desired value in a trigger.
Helpful?
BR,
Andreas
------------------------------
Andreas Legner
Original Message:
Sent: Wed May 10, 2023 03:11 AM
From: thibaut vandewalle
Subject: Informix ER -
Hi,
In our organization, we have several informix ids server (~30) and databases (~250) with the same schema on informix IDS server (version 14) all around our country. We would like to build some kind of Data Lake using informix ER and the pattern Primary-Target Replication System - Data Consolidation.
Here is a simplified example :
- Server IDS 1
o DataBase db1
Table customer
Column id (Primary Key)
Column name
…
Table adress
Column id (PK)
Column description
….
o DataBase db2
Table customer
Column id (Primary Key)
Column name
...
Table adress
Column id (PK)
Column description
….
- Server IDS 2
with db3 and db4 etc..
And the schema of our data lake :
- Server IDS idsLake
o Database dataLake
§ Table customer
· Column id (PK)
· Column dataBase (PK)
· Column name
· ….
§ Table Adress
· Column id (PK)
· Column dataBase (PK)
· Column description
· …..
For each table, the primary key is composed with ID and Database to hanble duplicated keys from the different database.
The command to replicate without filling the database column (executed in the Server IDS idsLake) is :
cdr define repl -M g_er -c g_er -C always repl _ind -S row -k ID -f y -D y -T "P db1@ids1:informix.customer" "select name, description from customer" "R dataLake@idsLake:informix.customer" "select name, description from customer";
To fill the column, I hoped i could do something like this
cdr define repl -M g_er -c g_er -C always repl _ind -S row -k ID -f y -D y -T "P db1@ids1:informix.customer" "select name, description from customer" "R dataLake@idsLake:informix.customer" "select name, description, 'ids1' as database from customer";
But it does not work.
Question : how can i know the replication source to fill the column database ?
PS : I know I can handle the database origin with asynchronous post trigger with jsonspl routine but i prefer to avoid it for this use case (because I have to code the replication et manage it by myself)
Thanks by advance
------------------------------
thibaut vandewalle
------------------------------