Informix

 View Only
  • 1.  Informix ER -

    Posted Wed May 10, 2023 11:26 AM

    Hi,

    In our organization, we have several informix ids server (~30) and databases (~250)  with the same schema on informix IDS server (version 14) all around our country. We would like to build some kind of Data Lake using informix ER and the pattern Primary-Target Replication System - Data Consolidation.


    Here is a simplified example :

    •        Server IDS 1
      o   DataBase db1
           Table customer
               Column id (Primary Key)
               Column name
               …
           Table adress
              Column id (PK)
              Column description
              ….
      o   DataBase db2
           Table customer
              Column id (Primary Key)
              Column name
              ...
          Table adress
             Column id (PK)
             Column description
             ….
    •        Server IDS 2
      with db3 and db4 etc..

    And the schema of our data lake :

    -          Server IDS idsLake

    o   Database dataLake

    §  Table customer

    ·         Column id (PK)

    ·         Column dataBase (PK)

    ·         Column name

    ·         ….

    §  Table Adress

    ·         Column id (PK)

    ·         Column dataBase (PK)

    ·         Column description

    ·         …..

    For each table, the primary key is composed with ID and Database to hanble duplicated keys from the different database.

    The command to replicate without filling the database column (executed in the  Server IDS idsLake) is :

    cdr define repl -M g_er -c g_er -C always repl _ind -S row -k ID -f y -D y -T "P db1@ids1:informix.customer" "select name, description from customer" "R dataLake@idsLake:informix.customer" "select name, description from customer";

    To fill the column, I hoped i could do something like this

    cdr define repl -M g_er -c g_er -C always repl _ind -S row -k ID -f y -D y -T "P db1@ids1:informix.customer" "select name, description from customer" "R dataLake@idsLake:informix.customer" "select name, description, 'ids1' as database from customer";

    But it does not work.

    Question : how can i know the replication source to fill the column database ?

    PS : I know I can handle the database origin with asynchronous post trigger with jsonspl routine but i prefer to avoid it for this use case (because I have to code the replication et manage it by myself)

    Thanks by advance



    ------------------------------
    thibaut vandewalle
    ------------------------------


  • 2.  RE: Informix ER -

    Posted Wed May 10, 2023 01:11 PM

    Hi Thibaut,

    If I understand well your target schema (idslake), the customer table will have the primary key on the "id" and "database" columns, but the source schema will have only the primary key on the "id" column for the same table.

    Technically,Enterprise Replication is based on a replication key to replicate data and ensure data consistency/coherence between the different informix instances.

    All tables that are replicated must have a replication key that is composed of one or more columns that uniquely identifies each row. The replication key must be the same on all servers that participate in the replicate.

    If you do not want to add the "database" column and change the primary key definition on each informix instance, you should use the ERKEY shadow columns in place of a primary key.


    -- Franck Thomas ConsultiX franck.thomas@consult-ix.fr http://www.consult-ix.fr Téléphone : 33 (0) 1 39 12 18 00 Mobile    : 33 (0) 6 78 81 09 33
    Le 10/05/2023 à 17:25, thibaut vandewalle via IBM TechXchange Community a écrit :
    010001880644d4ae-8b37ef7a-bc44-4f8f-8a87-62357f36ad6e-000000@email.amazonses.com">
    Hi,   In our organization, we have several informix ids server (~30) and databases (~250)  with the same schema on informix IDS server (version...
    Server IDS idsLake

    o   Database dataLake

    §  Table customer

    ·         Column id (PK)

    ·         Column dataBase (PK)

    ·         Column name

    ·         ....

    §  Table Adress

    ·         Column id (PK)

    ·         Column dataBase (PK)

    ·         Column description

    ·         .....

    For each table, the primary key is composed with ID and Database to hanble duplicated keys from the different database.

     

     

    The command to replicate without filling the database column (executed in the  Server IDS idsLake) is :

    cdr define repl -M g_er -c g_er -C always repl _ind -S row -k ID -f y -D y -T "P db1@ids1:informix.customer" "select name, description from customer" "R dataLake@idsLake:informix.customer" "select name, description from customer";

     

    To fill the column, I hoped i could do something like this

    cdr define repl -M g_er -c g_er -C always repl _ind -S row -k ID -f y -D y -T "P db1@ids1:informix.customer" "select name, description from customer" "R dataLake@idsLake:informix.customer" "select name, description, 'ids1' as database from customer";

    But it does not work.

     Question : how can i know the replication source to fill the column database ?

     

    PS : I know I can handle the database origin with asynchronous post trigger with jsonspl routine but i prefer to avoid it for this use case (because I have to code the replication et manage it by myself)

    Thanks by advance



    ------------------------------
    thibaut vandewalle
    ------------------------------
      Reply to Group Online   View Thread   Recommend   Forward   Flag as Inappropriate  


     
    You are subscribed to "Informix" as franck.thomas@consult-ix.fr. To change your subscriptions, go to My Subscriptions. To unsubscribe from this community discussion, go to Unsubscribe.





  • 3.  RE: Informix ER -

    Posted Thu May 11, 2023 01:02 PM

    Hi,

    Thanks for the answer. You correctly understood my need.
    I already tried the replication with ERKey but in my data lake. It works fine but I really need to know the origin of the record. That's why I add a column Database in the target schema. 
    Is it possible to analyse ifx_erkey1, ifx_erkey2 or ifx_erkey3 to determine where the record comes from (maybe with a trigger) ?

    Thibaut Vandewalle



    ------------------------------
    thibaut vandewalle
    ------------------------------



  • 4.  RE: Informix ER -

    IBM Champion
    Posted Fri May 12, 2023 08:51 AM

    So you want/need to bring the data over to that (new?) idsLake instance, into one (or fiew) consolidated dataLake database(s), but each row should know its origin - and might have to, to avoid PK clashes?

    If NOT, maybe leaving all the data where it resides and configuring the ~30 servers as a shard environment (flavor 'informational shard' where no data movement occurs) would also meet your needs.  You can then run so called CLIX queries against the entire setup, or parts of it, which would split the query into branches, executed on local servers, and recombine and consolidate the result sets to the server your connected to.

    Should you want to stick with your data consolidation approach, you could use ERKEY, as replication key,  and CRCOLS, providing cdrserver + cdrtime shadow columns of which the first would contain the originating (numeric) server ID.  Should be good enough as long as you don't plan to update anything on the central server (which would alter the CRCOLS shadow columns - they always contain origin and time of the last modification.)  Those shadow columns would not show up in a "select * from ...", but you can select (yet not update) them explicitly.

    It would be nice if one could stick a literal value into the select clause of a replicate, but that's not possible now. Moreover these select clauses have a meaning on source and on target, so such literal value had to have an additional source/target property.
    What also would help your purpose, but isn't available afaik, is if an incoming ER transaction's source server or database name would be available on SQL level, during ER data apply, so it could be used in an insert or update trigger.  If source server ID (from CRCOLS) isn't sufficient, you might be able to translate this into the desired value in a trigger.

    Helpful?

    BR,
     Andreas



    ------------------------------
    Andreas Legner
    ------------------------------



  • 5.  RE: Informix ER -

    Posted Tue May 16, 2023 04:56 PM

    Hi,

    I studied your idea to combine ERKey et CRCOLS and with configuring one group by database in my sqlhosts file, i was able to determine with the column cdrserver the origin of my record. That's exactly what i needed.

    Thanks a lot


    g


    ------------------------------
    thibaut vandewalle
    ------------------------------



  • 6.  RE: Informix ER -

    Posted Sun June 04, 2023 11:25 PM

    you can add a column with default values "ids1" in source table



    ------------------------------
    ZhiWei Cui
    GBASE
    ------------------------------