Data Integration

 View Only
  • 1.  RCP on Cassandra Connector

    Posted Wed March 08, 2023 09:27 AM

    Hi Team, would like to get some help and input on the current issue I am facing in my project. We are trying to build a reusable job for loading into Cassandra. However we run into an issue wherein when running on RCP the key column check-box is not being carried over causing the load to fail. Below is the process.

    Job1 - a transform job that creates a load ready dataset. This is NON-RCP job. The PK column(same as on the target table) is defined as a key on the output dataset(being the Key check box is checked).

    Job2 - a reusable load job to Cassandra running on RCP. This job is failing because the Cassandra connector is looking for a primary key. This is the key defined on the Job1.

    Test1 - I created a non reusable load job(NON-RCP) where the columns are defined and the key is checked and is working fine.

    Source -> copy -> Cassandra Connector. This is working fine.

    Test2 - I create another job with a combination of NON-RCP and RCP stagest. The last 3 stages are:

    copy stage(NON-RCP, column are defined together with the key) -> copy(RCP, no columns defined -> Cassandra Connector. This test is failing as Cassandra connector is looking for the key

    Is this an RCP thing or a Cassandra connector issue? Can anyone help me on this issue if you have run into this problem as well.



    ------------------------------
    Ajie LIM
    ------------------------------


  • 2.  RE: RCP on Cassandra Connector

    IBM Champion
    Posted Thu March 09, 2023 02:16 AM

    Hello Ajie,

    altough I have no knowledge about Cassandra, I can say that this is a RCP issue. The problem is, that the key column checkbox is not stored in the OSH schema (set $OSH_PRINT_SCHEMAS=True to see the schemas of all links in your flow/job), so this information will not be forwarded when using RCP. What you can normaly do when working with RCP together with a relational target is, that (in e.g. the DB2 Connector) you would choose Generate SQL=No and write the Update statement with the Where and Set clause yourself (which normally means you generate it before running the job and hand it over as a parameter).

    From what I can see about the Cassandra Connector in 11.7 is, that it does not have an option to provide your own Update statement, while in CP4D the Apache Cassandra (not the Apache Cassandra (optimized)) connector has a write mode "Update" and "Update statement", which maybe can help.

    KR Ralf



    ------------------------------
    Ralf Martin
    Principal Consultant
    Infologistix GmbH
    Bregenz
    ------------------------------



  • 3.  RE: RCP on Cassandra Connector

    Posted Thu March 09, 2023 09:27 AM

    Hi Ajie,

    that is the normal behaviour. Key information are not part of RCP metadata and could be only set at designtime. If you check the metadata of the generated dataset then you will find no key information. I do not know any option to define the key information at runtime.



    ------------------------------
    Udo Neumann
    ------------------------------