Cloud Pak for Data

 View Only
  • 1.  Import of sub sequencers

    Posted 5 days ago

    Dear All,

    While working on a migration activity from 11.7 to CP4D, I noticed a behavior of cp4d which I wanted to confirm if that is the expected functionality.

    For this explanation lets assume , i have a sequencer A calling a sub sequencer B thru Job Activity stage. As per my migration strategy, I am performing import of assets individually thru  dsjob migrate using isx. Lets assume B was already migrated and after migrating sequence A without any dependencies, I can see cp4d is identifying the sub-sequencer node which links to B as "Run Datastage Job" type rather than "Run Pipeline Job". 

    When i try to open node B from A as view asset it is not able to fetch it (404 Not Found CDIWA0078E Pipeline JSON not found for flow), but when I try to view it as job it is able to take me to B.Datastage Sequencer page. 

    In my opinion, while migrating independent sequencers job from 11.7 to cp4d, the nodes in the sequencer which are calling other sequencers should not convert it to "Run Datastage Job" type. 

    However, when we export both A and B together into an isx and migrate it into cp4d, the node which calls B now rightly identifies as "Run Pipelines job" type. 

    Kindly let me know if we always need to include the dependencies in the isx file while migrating it for the first time to avoid any such similar issues or there is a different way to do this. The problem with this approach is that the common components which are called by org wide sequencers have to be repeatedly exported to avoid the issue. 

    Thanks for your help.

    Best,

    Tapas



    ------------------------------
    Tapas Pradhan
    ------------------------------


  • 2.  RE: Import of sub sequencers

    Posted 4 days ago

    Hi Tapas,

    The first one: First import sub sequence B, then import sequence A, B appears `Run DataStage job`, is a defect, it should be treated as `Run Pipeline Job`, we will fix this asap.

    In general, when you import isx, you should import with full dependency. Missing dependency, especially missing ParameterSet, PROJDEF, shared container or nested sequence, is a common problem that can lead to frustration. If your project is small to middle size (5000 jobs or fewer), export the whole thing as isx with full dependency is much faster way to manage your migration. If it is bigger, you may want to manage the import using partitioning the workload by folder method. Even with this, it is much better to include full dependency as well. Granted it will make your isx file a little bigger, the migration service can resolve the difference properly this way.



    ------------------------------
    YONG LI
    ------------------------------



  • 3.  RE: Import of sub sequencers

    Posted 4 days ago
    Edited by Tapas Pradhan 4 days ago

    Thank you Yong for acknowledging this as a defect.

    I agree to your point that we should import with full dependency. On a full fledged 11.7 project, we usually use incdep option from istool starting from the master(top level) sequencer to figure out complete dependency hierarchy, but there are two notable issues with this approach if my DataStage project has a huge repository

    1) istool typically scans the whole repo for each dependency discovery. This takes around 3 - 5 mins (benchmarked on a project with 17k jobs) to figure out a dependency for one single sequencer. This makes our migration slow when dealing with thousands of sequencers. 

    2) istool sometimes also timeout when figuring out bigger dependencies. We did try to increase the timeout settings for istool but that didn't make much difference. As istool is failing on certain conditions, it makes this process unreliable when we try to implement scripts to automate the migration. 

    If there is another reliable way we can do it much more faster please guide. Would you mind share some links where I can learn more about " import using partitioning the workload by folder method" ?

    I would wait for your revert on which product release, are you planning to fix the original issue that I had highlighted. Thanks for your time reading my post and replying. 



    ------------------------------
    Tapas Pradhan
    ------------------------------



  • 4.  RE: Import of sub sequencers

    Posted 4 days ago

    Hello Tapas,

    if you migrate each Datastage job on its own, how should migration services know B is a job sequence. Maybe you as a developer know this because of the naming convention you are following. Migration service only sees a job activity with out further information.

    CP4D documentations says: Make sure that the ISX file export includes any dependencies, such as parameter sets and table definitions.

    https://www.ibm.com/docs/en/cloud-paks/cp-data/4.8.x?topic=data-migrating-datastage-jobs

    Hope this helps.

    Rgds Victoria



    ------------------------------
    Victoria Rickmann
    ------------------------------



  • 5.  RE: Import of sub sequencers

    Posted 4 days ago
    Edited by Tapas Pradhan 4 days ago

    Hi Victoria, 

    You are correct. As a developer we follow naming conventions which helps us identifying the kind of jobs we are dealing with without looking at it.

    In my opinion, migration service should have that intelligence (either of its own logic or thru user-defined configuration) to figure out if a Job Activity is calling a job or sequencer and should not convert all the Job Activity Stage to Run DataStage Job when migrated standalone. 

    Let me explain more by taking the same example. The job sequencer B is already migrated individually to cp4d. As we know every asset name is uniquely identified in cp4d, when master sequencer A is getting migrated with without dependencies in the isx , it should ideally identify job B as sequencer (as the metadata is already present in cp4d) and accordingly migrate "Job Activity" Stage to "Run Pipeline job". 

    Lets consider another scenario where sequencer A is getting migration first and we do not have sequencer B migrated yet. In this scenario, i could think of couple of options which should do but not necessarily take this approach:-

    1) It should errored out and ask out for missing dependent . 
    2) Job Activity should not resolve to either Run Pipeline job or Run DataStage job but rather wait for B to get migrated before it can decide which stage to resolve to ( highly impossible ! )

    I have highlighted few issues with ISX file exports on my previous reply. Also, if we have common components being called by each master sequencer, it adds a overhead to export and import the same common components repeatedly in each master sequencer move and reduces efficiency of the migration process. 

    Please let me know your views on this. Thank you for taking time and reverting back on the issue. 



    ------------------------------
    Tapas Pradhan
    ------------------------------