Data Integration

 View Only
Expand all | Collapse all

Searching documentation about DataStage export and DSX or ISX

  • 1.  Searching documentation about DataStage export and DSX or ISX

    Posted Fri December 15, 2023 06:37 AM

    Hello,

    A client asked me to extract dataset lineage information from DataStage.

    They send me an export of their process as a DSX file and showed me how to export it on their desktop clickable application.

    They use an on-premise DataStage server.

    Now I'm looking for a solution to automate this export. If possible by calling an API otherwise by calling a non-interactive command line client.

    I'm also looking for documentation, or a specification of the file format DSX.

    During my first search, I've heard about an ISX export format that is said to be more recent. So I'm also looking for information about its format or any API to produce it. I've already found a istool export command.

    However my client seamed reluctant to use this format for some reason.

    Any link to relevant documentation would be appreciated.

    Thank you,



    ------------------------------
    Jérôme Mainaud
    ------------------------------


  • 2.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Mon December 18, 2023 02:15 AM

    Hi Jerome,

    take a look at https://www.ibm.com/docs/en/iis/11.7?topic=projects-exporting to export a whole Datastage project or parts of it.

    Regards,

    Udo



    ------------------------------
    Udo Neumann
    ------------------------------



  • 3.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Mon December 18, 2023 04:33 AM

    Hi Jérôme,

    Is it not possible for your client to run the data lineage reports themselves directly on their platform?

    It'd certainly be quicker, easier, cheaper and more accurate than asking you to import their entire DataStage landscape into your environment and running the reports there, 

    John



    ------------------------------
    John McKeever
    ------------------------------



  • 4.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Mon December 18, 2023 06:55 AM

    Hi John,

    Thank you for your idea. I will look at these reports.

    My goal is to automatically import the lineage in a data catalog every night. Therefore I need to get a parseable result without requiring human action.

    The report can be exported in CSV format, so it is somehow parseable. 

    However I see two limitations for now:

    1. I see no automated ways to trigger this.
    2. "The default maximum number of nodes that are displayed in a lineage report is 500. If more than this number of nodes is present in your lineage report, the report is truncated."

    If I can find a solution for both of them, it cloud be cheaper.

    I see this is a part of the InfoSphere Information Governance Catalog which offers a REST API. This API seems to be focused on assets, but maybe there is information about lineage.



    ------------------------------
    Jérôme Mainaud
    ------------------------------



  • 5.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Tue December 19, 2023 09:05 AM

    You might also look at doing the export in XML format (as opposed to DSX) - as THEN it could be read by Manta (the lineage company IBM purchased a couple of months ago) and have the lineage created all.

    But - a better solution would be to look at modernizing the platform to the Cloud Pak for Data and DataStage NextGen..  That way the DataStage job would reside in a project and you could schedule the linage job via the on board scheduler or even run thiongs via a 3rd party scheduler.. And all that could run inside of the environment (including the lineage!)



    ------------------------------
    Mark Hickok
    ------------------------------



  • 6.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Tue December 19, 2023 09:31 AM

    You could analyse the lineage in Information Server if you have IGC licensed. Also you have to differenciate between design-time and runtime lineage. The maintance task could also be automated via scripts.



    ------------------------------
    Udo Neumann
    ------------------------------



  • 7.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Thu December 21, 2023 09:39 AM

    Thank you for the point. I will have to choose between design-time and real time lineage.



    ------------------------------
    Jérôme Mainaud
    ------------------------------



  • 8.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Wed December 20, 2023 10:50 AM

    For design-time metadata, I just use an ISX.  An ISX is basically a ZIP file (you can literally rename it to end in .gz or .zip and whatever Zip tool you prefer opens it right up) with all your objects in individual XML files.  I back up all of my objects automatically on a daily basis then I re-extract all the job/sequence/etc descriptions out of the ISX and throw ".txt" on the end of each filename.  I then put all of those files out on a fileshare and people can use Agent Ransack to search for impact analysis stuff.

    We could probably do a lot better, but the point is that an ISX already contains your entire project, neatly bundled up and easy to understand via anything that can handle hierarchical files.  This all happens over batch.  

    I created an authorization file to protect my passwords, the syntax I use is:

     $ISHOME/Clients/istools/cli/istool.sh export -domain "$DS_SERVICES_HOSTNAME" -af "$AUTHFILE"  -archive "/datastore/backup/proj_backup/$PROJECTNAME.isx" -datastage ' -incexec "'$DS_ENGINE_HOSTNAME'/'$PROJECTNAME'/*/*.*" '

    This is in 11.7.1.4 but I have been using the same command since 8.7.

    Then you take the resulting ISX and simply extract out the comments.  Different object types will have different extensions (.pjb = Parallel Job, etc).



    ------------------------------
    Nate Hoy
    ------------------------------



  • 9.  RE: Searching documentation about DataStage export and DSX or ISX

    IBM Champion
    Posted Thu December 21, 2023 06:59 AM

    As Udo already has pointed out, basing lineage on an export (no matter if this is DSX, XML or ISX) will give you the lineage for design time and not runtime. That means, that if in your project you use shared containers (subflows in CP4D), RCP or have e.g. your tablenames parameterized in your jobs, than you are likely to not get the full data lineage.

    KR Ralf



    ------------------------------
    Ralf Martin
    Principal Consultant
    Infologistix GmbH
    Bregenz
    ------------------------------



  • 10.  RE: Searching documentation about DataStage export and DSX or ISX

    Posted Thu December 21, 2023 10:50 AM

    Good points about design time vs run-time..   BUT..  You should NOT have to CHOOSE between them.  With IBM Knowledge Catalog and Manta you can have BOTH!



    ------------------------------
    Mark Hickok
    ------------------------------