Cloud Pak for Data

View Only

How to Retrieve the Output of a Notebook Node and Use It in Pipeline Conditions in IBM Cloud Pak for Data?

Sara A posted Sun December 01, 2024 05:50 AM

I am working on a pipeline in IBM Cloud Pak for Data and need help understanding how to view and use the output of a Notebook node. My goal is to create a conditional execution path in the pipeline based on the Notebook's results.

How can I inspect the runtime output of a Notebook node after it runs in the pipeline?
Is it possible to access and pass the Notebook output directly to a condition or another part of the pipeline? If so, how?

What I’ve tried so far:

Environment Variables & Bash Scripts: I attempted to set environment variables and use Bash scripts for passing data between nodes, but this approach resulted in errors.
Pipeline Parameters: I tried using pipeline parameters to share data across nodes but couldn’t figure out how to directly link these parameters to the Notebook node output.
Notebook Configuration Output: I explored configuring the Notebook node Output, but I couldn’t figure out how to link it to a variable from the notebook itself.

Pipeline start node and downstream

Node Configuration Output

Ralf Martin IBM Champion posted Mon December 09, 2024 02:35 AM

Hi Sara,

I have yet not worked with Notebook Node in Watson Pipelines, but have you tried accessing an attribute from your notebook in the expression editor of e.g. a "Set user variable"? There (in the expression editor) access tasks.Nodename.results.Option (Option could be return_value). This is yet (5.0.3) not implemented in the Expression elements, but you can access it via the typeahead, if you know the keywords.

Pipelineparameters: params.
Variables: vars.
Values from stages: tasks.

I hope this helps.

Sara A posted Mon December 09, 2024 06:51 AM

Hi @Ralf Martin,

Thank you for taking the time to respond to my question I appreciate your guidance.
I wanted to provide more details about what I’ve tried so far, as I realized I didn’t explain everything thoroughly in my original question:

1. I tried using expressions in the condition to try to access the notebook variable value. This approach didn’t work with me.

The condition expression:

tasks.run_notebook_job.results == "API"

2. I attempted to save the notebook output in python as a file and access it using either a condition expression or a pipleline parameter of type "cpd"-->"file" pointing to the notebook output file path.

The condition expression:

tasks.run_notebook_job.results.outputFile.contains("API")

note: outputFile is an environment variable declared in the pipeline notebook node output configuration

The condition expression: pipleline parameter

params.OUTPUTFILE.conatins("API")

3. Regarding your suggestion, I followed it as best as I understood.

I created a user variable
used an expression to access the notebook variable tasks.run_notebook_job.results I tried other things as you mentioned Options like retun_value or name of variable in the notebook but i get an error
then tried using a condition expression to access these variables via vars.

note:in this case python code has a variable of type int (1--> api)

The error:

I’m still unsure how to access the python variable in my notebook through the pipeline whether it’s as an output file, a variable, or an environment variable...etc. Could you please clarify this point or guide me further if I misunderstood your suggestion?

I’m new to Cloud Pak for Data, so I’m trying to explore and make the best use of the options available while also checking the documentation. I’d be really grateful if you could share more details or guide me further. I value your experience and advice. Whenever you have a moment, would you be open to trying this out and sharing any insights or suggestions you may have?

Ralf Martin IBM Champion posted Mon December 09, 2024 09:48 AM

Hi Sara,

on my side I can not check this with a Notebook node (honestly I do not know what it is supposed to do, I use Watson Pipelines soley in a DataStage conetxt), but in general you should be able to access

tasks.NodeName.results.return_value : Returncode of your task
tasks.NodeName.results.standard_output : What you task writes to the screen

So if you to an echo/print in your task, you should be able to capture it with tasks.NodeName.results.standard_output.

On Pipeline Parameters and Variables: As far as I know, these are used in the context of your pipeline, can be handed over and/or refrenced to/in tasks called in it (e.g. in a Run Bash Script), but you can not amnipulate them there (e.g. changing the value of a Pipelinevariable in your python script).

See an example where I access two Pipeline Parameters in a Run Bash Skript, they first have to be made available for the node in the Environment Variables (optional) section of the Node. This does an echo of the two parameters.

I developed this small pipeline in Legacy DataStage and imported it to CP4D, after one of the two Run Bash Skript nodes I have the UVA_2, where I did reference the Command Output and Return Value, which was translated to the following in CP4D (I guess for compatiblity reasons)

ds.GetCommandOutput(tasks.exec1) + '_' + string(ds.GetReturnValue(tasks.exec1))

I guess this is the same as

tasks.exec1.results.standard_output+ '_' +tasks.exec1.results.return_value

You do not have to do this in an Set user variable but can do this anywhere where you have an expression editor (like an Advance condition of a link conecting two nodes).

If what you run is a python script, then you can also do this in a Run Bash Skript node. Or if your Python skript writes to a flat file, in a Run Bash Skript node you can to a cat on this file and again refrence the standard_output of this Run Bash Skript node.

Sara A posted Tue December 10, 2024 05:38 AM

Hi @Ralf Martin,

Thanks for taking the time to answer me. Let me keep it simple.

Say I have a python code that reads a file, the file has a single line value e.g 'API':

with open('/project_data/data_asset/output.txt') as f:

contents = f.read()

print(contents)

so, now contents will print 'API'

This code is what I will use in the pipeline node python script job.

What I struggle to resolve is how can I access the variable "contents" in the pipeline and use it in the condition ?
I don't see it in the options list of the CEL Expression! I know there is a way to access the job output, the documentations mentions this but I can't figure out how !

Documentaion :

tasks.run_datastage_job.results.score

Gets the value score of job output

I am trying to use this structure in the condition but I don't know how to access the job output by this I mean "notebook variable"

tasks.run_script_job.results.??? == 'api'

Then I tried to configure the python script node output and added this but still it looks like it's not reading my variable:

Ralf Martin IBM Champion posted Tue December 10, 2024 06:21 AM

Hi Sara,

I guess that

print(contents)

will print "API" to stdout.

Both in my CP4D and my private DaaS, I only have a "Run Bash script" and not a "Run script job", in the "Run Bash script" on the output tab, there is a "Standard Output" and "Return Value" defined by default and "Standard Output" can easily be refrenced by

tasks.exec1.results.standard_output

So, can you run your python script from within as "Run Bash script"? Then you should be able to access your STDOUT.

Or, what I think is easier, do the following in a "Run Bash script" (just use shell instead of python)

cat '/project_data/data_asset/output.txt'

This prints the content of your file to STDOUT.

Cloud Pak for Data

How to Retrieve the Output of a Notebook Node and Use It in Pipeline Conditions in IBM Cloud Pak for Data?

Additional
Resources

Office

Quick Links

Cloud Pak for Data

How to Retrieve the Output of a Notebook Node and Use It in Pipeline Conditions in IBM Cloud Pak for Data?

Related Content

How to retrieve assets in a data science project of Cloud Pak for Data 3.5 by APIs

Improving Code Reusability, Part 2: Passing an Operator as a Parameter

Passing parameters to an operator in a parallel region

How to use existing python script in Cloud Pak for Data 3.5 data science project

Utilities Demand Response Program Propensity

Additional Resources

Office

Quick Links

Additional
Resources