Watson Studio

 View Only

SPSS Modeler Product Manager Tips and Tricks (2)

By SARAH DUNWORTH posted Thu September 03, 2015 11:23 AM

  
A common request I get is how to extract coefficients from regression and logistic regression model nuggets in Modeler. This is usually possible for any model that generates pmml, by exporting the pmml and then using the xml source node to parse the pmml and bring the coefficients back into Modeler as data. I'll first show a simple example of how you can do that with scripting, then i will show an alternative that is now possible with Modeler 17, where you don't have to actually generate the pmml at all.

This is my stream (adapted from one of the shipped demo streams), where I am building multiple logistic regression models:

2015-09-03_114141

 

and I would like to work extract the Coefficients generated for each model, but in an automated fashion (ie I don't want to manually copy and paste the values from the tables):

2015-09-03_113831

The script editor is accessed by going to Tools...Stream Properties...Execution

2015-09-03_114259

The script below locates and executes each of the Logistic build nodes in turn to generate a fresh model nugget, then exports the pmml for each model to a file location specified in the script:

from modeler.api import FileFormat

stream = modeler.script.stream()
session = modeler.script.session()
tr = session.getTaskRunner()

targets = [u"credit1",u"credit2"]

FOLDER = "C:/temp/models/"

def runAndExport(target):
buildnode = stream.findByType("logreg", target)
results = []
buildnode.run(results)
modeloutput = results[0]
filename = FOLDER + target + ".xml"
# print "Exporting", modeloutput, "to", filename
tr.exportModelToFile(modeloutput, filename, FileFormat.XML)
# Good practise to delete the model output from the palette
session.getModelOutputManager().removeModelOutput(modeloutput)

for target in targets:
runAndExport(target)

 

Once i have written or pasted the script into the script editor window, it can be tested by clicking the "Run this Script" button:

2015-09-03_115448

The result should be that the pmml files are generated on the file system in the directory specified in the script - but you still then need to parse the pmml to obtain the coefficients.

2015-09-03_115634

 

With Modeler 17, we added support for a pmml content model, which means the pmml can be directly manipulated within the python script.  The script below uses this new method.

 

from modeler.api import FileFormat

stream = modeler.script.stream()
session = modeler.script.session()

targets = [u"credit1",u"credit2"]

def runAndExtract(target):
buildnode = stream.findByType("logreg", target)
results = []
buildnode.run(results)
modeloutput = results[0]
cm = modeloutput.getContentModel("PMML")
values = cm.getValuesList("//ParamMatrix/PCell", ["parameterName", "beta"], False)
# Good practise to delete the model output from the palette
session.getModelOutputManager().removeModelOutput(modeloutput)
return values

for target in targets:
print "Coefficients for", target, "=", runAndExtract(target)

 

The result of executing this script is that the required coefficients can now be referenced within the script and used however they are needed - in this case, they are just being printed to the debug window.

2015-09-03_120602

 

Some tips for adapting this script:

- If you have a different model type: You would need to change the "logreg" to the model type. If you don't know how to refer to the model type, you could CTRL-C the node, then CTRL-SHIFT-P (described in my previous post) within the script editor to learn how to refer to that model type.

- Changing the name/number of models: Make sure each model is given a unique name (ie don't use Auto naming), and just refer to them in the list of targets, eg targets = [u"credit1",u"credit2", u"credit3", u"modelxyz"]

 

 

For more information on accessing results through these content models, refer to the documentation:

http://www-01.ibm.com/support/knowledgecenter/SS3RA7_17.0.0/clementine/scripting_accessresults.dita

 

Thanks to Julian Clinton who provided the scripts for this post.



#scripting
#SPSSModeler
#WatsonStudio
4 comments
7 views

Permalink

Comments

Thu January 31, 2019 11:56 AM

I've tried to do it on Timeseries node, but the .xml file is empy. Why?
My version is SPSS Modeler 18.
Thanks in advance

Thu June 08, 2017 02:34 PM

Hi, thanks for you post.
I get this error with your code,

values = cm.getValuesList(“//ParamMatrix/PCell”, [“parameterName”, “beta”], False)--> Error: AEQMJ0100E: scripts error: invalid byte 2 of 4-byte UTF-8 sequence

I try to fix but I can´t.

Its SPSS Modeler 18. Can you help me? Thanks

Tue September 08, 2015 08:34 PM

Oh, I found the R2 in the PMML, but only for the Linear node, and not for the Regression node (which had a few outputs only).

Tue September 08, 2015 03:53 PM

What about the R2? It is not included in this model output.