SPSS Modeler 18.1 introduced five new nodes that allows you to embed Python and R code in a Modeler Stream. With these nodes you can extend and embrace open source in SPSS Modeler, to perform tasks you can’t easily accomplish with out-of-the-box Modeler nodes.
 | Run R or Python scripts to import data |
 | Takes data from a node upstream and apply transformations to the data using R or Python scripting |
 | Run R or Python scripts to build and score models |
 | Run R or Python scripts to display text and graphical outputs on screen or output to file |
 | Run R or Python scripts to export data |
IBM SPSS Modeler Extension Nodes – PythonPython Execution Environment
- Data will be presented in the form of a Spark DataFrame
- The IBM SPSS Modeler installation includes a Spark distribution (for example, IBM SPSS Modeler 18.1.1 includes Spark 2.1.0)
- IBM SPSS Modeler includes a Python distribution
- If you plan to execute Python/Spark scripts against IBM SPSS Analytic Server, you must have a connection to Analytic Server, and Analytic Server must have access to a compatible installation of Apache Spark
Programming Framework - PythonIn general, to pass data from an extension node to a node downstream, you must set the schema or data model of the data and set the data in a Spark data frame.
The Python code doesn't just run upon execution. It runs any time the output data model of the node is requested. This includes when the Apply or OK button is clicked on the node dialog, when the tab is switched and when the node is connected. Hence, prior to the execution of the node, when the data is not yet available to be passed downstream, the output data model must be defined. See sample code below.
The second half of the sample code builds the output dataframe on execution of the node. The syntax in
blue is SPSS Modeler specific.
Extension Transform Node – PythonWith the Extension Transform node, you can take data from a node upstream and apply transformations to the data.

In this example,
Extension_Transform_Example_Python.str, we will apply the regexp_ extract() function to extract the error code from a string.
import spss.pyspark.runtime
from pyspark.sql.types import *
from pyspark.sql.functions import regexp_extract
pattern = r'Program Ref\. \/[A-Z0-9_]+\/([A-Za-z0-9_]+)\/'
asContext = spss.pyspark.runtime.getContext()
inputSchema = asContext.getSparkInputSchema()
# Get existing schema, add a new field to create a new output schema
outputSchema = StructType(inputSchema.fields + [StructField('Code', StringType(), True)])
asContext.setSparkOutputSchema(outputSchema)
if not asContext.isComputeDataModelOnly():
# Get existing dataframe and add a new column to the dataframe
inputData = asContext.getSparkInputData()
outputData = inputData.withColumn('Code', regexp_extract("Event Description", pattern, 1))
asContext.setSparkOutputData(outputData)
Extension Import Node – PythonWith the Extension Import node, you can execute Python scripts to import data.

In this example,
Extension_Import_Example_Python.str, we will import a csv file from a github repository. Since this is a source node, you must construct the output schema from scratch. The csv file has two fields.
import wget
import spss.pyspark.runtime
from pyspark.sql.types import *
asContext = spss.pyspark.runtime.getContext()
sqlContext=asContext.getSparkSQLContext()
fieldList = [
StructField('ID',IntegerType(),True),
StructField('CHURN',StringType(),True)
]
outputSchema = StructType(fieldList)
asContext.setSparkOutputSchema(outputSchema)
if not asContext.isComputeDataModelOnly():
#download file from github
url_churn='https://raw.githubusercontent.com/SidneyPhoon/Data/master/churn.csv'
wget.download(url_churn,"/Data")
#read file
churn_df= sqlContext.read.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
.option('header', 'true').option("inferSchema", "true").load("/Data/churn.csv")
# return the output DataFrame as the result
asContext.setSparkOutputData(churn_df)
Extension Model Node – PythonBuild and score models with the Extension Model node.

In this example, Extension_Model_Example_Python.str, we will build a Spark ML RandomForestClassifier model, to predict Mortgage Default.
In the extension node, review the syntax in the
Python model building syntax, the code builds the model saves the model.
modelpath = ascontext.createTemporaryFolder()
model.save(modelpath)
ascontext.setModelContentFromPath("model",modelpath)
Review the syntax in the
Python model scoring syntax, the code builds the output schema to include the new fields to be passed downstream, loads the model and scores the data in the input dataframe.
indf = ascontext.getSparkInputData()
model_path = ascontext.getModelContentToPath("model")
model = PipelineModel.load(model_path)
# compute the scores
r1 = model.transform(indf)
Extension Output Node – PythonWith the Extension Output node, you can display text and graphical outputs on screen or output to file.

In this example,
Extension_Model_Example_Python.str, the Python syntax converts the Spark datafame into a Pandas dataframe, apply the describe() to the Pandas dataframe and prints the results. Since this is a terminal node, there is no need to set output schema nor output dataframe.
indf = ascontext.getSparkInputData()
#display summary statistics
print indf.toPandas().describe()
IBM SPSS Modeler Extension Nodes – RR Execution Environment
- You must have installed SPSS Modeler Essentials for R and the supported version of R before running R code in the extension nodes (see ModelerRInstall.pdf)
- All R nodes are independent global R environments. Therefore, using library functions within the two separate R extension nodes requires the loading of the R library in both R scripts
Programming Framework - RThere are three important reserved R objects
- modelerDataModel – defines the data model or schema in the R dataframe
- modelerData – defines the R dataframe to be passed downstream to the next node
- modelerModel – defines the R model built in the Extension Model node
In general, to pass data from an extension node to a node downstream, you must set the schema of the data to the reserved word,
modelerDataModel, and set the data in a R data frame to the reserved word,
modelerData. The syntax in Blue is SPSS Modeler specific.
Extension Import Node – RWith the Extension Import node, you can execute R scripts to import data. In this example,
Extension_Import_Example_R.str, we will import a csv file from a github repository.
URL<-“https://raw.githubusercontent.com/SidneyPhoon/Data/master/churn.csv"
destfile<-"/Data/churn.csv"
download.file(URL, destfile)
churn_df <- read.csv(file="/Data/churn.csv", header=TRUE, sep=",")
#create a new variable
modelerData<-churn_df
#define the metadata of the new field
ID<-c(fieldName="ID",fieldLabel="ID",fieldStorage="integer",fieldMeasure="",fieldFormat="",fieldRole="")
CHURN<-c(fieldName="CHURN",fieldLabel="CHURN",fieldStorage="string",fieldMeasure="",fieldFormat="",fieldRole="")
#add the metadata of the new field to the existing metadata
modelerDataModel<-data.frame(ID,CHURN)
Extension Transform Node – RWith the Extension Transform node, you can take data from a node upstream and apply transformations to the data. In this example,
Extension_Transform_Example_R.str, we will retrieve weather data from the Weather Underground website and pass it downstream.
Review the R syntax in the extension node. The R syntax builds the output dataframe and sets it to the reserved word,
modelerData. The R syntax creates new fields and the
modelerDataModel.
Extension Model Node – RBuild and score models with the Extension Model node.
In this example,
Extension_Model_Example_R.str, we will build a Logistic Regression model, to predict Loan Default. The model is set to the reserved word,
modelerModel.
In the extension node, review the syntax in the
R model building syntax.
modelerModel <- glm(default~employ+address+debtinc+creddebt+alldebt,data= modelerData,family=binomial())
Review the syntax in the
R model scoring syntax, the syntax scores the data, builds the output schema to include the new field, defaultpropensity.
result<-predict(modelerModel, type="response") # predicted values
modelerData<-cbind(modelerData,result)
var1<-c(fieldName="defaultpropensity",fieldLabel="",fieldStorage="real",fieldMeasure="",fieldFormat="",fieldRole="")
modelerDataModel<-data.frame(modelerDataModel,var1)
Extension Output Node – RWith the Extension Output node, you can display text and graphical outputs on screen or output to file.


In this example,
Extension_Model_Example_R.str, we aggregated the debt-to-income ratio (debtinc), and plotted the value by “Default Group”. Since this is a terminal node, there is no need to set the output data model nor output dataframe.
Troubleshooting TipsWrite and debug Python/R code in an IDE such as Jupyter Notebook or R Studio. Paste the working code into the extension nodes and set the necessary output data model, output dataframe and model variables.
If you are running the Modeler stream in Modeler Server, you must install
SPSS Modeler Essentials for R and the supported version of R in Modeler Server, before running R code in the extension nodes
Supporting Information#datascience#extensionnodes#predictiveanalytics#Programmability#python#R#SPSS#SPSSModeler#WatsonStudio