Watson Studio, Watson ML, Watson OpenScale

 View Only

Spark + SPSS Modeler: Boosted Trees, K-Means, and Naive Bayes

By Archive User posted Mon April 11, 2016 04:28 PM

We are excited to announce the release of 3 new extensions for SPSS Modeler using MLlib implemented algorithms and PySpark. These three extensions are Gradient-Boosted Trees, K-Means Clustering, and Multinomial Naive Bayes. Niall McCarroll, IBM SPSS Analytic Server Software Engineer, and I developed these extensions in Modeler version 18, where it is now possible to run PySpark algorithms locally. This means that users who have Modeler 18 with Server Enablement can use these extensions to build models using local data or distributed data in a Spark cluster on Analytic Server.

  • Gradient-Boosted Trees - Supervised learning algorithm that can be used for either binary classification or regression tasks. Learn more about the implementation here.


  • K-Means Clustering - Unsupervised clustering technique accepting a user defined number of clusters (k). Learn more about the implementation here.


  • Multinomial Naive Bayes - Supervised learning variation of Naive Bayes used for classification. The inputs used for this algorithm should be frequencies. A classic example is using a term-document frequency matrix to perform document classification. Learn more about the implementation here.


Ready to get the extensions and try them out? Great! Search for the extensions by name in the Extension Hub in Modeler 18, or visit the repository for each extension: