SPSS Statistics

 View Only

SPSS Extends Spark Support to Cloudera & MapR

By Archive User posted Mon December 21, 2015 09:17 PM

IBM this week released an update to IBM SPSS Analytic Server 2.1 that extends SPSS’ support for Apache Spark to all four leading commercial software distributions based on Apache Hadoop.

Analytic Server builds on the Spark integration first released in late September by adding support for Cloudera CDH and MapR to the support previously provided for IBM BigInsights and Hortonworks HDP.

Spark integration delivers a number of benefits to users conducting predictive analytics with IBM SPSS Modeler in these Hadoop environments, including:

Running analytics faster
Complex workloads complete significantly faster in Spark compared to Hadoop Map/Reduce. This is particularly true for workloads containing iterative operations and those that chain multiple operations together since Spark caches data in memory rather than reading from and writing to disk for each task. If Spark is present, Analytic Server will automatically leverage it to process Modeler operations that previously pushed down to Map/Reduce. Otherwise, if Spark is not present, processing falls back to Map/Reduce. All of this happens transparently, ensuring the best possible performance for users without them having to write code or otherwise specify where their jobs will run.

Enabling users to be more productive
Since Spark processes jobs faster and more efficiently, it enables users to be more productive. Users are able to build predictive models faster, conduct more experiments in less time and build multiple predictive models without waiting for the system. The net result is that organizations can solve more use cases more rapidly thereby lessening time to value and further increasing the ROI of their predictive analytics investment.

Democratizing predictive analytics
SPSS is known for its ability to extend the benefits of predictive analytics to users who do not want to program. Integration with Spark takes this advantage to an entirely new level. Now SPSS Modeler users have access to a broader library of analytic algorithms that delivers solutions to even more use cases. In addition to the SPSS algorithms that now run in Spark, Data Scientists can build extensions that leverage Python or exploit more than 15 algorithms available from Spark’s MLlib machine learning library and share those extensions with non-programmer Data Scientists.

To help you get started, we’ve already posted a couple of sample Spark MLlib extensions – for Collaborative Filtering and Page Rank – in the IBM SPSS Predictive Analytics Gallery. They are freely available for download.

Beyond the advantages of Spark integration, customers who run Cloudera and MapR also have access now to the expanded set of predictive analytics algorithms that are included in Analytic Server 2.1. Specifically, Analytic Server 2.1 added Random Trees, CHAID, Linear, Generalized Linear, and Linear Support Vector Machines algorithms to the library of big data enabled algorithms previously supported in SPSS Modeler.

For a detailed overview of all of the Big Data algorithms released in SPSS during 2015, see Steve Barbee’s recent post, SPSS Algorithms Optimized for Apache Spark & Spark Algorithms Extending SPSS Modeler.