Today we are releasing Modeler version 18. There a quite a number of important changes and improvements in this version. We have four groupings of changes – Big Data Algorithms in Modeler, changes that continue Extend and Embrace the Value of Open Source, Platform Flexibility and other changes. Big Data Algorithms in Modeler
Over the past a year, a number of algorithms were added to Modeler but with the restriction that they only run with Analytic Server –which is the connector from Modeler to Hadoop. In version 18, all six of these algorithms are now available in Modeler with any type of data. The algorithms include
• Random Trees – a popular method in the data science community that involves taking a C&R Tree model with bagging and then only consider a sampling with replacement of variables for each split of the tree
• Tree-AS which is based on CHAID
• GLE – which incorporates a number of regression methods
• Linear-AS which performs linear regression
• Linear Support Vector Machines
• Two-Step-AS clustering
An important feature of all these algorithms is that they are multi-threaded
– i.e. a single build can use more than one core. This will improve model build times for large data sets and make better usage of data resources. GLE and Linear SVM support regularization
which prevents overfitting by penalizing models with extreme parameter values. Finally, Tree-AS and Linear SVM have behind the scenes data preparation
that will automatically handle common data issues
We have also added a big data algorithm in Modeler version 18 not present in version 17.1– a new version of the time series algorithm. Like the old version, it supports three methods of forecasting exponential smoothing, ARIMA and expert Modeler. In version 18, time series will run in Analytic Server and support multi-threading. In addition, the new algorithm supports split modeling. In Modeler, a variable can be defined as a split variable in the type node – with the result that supported algorithms will then produce a separate model for each split. With version 18, time series can be added to this list of supported algorithms.
Extend and Embrace the Value of Open Source
For many years we have been extending and embracing the value of open source. As you can see in this community, we have many open source extensions that allow non-programmers to run open source programs to do anything from modeling to different graphs to getting different types of data. We started extension in version 16 with R extensions. In version 17.1, we added Python with Spark extensions but required them to run in Analytic Server. Now with version 18, Python with Spark extensions will run natively in Modeler. We have also included Spark within the Modeler download so that any Python code can access Spark machine learning libraries – note that a Python 2.x must be installed separately. The distribution that we have used in testing is Anaconda found at https://www.continuum.io/downloads.
With this change, all Modeler users can now run Python extensions. They can invoke the Spark machine learning libraries that include many algorithms not found in Modeler such as gradient boosted trees. If the appropriate Python libraries are installed, data scientists can also invoke common Python machine learning libraries such as num-py, scipy, scikit-learn and Pandas.
We have also made it easier now to get extensions from the community. Using the new Extensions menu item, Modeler users can now invoke an Extension hub. With this hub, users can identify, download and install extensions without having to go to Github and manually transfer file.Platform Flexibility
We have added a couple of links in the Help menu to this community – particularly to the forums and the community help page.
Modeler Personal and Professional will be available on Mac OS with version 18. In addition, all versions of Modeler 18 support Windows 10.Other Changes
Modeler 18 extends its in-database mining capabilities to include DB2 in Z/os or IDAA (IBM SB2 Analytics Accelerator). Using a GUI, Modeler customers can now build and deploy models using the Decision Tree, Regression Tree, K-Means, Native Bayes, and Two-step algorithms.
Modeler Premium now includes additional entity analytics capabilities – including the ability to use an external DB2 repository, more than 4 cores and exposing relationships. Please note though that usage for more than 10 million records is no longer recommended. #Algorithms#Programmability#python#Spark#SPSSModeler#WatsonStudio