Watson Studio

 View Only

Run decision trees on Big Data

By Armand Ruiz posted Wed October 14, 2015 09:25 PM

To close these series of posts about the new algorithms of IBM SPSS Modeler 17.1, today is the turn of Tree-AS. The Tree-AS node can be used with data in a distributed environment to build CHAID decision trees using chi-square statistics to identify optimal splits.

The pre-existing tree algorithms (CHAID, QUEST and C&RT) can also be used in conjunction with Analytic Server but only through PSM (pass, stream, merge) to create multi-threaded split or averaged ensemble models. Tree-AS truly parallelizes the building of a single model

Tree-AS Supports:

  • CHAID or Exhaustive CHAID models

  • Binary, Categorical and Numeric targets

  • PMML and SQL Generation

  • and is scoreable via the database Scoring Adapters

It is similar to the existing CHAID node, but can scale better to large numbers of records, although doesn’t support all of the same features (e.g. there is no Interactive mode)