Algorithm Introduction
Chi-square automatic interaction detection (CHAID), is one of most popular decision tree techniques (proposed by Kass 1980). It is a classification method for building decision trees by using chi-square statistics to identify optimal splits.
The chi-squared test (written as χ2 test), is used to determine whether two categorical fields are independent. If the fields are not independent, they are associated.
CHAID is a tool used to discover the relationship between variables. CHAID analysis builds a predictive model, by determining how variables(predictors) best merge to explain the outcome in the given dependent variable(target).
In CHAID analysis, nominal, ordinal, and continuous data can be used. Continuous predictors are split into categories with an approximately equal number of observations. CHAID creates all possible cross tabulations for each categorical predictor until the best outcome is achieved and no further splitting can be performed.
In the CHAID technique, we can visually see the relationships between the split variables and the associated related factor within the tree. The development of the decision, or classification tree, starts by identifying the target variable or dependent variable, which is considered as the tree root. CHAID analysis splits the target into two or more categories that are called the initial, or parent nodes, and then the nodes are split using statistical algorithms into child nodes.
In practice, CHAID is often used in the context of direct marketing to select groups of consumers and to predict how their responses to some variables affects other variables. Other early applications were in the field of medical and psychiatric research.
SPSS CHAID Features and Strengths
Unlike the C&R Tree and QUEST nodes, CHAID can generate nonbinary trees, meaning that some splits have more than two branches. It therefore tends to create a wider tree than the binary growing methods. CHAID works for all types of inputs, and it accepts both case weights and frequency variables.
Main Features:
- Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits by comparing the adjusted p-values for each predictor but it takes longer to compute.
IBM SPSS CHAID Strengths
- Traditional CHAID grows on a centralized, not distributed data set. But IBM SPSS CHAID can handle a plurality of data sources -- including centralized or distributed.
- Distributed IBM SPSS CHAID can generate the same single tree as that in the centralized case
- This is better than building trees based on data samples.
- This is better than using ensemble models in terms of interpretation.
- It has the capability to deal with large data sets.
Example of a Descriptive Model:
Airport Customer Manager, Mike finds that the airport is losing customers, he wants to know the customers’ satisfaction level and which factors affect customer’s satisfaction. He selects the IBM SPSS CHAID node to help him.
He performs an online airport customer survey and gets survey results as shown below:
Stream: AirportSurvey.csv

Using IBM SPSS Modeler, Mike creats a CHAID and a C&RT model in a stream, to analyze which predictors are significant for customer overall satisfaction. (For the purpose of this example, the maximum tree depth is set to 3).
AirportSurvey.str

By examining the CHAID model, Mike finds "Airport Rating", "Ease of Finding Way Rating" and "Directional Signs Rating" to be the 3 most import predictors for customer overall satisfaction.

Since all of these items are categorical predictors, some further questions are:
1. What is the relationship between these predictors?
2. For those who rated airport as "Good", what's predictors are important? are they the same one for people rating the airport as "Fair"?
3. Is the analysis result reliable?
Regarding the above questions, Mike opened the CHAID viewer chart, and found that when the airport rating is Good, "Directional Signs Rating" is the next important predictors for overall satisfaction. It is different when the rating is Fair, in such a case, the next important predictor is "Ease of Finding Way Rating".

To valid the finding, Mike built one more model using the CART node, and found the result are consistent

(Example End)
SPSS CHAID available in
Product integration with UI
Spark and Python API
Learn more
Here's a micro-class video of CHAID introduction for University channel, main contents
- SPSS CHAID extended introduction.
- Usage case demo for airport customer satisfaction analysis
#GlobalAIandDataScience#GlobalDataScience