SPSS Statistics

 View Only
  • 1.  Accounting for Interaction Effects in Regression

    Posted Tue November 12, 2024 08:35 PM

    I have run a series of stepwise regression models for a transit authority, in hopes of identifying priority areas for improvement. Many of my independent variables are highly correlated with one another. Is there a subcommand in regression that will allow me to take into account interaction effects, or perhaps a way to import interaction coefficients from other procedures?

    Thanks in advance

    Bob Walker



    ------------------------------
    Robert Walker
    ------------------------------


  • 2.  RE: Accounting for Interaction Effects in Regression

    Posted Tue November 12, 2024 09:19 PM
    First of all, I would caution you about using stepwise regression.  Although it is very popular, it causes significance levels to be overstated, and it doesn't deal well with multicollinearity.

    You can incorporate interaction terms in the regression procedure by just running the appropriate COMPUTE commands, but this may make things even worse.  The Genera; Linear Models Univariate command can generate interaction terms for the regression without the need to do all those COMPUTE commands.

    I prefer the Shapley Value measures over stepwise regression.  STATS RELIMP is an extension command that includes these measures as an option for identifying the most important variables.  You can install it via Extensions > Extension Hub.  It does not do interaction terms for you, but, as I suggested above, I would be doubtful about those in the face of collinearity unless you have a good theoretical reason for including them.

    You might also include lasso regression, which can help with variable selection, but you would need to read up on this to make sense of the results.


    --





  • 3.  RE: Accounting for Interaction Effects in Regression

    Posted Wed November 13, 2024 08:31 AM

    I support Jon's comments about the perils of "stepwise" variable selection methods.  More info about those perils can be found here:

    • https://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
    • https://discourse.datamethods.org/t/author-checklist/3407#arrow_rightuse-of-stepwise-variable-selection-24

    I hope this helps.



    ------------------------------
    Bruce Weaver
    ------------------------------



  • 4.  RE: Accounting for Interaction Effects in Regression

    Posted Wed November 13, 2024 10:36 AM

    Thanks Bruce - much appreciated. I especially liked the last two reasons in the Stata documentation about stepwise:

    It allows us to not think about the problem.
    It uses a lot of paper.

    Good to have a sense of humor!

    Bob Walker



    ------------------------------
    Robert Walker
    ------------------------------



  • 5.  RE: Accounting for Interaction Effects in Regression

    Posted Wed November 13, 2024 10:32 AM

    Thank you Jon, super helpful! The GLM Univariate approach with full factorial option for the covariates worked nicely to capture interaction effects, which (compared to stepwise without them) were minimal. I especially liked the STAT RELIMP procedure, because I can scale the Shapely values to sum to 100% and get relative contributions. From what I understand, Shapely values capture interactions and dependencies that regression weights do not? Regardless, this approach will be easier for the client to digest.

    Thanks again,

    Bob Walker



    ------------------------------
    Robert Walker
    ------------------------------



  • 6.  RE: Accounting for Interaction Effects in Regression

    Posted Wed November 13, 2024 12:14 PM
    The Shapley Value calculation looks at the contribution of each regressor when added to all combinations of the other regressors, so the biggest numbers go with the variables that are most important across all other possibilities.

    You also get a table that shows the estimated effect of each regressor when included in models of all different sizes, i.e., regressors, so you have an idea of how sensitive the estimated effect is across all possible regressions.

    I like to look at the importance table and then eliminate the variables low on the list.


    --