SPSS Statistics

 View Only
Expand all | Collapse all

Logistic regression and assumptions

  • 1.  Logistic regression and assumptions

    Posted Thu September 05, 2024 07:34 AM

    Hello Groupe!

    I Have questions about the Logistic regression model assumption:

    • How does one verify that the expected value of the error term is zero? is it that important?
    • How does one verify no correlation between the error and the independent variables? how does it take form if some of the independent variables are categorical?
    • Is it acceptable to use VIF with categorical variables?

    Thanks!



    ------------------------------
    Meni Berger
    ------------------------------


  • 2.  RE: Logistic regression and assumptions

    Posted Thu September 05, 2024 12:13 PM

    Hello, Meni.

    About your questions:

    1) You can compute the model residuals and build a scatterplot with another variable (e.g., the case ID). In logistic regression models, due to the non-conventional form of the error, the assumption of lack of significative multicolinearity between predictors is more important than that one. The model accuracy itself is also more important.

    2) The first part is easy when you compute the residuals. When some predictor is categorical, you should explore some descriptive statistics of the error by category of this predictor and conclude if the values are too different when you compare the categories.

    3) To me, it makes no sense because VIF is the inverse of the tolerance that is, by itself, a R-squared derived measure.



    ------------------------------
    Estefano Souza
    ------------------------------



  • 3.  RE: Logistic regression and assumptions

    Posted Fri September 06, 2024 07:52 AM
    The zero residual mean assumption, which applies to linear regression, is automatically satisfied as long as the regression has a constant term or equivalent for factors.  Logistic estimation minimizes a different function.  You should look at the logistic test by segment for fit to verify that the logistic form is appropriate.

    The assumption, again for linear regression, of independence of the error term and the regressos can never be assessed from the estimation results, because the estimation process guarantees that residuals and regressors are uncorrelated.  You justify the assumption from theory or build a causal aka simultaneous equation model to estimate such a relationship.  Econometrics provides many ways to estimate such models such as two stage least squares.





  • 4.  RE: Logistic regression and assumptions

    Posted Fri September 06, 2024 10:45 AM

    In logistic regression, the errors follow a binomial distribution, since the outcome is binary. The error term is not independent and identically distributed (i.i.d.) normal, and its mean is not expected to be zero, So for your first question I do not think that we should  verify that the expected value of the error term is zero. It may not be important 



    ------------------------------
    Anil Deshpande
    ------------------------------



  • 5.  RE: Logistic regression and assumptions

    Posted Fri September 06, 2024 10:45 AM
    • Logistic regression, the error structure is different due to the binary outcome, but it's still important to ensure that the independent variables are not correlated with the error term .
    • For second question you can use deviance residuals or Pearson residuals, net net  link tests, and multicollinearity checks should help you
    • You can use boxplots or strip plots of the residuals for each category (level) of the categorical variable. 
      You should observe that the residuals are centered around zero and do not display any systematic pattern or differences across categories.
      now if residuals for one or more categories consistently deviate from zero, this could indicate correlation between the errors and the independent variable, suggesting possible misspecification of the model. The third one yes you can but Iam assuming you are using dummy variables and in such case if VIF > 10 is a concern, it indicates that this dummy variable is highly collinear with other predictors, which could cause problems in estimating the model.  
      Where are we using this Linear regression model ? Is it finance or Mechanical engg ? 


    ------------------------------
    Anil Deshpande
    ------------------------------



  • 6.  RE: Logistic regression and assumptions

    Posted Tue September 10, 2024 09:02 AM

    Thank you, Anil (and also Jon and Estefano !)

    I am updating my lecture material for logistic regression and delving deeper into my old assumption-checking part. I am trying to seek what's relevant and make the presentation more informative.

    do you have any examples for the boxplots or strip plots of the residuals?



    ------------------------------
    Meni Berger
    ------------------------------



  • 7.  RE: Logistic regression and assumptions

    Posted Tue September 10, 2024 12:37 PM
    I am away from my computer but will send some thoughts next week when I return.--





  • 8.  RE: Logistic regression and assumptions

    Posted Wed September 11, 2024 09:37 AM

    Thanks. I appreciate it very much.



    ------------------------------
    Meni Berger
    ------------------------------



  • 9.  RE: Logistic regression and assumptions

    Posted Sun September 15, 2024 12:01 PM
    Here is the dialog help for the STATS NTILE ANALYSIS extension command that you might be interested in.

    file:///C:/Users/jkpec/AppData/Roaming/IBM/SPSS%20Statistics/one/CustomDialogs/STATS_NTILE_ANALYSIS/STATS_NTILES_ANALYSISstripped.htm

    More later

    --





  • 10.  RE: Logistic regression and assumptions

    Posted Mon September 16, 2024 05:42 AM

    I came looking for copper and found gold! thanks. I can't wait to see what more you have in stock.

    STATS NTILE ANALYSIS is instrumental in evaluating the Logistic model classification gain and lift.

    I read Ridhima Kumar's article with great pleasure. following his instructions, I noticed that when I tried to recreate his instructions using STATS NTILE ANALYSIS, the Ntile Analysis table was sorted ascending and not descending, which is odd.

    also, the Target Response Rate % has values that do not concur with the way Ridhima calculated the % of Responders in his article. the values are the proportion of the response from the same Ntile, while the % of Responders in the article are calculated from all Ntiles.

    Maybe I am missing something, but the gain chart is upside down??



      ------------------------------
      Meni Berger
      ------------------------------