SPSS Statistics

 View Only
Expand all | Collapse all

binary logistic regression - problem with predition equation

  • 1.  binary logistic regression - problem with predition equation

    Posted Tue October 19, 2021 03:30 PM
    Since July my MSc student is having trouble with her binary logistic regression. SHe is now picking this back up so she can submit her thesis. Maybe someone here can help?

    She is assessing the use of particular characters to infer sex. Her variables are ordinal (the scores for the development of said features) and her outcome binary is male or female. The analyses work well for the regression itself but we seem unable to predict the sex of an "unknown". I would expect to write a prediction equation to determine sex with a cut off of 0.5 (which is what we set in SPSS). The problem is that the prediction never works. In the Discriminant Analysis this works well but not in the Binary logistic regression - which is what her data requires her to do rather than a DF. What are we doing wrong?

    So she creating the dummy variables (parameter coding in the log regr window).

    I would expect the prediction equation to be = log(p/1-p) = b0 + b1*x1 + b2*x2 + b3*x3 + b3*x3+b4*x4

    Her categories become dummy variables (e.g. gonial angle varies from 1-4 so she gets 4 variables this way gonial angle (1), gonial angle (2) etc)

    Below is the equation she worked out this way – then she replaces the variable name with a score of 1 or 0 depending on the ordinal variable.

    sex= 75.8 + (-57.296*(gonialangle1)) + (-37.333*(gonialangle2)) + (-18.504*(externaloccipitalprotuberance1)) + (-55.213*(externaloccipitalprotuberance2)) + (38.256*(externaloccipitalprotuberance3)) + (-57.519*(nuchalmarkeringen1)) + (-1.946*(nuchalmarkeringen2))

    she replaces the variable name with a score of 1 or 0 depending on the ordinal variable.

    Here is is worked out with an actual indiviual's scores:

    SK001sex = 75.8 + (-57.296*1) + (-37.333*0) + (-18.504*1) + (-55.213*0) + (38.256*0) + (-57.519*1) + (-1.946*0)
    SK001sex = 75.8 + (-57.296) + (-18.504) + (-57.519)
    SK001sex = 75.8 -57.296 -18.504 -57.519
    SK001sex = -57.519

    We never seem to get to a value between 0 and 1. Do we need to do something with the log? Are we not doing this correctly?

    Can someone please help us with this?

    All the best,

    Isabelle

    ------------------------------
    Isabelle De Groote
    ------------------------------

    #SPSSStatistics


  • 2.  RE: binary logistic regression - problem with predition equation

    IBM Champion
    Posted Tue October 19, 2021 03:39 PM
    Rather than checking the algebra, why don't you / doesn't she use the Scoring Wizard?  Estimate the equation but specify saving the model as XML.  Then, for the new data, use Utilities > Scoring Wizard, select the model file, fill out the dialog, and it will apply the prediction equation giving you predicted outcomes and/or probabilities.

    ------------------------------
    Jon Peck
    ------------------------------



  • 3.  RE: binary logistic regression - problem with predition equation

    Posted Mon September 11, 2023 10:43 AM

    Hello dear sir. I have a problem regarding scoring wizard. 

    When I derive a logistic regression model, and apply it to a new validation cohort using scoring wizard, SPSS provides a new variables named "selected probability" and "predicted probability". Moreover I calculated the "logit" using the regression function (logit = b0 + b1x1 + b2x2 ...).

    But I figured out the formula of " selected probability = 1 / [1 + e ˄(-logit)] " results in a different value than the selected probability that scoring wizard provides me.

    Do you have an idea why is that happening?



    ------------------------------
    Mehmet Muzaffer
    ------------------------------



  • 4.  RE: binary logistic regression - problem with predition equation

    IBM Champion
    Posted Tue October 19, 2021 03:40 PM
    p.s. Did you use the natural or common log function?

    ------------------------------
    Jon Peck
    ------------------------------



  • 5.  RE: binary logistic regression - problem with predition equation

    Posted Tue October 19, 2021 04:08 PM

    Thank you for your reply. 


    we haven't used any log. 

    for the DFA or Linear regression I never have to use log to make the prediction equation. I only use continuous data in my own research so I'm a bit at a loss of how to use the transformed variables and come up with the predictive value. 


    all the best



    ------------------------------
    Isabelle De Groote
    ------------------------------



  • 6.  RE: binary logistic regression - problem with predition equation

    IBM Champion
    Posted Tue October 19, 2021 04:14 PM
    Well, use the Scoring Wizard as I suggested, but logistic regression is linear in log(p/(1-p)) or use the exponential form.

    --





  • 7.  RE: binary logistic regression - problem with predition equation

    Posted Tue October 19, 2021 04:17 PM
    Edited by System Fri January 20, 2023 04:19 PM
    Hi, Isabelle. Have your student take a look here, and see if it's any help. She might find it most helpful to pay particular attention to method 2, where COMPUTE predprob = 1/(1 + EXP(-z)) is given.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 8.  RE: binary logistic regression - problem with predition equation

    Posted Wed October 20, 2021 04:26 AM
    Thanks, @Rick Marcantonio. this reference is useful!​

    ------------------------------
    Meni Berger
    ------------------------------



  • 9.  RE: binary logistic regression - problem with predition equation

    Posted Wed October 20, 2021 05:47 AM

    Dear Rick

    This is very useful. I am still confused what she needs to use as values in the prediction equation. It is usual in our field to publish a prediction equation for others to use on their own data. So another scientist will need to use his/her own scores to solve the equation and the value will then be the sex estimate. 

    If I understand it well:

    So if we take the example from the link you sent, there were originally 4 categories for the REGION - these have now been turned into region 1, region 2, region 3, and the reference "region" (for those from the 4th category)? Region does not have a coefficient because it is the reference so this will never appear in the equation and individuals who were in the reference category will thus always use a value of 0 if region 1,2 or 3 appears in it? 

    REGION
    REGION(1) 2.549343
    REGION(2) 2.019601
    REGION(3) -.281382

    So, if the equation has an individual from region 2, then we replace the value of region 2 with value "1" and replace region 1 and 3 with value "0"? 

    In our example this would look like: 
    sex = 75.8 + (-57.296*1) + (-37.333*0) + (-18.504*1) + (-55.213*0) + (38.256*0) + (-57.519*1) + (-1.946*0) 

    We then insert the value of sex (z) into this equation: 1/(1 + EXP(-z))  this should return a value between 1 and 0?

    Thank you!

    Isabelle




    ------------------------------
    Isabelle De Groote
    ------------------------------



  • 10.  RE: binary logistic regression - problem with predition equation

    Posted Wed October 20, 2021 08:12 AM
    Take the example literally. Assume the variables in the logistic equation are:

    GA=GonialAngle
    EOP=ExternalOccipitalProtuberance
    NM=NuchalMarkingen
    and the Constant is 75.8.

    Her syntax would be:

    COMPUTE z= 75.8-57.296*(GA=1)-37.333*(GA=2)-18.504*(EOP=1)-55.213*(EOP=2)+38.256*(EOP=3)-57.519*(NM=1)-1.946*(NM=2).
    COMPUTE predprob = 1/(1 + EXP(-z)) .
    COMPUTE predcat = (predprob > .5).
    EXECUTE.

    Note that she actually enters "GA=1" because that resolves to 0 or 1 given the value of GA for each particular case. There is no need to create her own dummy variables.


    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------