SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only
  • 1.  Propensity Score Matching 1:n Matching

    Posted yesterday

    Hello, Dear community!

     

    I am struggling with the Propensity Score Matching add-on. The instructions in the help section are not clear on the proper way to make 1 :n matching (1:2 or 1:3).

     

    The internet has been less than useful, and there is no definitive guide on the application of this practice in SPSS Statistics.

     

    Would anyone happen to have some insight about the best way to conduct a PSM with 1:n Matching? Any special definitions?

     

    Thanks!

     

    Meni Berger |

    Data Scientist

     

     



  • 2.  RE: Propensity Score Matching 1:n Matching

    Posted yesterday
    There are two extensions for PSM.  Data > Propensity Score Matching only provides 1:1 matches.  Data > Case Control Matching provides 1:N matching.
    The number of names in the Names for Match ID field determines the number
    of matches for each demander variable.  (Only one name
    can be specified if an additional output dataset is created.)

    The PSM dialog provides a simplified matching procedure and uses a logistic regression to calculate the match variable.  CCM uses a bounding box on the match variables rather than a logistic regression for the match distances.  You list a tolerance for each variable used in the match.  That should be zero for categorical variables.  If you want to use the logistic regression approach with CCM, you would run the logistic regression first and save the propensities and then enter that as the single match variable.

    Both procedures rely on the FUZZY extension command for the actual matching.  FUZZY was updated in 2023 to improve the matching algorithm, so if you have an old version installed,  you should update it from the Extension Hub.


    --





  • 3.  RE: Propensity Score Matching 1:n Matching

    Posted yesterday

    Thank you for your response, Jon.

     

    What is a 'Bounding box'? How does it differ from the PSM approach?

     

    Also, do you have a sample syntax of the  CCM process? Maybe a simple guide on how to set up the dialog box for 1:n matching in CCM?

     

              

    Meni Berger |

    Data Scientist and Head of Tech  Support

    Email  -  Meni@genius.co.il

    11 Menachem Begin st.,  Ramat Gan

    www.genius.co.il

    Click here to open a support ticket  

    Title: LinkedIn - Description: image of LinkedIn icon

     

     






  • 4.  RE: Propensity Score Matching 1:n Matching

    Posted yesterday
    I'm not clear on where the confusion lies, but here's an example.  Let's suppose that you want to match on age, income, and gender.  In the logistic regression PSM approach, it estimates the probability of a case as a model with those three variables, so the matches are based on closeness of the predicted probabilities computed from those variables weighted by the logistic regression coefficients (transformed through the logit function).  So differences in the predictors increase the difference in the predicted probabilities for the pair and make matching those cases less likely, but there is no maximum difference in the input variables beyond which a match is not allowed.

    With the CCM procedure, instead, you specify the bounds for the match based on differences in those three variables.  For example, you might specify
    1 500 0
    where 1 is the maximum age difference, 500 is the income difference and 0 indicates exact match on gender.  You enter those three values in the match tolerances field separated by blanks in the same order as the variables are entered.

    Any case-control comparison where any of the variable differences are outside the specified bound in either direction are not eligible for a match. Then the difference is computed for each eligible pair, and the control with the smallest differences among the available cases is assigned as the match.  The exact formulas for computing the difference are detailed in the dialog and syntax help.  There are three possible distance measures that you select from on the Options subdialog.

    For multiple matches, this process is repeated as many times as there are names in the Names for Match variables field, taking into account whether sampling with or without replacement.  The names are entered in that field separated by blanks.

    I hope that clears things up.


    --