SPSS Statistics

 View Only
Expand all | Collapse all

Multiple Imputation

Courtney B Francis

Courtney B FrancisTue September 20, 2022 12:03 AM

  • 1.  Multiple Imputation

    Posted Mon September 19, 2022 01:04 PM
    Hello!

    I have attempted to run Multiple Imputation on my dataset. After the imputation ran I still had missing data in the pooled data set.

    Do you know what might have caused this issue? I'm not sure if this is common or if there is an issue in my syntax or Data that may be creating this issue. Please see my syntax below:

    SET THREADS = 4.
    USE ALL.
    FILTER OFF.
    SORT CASES BY YEAR SUBJID.
    EXECUTE.
    SET SEED=20220913.
    MULTIPLE IMPUTATION
        INSTDICO
        INSTCONT
        WHITE
        BLACK
        Brown
        Black_Brown_Total
        BLACK_BROWN_AUTISM
        MINORITIZED_POC
        RECODEAUTISM
        HSGPA
        COLLEGE_INVOLVEMENT
        HABITS_OF_MIND_GRP
        ACADEMIC_SELFCONCEPT_GRP
        SOCIAL_SELFCONCEPT_GRP
        DEGASPDICO
        SEX
        FIRSTGEN
        INCOME
        ACT_FINAL
        BLACK_BROWN_XINCOME
        BLACK_BROWN_XHSGPA
        BLACK_BROWN_XHOMG
        BLACK_BROWN_XCOLLINV
        BLACK_BROWN_XASCG
        BLACK_BROWN_XSSCG
        BLACK_BROWN_XACT
        BLACK_BROWN_XSEX
        BLACK_BROWN_XFIRSTGEN
        BLACK_BROWN_XDEGASP
        WHITE_XINCOME
        WHITE_XHSGPA
        WHITE_XHOMG
        WHITE_XCOLLINV
        WHITE_XASCG
        WHITE_XSSCG
        WHITE_XACT
        WHITE_XSEX
        WHITE_XFIRSTGEN
        WHITE_XDEGASP
        AUTISM_XINCOME
        AUTISM_XHSGPA
        AUTISM_XHOMG
        AUTISM_XCOLLINV
        AUTISM_XASCG
        AUTISM_XSSCG
        AUTISM_XACT
        AUTISM_XSEX
        AUTISM_XFIRSTGEN
        AUTISM_XDEGASP
        AUTISM_XBLACK_BROWN
        AUTISM_XWHITE
        RECODE_DISAB01
        RECODE_DISAB02
        RECODE_DISAB04
        RECODE_DISAB05
        RECODE_DISAB06
        RECODE_DISAB07
      /ANALYSISWEIGHT STUDWGT
      /IMPUTE METHOD=FCS MAXITER= 100 NIMPUTATIONS=10 SCALEMODEL=LINEAR INTERACTIONS=NONE
        SINGULAR=1E-012 MAXPCTMISSING=NONE MAXMODELPARAM =10000
    /CONSTRAINTS BLACK_BROWN_XINCOME( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XHSGPA( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XHOMG( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XCOLLINV( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XASCG( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XSSCG( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XACT( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XSEX( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XFIRSTGEN( ROLE=IND)
    /CONSTRAINTS BLACK_BROWN_XDEGASP( ROLE=IND)
    /CONSTRAINTS WHITE_XINCOME( ROLE=IND)
    /CONSTRAINTS WHITE_XHSGPA( ROLE=IND)
    /CONSTRAINTS WHITE_XHOMG( ROLE=IND)
    /CONSTRAINTS WHITE_XCOLLINV( ROLE=IND)
    /CONSTRAINTS WHITE_XASCG( ROLE=IND)
    /CONSTRAINTS WHITE_XSSCG( ROLE=IND)
    /CONSTRAINTS WHITE_XACT( ROLE=IND)
    /CONSTRAINTS WHITE_XSEX( ROLE=IND)
    /CONSTRAINTS WHITE_XFIRSTGEN( ROLE=IND)
    /CONSTRAINTS WHITE_XDEGASP ( ROLE=IND)
    /CONSTRAINTS AUTISM_XINCOME( ROLE=IND)
    /CONSTRAINTS AUTISM_XHSGPA( ROLE=IND)
    /CONSTRAINTS AUTISM_XHOMG( ROLE=IND)
    /CONSTRAINTS AUTISM_XCOLLINV( ROLE=IND)
    /CONSTRAINTS AUTISM_XASCG( ROLE=IND)
    /CONSTRAINTS AUTISM_XSSCG( ROLE=IND)
    /CONSTRAINTS AUTISM_XACT( ROLE=IND)
    /CONSTRAINTS AUTISM_XSEX( ROLE=IND)
    /CONSTRAINTS AUTISM_XFIRSTGEN( ROLE=IND)
    /CONSTRAINTS AUTISM_XDEGASP ( ROLE=IND)
    /CONSTRAINTS AUTISM_XBLACK_BROWN( ROLE=IND)
    /CONSTRAINTS AUTISM_XWHITE( ROLE=IND)
     /CONSTRAINTS DEGASP  (RND=1 MIN=0 MAX=1)
      /MISSINGSUMMARIES NONE
      /IMPUTATIONSUMMARIES MODELS DESCRIPTIVES
      /OUTFILE IMPUTATIONS=courtney_syntax_9_14_22.sav FCSITERATIONS=iteration_history. 

    ------------------------------
    Courtney B Francis
    ------------------------------

    #SPSSStatistics


  • 2.  RE: Multiple Imputation

    IBM Champion
    Posted Mon September 19, 2022 01:27 PM
    The MI output tables indicate if values could not be imputed for some variables.  Also, note that the first dataset in the replications is the original data.

    --





  • 3.  RE: Multiple Imputation

    Posted Mon September 19, 2022 01:28 PM
    Hi.

    Do you mean apart from the variables specified in /CONSTRAINTS as ROLE=IND?

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 4.  RE: Multiple Imputation

    Posted Mon September 19, 2022 04:05 PM
    Hi @Rick Marcantonio,

    Thank you for your response!

    All of the variables listed -including the ones in the "/constraints... as Role =IND" have missing data. In all the imputed datasets including the pooled data set. And the amount of missingess is the same across all imputed datasets.




    ------------------------------
    Courtney B Francis
    ------------------------------



  • 5.  RE: Multiple Imputation

    Posted Mon September 19, 2022 04:11 PM
    So then, no values are being imputed at all?

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 6.  RE: Multiple Imputation

    Posted Mon September 19, 2022 04:17 PM
    Hi @Rick Marcantonio,!

    They are being imputed (The amount of cases have increased considerably) and the amount of missingness has decreased from the original dataset, but it was my understanding that once all the variables of interest were imputed, there should no longer be any missingness-especially in the pooled dataset. ​​

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 7.  RE: Multiple Imputation

    Posted Mon September 19, 2022 04:46 PM
    Yes, I understand what you mean.

    I'm curious if the student weight variable (STUDWGT) has any 0 or missing values...

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 8.  RE: Multiple Imputation

    IBM Champion
    Posted Mon September 19, 2022 05:04 PM
    But did the tables that the MI procedure produces show that some variables/values could not be imputed?

    --





  • 9.  RE: Multiple Imputation

    Posted Mon September 19, 2022 05:12 PM
    For example, Courtney, this table. What do you have, for "Not imputed"?



    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 10.  RE: Multiple Imputation

    Posted Mon September 19, 2022 08:17 PM
    Hi Rick!

    That section was blank just as it appears in your image. There were three variables that were "not imputed" due to do missing values" Bur the "Not Imputed (too Many Missing Values) was blank as yours appears above.

    Best,
    Courtney B. Francis

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 11.  RE: Multiple Imputation

    Posted Mon September 19, 2022 08:47 PM
    OK, maybe we've narrowed it down to those 3 variables that are not imputed due to missing values.

    Try adding /MISSINGSUMMARIES OVERALL VARIABLES(MAXVARS=100 MINPCTMISSING=0)

    Do those three variables appear at the top of the list in the Variable Summary Table?

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 12.  RE: Multiple Imputation

    Posted Mon September 19, 2022 09:15 PM
    HI @Rick Marcantonio,

    I realized a typo in my reply to your previous message. I meant to say: There were three variables in the box below labeled: "not imputed (no missing values)" 
    Are you suggesting I add /MISSINGSUMMARIES OVERALL VARIABLES (MAXVARS=100 MINPCTMISSING=0) to the overall syntax and rerun the model.

    Those three variables do not appear at the top of the list in the variable summary table. 


    ------------------------------
    Courtney B Francis
    ------------------------------



  • 13.  RE: Multiple Imputation

    Posted Mon September 19, 2022 10:10 PM
    Well, no, they'd be at the bottom of the list, since they have no missing data.

    I am suggesting that you re-run your original syntax, just please change /MISSINGSUMMARIES NONE to /MISSINGSUMMARIES OVERALL VARIABLES(MAXVARS=100 MINPCTMISSING=0).

    I'm trying to get some idea what the "missingness" looks like in these data.


    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 14.  RE: Multiple Imputation

    Posted Mon September 19, 2022 10:25 PM
    Okay @Rick Marcantonio, I will give that a try by including that line in my syntax and share the results.


    Best,​

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 15.  RE: Multiple Imputation

    Posted Mon September 19, 2022 11:49 PM
    Edited by System Test Fri January 20, 2023 04:45 PM
    Hi @Rick Marcantonio,

    I ran the model again (less iterations and imputation for the sake of time), and I received​ the same situation. I still have missingness in my pooled dataset for ALL variables.


    I used the same syntax, except at the bottom, instead of MISSINGSUMMARIES = NONE, I changed it to: /MISSINGSUMMARIES OVERALL VARIABLES(MAXVARS=100 MINPCTMISSING=0).

    If you have any insight into what the issue is, please let me know!

    Best,


    ------------------------------
    Courtney B Francis
    ------------------------------



  • 16.  RE: Multiple Imputation

    Posted Mon September 19, 2022 11:58 PM
    I can't see the table I want to see - the Variable Summary.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 17.  RE: Multiple Imputation

    Posted Tue September 20, 2022 12:03 AM

    I pasted it below:

    Variable Summary

    Missing

    Valid N

    Mean

    Std. Deviation

    N

    Percent

    ACT_FINAL

    122581

    25.4%

    359361

    24.7110

    16.94311

    WHITE_XACT

    69418

    14.4%

    412524

    13.2600

    40.98339

    RECODE of DISAB01_LEARNING DISABILITY

    63094

    13.1%

    418848

    .0438

    .63238

    RECODE of INCOME (Parental Income)

    49476

    10.3%

    432466

    1.97

    2.875

    RECODE of DISAB01_CHRONIC ILLNESS

    47338

    9.8%

    434604

    .0356

    .57287

    RECODE of DISAB01_PSYCHOLOGICAL DISORDER

    46662

    9.7%

    435280

    .0221

    .45473

    RECODE of DISAB01_OTHER DISABILITY

    46533

    9.7%

    435409

    .0745

    .81162

    RECODE of DISAB01_ADHD

    45443

    9.4%

    436499

    .0567

    .71469

    RECODE of DISAB01_LEARNING DISABILITY

    45336

    9.4%

    436606

    .0328

    .55037

    TFS Likelihood of College Involvement Score

    44385

    9.2%

    437557

    49.4252

    25.04735

    DEGREE ASPIRATIONS DICHOTOMOUS

    42437

    8.8%

    439505

    .7639

    1.31083

    BLACK_BROWN_XACT

    41513

    8.6%

    440429

    2.6916

    22.11014

    WHITE_XINCOME

    36537

    7.6%

    445405

    1.2194

    3.98811

    AUTISM_XACT

    31898

    6.6%

    450044

    .1061

    5.15740

    AUTISM_XINCOME

    31462

    6.5%

    450480

    .0103

    .48015

    AUTISM_XCOLLINV

    31354

    6.5%

    450588

    .2608

    11.01939

    AUTISM_XSSCG

    31210

    6.5%

    450732

    .0095

    .41915

    AUTISM_XASCG

    31203

    6.5%

    450739

    .0117

    .50244

    AUTISM_XHSGPA

    31183

    6.5%

    450759

    .0367

    1.50114

    AUTISM_XHOMG

    31168

    6.5%

    450774

    .0120

    .50833

    RECODE of AUTISM

    31116

    6.5%

    450826

    .0061

    .23762

    WHITE_XDEGASP

    30886

    6.4%

    451056

    .4121

    1.51961

    WHITE_XCOLLINV

    30022

    6.2%

    451920

    27.2186

    76.92240

    AUTISM_XDEGASP

    25605

    5.3%

    456337

    .0036

    .18283

    TFS Social Self-Concept Group

    21697

    4.5%

    460245

    1.92

    2.366

    TFS Academic Self-Concept Group

    20658

    4.3%

    461284

    1.94

    2.226

    AUTISM_XWHITE

    18863

    3.9%

    463079

    .0036

    .18470

    BLACK_BROWN_XCOLLINV

    18023

    3.7%

    463919

    8.5708

    58.87801

    BLACK_BROWN_XINCOME

    17799

    3.7%

    464143

    .2465

    1.86394

    WHITE_XSSCG

    17458

    3.6%

    464484

    1.0970

    3.41879

    WHITE_XASCG

    16917

    3.5%

    465025

    1.1271

    3.43757

    BLACK_BROWN_XDEGASP

    16077

    3.3%

    465865

    .1395

    1.06979

    AUTISM_XSEX

    15996

    3.3%

    465946

    .0018

    .12976

    RECODE of FIRSTGEN (First generation status based on parent(s) with less than 's

    13686

    2.8%

    468256

    .19

    1.217

    BLACK_BROWN_XSSCG

    13200

    2.7%

    468742

    .3558

    2.54229

    BLACK_BROWN_XASCG

    12949

    2.7%

    468993

    .3372

    2.39447

    WHITE_XHOMG

    12777

    2.7%

    469165

    1.1525

    3.51353

    WHITE_XHSGPA

    11872

    2.5%

    470070

    3.8132

    10.48926

    TFS Habits of Mind Group

    11731

    2.4%

    470211

    1.99

    2.335

    BLACK_BROWN_XHOMG

    11189

    2.3%

    470753

    .3650

    2.56174

    AUTISM_XFIRSTGEN

    11000

    2.3%

    470942

    .0008

    .08888

    BLACK_BROWN_XHSGPA

    10521

    2.2%

    471421

    1.0982

    7.29505

    AUTISM_XBLACK_BROWN

    10239

    2.1%

    471703

    .0007

    .08021

    WHITE_XFIRSTGEN

    10102

    2.1%

    471840

    .0649

    .75961

    This is the group of ASAIN & HISPAINIC & OTHER

    9515

    2.0%

    472427

    .1108

    .96923

    This is the Black and Brown combined variable

    9515

    2.0%

    472427

    .1909

    1.21370

    This is the Hispanic Race Code

    9515

    2.0%

    472427

    .1019

    .93436

    This is the first Recode of the Black Variable

    9515

    2.0%

    472427

    .0889

    .87919

    This is the White Race Code

    9515

    2.0%

    472427

    .5818

    1.52343

    BLACK_BROWN_XFIRSTGEN

    8582

    1.8%

    473360

    .0821

    .84733

    What was your average grade in high school?

    5275

    1.1%

    476667

    6.39

    4.275

    WHITE_XSEX

    4808

    1.0%

    477134

    .3077

    1.42634

    BLACK_BROWN_XSEX

    4808

    1.0%

    477134

    .1096

    .96532

    ED INST TYPE (UNIVERSITY =0 & 4 YEAR COLLEGE =1)

    794

    0.2%

    481148

    .5016

    1.54678

    RECODE of SEX (Your sex:)

    0

    0.0%

    481942

    .54

    1.540

    BLACK_BROWN_AUTISM

    0

    0.0%

    481942

    .0007

    .07935

    INSTITUTIONAL CONTROL

    0

    0.0%

    481942

    1.32

    1.447



    ------------------------------
    Courtney B Francis
    ------------------------------



  • 18.  RE: Multiple Imputation

    Posted Tue September 20, 2022 12:39 AM
    OK, I'm just about out of bullets.

    Try this. In a syntax window, paste this and run it:

    COUNT num_missing= INSTDICO INSTCONT WHITE BLACK Brown Black_Brown_Total
        BLACK_BROWN_AUTISM MINORITIZED_POC RECODEAUTISM HSGPA
        COLLEGE_INVOLVEMENT HABITS_OF_MIND_GRP ACADEMIC_SELFCONCEPT_GRP
        SOCIAL_SELFCONCEPT_GRP DEGASPDICO SEX FIRSTGEN INCOME ACT_FINAL
        BLACK_BROWN_XINCOME BLACK_BROWN_XHSGPA BLACK_BROWN_XHOMG
        BLACK_BROWN_XCOLLINV BLACK_BROWN_XASCG BLACK_BROWN_XSSCG
        BLACK_BROWN_XACT BLACK_BROWN_XSEX BLACK_BROWN_XFIRSTGEN
        BLACK_BROWN_XDEGASP WHITE_XINCOME WHITE_XHSGPA WHITE_XHOMG
        WHITE_XCOLLINV WHITE_XASCG WHITE_XSSCG WHITE_XACT WHITE_XSEX
        WHITE_XFIRSTGEN WHITE_XDEGASP AUTISM_XINCOME AUTISM_XHSGPA
        AUTISM_XHOMG AUTISM_XCOLLINV AUTISM_XASCG AUTISM_XSSCG
        AUTISM_XACT AUTISM_XSEX AUTISM_XFIRSTGEN AUTISM_XDEGASP
        AUTISM_XBLACK_BROWN AUTISM_XWHITE RECODE_DISAB01
        RECODE_DISAB02 RECODE_DISAB04 RECODE_DISAB05 RECODE_DISAB06
        RECODE_DISAB07 (MISSING, SYSMIS).
    
    FRE VAR num_missing.
    ​
    I think I would also like to see the correlation matrix of these variables.

    Maybe you could just send me the dataset. 

    marcantr@us.ibm.com

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 19.  RE: Multiple Imputation

    Posted Tue September 20, 2022 08:34 AM
    Should I be running this syntax on the imputed data set? Or the original dataset?

    Best,

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 20.  RE: Multiple Imputation

    Posted Tue September 20, 2022 08:36 AM
    The original data, before imputation.

    Also, it looks like a lot (the majority, perhaps) of these variables are binary (0/1). Is that true?

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 21.  RE: Multiple Imputation

    Posted Tue September 20, 2022 08:39 AM
    Yes!

    majority is binary.

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 22.  RE: Multiple Imputation

    Posted Tue September 20, 2022 09:26 AM
    Courtney;

    I think I  see what is happening. Cases that are completely missing (e.g., let's say that case #5 has no valid data at all for any of the analysis variables) are not imputed. Here is an example. Open a new syntax window and run it.

    preserve.
    *output close all.
    set undefined=nowarn.
    dataset close all.
    new file.
    DATA LIST FREE /id a b c d e f g studwgt.
    begin data.
    01 1 4 3 5 6 7 5 12
    02 8 6 7 5 6 4 . 21
    03 4 5 . 5 4 6 3 10
    04 2 1 4 4 3 6 7 21
    05 . . . . . . . 14
    06 7 7 8 6 8 7 4 11
    07 5 4 6 7 8 9 1 16
    08 7 6 . 7 . 6 5 14
    09 5 3 4 3 2 1 3 16
    10 6 6 5 7 . 4 . 12
    11 3 3 4 2 3 1 2 15
    12 . . . . . . . 20
    end data.
    restore.
    recode a b c e g (lo thru 5=0) (6 thru hi=1).
    variable level a to g (scale).
    dataset declare data.
    MULTIPLE IMPUTATION a to g
    /IMPUTE METHOD=FCS MAXITER= 100 NIMPUTATIONS=10 SCALEMODEL=LINEAR INTERACTIONS=NONE
    SINGULAR=1E-012 MAXPCTMISSING=NONE MAXMODELPARAM =10000
    /MISSINGSUMMARIES OVERALL VARIABLES(MAXVARS=100 MINPCTMISSING=0)
    /IMPUTATIONSUMMARIES MODELS DESCRIPTIVES
    /ANALYSISWEIGHT STUDWGT
    /OUTFILE IMPUTATIONS=data.

    dataset activate data.
    des var all.
    ***.

    Go down to the DESCRIPTIVES output. You will see that the 2 cases with no data receive no imputed values.

    I missed that in the manual but it is there:

    "Cases that have a missing value for each analysis variable are included in analyses of missingness but are excluded from imputation. Specifically, values of such cases are not imputed and are excluded when when building imputation models. The determination of which cases are completely missing is made after any variables are filtered out of the imputation model by the MAXPCTMISSING keyword."


    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 23.  RE: Multiple Imputation

    Posted Tue September 20, 2022 03:42 PM
    Hi again @Rick Marcantonio,

    Based on what you shared from the manual, what is the solution? How do I go about examining for "cases that have a missing value for each analysis variable?" Would I use the sort by function to find all the missing values and then remove those particular cases somehow from my model?

    OR

    Are you saying the MAXPCTMISSING keyword will filter these cases out? If so I'm not seeing were it does that?

    Please let me know your thoughts.

    Please forgive the delay in response!​

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 24.  RE: Multiple Imputation

    Posted Tue September 20, 2022 04:01 PM
    I started suspecting this last night, which is why I asked you to create the COUNT variable. Then I was going to find those completely missing cases by sorting the data in descending order by that variable and looking at it.

    That's an interesting exercise, but that's about all - unless you have good reason to believe that there were some kind of data entry errors and those cases should have some observed data. But that's not a statistical question.

    As for doing something, there really is nothing to "do." If a person gave no data at all for the variables in your imputation model, then they gave no data... that's that. The good news is that you have plenty of data that was complete and/or imputed; more than enough to draw some solid research conclusions. The "empty" cases are causing no harm by being there. Statistically, we cannot give any analysis degrees of freedom it does not deserve by (essentially) "making up" entire cases, no matter how well-intentioned we are in wanting to.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 25.  RE: Multiple Imputation

    Posted Tue September 20, 2022 04:49 PM
    Okay,

    that makes sense @Rick Marcantonio

    So the "missing" data I'm seeing in the pooled cases is just going to "be there"? I guess I've never seen that in all my readings, so it has me a bit alarmed. 


    ------------------------------
    Courtney B Francis
    ------------------------------



  • 26.  RE: Multiple Imputation

    Posted Tue September 20, 2022 04:58 PM
    Don't let it alarm you. You did impute missing data where it could be imputed. It isn't like you did nothing. You did quite a bit!

    Missing CASES are a different story. Those can safely be ignored.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 27.  RE: Multiple Imputation

    IBM Champion
    Posted Tue September 20, 2022 05:24 PM
    It does matter, though, if those cases were systematically missing in a way that relates to the variables of interest.  Obviously, though, you would have to infer that from external facts and empty cases don't talk - even under torture.
    --





  • 28.  RE: Multiple Imputation

    Posted Tue September 20, 2022 05:28 PM
    Yes, that's true. That brings up whether the data are MAR, MCAR, or NMAR.

    MI is going to assume MAR (and then of course MCAR as well).

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 29.  RE: Multiple Imputation

    Posted Wed September 21, 2022 10:23 AM

    Well Thank you Both @Rick Marcantonio and @Jon Peck!

    I appreciate you both supporting me as I try to understand why there are still "missing values"  in my pooled dataset.

    I ran the imputation model again (It's taking such a LONG TIME this time around -if you all have any advice for speeding it up let me know! :'D ) with a few variable adjustments, and hopefully I can simply move on to the final stages of my analysis. Even if the pooled dataset still has missing values.

    Thank you again! If any other thoughts come up regarding this thread, please let me know! 

    Best,​​



    ------------------------------
    Courtney B Francis
    ------------------------------



  • 30.  RE: Multiple Imputation

    Posted Tue September 20, 2022 08:21 AM
    Edited by System Test Fri January 20, 2023 04:40 PM
    By the way, do these means and standard deviations look right to you? The means are very large and variance practically non-existent.

    WAIT. Sorry, this was an artifact of the way it reads in my email. N and Mean have no separator in my view.

    ------------------------------
    Rick Marcantonio
    Quality Assurance
    IBM
    ------------------------------



  • 31.  RE: Multiple Imputation

    Posted Tue September 20, 2022 08:33 AM
    @Rick Marcantonio

    No worries!

    I will do the syntax you suggested earlier and I can email you!

    Best,​

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 32.  RE: Multiple Imputation

    Posted Mon September 19, 2022 08:19 PM
    All variables were imputed with the exception of the three in the "Not Imputed (No Missing Values)" section. 

    I used the example image from Rick, below to illustrate.


    ------------------------------
    Courtney B Francis
    ------------------------------



  • 33.  RE: Multiple Imputation

    Posted Mon September 19, 2022 11:51 PM
    Hi @Jon Peck,

    Please see the results from my attempt:
    Let me know if you have any advice for changes I should make.

    Best,​

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 34.  RE: Multiple Imputation

    Posted Thu September 22, 2022 02:03 AM
    @Courtney B Francis, while this thread discusses several 'technical' aspects of performing MI in SPSS, let's not forget the impact that MI will have on the validity of the inference drawn from the results of the imputed data set. @Jon Peck and @Rick Marcantonio have already mentioned that the type of missingness (MCAR, MAR, MNAR) is important. Moreover, ​​​is there a 'healthy' ratio between the proportions of imputed and observed data? Are we even performing MI based on already imputed data?

    In any case, consider performing sensitivity analyses under different scenarios in order to assess whether and how the imputations impact the results. Do analyses using different assumptions about missing data come to different conclusions regarding your main outcomes of interest?

    ------------------------------
    Frank Furter
    ------------------------------



  • 35.  RE: Multiple Imputation

    Posted Thu September 22, 2022 09:37 PM
    Edited by System Test Fri January 20, 2023 04:25 PM
    Thank you for the insight @Frank Furter !
    You are very right! And I did address patterns of missiningess before deciding to complete multiple imputation on my original (not already imputed data). Thank you for tapping in!

    @Rick Marcantonio @Jon Peck @Frank Furter

    I'm curious (not sure if I need to start a separate thread): Does anyone know why my pooled data set now has no values for minimum and maximum or standard deviation in the Descriptive Statistics? My original data (and the subsequent MI datasets) has the: N, Minimum, Maximum, Mean, Std. Deviation?


    My Pooled Data has an N and a Mean, that's it. Any thoughts on what might have went wrong? Syntax is still the same:


    Is there some way to get a pooled dataset  that has all the descriptive statistics? 

    Best,



    ​​​

    ------------------------------
    Courtney B Francis
    ------------------------------



  • 36.  RE: Multiple Imputation

    Posted Wed April 10, 2024 08:38 PM

    Hello Courtney and others,

    This thread may have run cold, and you may be long gone, but I have the same question. Wondering if you or anyone has a clear answer? Similar to your situation, I have used the MI in SPSS to create a number of sets of data with missing values imputed. So far, when I have tried to run analysis, I get separate lines of output for the original data, as well as imputations 1 thru X. However, the "pooled" section at the end has only contained counts, means, and SE in some cases. I have tried running cross tabs with chi-square and one-way ANOVA so far, and the result is the same (meaning - no actual statistical output such as test values or p-values). 

    My understanding is that the point of creating multiple imputations is to make different models of reasonable data to fill missing values. Further, I figured that since sampling error will be better contained with larger samples, that a pooling of the various imputed sets of data would be the best outcome for analysis. If this is incorrect, could someone please help me better understand? 

    As it stands, I have 11 sets of results (original + 10 MI sets), and reading through them to try and make sense is quite challenging. In the few test analyses I have run, the results from all 11 sets match perfectly, so interpretation is straightforward. However, what happens if 2 or more results patterns emerge? Do I just split the difference or go with the most common result? This seems quite unscientific. Furthermore, what are the standards for reporting results in a paper if I am choosing between 11 different outcomes? 

    I have gotten a little deeper than I planned in this post, but thank you for any advice!

    Best - 

    Matt



    ------------------------------
    Matthew Heller
    ------------------------------



  • 37.  RE: Multiple Imputation

    IBM Champion
    Posted Wed April 10, 2024 09:42 PM
    You are not using MI to create a larger dataset nor to see how results vary by set.  You do the MI step and then just run the procedure you want, e.g., regression.  You don't use the individual repetitions.  They are created in order to get proper estimates of the variances that can be used to pool the results.  What you care about is only the pooled results.  There are some statistics that don't get pooled, but the main results should be present after all the individual samples are estimated.

    --





  • 38.  RE: Multiple Imputation

    Posted Thu April 11, 2024 05:46 PM

    Hello Jon - thank you for your response!

    I may not be clearly communicating the problem (as I understand it), so let me offer a bit more context and some examples. 

    I filled in missing values in my dataset using the SPSS 29 process:

    Analyze --> Multiple Imputation --> Impute Missing Data Values

    The dataset looks right - each imputed set has created missing values and they are highlighted yellow. The default setting is to create 5 sets of imputed data, and because more is (sometimes) better, I created 10 imputed sets. So, now my dataset has the original data cases (N = 1552), and the imputation set 1 (cases 1553 - 3104), imputations #2, etc.

    As you said, my goal was simply to run the analysis of choice, read the pooled data results, and move on with my life. However, SPSS is not producing any helpful pooled results. Here is a screenshot of a the relevant section of a basic Frequencies analysis I ran:

    Note that "Imputation 10" includes counts as well as percentage, valid percentage, etc. On the other hand, the "Pooled" section only includes counts, and no additional calculations. I had to manually calculate each percentage to understand the patterns I was seeing. This same pattern repeats in other analyses as well. For example, here is a Crosstabs with chi-square analysis. First, a portion of the Crosstabulation table, where I asked for a bunch of content, such as "expected count", "row/column/total percentages", and "adjusted residual". You can see (in contrast to Imputation 10) that the only output for "Pooled" section is the actual counts:

    And then, the actual chi-square output. Here you can see that SPSS does not even both to create a section for "Pooled" - it just skips it entirely, ending with Imputation 10:

     I hope this clarifies the problem I am having. I am asking SPSS for an analysis, and not getting what I believe I am asking for. Could this be a problem based on menu commands (vs. syntax), since I am simply using the menus? Could this be something to do with having 10 instead of 5 imputations? Or am I misreading the output and expecting the wrong thing?

    Any help would be appreciated. Thank you!

    Matt



    ------------------------------
    Matthew Heller
    ------------------------------



  • 39.  RE: Multiple Imputation

    IBM Champion
    Posted Thu April 11, 2024 06:34 PM
    The statistics in the pooled output would be computed by aggregating the values in the individual segments, not from the other statistics in the pooled section of the output.  Doing the latter would pretty  much require recreating the formulas in each procedure that supports MI analysis.  Some procedure output isn't amenable to that approach.

    There is a section in the Algorithms doc
    Multiple Imputation: Pooling Algorithms
    that might be helpful, but it doesn't enumerate all the possibilities.

    Beyond that, I have to leave this to the statistician team to go deeper.

    --





  • 40.  RE: Multiple Imputation

    Posted Fri April 12, 2024 03:02 AM

    When multiply imputing missing data, you do not pool the imputed data sets and then perform the analysis. Instead, you perform the analysis separately on each of the imputed data sets and then pool the results. Some procedures in SPSS can do this automatically when they recognize a multiply imputed data set generated by the MI procedure whereas unfortunately others can't. See, e. g., https://www.ibm.com/docs/en/spss-statistics/29.0.0?topic=imputation-analyzing-multiple-data and https://bookdown.org/mwheymans/bookmi/data-analysis-after-multiple-imputation.html



    ------------------------------
    Frank Furter
    ------------------------------



  • 41.  RE: Multiple Imputation

    Posted Fri April 12, 2024 01:52 PM

    Thank you for your feedback, Jon and Frank.

    My understanding of what you are saying, and the attached documentation, is that using MI to fill in missing values is not nearly as straightforward in SPSS as I would have liked to see it. It seems that means and N are readily calculated in the pooled condition, but other analyses, including test statistics, p-values, or effect sizes are not in many cases. 

    My original goal in using MI was to fill in reasonable "guesses" for the missing values in my survey responses and then use this more complete data set in analysis. What I am thinking about now, as a solution to my problem, is whether there a meaningful way that I can take my 10 imputations, and perhaps average them (?), in order to create a single new, complete data set? I feel like I am being distracted by these 10 different imputation sets. Is is accurate to imagine each imputation as a sort of random sample of reasonable data points for that missing value? In other words, one imputation is not better or worse than another, as far as we know, because we do not know the true "population" value, but they should cluster around the population mean, following rules of normal distributions, etc.? If this is correct, then if I averaged the 10 imputations for each missing value, and entered them in a final, complete data set, I could run analyses without worrying about MI and which analyses support pooling or not. In other words, I would have a single dataset composed of a) my original, real data, and b) missing values composed of the average of the 10 imputations for each value. Would that make sense?



    ------------------------------
    Matthew Heller
    ------------------------------



  • 42.  RE: Multiple Imputation

    IBM Champion
    Posted Fri April 12, 2024 02:14 PM
    This would be defeating the point of multiple imputation.  You would be better off just using the single imputation method with an appropriate choice of imputation method.  The point of MI is to account for variance, and you would be eliminating that. 

    --