Original Message:
Sent: 4/12/2024 1:52:00 PM
From: Matthew Heller
Subject: RE: Multiple Imputation
Thank you for your feedback, Jon and Frank.
My understanding of what you are saying, and the attached documentation, is that using MI to fill in missing values is not nearly as straightforward in SPSS as I would have liked to see it. It seems that means and N are readily calculated in the pooled condition, but other analyses, including test statistics, p-values, or effect sizes are not in many cases.
My original goal in using MI was to fill in reasonable "guesses" for the missing values in my survey responses and then use this more complete data set in analysis. What I am thinking about now, as a solution to my problem, is whether there a meaningful way that I can take my 10 imputations, and perhaps average them (?), in order to create a single new, complete data set? I feel like I am being distracted by these 10 different imputation sets. Is is accurate to imagine each imputation as a sort of random sample of reasonable data points for that missing value? In other words, one imputation is not better or worse than another, as far as we know, because we do not know the true "population" value, but they should cluster around the population mean, following rules of normal distributions, etc.? If this is correct, then if I averaged the 10 imputations for each missing value, and entered them in a final, complete data set, I could run analyses without worrying about MI and which analyses support pooling or not. In other words, I would have a single dataset composed of a) my original, real data, and b) missing values composed of the average of the 10 imputations for each value. Would that make sense?
------------------------------
Matthew Heller
------------------------------
Original Message:
Sent: Fri April 12, 2024 03:01 AM
From: Frank Furter
Subject: Multiple Imputation
When multiply imputing missing data, you do not pool the imputed data sets and then perform the analysis. Instead, you perform the analysis separately on each of the imputed data sets and then pool the results. Some procedures in SPSS can do this automatically when they recognize a multiply imputed data set generated by the MI procedure whereas unfortunately others can't. See, e. g., https://www.ibm.com/docs/en/spss-statistics/29.0.0?topic=imputation-analyzing-multiple-data and https://bookdown.org/mwheymans/bookmi/data-analysis-after-multiple-imputation.html
------------------------------
Frank Furter
Original Message:
Sent: Thu April 11, 2024 05:45 PM
From: Matthew Heller
Subject: Multiple Imputation
Hello Jon - thank you for your response!
I may not be clearly communicating the problem (as I understand it), so let me offer a bit more context and some examples.
I filled in missing values in my dataset using the SPSS 29 process:
Analyze --> Multiple Imputation --> Impute Missing Data Values
The dataset looks right - each imputed set has created missing values and they are highlighted yellow. The default setting is to create 5 sets of imputed data, and because more is (sometimes) better, I created 10 imputed sets. So, now my dataset has the original data cases (N = 1552), and the imputation set 1 (cases 1553 - 3104), imputations #2, etc.
As you said, my goal was simply to run the analysis of choice, read the pooled data results, and move on with my life. However, SPSS is not producing any helpful pooled results. Here is a screenshot of a the relevant section of a basic Frequencies analysis I ran:
![](https://dw1.s81c.com//IMWUC/MessageImages/edeb991d2f054f89a5e24721963d882b.png)
Note that "Imputation 10" includes counts as well as percentage, valid percentage, etc. On the other hand, the "Pooled" section only includes counts, and no additional calculations. I had to manually calculate each percentage to understand the patterns I was seeing. This same pattern repeats in other analyses as well. For example, here is a Crosstabs with chi-square analysis. First, a portion of the Crosstabulation table, where I asked for a bunch of content, such as "expected count", "row/column/total percentages", and "adjusted residual". You can see (in contrast to Imputation 10) that the only output for "Pooled" section is the actual counts:
![](https://dw1.s81c.com//IMWUC/MessageImages/ff7d9e5c5e5743ceb21c6d4aeab94310.png)
And then, the actual chi-square output. Here you can see that SPSS does not even both to create a section for "Pooled" - it just skips it entirely, ending with Imputation 10:
![](https://dw1.s81c.com//IMWUC/MessageImages/f0531aa3e75c4c06aa591e60c01a4af3.png)
I hope this clarifies the problem I am having. I am asking SPSS for an analysis, and not getting what I believe I am asking for. Could this be a problem based on menu commands (vs. syntax), since I am simply using the menus? Could this be something to do with having 10 instead of 5 imputations? Or am I misreading the output and expecting the wrong thing?
Any help would be appreciated. Thank you!
Matt
------------------------------
Matthew Heller
Original Message:
Sent: Wed April 10, 2024 09:41 PM
From: Jon Peck
Subject: Multiple Imputation
You are not using MI to create a larger dataset nor to see how results vary by set. You do the MI step and then just run the procedure you want, e.g., regression. You don't use the individual repetitions. They are created in order to get proper estimates of the variances that can be used to pool the results. What you care about is only the pooled results. There are some statistics that don't get pooled, but the main results should be present after all the individual samples are estimated.
--
Original Message:
Sent: 4/10/2024 5:37:00 PM
From: Matthew Heller
Subject: RE: Multiple Imputation
Hello Courtney and others,
This thread may have run cold, and you may be long gone, but I have the same question. Wondering if you or anyone has a clear answer? Similar to your situation, I have used the MI in SPSS to create a number of sets of data with missing values imputed. So far, when I have tried to run analysis, I get separate lines of output for the original data, as well as imputations 1 thru X. However, the "pooled" section at the end has only contained counts, means, and SE in some cases. I have tried running cross tabs with chi-square and one-way ANOVA so far, and the result is the same (meaning - no actual statistical output such as test values or p-values).
My understanding is that the point of creating multiple imputations is to make different models of reasonable data to fill missing values. Further, I figured that since sampling error will be better contained with larger samples, that a pooling of the various imputed sets of data would be the best outcome for analysis. If this is incorrect, could someone please help me better understand?
As it stands, I have 11 sets of results (original + 10 MI sets), and reading through them to try and make sense is quite challenging. In the few test analyses I have run, the results from all 11 sets match perfectly, so interpretation is straightforward. However, what happens if 2 or more results patterns emerge? Do I just split the difference or go with the most common result? This seems quite unscientific. Furthermore, what are the standards for reporting results in a paper if I am choosing between 11 different outcomes?
I have gotten a little deeper than I planned in this post, but thank you for any advice!
Best -
Matt
------------------------------
Matthew Heller
Original Message:
Sent: Thu September 22, 2022 09:36 PM
From: Courtney B Francis
Subject: Multiple Imputation
Thank you for the insight @Frank Furter !
You are very right! And I did address patterns of missiningess before deciding to complete multiple imputation on my original (not already imputed data). Thank you for tapping in!
@Rick Marcantonio @Jon Peck @Frank Furter
I'm curious (not sure if I need to start a separate thread): Does anyone know why my pooled data set now has no values for minimum and maximum or standard deviation in the Descriptive Statistics? My original data (and the subsequent MI datasets) has the: N, Minimum, Maximum, Mean, Std. Deviation?
![](https://dw1.s81c.com//IMWUC/MessageImages/f38e5fde89174df983f4566a6d0aea33.png)
My Pooled Data has an N and a Mean, that's it. Any thoughts on what might have went wrong? Syntax is still the same:
![](https://dw1.s81c.com//IMWUC/MessageImages/507f4d243aa8415f8a24fc6f35670929.png)
Is there some way to get a pooled dataset that has all the descriptive statistics?
Best,
------------------------------
Courtney B Francis
Original Message:
Sent: Thu September 22, 2022 02:02 AM
From: Frank Furter
Subject: Multiple Imputation
@Courtney B Francis, while this thread discusses several 'technical' aspects of performing MI in SPSS, let's not forget the impact that MI will have on the validity of the inference drawn from the results of the imputed data set. @Jon Peck and @Rick Marcantonio have already mentioned that the type of missingness (MCAR, MAR, MNAR) is important. Moreover, is there a 'healthy' ratio between the proportions of imputed and observed data? Are we even performing MI based on already imputed data?
In any case, consider performing sensitivity analyses under different scenarios in order to assess whether and how the imputations impact the results. Do analyses using different assumptions about missing data come to different conclusions regarding your main outcomes of interest?
------------------------------
Frank Furter
Original Message:
Sent: Mon September 19, 2022 11:51 PM
From: Courtney B Francis
Subject: Multiple Imputation
Hi @Jon Peck,
Please see the results from my attempt:
Let me know if you have any advice for changes I should make.
Best,
------------------------------
Courtney B Francis
Original Message:
Sent: Mon September 19, 2022 05:03 PM
From: Jon Peck
Subject: Multiple Imputation
But did the tables that the MI procedure produces show that some variables/values could not be imputed?
--
Original Message:
Sent: 9/19/2022 4:46:00 PM
From: Rick Marcantonio
Subject: RE: Multiple Imputation
Yes, I understand what you mean.
I'm curious if the student weight variable (STUDWGT) has any 0 or missing values...
------------------------------
Rick Marcantonio
Quality Assurance
IBM
Original Message:
Sent: Mon September 19, 2022 04:16 PM
From: Courtney B Francis
Subject: Multiple Imputation
Hi @Rick Marcantonio,!
They are being imputed (The amount of cases have increased considerably) and the amount of missingness has decreased from the original dataset, but it was my understanding that once all the variables of interest were imputed, there should no longer be any missingness-especially in the pooled dataset.
------------------------------
Courtney B Francis
Original Message:
Sent: Mon September 19, 2022 04:11 PM
From: Rick Marcantonio
Subject: Multiple Imputation
So then, no values are being imputed at all?
------------------------------
Rick Marcantonio
Quality Assurance
IBM
Original Message:
Sent: Mon September 19, 2022 04:04 PM
From: Courtney B Francis
Subject: Multiple Imputation
Hi @Rick Marcantonio,
Thank you for your response!
All of the variables listed -including the ones in the "/constraints... as Role =IND" have missing data. In all the imputed datasets including the pooled data set. And the amount of missingess is the same across all imputed datasets.
------------------------------
Courtney B Francis
Original Message:
Sent: Mon September 19, 2022 01:27 PM
From: Rick Marcantonio
Subject: Multiple Imputation
Hi.
Do you mean apart from the variables specified in /CONSTRAINTS as ROLE=IND?
------------------------------
Rick Marcantonio
Quality Assurance
IBM
Original Message:
Sent: Mon September 19, 2022 11:32 AM
From: Courtney B Francis
Subject: Multiple Imputation
Hello!
I have attempted to run Multiple Imputation on my dataset. After the imputation ran I still had missing data in the pooled data set.
Do you know what might have caused this issue? I'm not sure if this is common or if there is an issue in my syntax or Data that may be creating this issue. Please see my syntax below:
SET THREADS = 4.
USE ALL.
FILTER OFF.
SORT CASES BY YEAR SUBJID.
EXECUTE.
SET SEED=20220913.
MULTIPLE IMPUTATION
INSTDICO
INSTCONT
WHITE
BLACK
Brown
Black_Brown_Total
BLACK_BROWN_AUTISM
MINORITIZED_POC
RECODEAUTISM
HSGPA
COLLEGE_INVOLVEMENT
HABITS_OF_MIND_GRP
ACADEMIC_SELFCONCEPT_GRP
SOCIAL_SELFCONCEPT_GRP
DEGASPDICO
SEX
FIRSTGEN
INCOME
ACT_FINAL
BLACK_BROWN_XINCOME
BLACK_BROWN_XHSGPA
BLACK_BROWN_XHOMG
BLACK_BROWN_XCOLLINV
BLACK_BROWN_XASCG
BLACK_BROWN_XSSCG
BLACK_BROWN_XACT
BLACK_BROWN_XSEX
BLACK_BROWN_XFIRSTGEN
BLACK_BROWN_XDEGASP
WHITE_XINCOME
WHITE_XHSGPA
WHITE_XHOMG
WHITE_XCOLLINV
WHITE_XASCG
WHITE_XSSCG
WHITE_XACT
WHITE_XSEX
WHITE_XFIRSTGEN
WHITE_XDEGASP
AUTISM_XINCOME
AUTISM_XHSGPA
AUTISM_XHOMG
AUTISM_XCOLLINV
AUTISM_XASCG
AUTISM_XSSCG
AUTISM_XACT
AUTISM_XSEX
AUTISM_XFIRSTGEN
AUTISM_XDEGASP
AUTISM_XBLACK_BROWN
AUTISM_XWHITE
RECODE_DISAB01
RECODE_DISAB02
RECODE_DISAB04
RECODE_DISAB05
RECODE_DISAB06
RECODE_DISAB07
/ANALYSISWEIGHT STUDWGT
/IMPUTE METHOD=FCS MAXITER= 100 NIMPUTATIONS=10 SCALEMODEL=LINEAR INTERACTIONS=NONE
SINGULAR=1E-012 MAXPCTMISSING=NONE MAXMODELPARAM =10000
/CONSTRAINTS BLACK_BROWN_XINCOME( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XHSGPA( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XHOMG( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XCOLLINV( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XASCG( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XSSCG( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XACT( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XSEX( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XFIRSTGEN( ROLE=IND)
/CONSTRAINTS BLACK_BROWN_XDEGASP( ROLE=IND)
/CONSTRAINTS WHITE_XINCOME( ROLE=IND)
/CONSTRAINTS WHITE_XHSGPA( ROLE=IND)
/CONSTRAINTS WHITE_XHOMG( ROLE=IND)
/CONSTRAINTS WHITE_XCOLLINV( ROLE=IND)
/CONSTRAINTS WHITE_XASCG( ROLE=IND)
/CONSTRAINTS WHITE_XSSCG( ROLE=IND)
/CONSTRAINTS WHITE_XACT( ROLE=IND)
/CONSTRAINTS WHITE_XSEX( ROLE=IND)
/CONSTRAINTS WHITE_XFIRSTGEN( ROLE=IND)
/CONSTRAINTS WHITE_XDEGASP ( ROLE=IND)
/CONSTRAINTS AUTISM_XINCOME( ROLE=IND)
/CONSTRAINTS AUTISM_XHSGPA( ROLE=IND)
/CONSTRAINTS AUTISM_XHOMG( ROLE=IND)
/CONSTRAINTS AUTISM_XCOLLINV( ROLE=IND)
/CONSTRAINTS AUTISM_XASCG( ROLE=IND)
/CONSTRAINTS AUTISM_XSSCG( ROLE=IND)
/CONSTRAINTS AUTISM_XACT( ROLE=IND)
/CONSTRAINTS AUTISM_XSEX( ROLE=IND)
/CONSTRAINTS AUTISM_XFIRSTGEN( ROLE=IND)
/CONSTRAINTS AUTISM_XDEGASP ( ROLE=IND)
/CONSTRAINTS AUTISM_XBLACK_BROWN( ROLE=IND)
/CONSTRAINTS AUTISM_XWHITE( ROLE=IND)
/CONSTRAINTS DEGASP (RND=1 MIN=0 MAX=1)
/MISSINGSUMMARIES NONE
/IMPUTATIONSUMMARIES MODELS DESCRIPTIVES
/OUTFILE IMPUTATIONS=courtney_syntax_9_14_22.sav FCSITERATIONS=iteration_history.
------------------------------
Courtney B Francis
------------------------------
#SPSSStatistics