Thank you - makes perfect sense- I think that is why I am digging so deep to try to understand why I am doing what I am doing, and being really pedantic even with minor analyses that likely won't make it into the body of the thesis (or subsequent papers) - it is definitely not a volume thing as I'm busy ditching my original plans to analyse everything :D ... and I have to say (no criticism) that stats at the lower levels is taught as pretty much an exact science, which I expect is a continuation of the journey from maths at school (didn't do stats - I am a practical science sort of bod - where correct methods really matter - i.e., lives in your hands) and this research thing is a big eye opener/smack in the face after many years of doing the job :) My luck dictates I will get at least one really rotten antagonistic sharpster in my thesis defence viva so thank you for making me think it through :D
Original Message:
Sent: Tue December 03, 2024 11:28 AM
From: Bruce Weaver
Subject: Pooling multiple imputations
I am reminded of what Robert P. Abelson wrote in his book, Statistics as Principled Argument.
For many students, statistics is an island, separated from other aspects of the research enterprise. Statistics is viewed as an unpleasant obligation, to be dismissed as rapidly as possible so that they can get on with the rest of their lives. Furthermore, it is very hard to deal with uncertainty, whether in life or in the little world of statistical inference. Many students try to avoid ambiguity by seizing upon tangible calculations, with stacks of computer output to add weight to their numbers. Students become rule-bound, thinking of statistical practice as a medical or religious regimen. They ask questions such as, "Am I allowed to analyze my data with this method?" in the querulous manner of a patient or parishioner anxious to avoid sickness or sin, and they seem to want a prescriptive answer, such as, "Run an analysis of variance according to the directions on the computer package, get lots of sleep, and call me in the morning."
For years, I always responded to students who asked, "Can I do this?" by saying something like, "You can do anything you want, but if you use method M you'll be open to criticism Z. You can argue your case effectively, however, if you use procedure P and are lucky enough to get result R. If you don't get Result R, then I'm afraid you'll have to settle for a weaker claim."
Eventually, I began to appreciate an underlying implication of the way I found myself responding: namely, that the presentation of the inferences drawn from statistical analysis importantly involves rhetoric. When you do research, critics may quarrel with the interpretation of your results, and you had better be prepared with convincing counterarguments. (These critics may never in reality materialize, but the anticipation of criticism is fundamental to good research and data analysis. In fact, imagined encounters with antagonistic sharpsters should inform the design of your research in the first place.) There are analogous features between the claims of a statistical analyst and a case presented by a lawyer--the case may be persuasive or flimsy (even fishy), the style of inference may be loose or tight, prior conventions and rules of evidence may be invoked or flouted, and so on. (p. xii)
I hope this helps, should you encounter any "antagonistic sharpsters" π
------------------------------
Bruce Weaver
Original Message:
Sent: Tue December 03, 2024 11:06 AM
From: Sharon Cooksey
Subject: Pooling multiple imputations
Think I was having a bit of a brain meltdown (I really hate equations) ... thank you - I have gone down the route you suggest - based on what I can understand from van Ginkel (2019) ... but I am happier now it seems that no one really has an answer (i.e., I am not doing it wrong) and so long as I can support what I am doing some way or other it will do :)
Thanks again!
------------------------------
Sharon Cooksey
Original Message:
Sent: Tue December 03, 2024 10:35 AM
From: Bruce Weaver
Subject: Pooling multiple imputations
Let me say again that I have not read the van Ginkel (2019) article you referred to. But the final sentence of the abstract suggests that reporting simple descriptive stats for the R2 and adjusted R2 values across all imputed datasets might be as good an approach as any other.
Of the methods for pooling the point estimates of R2 no method clearly performs best, but it is argued that the average of R2 's across imputed data set is preferred.
Granted, van Ginkel just suggests reporting the mean. But it seems to me you could report some other basic descriptive stats, or maybe even show a boxplot. YMMV.
------------------------------
Bruce Weaver
Original Message:
Sent: Tue December 03, 2024 09:54 AM
From: Sharon Cooksey
Subject: Pooling multiple imputations
Hi Bruce,
many thanks for the quick reply - I think I am making my struggle look more confused because my language is poor to describe what I am attempting to do - I get I am not pooling the datasets (I thought it was a bit dodgy but there are some methods described on various youtube videos - however, Jon explained that and I get it) ... so what I am trying to find one number/figure to estimate the pooled data parameter results so I can write them in my results chapter (and understand what is going on in the studies).
So, as I understand it - the chapter you have pointed me to is explaining how the values which are listed as 'pooled' in SPSS output are arrived at... not explaining how to calculate any value which is not provided already by SPSS. This may be completely incorrect but it is how I read the article ... so the question was how could I find out the pooled estimates which aren't already provided by SPSS ... so in regression adjusted R2, in ANOVA F etc. - the article I cited suggests it is possible - but I really can't get my head around all the equations (I have a mental block for equations)
I may have this completely muddled but I am getting nowhere quickly and I am short on time (and getting frustrated with it as this comes on the back of weeks trying to understand and get MI or EM etc. to work - my dataset is so large it took a high powered computer 5 days to run the MI and I kept thinking I must have done something wrong - I hadn't but all of the restarts cost more days...) The analyses in SPSS are the minor analyses - which I am trying to do before I switch to more involved.. and more important for research results - and I thought I would be finished by now :( (Also have 2 more very large datasets to analyse after this one and starting to feel it is impossible ) I have tried asking the stats tutors at my uni but they don't know how to help even with MI in SPSS and my supervisors also have no experience so I am on my own
Thank you so much for any help you can give
------------------------------
Sharon Cooksey
Original Message:
Sent: Tue December 03, 2024 09:08 AM
From: Bruce Weaver
Subject: Pooling multiple imputations
Hello Sharon. Your statement that you are trying to "calculate F, Rsquared, beta, etc. values for the pooled data" (emphasis added) suggests to me that you are still struggling with the idea that when you use multiple imputation, you do not pool the data. Rather, you use Rubin's rules to compute pooled estimates of parameters and their standard errors, and then use those pooled estimates for significance testing. See this chapter, for example.
I hope this helps.
PS- I have not read that particular article by Joost R. van Ginkel, but have found previous articles by him to be credible.
------------------------------
Bruce Weaver
Original Message:
Sent: Tue December 03, 2024 07:15 AM
From: Sharon Cooksey
Subject: Pooling multiple imputations
Hi Jon - I hope it is OK to now ask another question. I am working with the whole imputed dataset (so 100 imputations and it is pretty unwieldy! but that's OK as as you pointed out it is the correct way) but I'm now wanting to run a series of regressions (and will also want to run ANOVAs) and so I have spent over 3 hours this morning trying to understand how to calculate F, Rsquared, beta, etc. values for the pooled data - or to at least be able to report a single pooled figure for each of the analyses so I can report something for the imputed data .... I found several papers and tried to follow the advice from several other places - but I still really do not have any idea, other than simply calculating the means of those values from the imputed datasets is not acceptable (because that is what I was trying to do before) and a headache ... I found this paper which seems to be credible -
Joost R. van Ginkel (2019) Significance Tests and Estimates for R2 for Multiple Regression in Multiply Imputed Datasets: A Cautionary Note on Earlier Findings, and Alternative Solutions, Multivariate Behavioral Research, 54:4, 514-529, DOI: 10.1080/00273171.2018.1540967
But I just cannot get my head around it. Is there a simple way, or something I could follow to calculate the values I need please? Many thanks in advance :)
------------------------------
Sharon Cooksey
Original Message:
Sent: Sun December 01, 2024 10:40 PM
From: Jon Peck
Subject: Pooling multiple imputations
You should never combine all the imputations into a single dataset. That would be very misleading. Each imputation round produces an entire dataset. You can pick out any single dataset including the original imputed data based on the imputation_ variable, by using Data > Select Cases.
--
Original Message:
Sent: 12/1/2024 10:16:00 PM
From: Sharon Cooksey
Subject: Pooling multiple imputations
Hi - I have a dataset consisting of 100 imputations and the original dataset. Is it possible to pool the imputed datasets to make a single pooled dataset please? ( would like to do this as it is suggested that is helpful to use the imputed dataset from time 1 alongside original time 2 data when imputing time 2 datasets in longitudinal studies)
If it is possible please can you tell me how/ the step by step method for SPSS?
Many thanks!
------------------------------
Sharon Cooksey
------------------------------