AI and Data Science Master the art of data science. Join now
Original Message:Sent: 7/29/2021 10:40:00 AMFrom: Claire TyersSubject: Applying a sampling weightHi allI need to apply a sampling weight to my data then run a range of basic frequencies and statistics on the weighted data. I soon realised that this isn't possible in the base package so have now purchased complex samples but I'm struggling.I just want to apply a weight that has already been calculated. The sample isn't stratified or complex, it's just that we got greater participation in a survey from some respondents than others. The survey was originally sent out to an entire population. I have created a plan file with no strata or clusters but with a sample weight added. I selected WR estimation method. When I then run complex samples descriptives, the means that are produced are completely different a colleague using R. R, as I understand it just multiplies the weight by the value of the numerical variable. I can replicate the results when I do this manually in excel.What am I doing wrong? Any help would be much appreciated as a newbie to Complex Samples.Many thanks------------------------------Claire Tyers------------------------------#SPSSStatistics
This may be a good time to show your students that the world is a more complicated place than they thought :-)Weighting is a complicated subject, and statisticians are not all in agreement on how weights should be used. There is a section called "Using Rake-Weighted Data in SPSS Statistics Procedures" in the Raking with IBM SPSS Statistics.pdf file installed with the SPSSINC RAKE extension command that might be helpful.Weights can arise for several different reasons, including matching control totals, complex samples, importance, and heteroscedastic errors to name a few, and procedures have different ways of handling them. This shows up mainly when weights are not integers as would be the case after raking. CROSSTABS offers five methods for handling weights but by default rounds the cell counts. FREQUENCIES uses the fractional values as is, however the formatting of the counts shows zero decimal places if the variable format has zero decimals. If you expand the decimals in the pivot table or the variable format setting, you would see the fractional values.Some of the nonparametric procedures work by replicating cases according to the weights, so they have to round the weights.The complex samples procedures treat the weights as arising from the sampling scheme, which means a sampling design must be specified.In regression, the weights might also be used for heteroscedasticity correction, so the fractional values would be used.,One important improvement in CTABLES is that it can use either the usual weight as set by the WEIGHT CASES command, but it can also use effective base weighting, which ignores the weight set by WEIGHT CASES and uses an approximation for sampling weights originally specified by Kish without the need for a sampling plan. An example is included int he document mentioned above.Overall, tabulations in general can use the fractional weights as they are, but modeling procedures such as regression might do things differently if these are sampling weights. Gelman suggests not weighting the data at all but including the variables that determine the weights as additional regressors in the estimation process.What I tell people understandably uncertain about what to do in the modeling scenario is to experiment with the various possibilities remembering that the underlying assumption in regression is that the same model applies to all the cases, so if the weights make a big difference in the coefficient estimates, that suggests that there are problems with the model.I hope this is helpful.