Hi @Alberto Camardiel
I can't speak directly to this problem but I will ask:
- Do you get the same results in the current version (IBM SPSS Statistics 29.0.1.0)? If this is in fact a defect in IBM SPSS Statistics, then the remedy would be included in a release after version 29.0.1.0. IBM SPSS Statistics 22.0 was End of Support 2019-09-30.
- Are you able to open a Support case? IBM SPSS Support could help you more effectively and bring more resources to bear if there were an open Support case. I can see this issue needing an exchange of command syntax and data, as well as input from statisticians and developers. We could better ensure the securiy of your data and better understand the parameters of the problem with a Support case.
Start at the IBM Support Portal and use the blue "Open a case" button in the upper right of that page.
------------------------------
David Dwyer
SPSS Technical Support
IBM Software
------------------------------
Original Message:
Sent: Wed June 28, 2023 08:23 PM
From: Alberto Camardiel
Subject: Complex Sampling
I believe there is an error in the programming of the Complex Samples (CS) module of SPSS version 22. I copy below the syntax used to generate a two-stage probability proportional to size (PPS) cluster sample design and selection plan for a population with nine clusters of varying size described in the table below. The design selects 3 clusters out of 9 with PPS:
Cluster | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Households | 20 | 100 | 50 | 15 | 18 | 43 | 20 | 36 | 13 |
The syntax program I ran is as follows:
* Asistente de muestreo. |
CSPLAN SAMPLE |
/PLAN FILE='C:\Users\gaine\Documents\apgea\mis_cursos_ceadcs_pipe\Cohorte '+ |
'2022-2023\16_muestreo_para_investigadores_2022\04_clases_modalidad_virtual\tema_07_preparacion'+ |
'\00_apoyos\spss\prueba.csplan' |
/PLANVARS SAMPLEWEIGHT=SampleWeight_Final_ |
/PRINT PLAN |
/DESIGN STAGELABEL='1' CLUSTER=Manzana |
/METHOD TYPE=PPS_SAMPFORD ESTIMATION=DEFAULT |
/MOS VARIABLE=Vivienda |
/SIZE VALUE=3 |
/STAGEVARS INCLPROB(InclusionProbability_1_) CUMWEIGHT(SampleWeightCumulative_1_) |
POPSIZE(PopulationSize_1_) SAMPSIZE(SampleSize_1_) RATE(SamplingRate_1_) WEIGHT(SampleWeight_1_) |
/DESIGN STAGELABEL='2' |
/METHOD TYPE=SIMPLE_WOR |
/SIZE VALUE=7 |
/STAGEVARS INCLPROB(InclusionProbability_2_) CUMWEIGHT(SampleWeightCumulative_2_) |
POPSIZE(PopulationSize_2_) SAMPSIZE(SampleSize_2_) RATE(SamplingRate_2_) WEIGHT(SampleWeight_2_). |
CSSELECT |
/PLAN FILE='C:\Users\gaine\Documents\apgea\mis_cursos_ceadcs_pipe\Cohorte '+ |
'2022-2023\16_muestreo_para_investigadores_2022\04_clases_modalidad_virtual\tema_07_preparacion'+ |
'\00_apoyos\spss\prueba.csplan' |
/CRITERIA STAGES=1 2 SEED=1 |
/CLASSMISSING EXCLUDE |
/SAMPLEFILE OUTFILE='C:\Users\gaine\Documents\apgea\mis_cursos_ceadcs_pipe\Cohorte '+ |
'2022-2023\16_muestreo_para_investigadores_2022\04_clases_modalidad_virtual\tema_07_preparacion'+ |
'\00_apoyos\spss\prueba.sav' |
/JOINTPROB OUTFILE='C:\Users\gaine\Documents\apgea\mis_cursos_ceadcs_pipe\Cohorte '+ |
'2022-2023\16_muestreo_para_investigadores_2022\04_clases_modalidad_virtual\tema_07_preparacion'+ |
'\00_apoyos\spss\proba.sav' |
/PRINT SELECTION. |
The sample selected by CS from SPSS contains selection probabilities in the first stage that instead of varying with cluster size are all equal to 0.33. The values that should have been included as first-stage selection probabilities are:
PPS first stage | 0,19 | 0,95 | 0,48 | 0,14 | 0,17 | 0,41 | 0,19 | 0,34 | 0,12 |
SPSS | 0,33 | 0,33 | 0,33 | 0,33 | 0,33 | 0,33 | 0,33 | 0,33 | 0,33 |
The SPSS CS manual does not help to know what might be going on. Perhaps I am wrong and precisely for that reason I am posting it.
------------------------------
Alberto Camardiel
------------------------------