SPSS Statistics

View Only

Problem finding outliers via IQR

• 1.  Problem finding outliers via IQR

Posted Mon May 29, 2023 09:50 AM

Hello, I am new to the group. I am completing my dissertation and trying to learn SPSS on my own. I've done my best to explain the problem, but if I have left something out, please let me know

My design is a retrospective one group pretest posttest: I'm looking at multiple (13 total) cardiac risk factors before/after participating in a cardiac rehab program. Not everyone had pretest and/or posttest results, so I made the decision to include the risk factor in the analysis as long as it included both a pretest and posttest value - this means I do not have the same number of pairs for all of the risk factors, and will have to run tests of normalcy on each risk factor. Before this I wanted to look for outliers using the IQR method and did the following method, these were found online and confirmed by watching several tutorials on the same subject:

1) Analyze>Descriptive Statistics>Explore
2) Select the variable to the Dependent List
3) Under Statistics, select Descriptives, Outliers, and Percentiles.
4) Under Plots, de-select Stem and Plots and leave the rest as is.
5) Leave Options and Bootstraps as is

When I follow these instructions and the only variable I select is PreA1c, I get the following information

EXAMINE VARIABLES=Pre.A1c
/PLOT BOXPLOT
/COMPARE GROUPS
/PERCENTILES(5,10,25,50,75,90,95) HAVERAGE
/STATISTICS DESCRIPTIVES EXTREME
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

When I run this test again, but select all of the pretest variables (including the preA1c) this is the log and boxplot I receive:

EXAMINE VARIABLES=Pre.A1c Post.eAG Pre.SBP Pre.DBP Pre.TotChol Pre.HDL Pre.CholHDLRatio Pre.VLDL
Pre.LDL Pre.Tri Pre.NonHSL Pre.Wt Pre.BMI
/PLOT BOXPLOT
/COMPARE GROUPS
/PERCENTILES(5,10,25,50,75,90,95) HAVERAGE
/STATISTICS DESCRIPTIVES EXTREME
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

I do not understand why the same variable displays different information, when the only difference was the number of variables I selected. The variables are independent of one another, but I see that the box I must select them to under Explore is a Dependent List. Is this why there is a difference? And if so, would I correct this by simply running each variable by itself as I did in the first example? <o:p></o:p>

Thank you in advance. Brian Foster<o:p></o:p>

------------------------------
Brian Foster
------------------------------

• 2.  RE: Problem finding outliers via IQR

IBM Champion
Posted Mon May 29, 2023 06:44 PM
First, you are using LISTWISE deletion of missing values.  That means that a missing value for any variable included in the command causes the case to be ignored.  You can choose pairwise, instead, and then each variable will be evaluated independently, so missing values in one variable will not affect the results for a different one.

Listwise deletion gives you a consistent case  base across the variables and is typically used in statistical procedures such as regression, but pairwise shows you more about the individual variables.

Second, remember that a value is really an outlier only in reference to its peers, i.e., given the other variables that define a group.  So you might want to try out  Data > Identify Unusual Cases for a more useful evaluation.  You might also try using clustering to see cases that do not fit well into any cluster.

Third, consider what you will do with values identified as outliers.  Unless you have a good reason to remove it such as a data recording error, it is probably a bad idea to remove such cases.

--