Hello Bindu. This sentence in your recent post caught my eye:
Regarding the comment on normality test as a pre requisite, I always recommend to check normality test before proceeding with t-test or ANOVA test or the similar ones because statistically lot of reasons are there supporting this.
I gather you are referring to and disagreeing with the tongue-in-cheek presentation in which I argued that it is both silly and pointless to rely on statistical tests of normality to justify use of a t-test or ANOVA etc. If so, that's fine. You are certainly free to disagree. But I am curious about what specifically you disagree with. For example:
- Do you not agree that statistical tests of normality have too little power when n is small and too much power as n increases?
- Do you not agree that for OLS models (including t-tests & ANOVA), normality of the errors is a sufficient normality condition, but the necessary normality condition is approximate normality of the sampling distributions of the parameter estimates? (I say approximate normality, because I agree with the things George Box said about straight lines and normal distributions in the real world in his 1976 article--see the excerpt pasted below.)
- Can you provide references supporting use of statistical tests of normality as prerequisites to t-tests, ANOVA, etc.? (I ask because the only ones I remember seeing are in introductory to intermediate level textbooks written by non-statisticians.)
- When you do use statistical tests of normality, do you use them on the outcome variable, or on the residuals from the model of interest?
Thanks for clarifying.
Cheers,
Bruce
Excerpt from Box (1976)
In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. The physicist knows that particles have mass and yet certain results, approximating what really happens, may be derived from the assumption that they do not. Equally, the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world.
https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writings/Boxonmaths.pdf
------------------------------
Bruce Weaver
------------------------------
Original Message:
Sent: Mon March 17, 2025 04:28 AM
From: Bindu Krishnan
Subject: Shapiro-Wilk en SPSS
Hi Juan
I tried the Shapiro Wilk test for your dataset with the NORMALITY ANALYSIS extension procedure and getting the same p-value of 0.402. Even you can have a look at the attached screenshot for the p values obtained by other methods. Like Jon Peck mentioned, I too suggest you to use the NORMALITY ANALYSIS extension procedure for more tests of normality. But note that we need to select at least two variables here but you will obtain the output of both univariate and multivariate case separately. Regarding the comment on normality test as a pre requisite, I always recommend to check normality test before proceeding with t-test or ANOVA test or the similar ones because statistically lot of reasons are there supporting this.
Thanks
------------------------------
Bindu Krishnan
Senior Statistician
IBM SPSS Statistics
Original Message:
Sent: Wed March 12, 2025 06:42 PM
From: Juan Quik
Subject: Shapiro-Wilk en SPSS
Hello, I am using SPSS to perform the Shapiro-Wilk test. I input data with a sample size of less than 50 and export the results to PDF using the menu option: Analyze > Descriptive Statistics > Explore. I leave the default settings so that the software exports the results. For example, I input the following 10 data points:
56.0360.9391.5395.8476.0477.0481.0790.0835.0988.0
SPSS generates a Shapiro-Wilk statistic of 0.925 and a significance level (p-value) of 0.402. However, when I consult the Shapiro-Wilk coefficient table, I get the same statistic (0.925), but the significance level is very different, showing as 0.842 with a 5% significance level.
My questions are:
Why is this value (p-value) so different?
Is there a way to configure SPSS to use a 5% significance level?
Thank you very much for your support!
------------------------------
Juan Quik
------------------------------