Literature DB >> 23843808

Normality tests for statistical analysis: a guide for non-statisticians.

Abstract

Statistical errors are common in scientific literature and about 50% of the published articles have at least one error. The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. The aim of this commentary is to overview checking for normality in statistical analysis using SPSS.

Entities: Chemical Disease Species

Keywords: Normality; Statistical Analysis

Year: 2012 PMID： 23843808 PMCID： PMC3693611 DOI： 10.5812/ijem.3505

Source DB: PubMed Journal: Int J Endocrinol Metab ISSN： 1726-913X

1. Background

Statistical errors are common in scientific literature, and about 50% of the published articles have at least one error (1). Many of the statistical procedures including correlation, regression, t tests, and analysis of variance, namely parametric tests, are based on the assumption that the data follows a normal distribution or a Gaussian distribution (after Johann Karl Gauss, 1777–1855); that is, it is assumed that the populations from which the samples are taken are normally distributed (2-5). The assumption of normality is especially critical when constructing reference intervals for variables (6). Normality and other assumptions should be taken seriously, for when these assumptions do not hold, it is impossible to draw accurate and reliable conclusions about reality (2, 7). With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems (4); this implies that we can use parametric procedures even when the data are not normally distributed (8). If we have samples consisting of hundreds of observations, we can ignore the distribution of the data (3). According to the central limit theorem, (a) if the sample data are approximately normal then the sampling distribution too will be normal; (b) in large samples (> 30 or 40), the sampling distribution tends to be normal, regardless of the shape of the data (2, 8); and (c) means of random samples from any distribution will themselves have normal distribution (3). Although true normality is considered to be a myth (8), we can look for normality visually by using normal plots (2, 3) or by significance tests, that is, comparing the sample distribution to a normal one (2, 3). It is important to ascertain whether data show a serious deviation from normality (8). The purpose of this report is to overview the procedures for checking normality in statistical analysis using SPSS.

2. Visual Methods

Visual inspection of the distribution may be used for assessing normality, although this approach is usually unreliable and does not guarantee that the distribution is normal (2, 3, 7). However, when data are presented visually, readers of an article can judge the distribution assumption by themselves (9). The frequency distribution (histogram), stem-and-leaf plot, boxplot, P-P plot (probability-probability plot), and Q-Q plot (quantile-quantile plot) are used for checking normality visually (2). The frequency distribution that plots the observed values against their frequency, provides both a visual judgment about whether the distribution is bell shaped and insights about gaps in the data and outliers outlying values (10). The stem-and-leaf plot is a method similar to the histogram, although it retains information about the actual data values (8). The P-P plot plots the cumulative probability of a variable against the cumulative probability of a particular distribution (e.g., normal distribution). After data are ranked and sorted, the corresponding z-score is calculated for each rank as follows: z = x - ᵪ̅ / s. This is the expected value that the score should have in a normal distribution. The scores are then themselves converted to z-scores. The actual z-scores are plotted against the expected z-scores. If the data are normally distributed, the result would be a straight diagonal line (2). A Q-Q plot is very similar to the P-P plot except that it plots the quantiles (values that split a data set into equal portions) of the data set instead of every individual score in the data. Moreover, the Q-Q plots are easier to interpret in case of large sample sizes (2). The boxplot shows the median as a horizontal line inside the box and the interquartile range (range between the 25 th to 75 th percentiles) as the length of the box. The whiskers (line extending from the top and bottom of the box) represent the minimum and maximum values when they are within 1.5 times the interquartile range from either end of the box (10). Scores greater than 1.5 times the interquartile range are out of the boxplot and are considered as outliers, and those greater than 3 times the interquartile range are extreme outliers. A boxplot that is symmetric with the median line at approximately the center of the box and with symmetric whiskers that are slightly longer than the subsections of the center box suggests that the data may have come from a normal distribution (8).

3. Normality Tests

The normality tests are supplementary to the graphical assessment of normality (8). The main tests for the assessment of normality are Kolmogorov-Smirnov (K-S) test (7), Lilliefors corrected K-S test (7, 10), Shapiro-Wilk test (7, 10), Anderson-Darling test (7), Cramer-von Mises test (7), D’Agostino skewness test (7), Anscombe-Glynn kurtosis test (7), D’Agostino-Pearson omnibus test (7), and the Jarque-Bera test (7). Among these, K-S is a much used test (11) and the K-S and Shapiro-Wilk tests can be conducted in the SPSS Explore procedure (Analyze → Descriptive Statistics → Explore → Plots → Normality plots with tests) (8). The tests mentioned above compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation; the null hypothesis is that “sample distribution is normal.” If the test is significant, the distribution is non-normal. For small sample sizes, normality tests have little power to reject the null hypothesis and therefore small samples most often pass normality tests (7). For large sample sizes, significant results would be derived even in the case of a small deviation from normality (2, 7), although this small deviation will not affect the results of a parametric test (7). The K-S test is an empirical distribution function (EDF) in which the theoretical cumulative distribution function of the test distribution is contrasted with the EDF of the data (7). A limitation of the K-S test is its high sensitivity to extreme values; the Lilliefors correction renders this test less conservative (10). It has been reported that the K-S test has low power and it should not be seriously considered for testing normality (11). Moreover, it is not recommended when parameters are estimated from the data, regardless of sample size (12). The Shapiro-Wilk test is based on the correlation between the data and the corresponding normal scores (10) and provides better power than the K-S test even after the Lilliefors correction (12). Power is the most frequent measure of the value of a test for normality—the ability to detect whether a sample comes from a non-normal distribution (11). Some researchers recommend the Shapiro-Wilk test as the best choice for testing the normality of data (11).

4. Testing Normality Using SPSS

We consider two examples from previously published data: serum magnesium levels in 12–16 year old girls (with normal distribution, n = 30) (13) and serum thyroid stimulating hormone (TSH) levels in adult control subjects (with non-normal distribution, n = 24) (14). SPSS provides the K-S (with Lilliefors correction) and the Shapiro-Wilk normality tests and recommends these tests only for a sample size of less than 50 (8). In Figure, both frequency distributions and P-P plots show that serum magnesium data follow a normal distribution while serum TSH levels do not. Results of K-S with Lilliefors correction and Shapiro-Wilk normality tests for serum magnesium and TSH levels are shown in Table. It is clear that for serum magnesium concentrations, both tests have a p-value greater than 0.05, which indicates normal distribution of data, while for serum TSH concentrations, data are not normally distributed as both p values are less than 0.05. Lack of symmetry (skewness) and pointiness (kurtosis) are two main ways in which a distribution can deviate from normal. The values for these parameters should be zero in a normal distribution. These values can be converted to a z-score as follows:

Figure

Histograms (Left) and P-P Plots (Right) for Serum Magnesium and TSH Levels

Table

Skewness, kurtosis, and Normality Tests for Serum Magnesium and TSH Levels Provided by SPSS

	No.	Mean ± SD a	Mean ± SEM a	Skewness	SE_Skewness	Z_Skewness	Kurtosis	SE_Kurtosis	Z_Kurtosis	K-S a With Lilliefors Correction Test			Shapiro-Wilk Test
	No.	Mean ± SD a	Mean ± SEM a	Skewness	SE_Skewness	Z_Skewness	Kurtosis	SE_Kurtosis	Z_Kurtosis	Statistics	Df a	P value	Statistics	Df a	P-value
Serum magnesium, mg/dL	30	2.08 ± 0.175	2.08 ± 0.03	0.745	0.427	1.74	0.567	0.833	0.681	0.137	30	0.156	0.955	30	0.236
Serum TSH a , mU/L	24	1.67 ± 1.53	1.67 ± 0.31	1.594	0.472	3.38	1.401	0.918	1.52	0.230	24	0.002	0.750	24	<0.001

aAbbreviations: Df, Degree of freedom; K-S, Kolmogorov-Smirnov; SD, Standard deviation; SEM, Standard error of mean; TSH, Thyroid stimulating hormone

and aAbbreviations: Df, Degree of freedom; K-S, Kolmogorov-Smirnov; SD, Standard deviation; SEM, Standard error of mean; TSH, Thyroid stimulating hormone An absolute value of the score greater than 1.96 or lesser than -1.96 is significant at P < 0.05, while greater than 2.58 or lesser than -2.58 is significant at P < 0.01, and greater than 3.29 or lesser than -3.29 is significant at P < 0.001. In small samples, values greater or lesser than 1.96 are sufficient to establish normality of the data. However, in large samples (200 or more) with small standard errors, this criterion should be changed to ± 2.58 and in very large samples no criterion should be applied (that is, significance tests of skewness and kurtosis should not be used) (2). Results presented in Table indicate that parametric statistics should be used for serum magnesium data and non-parametric statistics should be used for serum TSH data.

5. Conclusions

According to the available literature, assessing the normality assumption should be taken into account for using parametric statistical tests. It seems that the most popular test for normality, that is, the K-S test, should no longer be used owing to its low power. It is preferable that normality be assessed both visually and through normality tests, of which the Shapiro-Wilk test, provided by the SPSS software, is highly recommended. The normality assumption also needs to be considered for validation of data presented in the literature as it shows whether correct statistical tests have been used.

7 in total

1. An introduction to everyday statistics--1.

Authors: P Driscoll; F Lecky; M Crosby
Journal: J Accid Emerg Med Date: 2000-05

2. Guidelines for reporting statistics in journals published by the American Physiological Society.

Authors: Douglas Curran-Everett; Dale J Benos
Journal: Am J Physiol Endocrinol Metab Date: 2004-08 Impact factor: 4.310

3. Estimating departure from normality.

Authors: P Royston
Journal: Stat Med Date: 1991-08 Impact factor: 2.373

4. Detecting skewness from summary information.

Authors: D G Altman; J M Bland
Journal: BMJ Date: 1996-11-09

5. Statistics notes: the normal distribution.

Authors: D G Altman; J M Bland
Journal: BMJ Date: 1995-02-04

6. Pediatric reference values for serum magnesium levels in Iranian subjects.

Authors: Asghar Ghasemi; Leila Syedmoradi; Saleh Zahediasl; Fereidoun Azizi
Journal: Scand J Clin Lab Invest Date: 2010-10 Impact factor: 1.713

7. Alterations in osmotic fragility of the red blood cells in hypo- and hyperthyroid patients.

Authors: S Zahedi Asl; N Khalili Brojeni; A Ghasemi; F Faraji; M Hedayati; F Azizi
Journal: J Endocrinol Invest Date: 2009-01 Impact factor: 4.256

7 in total

409 in total

1. Validation of the Physician-Pharmacist Collaborative Index for physicians in Malaysia.

Authors: Renukha Sellappans; Chirk Jenn Ng; Pauline Siew Mei Lai
Journal: Int J Clin Pharm Date: 2015-09-25

2. Connecting secretome to hematopoietic stem cell phenotype shifts in an engineered bone marrow niche.

Authors: Aidan E Gilchrist; Brendan A C Harley
Journal: Integr Biol (Camb) Date: 2020-07-10 Impact factor: 2.192

3. Parameters and functional analysis of the deep epaxial muscles in the thoracic, lumbar and sacral regions of the equine spine.

Authors: J A García Liñeiro; G H Graziotti; J M Rodríguez Menéndez; C M Ríos; N O Affricano; C L Victorica
Journal: J Anat Date: 2018-04-30 Impact factor: 2.610

4. Breaking Down the Coercive Cycle: How Parent and Child Risk Factors Influence Real-Time Variability in Parental Responses to Child Misbehavior.

Authors: Erika Lunkenheimer; Anna Lichtwarck-Aschoff; Tom Hollenstein; Christine J Kemp; Isabela Granic
Journal: Parent Sci Pract Date: 2016-08-23

5. Operationalization of Sign Language Phonological Similarity and its Effects on Lexical Access.

Authors: Joshua T Williams; Adam Stone; Sharlene D Newman
Journal: J Deaf Stud Deaf Educ Date: 2017-07-01

6. Risk-adjusted treatment selection and outcome of patients with acute cholecystitis.

Authors: J I González-Muñoz; G Franch-Arcas; M Angoso-Clavijo; M Sánchez-Hernández; A García-Plaza; M Caraballo-Angeli; L Muñoz-Bellvís
Journal: Langenbecks Arch Surg Date: 2016-10-04 Impact factor: 3.445

7. Comparing the Performance of Approaches for Testing the Homogeneity of Variance Assumption in One-Factor ANOVA Models.

Authors: Yan Wang; Patricia Rodríguez de Gil; Yi-Hsin Chen; Jeffrey D Kromrey; Eun Sook Kim; Thanh Pham; Diep Nguyen; Jeanine L Romano
Journal: Educ Psychol Meas Date: 2016-04-27 Impact factor: 2.821

8. Health Services Utilization Among Chinese American Older Adults: Moderation of Social Support With Functional Limitation.

Authors: Jinjiao Wang; Dexia Kong; Benjamin C Sun; XinQi Dong
Journal: J Appl Gerontol Date: 2018-07-22

9. Effects of Increased Arterial Stiffness on Atherosclerotic Plaque Amounts.

Authors: Kellie V Stoka; Justine A Maedeker; Lisa Bennett; Siddharth A Bhayani; William S Gardner; Jesse D Procknow; Austin J Cocciolone; Tezin A Walji; Clarissa S Craft; Jessica E Wagenseil
Journal: J Biomech Eng Date: 2018-05-01 Impact factor: 2.097

10. Randomization tests as alternative analysis methods for behavior-analytic data.

Authors: Andrew R Craig; Wayne W Fisher
Journal: J Exp Anal Behav Date: 2019-02-01 Impact factor: 2.468