Literature DB >> 26587420

Statistical notes for clinical researchers: effect size.

Abstract

Year: 2015 PMID： 26587420 PMCID： PMC4650530 DOI： 10.5395/rde.2015.40.4.328

Source DB: PubMed Journal: Restor Dent Endod ISSN： 2234-7658

× No keyword cloud information.

In most clinical studies, p value is the final result of data analysis. A small p value is interpreted as a significant difference between the experimental group and the control group. However, reporting p value is not enough to know the actual difference. Problem of p value is that it depends on the sample size, n. Even a trivial meaningless difference can result in an extremely small p value when sample size is large. To make up this weak point, we need to report the 'effect size' as well as the p value. Effect size is a simple way to show the actual difference, which is independent of the sample size.

1. Reporting p value is not enough

In statistical testing we set a null hypothesis first and calculate the test statistic such as t values under an assumption of the null hypothesis. Finally, a p value is obtained which represents the probability of observing the current data due to chance when the null hypothesis is true. In most scientific articles, we usually make conclusion based on p values compared to the alpha error level chosen, e.g., 0.05. A smaller p value than alpha level is interpreted as a statistical significance. However, there are serious problems in relying on the p value only. First, depending on the sample size, a wide range of p values can be obtained with the same size of difference, which can lead to contradictory results: either statistically significant or insignificant conclusions. Examples 1 and 2 in Table 1 have the same trivial difference of 3 between before and after treatments, assuming a clinically meaningful difference as 10. Two results were contradictory: statistically significant (p = 0.001, Example 2) and insignificant (p = 0.382, Example 1) depending on whether the sample size is large (n =10,000) or small (n =100). Moreover, as appeared in Example 2, it is a serious problem that clinically meaningless condition is concluded as statistically significant. The treatment in example 2 is clinically insignificant but statistically significant! What would you reasonably conclude on this case? This is a problem caused by using inappropriately large sample sizes.

Table 1

Examples of results of significant testing using p value and comparative effect size

Example	Before	After	SD^*	Diff.	n	t value	p value	Effect size	Characteristics
1	145	142	100	3	100	0.3=3100/100	0.382	0.03=3100	Trivial effect & insignificant
2	145	142	100	3	10,000	3=3100/10,000	0.001	0.03=3100	Trivial effect & significant
3	145	115	100	30	100	3=3100/100	0.001	0.3=30100	Substantial effect & significant

*SD, standard deviation.

Second, the information provided by the size of p value is confusing, because it is confounded by the sample size. We may expect that a small p value can tell us some information on how much difference exists between the observed data and the assumption of null hypothesis. However, the same size of p values can be obtained from quite different situations. Example 2 with a trivial effect and larger sample size and Example 3 with a substantial effect and smaller sample size both show the same p value 0.001 in Table 1. The result shows that p values are confounded with the sample size. Two problems above can be overcome by controlling the sample size. To avoid this discordant situation, sample size determination procedure must be performed in the design stage in an experimental study. We generally need to calculate appropriate sample size in consideration with difference, SD, alpha error and power in the study design stage. The conclusion of significance testing is reliable only when an appropriate sample size was applied in a study. When we analyze a survey data with a large sample size, we need to consider the effect of large sample size in the interpretation of the test results. Also the weakness of p value can be compensated by considering the effect size coincidently. As shown in Table 1, effect sizes exactly reflect the magnitude of actual effect, as displayed by 0.03 for a trivial difference and 0.3 for a substantial one.

2. What is effect size?

'Effect size' is simply a way of quantifying difference between compared groups, in other words, the actual effect.1 While a p value has an important meaning in statistical inference, an effect size is expressing a descriptive importance. In Table 1, the effect sizes were expressed as the difference between two group means divided by the standard deviation of the group. When we compared Example 2 and Example 3, their effect sizes are a quite different as 0.03 and 0.3, while their p values are the same. Let's suppose clinicians generally think a change of at least '10' is clinically meaningful while a change of 3 after treatment is negligible. Therefore, they would not apply the treatment for the small change 3, even though the statistical significance test concluded the treatment is effective based on highly significant p value. Contrarily, they would apply the treatment in Example 3 because they can expect a substantial change of '30', and the statistical test concluded its significance. The results show that effect size exactly reflects the actual difference or effect. Therefore, reporting both the p value and the effect size is necessary in order to consider both statistical significance and actual clinical significance.

3. Types of effect size

Generally, there are two types of common effect size indices: standardized difference between groups and measures of association between groups. Table 2 shows the types of effect size indices and general standards of small, medium, and large effect for each type of effect size.

Table 2

Common effect size indices2

Index		Description	Standard	Comment
Between groups	Cohen's d or Glass's Δ	d or Δ = (Mean₁ - Mean₂) / SD^*d: use pooled SDΔ: use SD of control group	Small 0.2Medium 0.5Large 0.8Very large 1.3	For continuous outcomes
	Odds ratio (OR)	OR = odds₁ / odds₂	Small 1.5Medium 2Large 3	Degree of association between binary outcomes
	Relative risk or risk ratio (RR)	RR = p₁ / p₂	Small 2Medium 3Large 4	For binary outcomes, ratio of two proportions
Measures of association	Pearson's r correlation	Range -1 to 1	Small ± 0.2Medium ± 0.3Large ± 0.5	Measures the degree of linear relationship
Measures of association	Pearson r correlation coefficient	Range 0 to 1	Small 0.04Medium 0.09Large 0.25	Proportion of variance explained

*SD, standard deviation.

Between groups 1) Cohen's d or Glass's Δ: Defined by difference between two group means divided by standard deviation for continuous outcomes. Cohen's d is calculated by dividing pooled standard deviation under assumption of the equal variances while Glass's Δ is obtained by dividing the standard deviation of control group. 2) Odds ratio: Defined by ratio of odds of two compared groups for binary outcomes. 3) Relative ratio: Defined by ratio of proportions of two compared groups for binary outcomes. Measures of association 1) Pearson's r correlation: Effect size representing association of two variables. 2) Pearson r correlation coefficient: The amount of variation explained.

4. Interpretation of effect size

Then, how would we interpret the degree of effect size? An effect size is exactly equivalent to a Z score of a standard normal distribution. Assume that all data are normally distributed. If Cohen's d is calculated to be zero, it means that there is no mean difference between two comparative groups and the position of the mean of experimental group is exactly the same with the mean of control group. Therefore, 50% of observations in control group locate below the mean of experimental group (Table 3). The relative 'small' effect size '0.2' means the mean of experimental group is located at 0.2 standard deviation above the mean of control group. The Z score of 0.2 is at 58th percentile which have 58% of observations below in control group (Figure 1). Similarly, the Cohen's d values 0.5 and 0.8 locate at 69th and 79th percentile of the distribution of the control group, respectively.

Table 3

Interpretation of Cohen's d which represents a standardized difference [(Mean1 - Mean2) / SD*]13

Relative size	Effect size	% of control group below the mean of experimental group
	0.0	50%
Small	0.2	58%
Medium	0.5	69%
Large	0.8	79%
	1.4	92%

*SD, standard deviation.

Figure 1

Distribution of control group (solid line) and experimental group (dotted line), and position of Cohen's d = 0.2.1

5. Conversion of effect sizes to Pearson r correlation coefficient

Pearson r correlation coefficient is an effect size which is widely understood and frequently used. Converting various statistic values including t or F into Pearson r correlation coefficient may be advantageous because it facilitates interpretation. Also Cohen's d can be converted into r. Table 4 provides the conversion formula and a brief explanation.

Table 4

Conversion from various statistics to Perason r correlation coefficient association measures3

Statistic	Conversion formula	Comment
χ²df = 1	r=χ2df=1N	A single degree of freedom chi-square value divided by the number of cases
t	r=t2t2+df	From t value to r correlation coefficient
F	r=Fdf=1,_Fdf=1,_+dferror	From F value with single freedom numerator to r
Cohen's d	r=d2d2+4	From Cohen's d to r

6. Summary

Though p values give information on statistical significance, they are confounded with the sample size. Effect size can make up the weak point, by providing information on the actual effect which is independent of the sample size. Therefore, reporting the effect size as well as the p value is recommended.

1 in total

1. Using Effect Size-or Why the P Value Is Not Enough.

Authors: Gail M Sullivan; Richard Feinn
Journal: J Grad Med Educ Date: 2012-09

1 in total

23 in total

1. Individual cerebrocerebellar functional network analysis decoding symptomatologic dynamics of postoperative cerebellar mutism syndrome.

Authors: Ko-Ting Chen; Tsung-Ying Ho; Tiing-Yee Siow; Yu-Chiang Yeh; Sheng-Yao Huang
Journal: Cereb Cortex Commun Date: 2022-02-11

2. The effect of Nigella sativa on TAC and MDA in obese and overweight women: secondary analysis of a crossover, double blind, randomized clinical trial.

Authors: Nooshin Abdollahi; Azadeh Nadjarzadeh; Amin Salehi-Abargouei; Hossien Fallahzadeh; Elham Razmpoosh; Elnaz Lorzaedeh; Sara Safi
Journal: J Diabetes Metab Disord Date: 2022-02-07

3. Use of the reliable change index to evaluate the effect of a multicomponent exercise program on physical functions.

Authors: Haruhiko Sato; Masanori Wakida; Ryo Kubota; Takayuki Kuwabara; Kimihiko Mori; Tsuyoshi Asai; Yoshihiro Fukumoto; Jiro Nakano; Kimitaka Hase
Journal: Aging Clin Exp Res Date: 2022-09-03 Impact factor: 4.481

4. The effect of synbiotic supplementation on atherogenic indices, hs-CRP, and malondialdehyde, as major CVD-related parameters, in women with gestational diabetes mellitus: a secondary data-analysis of a randomized double-blind, placebo-controlled study.

Authors: Zohoor Nabhani; Elham Razmpoosh; Cain C T Clark; Nazanin Goudarzi; Alemeh Hariri Far
Journal: Diabetol Metab Syndr Date: 2022-06-21 Impact factor: 5.395

5. Recent advances in statistics.

Authors: Hae-Young Kim
Journal: J Periodontal Implant Sci Date: 2022-06 Impact factor: 2.086

6. Motivation and Lifestyle-Related Changes among Participants in a Healthy Life Centre: A 12-Month Observational Study.

Authors: Cille H Sevild; Christopher P Niemiec; Sindre M Dyrstad; Lars Edvin Bru
Journal: Int J Environ Res Public Health Date: 2022-04-24 Impact factor: 4.614

7. The effect of Nigella sativa supplementation on cardiovascular risk factors in obese and overweight women: a crossover, double-blind, placebo-controlled randomized clinical trial.

Authors: Elham Razmpoosh; Sara Safi; Azadeh Nadjarzadeh; Hossien Fallahzadeh; Nooshin Abdollahi; Mahta Mazaheri; Majid Nazari; Amin Salehi-Abargouei
Journal: Eur J Nutr Date: 2020-09-02 Impact factor: 5.614

8. Sitting Posture during Prolonged Computer Typing with and without a Wearable Biofeedback Sensor.

Authors: Yi-Liang Kuo; Kuo-Yuan Huang; Chieh-Yu Kao; Yi-Ju Tsai
Journal: Int J Environ Res Public Health Date: 2021-05-19 Impact factor: 3.390

9. Palaeolithic diet decreases fasting plasma leptin concentrations more than a diabetes diet in patients with type 2 diabetes: a randomised cross-over trial.

Authors: Maelán Fontes-Villalba; Staffan Lindeberg; Yvonne Granfeldt; Filip K Knop; Ashfaque A Memon; Pedro Carrera-Bastos; Óscar Picazo; Madhvi Chanrai; Jan Sunquist; Kristina Sundquist; Tommy Jönsson
Journal: Cardiovasc Diabetol Date: 2016-05-23 Impact factor: 9.951

10. Evidence for ACTN3 as a Speed Gene in Isolated Human Muscle Fibers.

Authors: Siacia Broos; Laurent Malisoux; Daniel Theisen; Ruud van Thienen; Monique Ramaekers; Cécile Jamart; Louise Deldicque; Martine A Thomis; Marc Francaux
Journal: PLoS One Date: 2016-03-01 Impact factor: 3.240