Literature DB >> 26587420

Statistical notes for clinical researchers: effect size.

Hae-Young Kim1.   

Abstract

Year:  2015        PMID: 26587420      PMCID: PMC4650530          DOI: 10.5395/rde.2015.40.4.328

Source DB:  PubMed          Journal:  Restor Dent Endod        ISSN: 2234-7658


× No keyword cloud information.
In most clinical studies, p value is the final result of data analysis. A small p value is interpreted as a significant difference between the experimental group and the control group. However, reporting p value is not enough to know the actual difference. Problem of p value is that it depends on the sample size, n. Even a trivial meaningless difference can result in an extremely small p value when sample size is large. To make up this weak point, we need to report the 'effect size' as well as the p value. Effect size is a simple way to show the actual difference, which is independent of the sample size.

1. Reporting p value is not enough

In statistical testing we set a null hypothesis first and calculate the test statistic such as t values under an assumption of the null hypothesis. Finally, a p value is obtained which represents the probability of observing the current data due to chance when the null hypothesis is true. In most scientific articles, we usually make conclusion based on p values compared to the alpha error level chosen, e.g., 0.05. A smaller p value than alpha level is interpreted as a statistical significance. However, there are serious problems in relying on the p value only. First, depending on the sample size, a wide range of p values can be obtained with the same size of difference, which can lead to contradictory results: either statistically significant or insignificant conclusions. Examples 1 and 2 in Table 1 have the same trivial difference of 3 between before and after treatments, assuming a clinically meaningful difference as 10. Two results were contradictory: statistically significant (p = 0.001, Example 2) and insignificant (p = 0.382, Example 1) depending on whether the sample size is large (n =10,000) or small (n =100). Moreover, as appeared in Example 2, it is a serious problem that clinically meaningless condition is concluded as statistically significant. The treatment in example 2 is clinically insignificant but statistically significant! What would you reasonably conclude on this case? This is a problem caused by using inappropriately large sample sizes.
Table 1

Examples of results of significant testing using p value and comparative effect size

ExampleBeforeAfterSD*Diff.nt valuep valueEffect sizeCharacteristics
114514210031000.3=3100/1000.3820.03=3100Trivial effect & insignificant
2145142100310,0003=3100/10,0000.0010.03=3100Trivial effect & significant
3145115100301003=3100/1000.0010.3=30100Substantial effect & significant

*SD, standard deviation.

Second, the information provided by the size of p value is confusing, because it is confounded by the sample size. We may expect that a small p value can tell us some information on how much difference exists between the observed data and the assumption of null hypothesis. However, the same size of p values can be obtained from quite different situations. Example 2 with a trivial effect and larger sample size and Example 3 with a substantial effect and smaller sample size both show the same p value 0.001 in Table 1. The result shows that p values are confounded with the sample size. Two problems above can be overcome by controlling the sample size. To avoid this discordant situation, sample size determination procedure must be performed in the design stage in an experimental study. We generally need to calculate appropriate sample size in consideration with difference, SD, alpha error and power in the study design stage. The conclusion of significance testing is reliable only when an appropriate sample size was applied in a study. When we analyze a survey data with a large sample size, we need to consider the effect of large sample size in the interpretation of the test results. Also the weakness of p value can be compensated by considering the effect size coincidently. As shown in Table 1, effect sizes exactly reflect the magnitude of actual effect, as displayed by 0.03 for a trivial difference and 0.3 for a substantial one.

2. What is effect size?

'Effect size' is simply a way of quantifying difference between compared groups, in other words, the actual effect.1 While a p value has an important meaning in statistical inference, an effect size is expressing a descriptive importance. In Table 1, the effect sizes were expressed as the difference between two group means divided by the standard deviation of the group. When we compared Example 2 and Example 3, their effect sizes are a quite different as 0.03 and 0.3, while their p values are the same. Let's suppose clinicians generally think a change of at least '10' is clinically meaningful while a change of 3 after treatment is negligible. Therefore, they would not apply the treatment for the small change 3, even though the statistical significance test concluded the treatment is effective based on highly significant p value. Contrarily, they would apply the treatment in Example 3 because they can expect a substantial change of '30', and the statistical test concluded its significance. The results show that effect size exactly reflects the actual difference or effect. Therefore, reporting both the p value and the effect size is necessary in order to consider both statistical significance and actual clinical significance.

3. Types of effect size

Generally, there are two types of common effect size indices: standardized difference between groups and measures of association between groups. Table 2 shows the types of effect size indices and general standards of small, medium, and large effect for each type of effect size.
Table 2

Common effect size indices2

IndexDescriptionStandardComment
Between groupsCohen's d or Glass's Δd or Δ = (Mean1 - Mean2) / SD*d: use pooled SDΔ: use SD of control groupSmall 0.2Medium 0.5Large 0.8Very large 1.3For continuous outcomes
Odds ratio (OR)OR = odds1 / odds2Small 1.5Medium 2Large 3Degree of association between binary outcomes
Relative risk or risk ratio (RR)RR = p1 / p2Small 2Medium 3Large 4For binary outcomes, ratio of two proportions
Measures of associationPearson's r correlationRange -1 to 1Small ± 0.2Medium ± 0.3Large ± 0.5Measures the degree of linear relationship
Pearson r correlation coefficientRange 0 to 1Small 0.04Medium 0.09Large 0.25Proportion of variance explained

*SD, standard deviation.

Between groups 1) Cohen's d or Glass's Δ: Defined by difference between two group means divided by standard deviation for continuous outcomes. Cohen's d is calculated by dividing pooled standard deviation under assumption of the equal variances while Glass's Δ is obtained by dividing the standard deviation of control group. 2) Odds ratio: Defined by ratio of odds of two compared groups for binary outcomes. 3) Relative ratio: Defined by ratio of proportions of two compared groups for binary outcomes. Measures of association 1) Pearson's r correlation: Effect size representing association of two variables. 2) Pearson r correlation coefficient: The amount of variation explained.

4. Interpretation of effect size

Then, how would we interpret the degree of effect size? An effect size is exactly equivalent to a Z score of a standard normal distribution. Assume that all data are normally distributed. If Cohen's d is calculated to be zero, it means that there is no mean difference between two comparative groups and the position of the mean of experimental group is exactly the same with the mean of control group. Therefore, 50% of observations in control group locate below the mean of experimental group (Table 3). The relative 'small' effect size '0.2' means the mean of experimental group is located at 0.2 standard deviation above the mean of control group. The Z score of 0.2 is at 58th percentile which have 58% of observations below in control group (Figure 1). Similarly, the Cohen's d values 0.5 and 0.8 locate at 69th and 79th percentile of the distribution of the control group, respectively.
Table 3

Interpretation of Cohen's d which represents a standardized difference [(Mean1 - Mean2) / SD*]13

Relative sizeEffect size% of control group below the mean of experimental group
0.050%
Small0.258%
Medium0.569%
Large0.879%
1.492%

*SD, standard deviation.

Figure 1

Distribution of control group (solid line) and experimental group (dotted line), and position of Cohen's d = 0.2.1

5. Conversion of effect sizes to Pearson r correlation coefficient

Pearson r correlation coefficient is an effect size which is widely understood and frequently used. Converting various statistic values including t or F into Pearson r correlation coefficient may be advantageous because it facilitates interpretation. Also Cohen's d can be converted into r. Table 4 provides the conversion formula and a brief explanation.
Table 4

Conversion from various statistics to Perason r correlation coefficient association measures3

StatisticConversion formulaComment
χ2df = 1r=χ2df=1NA single degree of freedom chi-square value divided by the number of cases
tr=t2t2+dfFrom t value to r correlation coefficient
Fr=Fdf=1,_Fdf=1,_+dferrorFrom F value with single freedom numerator to r
Cohen's dr=d2d2+4 From Cohen's d to r

6. Summary

Though p values give information on statistical significance, they are confounded with the sample size. Effect size can make up the weak point, by providing information on the actual effect which is independent of the sample size. Therefore, reporting the effect size as well as the p value is recommended.
  1 in total

1.  Using Effect Size-or Why the P Value Is Not Enough.

Authors:  Gail M Sullivan; Richard Feinn
Journal:  J Grad Med Educ       Date:  2012-09
  1 in total
  23 in total

1.  Individual cerebrocerebellar functional network analysis decoding symptomatologic dynamics of postoperative cerebellar mutism syndrome.

Authors:  Ko-Ting Chen; Tsung-Ying Ho; Tiing-Yee Siow; Yu-Chiang Yeh; Sheng-Yao Huang
Journal:  Cereb Cortex Commun       Date:  2022-02-11

2.  The effect of Nigella sativa on TAC and MDA in obese and overweight women: secondary analysis of a crossover, double blind, randomized clinical trial.

Authors:  Nooshin Abdollahi; Azadeh Nadjarzadeh; Amin Salehi-Abargouei; Hossien Fallahzadeh; Elham Razmpoosh; Elnaz Lorzaedeh; Sara Safi
Journal:  J Diabetes Metab Disord       Date:  2022-02-07

3.  Use of the reliable change index to evaluate the effect of a multicomponent exercise program on physical functions.

Authors:  Haruhiko Sato; Masanori Wakida; Ryo Kubota; Takayuki Kuwabara; Kimihiko Mori; Tsuyoshi Asai; Yoshihiro Fukumoto; Jiro Nakano; Kimitaka Hase
Journal:  Aging Clin Exp Res       Date:  2022-09-03       Impact factor: 4.481

4.  The effect of synbiotic supplementation on atherogenic indices, hs-CRP, and malondialdehyde, as major CVD-related parameters, in women with gestational diabetes mellitus: a secondary data-analysis of a randomized double-blind, placebo-controlled study.

Authors:  Zohoor Nabhani; Elham Razmpoosh; Cain C T Clark; Nazanin Goudarzi; Alemeh Hariri Far
Journal:  Diabetol Metab Syndr       Date:  2022-06-21       Impact factor: 5.395

5.  Recent advances in statistics.

Authors:  Hae-Young Kim
Journal:  J Periodontal Implant Sci       Date:  2022-06       Impact factor: 2.086

6.  Motivation and Lifestyle-Related Changes among Participants in a Healthy Life Centre: A 12-Month Observational Study.

Authors:  Cille H Sevild; Christopher P Niemiec; Sindre M Dyrstad; Lars Edvin Bru
Journal:  Int J Environ Res Public Health       Date:  2022-04-24       Impact factor: 4.614

7.  The effect of Nigella sativa supplementation on cardiovascular risk factors in obese and overweight women: a crossover, double-blind, placebo-controlled randomized clinical trial.

Authors:  Elham Razmpoosh; Sara Safi; Azadeh Nadjarzadeh; Hossien Fallahzadeh; Nooshin Abdollahi; Mahta Mazaheri; Majid Nazari; Amin Salehi-Abargouei
Journal:  Eur J Nutr       Date:  2020-09-02       Impact factor: 5.614

8.  Sitting Posture during Prolonged Computer Typing with and without a Wearable Biofeedback Sensor.

Authors:  Yi-Liang Kuo; Kuo-Yuan Huang; Chieh-Yu Kao; Yi-Ju Tsai
Journal:  Int J Environ Res Public Health       Date:  2021-05-19       Impact factor: 3.390

9.  Palaeolithic diet decreases fasting plasma leptin concentrations more than a diabetes diet in patients with type 2 diabetes: a randomised cross-over trial.

Authors:  Maelán Fontes-Villalba; Staffan Lindeberg; Yvonne Granfeldt; Filip K Knop; Ashfaque A Memon; Pedro Carrera-Bastos; Óscar Picazo; Madhvi Chanrai; Jan Sunquist; Kristina Sundquist; Tommy Jönsson
Journal:  Cardiovasc Diabetol       Date:  2016-05-23       Impact factor: 9.951

10.  Evidence for ACTN3 as a Speed Gene in Isolated Human Muscle Fibers.

Authors:  Siacia Broos; Laurent Malisoux; Daniel Theisen; Ruud van Thienen; Monique Ramaekers; Cécile Jamart; Louise Deldicque; Martine A Thomis; Marc Francaux
Journal:  PLoS One       Date:  2016-03-01       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.