| Literature DB >> 22389720 |
Joseph F Mudge1, Leanne F Baker, Christopher B Edge, Jeff E Houlahan.
Abstract
Null hypothesis significance testing has been under attack in recent years, partly owing to the arbitrary nature of setting α (the decision-making threshold and probability of Type I error) at a constant value, usually 0.05. If the goal of null hypothesis testing is to present conclusions in which we have the highest possible confidence, then the only logical decision-making threshold is the value that minimizes the probability (or occasionally, cost) of making errors. Setting α to minimize the combination of Type I and Type II error at a critical effect size can easily be accomplished for traditional statistical tests by calculating the α associated with the minimum average of α and β at the critical effect size. This technique also has the flexibility to incorporate prior probabilities of null and alternate hypotheses and/or relative costs of Type I and Type II errors, if known. Using an optimal α results in stronger scientific inferences because it estimates and minimizes both Type I errors and relevant Type II errors for a test. It also results in greater transparency concerning assumptions about relevant effect size(s) and the relative costs of Type I and II errors. By contrast, the use of α = 0.05 results in arbitrary decisions about what effect sizes will likely be considered significant, if real, and results in arbitrary amounts of Type II error for meaningful potential effect sizes. We cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α.Entities:
Mesh:
Year: 2012 PMID: 22389720 PMCID: PMC3289673 DOI: 10.1371/journal.pone.0032734
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The non-linear relationship between α and β.
The relationship between α and β for an independent 2-sample, 2-tailed t-test with n1 = n2 = 10, and critical effect size = 1 σ.
Figure 2Determination of optimal α from the a priori combined probabilities of Type I and Type II error.
α and ω (the average of Type I and Type II error) for independent, 2-tailed, 2-sample t-tests (n1 = n2). Data are for 3 (dotted line), 10 (solid line), and 30 (double line) samples per group, with critical effect sizes of 1 SD of either group. Drop lines indicate the minimum average of Type I and Type II error and its associated value of α.
Figure 3The average of the probabilities of Type I and Type II error, ω (a) and the cost-weighted probability of errors, ωc (b).
The combined probabilities of Type I and Type II error, ω (a), and the cost-weighted probability of errors, ω c (b). The α level at i) minimizes average error (assuming a Type I/Type II error cost ratio of 1), while the α level at ii) minimizes the cost-weighted probability of errors at a Type I/Type II error cost ratio of 4.
Probabilities of Type I (α), Type II (β) and average error (ω), with corresponding test conclusions for large, medium and small effect sizes (δ) using standard α levels and by setting α to minimize combined probabilities of Type I and Type II error.
| Critical Effect Size | Choice of |
|
|
| Result |
| large ( | Standard | 0.05 | 0.493 | 0.272 | non-significant |
| Optimal | 0.191 | 0.212 | 0.202 | significant | |
| medium ( | Standard | 0.05 | 0.738 | 0.394 | non-significant |
| Optimal | 0.266 | 0.372 | 0.319 | significant | |
| small ( | Standard | 0.05 | 0.898 | 0.474 | non-significant |
| Optimal | 0.323 | 0.563 | 0.443 | significant |
p-value used for significance testing is 0.14 [10].
Probabilities are calculated for a two-sample t-test (two-tailed) with n1 = 3, n2 = 8, and σ p = 17.27, from [10].
Probabilities of Type I (α), Type II (β) and average error (ω), with corresponding test conclusions for large, medium and small effect sizes (δ) using standard α levels and by setting α to minimize combined probabilities of Type I and Type II error.
| Critical Effect Size | Choice of |
|
|
| Result |
| large (R2≥0.75) | Standard | 0.05 | 7.37*10−11 | 0.0250 | significant |
| Optimal | 0.0000286 | 0.0000266 | 0.0000276 | non-significant | |
| medium (R2≥0.5) | Standard | 0.05 | 0.000136 | 0.0251 | significant |
| Optimal | 0.00378 | 0.00384 | 0.00381 | significant | |
| small (R2≥0.25) | Standard | 0.05 | 0.0635 | 0.0568 | significant |
| Optimal | 0.0531 | 0.0603 | 0.0567 | significant |
p-value used for significance testing is 0.0012 [11].
Probabilities are calculated for a simple linear regression with N = 43, from [11].
Probabilities of Type I (α), Type II (β) cost-weighted average error (ω c), and average error (ω), with corresponding test conclusions for Type I/Type II error cost ratios of 4, 1, and 0.25 using standard α levels and by setting α to minimize cost-weighted average of probabilities of Type I and Type II error.
| Type I/Type II error cost ratio | Choice of |
|
|
|
| Result |
| 4 | Standard | 0.05 | 0.00213 | 0.0404 | 0.0261 | significant |
| Optimal | 0.00658 | 0.0274 | 0.0107 | 0.0170 | non-significant | |
| 1 | Standard | 0.05 | 0.00213 | 0.0261 | 0.0261 | significant |
| Optimal | 0.0142 | 0.0121 | 0.0132 | 0.0132 | non-significant | |
| 0.25 | Standard | 0.05 | 0.00213 | 0.0117 | 0.0261 | significant |
| Optimal | 0.0282 | 0.00506 | 0.00967 | 0.0166 | significant |
p-value used for significance testing is 0.02495 [12].
Probabilities are calculated for a one-way ANOVA with N = 30, k = 3, and σ p (within groups) = 3.4, and critical effect size = σ p (within groups) from [12].