| Literature DB >> 28649206 |
Joachim I Krueger1, Patrick R Heck1.
Abstract
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.Entities:
Keywords: Bayes’ theorem; NHST; null hypotheses; replicability; reverse inference; statistical significance testing
Year: 2017 PMID: 28649206 PMCID: PMC5465841 DOI: 10.3389/fpsyg.2017.00908
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Crossed proportions of conditional probability terms (p < 0.05).
| p(H|D) ≤ 0.50 | p(H|D) > 0.50 | |
|---|---|---|
| p(D|H) ≤ 0.05 | 8.080 | 1.937 |
| p(D|H) > 0.05 | 42.030 | 47.953 |
| p(D|H) ≤ 0.01 | 1.89 | 0.14 |
| p(D|H) > 0.01 | 48.38 | 49.59 |
| p(D|H) ≤ 0.001 | 0.22 | 0.00002 |
| p(D|H) > 0.001 | 49.40 | 50.38 |
Positive correlation between p(H) and p(D|H).
| FA ratio | Miss ratio | Phi | ||
|---|---|---|---|---|
| 0 | 0.267 | 0.200 | 0.465 | 0.201 |
| 0.1 | 0.343 | 0.157 | 0.460 | 0.229 |
| 0.2 | 0.415 | 0.120 | 0.449 | 0.260 |
| 0.3 | 0.494 | 0.092 | 0.444 | 0.278 |
| 0.4 | 0.565 | 0.063 | 0.436 | 0.302 |
| 0.5 | 0.628 | 0.046 | 0.430 | 0.313 |
| 0.6 | 0.698 | 0.031 | 0.425 | 0.327 |
| 0.7 | 0.760 | 0.018 | 0.416 | 0.338 |
| 0.8 | 0.826 | 0.008 | 0.411 | 0.349 |
| 0.9 | 0.891 | 0.003 | 0.405 | 0.356 |
Negative correlation between p(D|H) and p(D|∼H).
| FA ratio | Miss ratio | Phi | ||
|---|---|---|---|---|
| 0 | 0.260 | 0.198 | 0.468 | 0.199 |
| –0.1 | 0.287 | 0.181 | 0.464 | 0.213 |
| –0.2 | 0.311 | 0.165 | 0.462 | 0.225 |
| –0.3 | 0.345 | 0.144 | 0.462 | 0.236 |
| –0.4 | 0.363 | 0.144 | 0.463 | 0.234 |
| –0.5 | 0.390 | 0.135 | 0.461 | 0.242 |
| –0.6 | 0.411 | 0.132 | 0.461 | 0.245 |
| –0.7 | 0.437 | 0.126 | 0.459 | 0.249 |
| –0.8 | 0.461 | 0.123 | 0.463 | 0.248 |
| –0.9 | 0.492 | 0.125 | 0.456 | 0.253 |
Correlations for a simulation varying sampling proportion from 0.1 to 0.9, effect size from 0.01 to 1.0, and p(H) from 0.01 to 0.99.
| Sampling proportion | δ | p(H) | p(∼H) | p(D|H) | p(D|∼H) | p(H|D) | Updated p(H|D) | |
|---|---|---|---|---|---|---|---|---|
| δ | 0.000 | – | ||||||
| p(H) | 0.000 | 0.000 | – | |||||
| p(∼H) | 0.000 | 0.000 | –1.000 | – | ||||
| p(D|H) | 0.564 | –0.642 | 0.000 | 0.000 | – | |||
| p(D|∼H) | –0.577 | –0.636 | 0.000 | 0.000 | 0.200 | – | ||
| p(H|D) | 0.713 | –0.002 | 0.394 | –0.394 | –0.400 | – | ||
| Updated p(H|D) | 0.767 | 0.000 | 0.279 | –0.279 | –0.444 | 0.969 | – | |
| Sample mean | –0.634 | 0.673 | 0.000 | 0.000 | –0.800 | –0.054 | –0.593 | –0.601 |
Varying sample size and effect size.
| δ | Mdn p | r(p(D|H),p(H|D)) | FA ratio | Miss ratio | Phi | |||
|---|---|---|---|---|---|---|---|---|
| 0.2 | 20 | 0.321 | 0.156 | 0.024 | 0.025 | 0.000 | 0.503 | 0.000 |
| 50 | 0.239 | 0.340 | 0.116 | 0.118 | 0.192 | 0.496 | 0.088 | |
| 100 | 0.157 | 0.552 | 0.305 | 0.319 | 0.162 | 0.429 | 0.316 | |
| 200 | 0.079 | 0.743 | 0.552 | 0.644 | 0.106 | 0.222 | 0.662 | |
| 0.5 | 20 | 0.134 | 0.643 | 0.414 | 0.445 | 0.147 | 0.340 | 0.476 |
| 50 | 0.032 | 0.761 | 0.579 | 0.747 | 0.134 | 0.078 | 0.786 | |
| 100 | 0.006 | 0.651 | 0.424 | 0.650 | 0.261 | 0.000 | 0.691 | |
| 200 | 0.000 | 0.519 | 0.270 | 0.400 | 0.340 | 0.000 | 0.557 | |
| 0.8 | 20 | 0.032 | 0.759 | 0.577 | 0.742 | 0.172 | 0.052 | 0.764 |
| 50 | 0.002 | 0.584 | 0.341 | 0.506 | 0.285 | 0.000 | 0.644 | |
| 100 | 0.000 | 0.482 | 0.232 | 0.331 | 0.369 | 0.000 | 0.507 | |
| 200 | 0.000 | 0.374 | 0.140 | 0.203 | 0.420 | 0.000 | 0.404 |
Varying sample size and effect size.
| δ | Mdn p | r(p(D|H),p(H|D)) | FA ratio | Miss ratio | Phi | |||
|---|---|---|---|---|---|---|---|---|
| 0.2 | 20 | 0.321 | 0.300 | 0.090 | 0.091 | 0.000 | 0.507 | 0.000 |
| 50 | 0.239 | 0.583 | 0.340 | 0.348 | 0.051 | 0.494 | 0.128 | |
| 100 | 0.157 | 0.785 | 0.617 | 0.655 | 0.035 | 0.403 | 0.433 | |
| 200 | 0.079 | 0.820 | 0.672 | 0.845 | 0.026 | 0.158 | 0.804 | |
| 0.5 | 20 | 0.134 | 0.826 | 0.682 | 0.762 | 0.031 | 0.287 | 0.632 |
| 50 | 0.032 | 0.772 | 0.595 | 0.840 | 0.079 | 0.020 | 0.899 | |
| 100 | 0.006 | 0.632 | 0.400 | 0.629 | 0.260 | 0.000 | 0.692 | |
| 200 | 0.000 | 0.507 | 0.257 | 0.382 | 0.344 | 0.000 | 0.554 | |
| 0.8 | 20 | 0.032 | 0.767 | 0.588 | 0.817 | 0.132 | 0.009 | 0.846 |
| 50 | 0.002 | 0.569 | 0.323 | 0.484 | 0.285 | 0.000 | 0.644 | |
| 100 | 0.000 | 0.478 | 0.228 | 0.325 | 0.364 | 0.000 | 0.511 | |
| 200 | 0.000 | 0.370 | 0.137 | 0.199 | 0.422 | 0.000 | 0.403 |