| Literature DB >> 27077017 |
Chris H J Hartgerink1, Robbie C M van Aert1, Michèle B Nuijten1, Jelte M Wicherts1, Marcel A L M van Assen2.
Abstract
Previous studies provided mixed findings on pecularities in p-value distributions in psychology. This paper examined 258,050 test results across 30,710 articles from eight high impact journals to investigate the existence of a peculiar prevalence of p-values just below .05 (i.e., a bump) in the psychological literature, and a potential increase thereof over time. We indeed found evidence for a bump just below .05 in the distribution of exactly reported p-values in the journals Developmental Psychology, Journal of Applied Psychology, and Journal of Personality and Social Psychology, but the bump did not increase over the years and disappeared when using recalculated p-values. We found clear and direct evidence for the QRP "incorrect rounding of p-value" (John, Loewenstein & Prelec, 2012) in all psychology journals. Finally, we also investigated monotonic excess of p-values, an effect of certain QRPs that has been neglected in previous research, and developed two measures to detect this by modeling the distributions of statistically significant p-values. Using simulations and applying the two measures to the retrieved test results, we argue that, although one of the measures suggests the use of QRPs in psychology, it is difficult to draw general conclusions concerning QRPs based on modeling of p-value distributions.Entities:
Keywords: Caliper test; Data peeking; NHST; QRP; p-values
Year: 2016 PMID: 27077017 PMCID: PMC4830257 DOI: 10.7717/peerj.1935
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Distributions of 20 million p-values each, when Cohen’s standardized effect size d = 0 (bump; A), d = .2 (bump; B), and d = .5 (monotonic excess; C), given data peeking (solid) or no data peeking (dashed).
Simulations were run for two-sample t-tests with n = 24. For data peeking, a maximum of three rounds of additional sampling occurred if the result was nonsignificant, with each round adding 1∕3 of the original sample size.
Articles downloaded, articles with extracted results in American Psychological Association (APA) style, and number of extracted APA test results per journal.
| Journal | Acronym | Timespan | Articles downloaded | Articles with extracted results (%) | APA results extracted |
|---|---|---|---|---|---|
| Developmental Psychology | DP | 1985–2013 | 3,381 | 2,607 (77%) | 37,658 |
| Frontiers in Psychology | FP | 2010–2013 | 2,126 | 702 (33%) | 10,149 |
| Journal of Applied Psychology | JAP | 1985–2013 | 2,782 | 1,638 (59%) | 15,134 |
| Journal of Consulting and Clinical Psychology | JCCP | 1985–2013 | 3,519 | 2,413 (69%) | 27,429 |
| Journal of Experimental Psychology General | JEPG | 1985–2013 | 1,184 | 821 (69%) | 18,921 |
| Journal of Personality and Social Psychology | JPSP | 1985–2013 | 5,108 | 4,346 (85%) | 101,621 |
| Public Library of Science | PLOS | 2000–2013 | 10,303 | 2,487 (24%) | 31,539 |
| Psychological Science | PS | 2003–2013 | 2,307 | 1,681 (73%) | 15,654 |
Composition of extracted APA test results with respect to exact and inexact reporting of p-values or test statistics.
| Exact test statistic | Inexact test statistic | ||
|---|---|---|---|
| Exact | 68,776 | 274 | |
| Inexact | 187,617 | 1,383 | |
Figure 2Distributions of all reported p-values (white) and exactly reported p-values (blue) across eight psychology journals.
Binwidth = .00125.
Caliper test for exactly reported p-values per journal for different binwidths.
| Binwidth | 0.00125 | 0.0025 | 0.005 | 0.01 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All | ||||||||||||||||
| DP | ||||||||||||||||
| FP | 96 | 193 | 0.497 | 0.557 | 105 | 227 | 0.463 | 0.884 | 141 | 304 | 0.464 | 0.906 | 215 | 458 | 0.469 | 0.912 |
| JAP | 85 | 154 | 0.552 | 0.113 | 101 | 183 | 0.552 | 0.092 | ||||||||
| JCCP | 246 | 517 | 0.476 | 0.874 | 267 | 562 | 0.475 | 0.889 | 308 | 641 | 0.48 | 0.848 | 395 | 823 | 0.48 | 0.882 |
| JEPG | 147 | 285 | 0.516 | 0.318 | 159 | 310 | 0.513 | 0.346 | 195 | 375 | 0.52 | 0.235 | 258 | 509 | 0.507 | 0.395 |
| JPSP | ||||||||||||||||
| PLOS | 307 | 649 | 0.473 | 0.921 | 366 | 760 | 0.482 | 0.854 | 489 | 1,000 | 0.489 | 0.766 | 744 | 1,558 | 0.478 | 0.964 |
| PS | 237 | 497 | 0.477 | 0.859 | 256 | 539 | 0.475 | 0.886 | 299 | 652 | 0.459 | 0.984 | 418 | 886 | 0.472 | 0.957 |
Notes.
frequency of p-values in .05 minus binwidth through .05
total frequency of p-values across both intervals in the comparison; Pr, x∕N
p-value of the binomial test
Significant results (α = .05, one-tailed) indicating excess of p-values just below .05 and are reported in bold.
Figure 3Distribution of recalculated p-values where the p-value is reported as p = .05.
9.7% of the results fall outside the range of the plot, with 3.6% at the left tail and 6.1% at the right tail. Binwidth = .00125
Figure 4Recalculated p-values for exactly reported test statistics (white bars), and recalculated p-values for exactly reported test statistics where p-values are also exactly reported (blue bars).
Binwidth = .00125
Caliper test for exactly recalculated p-values per journal for different binwidths.
| Binwidth | 0.00125 | 0.0025 | 0.005 | 0.01 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All | 1,404 | 2,808 | 0.5 | 0.508 | 2,808 | 5,761 | 0.487 | 0.973 | 5,761 | 11,824 | 0.487 | 0.997 | 11,824 | 25,142 | 0.47 | >.999 |
| DP | 184 | 382 | 0.482 | 0.779 | 382 | 829 | 0.461 | 0.989 | 829 | 1,710 | 0.485 | 0.9 | 1,710 | 3,579 | 0.478 | 0.996 |
| FP | 30 | 69 | 0.435 | 0.886 | 69 | 172 | 0.401 | 0.996 | 172 | 376 | 0.457 | 0.956 | 376 | 799 | 0.471 | 0.955 |
| JAP | 73 | 145 | 0.503 | 0.5 | 145 | 270 | 0.537 | 0.124 | 270 | 556 | 0.486 | 0.765 | 556 | 1,168 | 0.476 | 0.952 |
| JCCP | 160 | 308 | 0.519 | 0.265 | 308 | 633 | 0.487 | 0.763 | 633 | 1,267 | 0.5 | 0.522 | 1,267 | 2,706 | 0.468 | >.999 |
| JEPG | 81 | 164 | 0.494 | 0.593 | 164 | 332 | 0.494 | 0.608 | 332 | 683 | 0.486 | 0.778 | 683 | 1,535 | 0.445 | >.999 |
| JPSP | 640 | 1,268 | 0.505 | 0.379 | 1,268 | 2,557 | 0.496 | 0.668 | 2,557 | 5,174 | 0.494 | 0.802 | 5,174 | 10,976 | 0.471 | >.999 |
| PLOS | 125 | 260 | 0.481 | 0.752 | 260 | 541 | 0.481 | 0.828 | 541 | 1,170 | 0.462 | 0.995 | 1,170 | 2,544 | 0.46 | >.999 |
| PS | 111 | 212 | 0.524 | 0.268 | 212 | 427 | 0.496 | 0.577 | 427 | 888 | 0.481 | 0.88 | 888 | 1,835 | 0.484 | 0.919 |
Notes.
frequency of p-values in .05 minus binwidth through .05
total frequency of p-values across both intervals in the comparison, Pr, x∕N
p-value of the binomial test
Significant results (α = .05, one-tailed) indicating excess of p-values just below .05 and are reported in bold.
Caliper tests for exactly recalculated and exactly reported p-values per journal, including alternative binwidths.
| Binwidth | 0.00125 | 0.0025 | 0.005 | 0.01 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All | 809 | 1,617 | 0.5 | 0.5 | 1,617 | 3,403 | 0.475 | 0.998 | 3,403 | 7,402 | 0.46 | 1 | ||||
| DP | 46 | 87 | 0.529 | 0.334 | 87 | 185 | 0.47 | 0.811 | 185 | 358 | 0.517 | 0.281 | 358 | 756 | 0.474 | 0.932 |
| FP | 15 | 27 | 0.556 | 0.351 | 27 | 87 | 0.31 | >.999 | 87 | 192 | 0.453 | 0.915 | 192 | 437 | 0.439 | 0.995 |
| JAP | 8 | 20 | 0.4 | 0.868 | 29 | 65 | 0.446 | 0.839 | 65 | 141 | 0.461 | 0.844 | ||||
| JCCP | 43 | 78 | 0.551 | 0.214 | 78 | 161 | 0.484 | 0.682 | 161 | 364 | 0.442 | 0.988 | 364 | 780 | 0.467 | 0.971 |
| JEPG | 27 | 50 | 0.54 | 0.336 | 50 | 98 | 0.51 | 0.46 | 98 | 209 | 0.469 | 0.834 | 209 | 479 | 0.436 | 0.998 |
| JPSP | 547 | 1,117 | 0.49 | 0.764 | 1,117 | 2,451 | 0.456 | >.999 | ||||||||
| PLOS | 76 | 149 | 0.51 | 0.435 | 149 | 323 | 0.461 | 0.926 | 323 | 698 | 0.463 | 0.978 | 698 | 1,470 | 0.475 | 0.975 |
| PS | 57 | 93 | 0.613 | 0.019 | 93 | 187 | 0.497 | 0.558 | 187 | 400 | 0.468 | 0.912 | 400 | 888 | 0.45 | 0.999 |
Notes.
frequency of p-values in .05 minus binwidth through .05
total frequency of p-values across both intervals in the comparison, Pr, x∕N
p-value of the binomial test
Significant results (α = .05, one-tailed) indicating excess of p-values just below .05 and are reported in bold.
Linear regression coefficients as a test of increasing excess of p-values just below .05.
Intercept indicates the degree of excess for the first year of the estimated timespan (>0= excess).
| Timespan | Coefficient | Estimate | ||||
|---|---|---|---|---|---|---|
| All | 1985–2013 | Intercept | 0.007 | 0.017 | 0.392 | 0.698 |
| All | Years (centered) | −0.001 | 0.001 | −0.492 | 0.627 | |
| DP | 1985–2013 | Intercept | −0.043 | 0.056 | −0.769 | 0.448 |
| DP | Years (centered) | 0.001 | 0.003 | 0.193 | 0.849 | |
| FP | 2010–2013 | Intercept | −0.182 | 0.148 | −1.233 | 0.343 |
| FP | Years (centered) | 0.055 | 0.079 | 0.694 | 0.560 | |
| JAP | 1985–2013 | Intercept | 0.041 | 0.081 | 0.504 | 0.619 |
| JAP | Years (centered) | −0.001 | 0.005 | −0.208 | 0.837 | |
| JCCP | 1985–2013 | Intercept | 0.077 | 0.058 | 1.315 | 0.200 |
| JCCP | Years (centered) | −0.006 | 0.004 | −1.546 | 0.134 | |
| JEPG | 1985–2013 | Intercept | −0.022 | 0.124 | −0.176 | 0.862 |
| JEPG | Years (centered) | 0.001 | 0.007 | 0.097 | 0.924 | |
| JPSP | 1985–2013 | Intercept | −0.002 | 0.027 | −0.062 | 0.951 |
| JPSP | Years (centered) | 0.000 | 0.002 | −0.005 | 0.996 | |
| PLOS | 2006–2013 | Intercept | ||||
| PLOS | Years (centered) | |||||
| PS | 2003–2013 | Intercept | 0.081 | 0.078 | 1.045 | 0.323 |
| PS | Years (centered) | −0.009 | 0.013 | −0.669 | 0.520 |
Notes.
Significant results (α = .05, two-tailed) are reported in bold.
Results of parameter estimation of the distribution of effect sizes and measures of data peeking as a function of population effect size (δ, ρ), population heterogeneity (τ), and data peeking, for the simulated data.
Results are based on all p-values 0–1, p-values ≤.05, and ≤.00125.
| Without data peeking | 0–1 | 0 | 0.103 | 0.258 | 0.413 | 0 | 0.103 | 0.258 | 0.413 | |
| 0 | 0 | 0 | 0 | 0.077 | 0.077 | 0.077 | 0.077 | |||
| 0–.05 | 0 | 0.103 | 0.258 | 0.413 | 0 | 0.103 | 0.258 | 0.413 | ||
| 0 | 0 | 0 | 0.001 | 0.077 | 0.077 | 0.077 | 0.077 | |||
| Misfit | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0–.00125 | 0 | 0.103 | 0.258 | 0.413 | 0.1 | 0.107 | 0.259 | 0.413 | ||
| 0 | 0 | 0 | 0.001 | 0.025 | 0.076 | 0.077 | 0.077 | |||
| Misfit | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 1 | 1 | 1 | 1 | 1.205 | 1.006 | 1.003 | 1.001 | |||
| With data peeking | 0–.05 | 0 | 0 | 0.117 | 0.345 | 0 | 0 | 0.075 | 0.360 | |
| 0 | 0 | 0 | 0.038 | 0 | 0.055 | 0.137 | 0.091 | |||
| Misfit | ||||||||||
| 759,812 | 811,296 | 936,517 | 994,974 | 434,660 | 525,023 | 707,650 | 889,681 | |||
| 0–.00125 | 0 | 0.075 | 0.218 | 0.366 | 0.066 | 0.161 | 0.283 | 0.402 | ||
| 0 | 0 | 0 | 0 | 0.036 | 0 | 0 | 0.012 | |||
| Misfit | 6.9 | 3.2 | 7.1 | 2 | 1.9 | 2.6 | 2.1 | |||
| 9,729 | 21,576 | 95,615 | 350,482 | 14,791 | 34,530 | 124,991 | 366,875 | |||
| 1.977 | 1.976 | 1.835 | 1.166 | 1.628 | 1.620 | 1.472 | 1.164 | |||
Notes.
, estimated population effect; , estimated population heterogeneity; misfit 0–.05; misfit of estimates based on p-values 0–.05, misfit 0–.00125, misfit of estimates based on p-values 0–.00125 (bold indicates p < .05); N, number of results included in estimation; D, comparison of observed- and expected p-value frequencies.
Figure 5Observed proportions of p-values (circles) and expected proportions of p-values based on and estimated from 0–.00125 (crosses).