| Literature DB >> 26246976 |
Abstract
In recent years, researchers have attempted to provide an indication of the prevalence of inflated Type 1 error rates by analyzing the distribution of p-values in the published literature. De Winter & Dodou (2015) analyzed the distribution (and its change over time) of a large number of p-values automatically extracted from abstracts in the scientific literature. They concluded there is a 'surge of p-values between 0.041-0.049 in recent decades' which 'suggests (but does not prove) questionable research practices have increased over the past 25 years.' I show the changes in the ratio of fractions of p-values between 0.041-0.049 over the years are better explained by assuming the average power has decreased over time. Furthermore, I propose that their observation that p-values just below 0.05 increase more strongly than p-values above 0.05 can be explained by an increase in publication bias (or the file drawer effect) over the years (cf. Fanelli, 2012; Pautasso, 2010, which has led to a relative decrease of 'marginally significant' p-values in abstracts in the literature (instead of an increase in p-values just below 0.05). I explain why researchers analyzing large numbers of p-values need to relate their assumptions to a model of p-value distributions that takes into account the average power of the performed studies, the ratio of true positives to false positives in the literature, the effects of publication bias, and the Type 1 error rate (and possible mechanisms through which it has inflated). Finally, I discuss why publication bias and underpowered studies might be a bigger problem for science than inflated Type 1 error rates, and explain the challenges when attempting to draw conclusions about inflated Type 1 error rates from a large heterogeneous set of p-values.Entities:
Keywords: False positives; Publication bias; Statistics; p-curve; p-value
Year: 2015 PMID: 26246976 PMCID: PMC4525697 DOI: 10.7717/peerj.1142
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Expected percentage of p-values between 0.001–0.049 based on 42% and 55% power.
Note that columns do not sum to 0.55 and 0.42 because some p-values are not included in the analysis (e.g., p-values between 0.049–0.050).
| Expected | Expected | |
|---|---|---|
| 0.300 | 0.199 | |
| 0.085 | 0.072 | |
| 0.056 | 0.051 | |
| 0.042 | 0.034 | |
| 0.034 | 0.033 |
Percentage of papers that report p-values in abstracts, and the number of reconstructed and observed (De Winter & Dodou, 2015) p-values between 0.001–0.049 in 1990 and 2013 for each bin.
| % | % | Reconstructed # | Reconstructed # | Observed # | Observed # | |
|---|---|---|---|---|---|---|
| 0.01 | 0.1 | 1,676 | 46,064 | 1,770 | 44,970 | |
| 0.01 | 0.1 | 480 | 16,690 | 462 | 14,885 | |
| 0.01 | 0.1 | 315 | 11,698 | 268 | 10,630 | |
| 0.01 | 0.1 | 237 | 9,189 | 240 | 9,108 | |
| 0.01 | 0.1 | 190 | 7,629 | 178 | 8,250 |
Ratio of fractions of reconstructed p-values and p-value ratios observed by De Winter & Dodou (2015) between 0.001–0.049 for 1990 and 2013.
| Reconstructed fraction | Reconstructed fraction | Reconstructed 1990/2013 ratio of fractions | Observed fraction | Observed fraction | Observed 1990/2013 ratio of fraction | |
|---|---|---|---|---|---|---|
| 0.001–0.009 | 0.299 | 1.993 | 6.674 | 0.315 | 1.945 | 6.17 |
| 0.011–0.019 | 0.085 | 0.722 | 8.454 | 0.082 | 0.644 | 7.83 |
| 0.021–0.029 | 0.056 | 0.506 | 9.017 | 0.048 | 0.460 | 9.63 |
| 0.031–0.039 | 0.042 | 0.398 | 9.417 | 0.043 | 0.394 | 9.21 |
| 0.041–0.049 | 0.034 | 0.330 | 9.740 | 0.032 | 0.367 | 11.28 |
Ratios of fractions of the percentage of p-values in abstracts for 2013 relative to the percentages in 23 preceding years, for the 5 p-value bins.
| Year compared to 2013 | Year/2013 ratio of fractions per | ||||
|---|---|---|---|---|---|
| 0.001–0.009 | 0.011–0.019 | 0.021–0.029 | 0.031–0.039 | 0.041–0.049 | |
| 1990 | 6.17 | 7.83 | 9.63 | 9.22 | 11.26 |
| 1991 | 5.10 | 6.81 | 6.90 | 8.02 | 9.19 |
| 1992 | 4.07 | 5.04 | 5.72 | 6.30 | 6.55 |
| 1993 | 3.03 | 4.06 | 4.39 | 5.01 | 4.92 |
| 1994 | 2.56 | 3.26 | 3.61 | 4.28 | 4.31 |
| 1995 | 2.10 | 2.72 | 2.98 | 3.22 | 3.36 |
| 1996 | 1.62 | 2.23 | 2.26 | 2.48 | 2.52 |
| 1997 | 1.42 | 1.84 | 2.15 | 2.19 | 2.16 |
| 1998 | 1.26 | 1.76 | 1.85 | 2.15 | 1.95 |
| 1999 | 1.12 | 1.52 | 1.64 | 1.70 | 1.70 |
| 2000 | 1.02 | 1.29 | 1.39 | 1.42 | 1.50 |
| 2001 | 0.95 | 1.19 | 1.27 | 1.34 | 1.25 |
| 2002 | 0.88 | 1.08 | 1.09 | 1.17 | 1.15 |
| 2003 | 0.77 | 0.88 | 0.90 | 0.97 | 0.98 |
| 2004 | 0.67 | 0.76 | 0.78 | 0.81 | 0.81 |
| 2005 | 0.57 | 0.64 | 0.66 | 0.67 | 0.67 |
| 2006 | 0.52 | 0.57 | 0.59 | 0.60 | 0.62 |
| 2007 | 0.47 | 0.51 | 0.51 | 0.54 | 0.54 |
| 2008 | 0.42 | 0.45 | 0.46 | 0.45 | 0.48 |
| 2009 | 0.39 | 0.42 | 0.41 | 0.43 | 0.42 |
| 2010 | 0.36 | 0.38 | 0.38 | 0.40 | 0.39 |
| 2011 | 0.31 | 0.33 | 0.32 | 0.33 | 0.33 |
| 2012 | 0.27 | 0.28 | 0.27 | 0.27 | 0.28 |
Type 1 error rates, absolute number of reconstructed Type 1 errors between 0.001–0.049 from 1990 to 2013, and their ratio.
| Type 1 error rate 1990 | Type 1 error rate 2013 | Significant | Significant | Reconstructed 1990/2013 ratio of fractions | |
|---|---|---|---|---|---|
| 0.008 | 0.009 | 1,899 | 50,338 | 6.44 | |
| 0.008 | 0.014 | 581 | 17,072 | 7.14 | |
| 0.008 | 0.018 | 408 | 13,706 | 8.16 | |
| 0.008 | 0.022 | 327 | 12,755 | 9.47 | |
| 0.008 | 0.028 | 279 | 13,262 | 11.56 |