| Literature DB >> 26919473 |
Alexander Etz1, Joachim Vandekerckhove2,3.
Abstract
We revisit the results of the recent Reproducibility Project: Psychology by the Open Science Collaboration. We compute Bayes factors-a quantity that can be used to express comparative evidence for an hypothesis but also for the null hypothesis-for a large subset (N = 72) of the original papers and their corresponding replication attempts. In our computation, we take into account the likely scenario that publication bias had distorted the originally published results. Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak (i.e., Bayes factor < 10). The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication, and no replication attempts provided strong evidence in favor of the null. In all cases where the original paper provided strong evidence but the replication did not (15%), the sample size in the replication was smaller than the original. Where the replication provided strong evidence but the original did not (10%), the replication sample size was larger. We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature. We further conclude that traditional sample sizes are insufficient and that a more widespread adoption of Bayesian methods is desirable.Entities:
Mesh:
Year: 2016 PMID: 26919473 PMCID: PMC4769355 DOI: 10.1371/journal.pone.0149794
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Descriptive labels for certain Bayes factors.
| Label |
| |
|---|---|---|
| Strongly support | 10 | 91% |
| Weakly support | 3 | 75% |
| Ambiguous information | 1 | 50% |
| Weakly support |
| 25% |
| Strongly support |
| 9% |
a: is the posterior probability of assuming prior equiprobability between and .
Fig 1Predicted distributions of t statistics in the literature.
Predicted distributions are shown under the four censoring mechanisms we consider (columns) and two possible states of nature (top row: true (δ = 0); bottom row: false (δ ≠ 0)).
The four weighting functions.
| Model | Weight | Parameters |
|---|---|---|
| No bias | None | |
| Extreme bias | None | |
| Constant bias | ||
| Exponential bias |
Note: w(x) is always 1 for results that are statistically significant at the.05-level. The dependency on the design and data properties that determine statistical significance is implied.
Consistency of Bayes factors across original and replicate studies.
Columns indicate the magnitude of the mitigated Bayes factor from the original study, and rows indicate the magnitude of the Bayes factor obtained in the replication project.
| 0 − 1/10 | 1/10 − 1/3 | 1/3 − 3 | 3 − 10 | 10 − ∞ | |||
|---|---|---|---|---|---|---|---|
| 0 − 1/10 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 1/10 − 1/3 | 0 | 0 | 18 | 4 | 3 | 25 | |
| 1/3 − 3 | 0 | 0 | 16 | 4 | 7 | 27 | |
| 3 − 10 | 0 | 0 | 3 | 1 | 1 | 5 | |
| 10 − ∞ | 0 | 1 | 6 | 0 | 8 | 15 | |
| 0 | 1 | 43 | 9 | 19 | 72 | ||
Fig 2Evidence resulting from replicated studies plotted against evidence resulting from the original publications.
For the original publications, evidence for the alternative hypothesis was calculated taking into account the possibility of publication bias. Small crosses indicate cases where neither the replication nor the original gave strong evidence. Circles indicate cases where one or the other gave strong evidence, with the size of each circle proportional to the ratio of the replication sample size to the original sample size (a reference circle appears in the lower right). The area labeled ‘replication uninformative’ contains cases where the original provided strong evidence but the replication did not, and the area labeled ‘original uninformative’ contains cases where the reverse was true. Two studies that fell beyond the limits of the figure in the top right area (i.e., that yielded extremely large Bayes factors both times) and two that fell above the top left area (i.e., large Bayes factors in the replication only) are not shown. The effect that relative sample size has on Bayes factor pairs is shown by the systematic size difference of circles going from the bottom right to the top left. All values in this figure can be found in S1 Table.