| Literature DB >> 31767750 |
Christopher J Bryan1, David S Yeager2, Joseph M O'Brien2.
Abstract
In recent years, the field of psychology has begun to conduct replication tests on a large scale. Here, we show that "replicator degrees of freedom" make it far too easy to obtain and publish false-negative replication results, even while appearing to adhere to strict methodological standards. Specifically, using data from an ongoing debate, we show that commonly exercised flexibility at the experimental design and data analysis stages of replication testing can make it appear that a finding was not replicated when, in fact, it was. The debate that we focus on is representative, on key dimensions, of a large number of other replication tests in psychology that have been published in recent years, suggesting that the lessons of this analysis may be far reaching. The problems with current practice in replication science that we uncover here are particularly worrisome because they are not adequately addressed by the field's standard remedies, including preregistration. Implications for how the field could develop more effective methodological standards for replication are discussed.Entities:
Keywords: null hacking; p-hacking; replication crisis; reproducibility; researcher degrees of freedom
Mesh:
Year: 2019 PMID: 31767750 PMCID: PMC6925985 DOI: 10.1073/pnas.1910951116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
List of the potentially important design features in the original experiments by Bryan et al. (57) that the replicating authors chose to deviate from in one or both of their published replication tests (18, 35)
| No. | Design features | Original experiments ( | Replication test #1 ( | Replication test #2 ( |
| 1 | Medium used to administer noun vs. verb manipulation | Online survey |
| Online survey |
| 2 | Screened out prospective participants who had already voted? | Yes |
|
|
| 3 | Screened out nonnative English speakers? | Yes |
|
|
| 4 | Participants treated on Election Day until close of polls? | No | No |
|
| 5 | Participants treated on Election Day only before 9 AM? | Yes |
|
|
| 6 | Participants treated 1 d before Election Day? | Yes | Yes | Yes |
| 7 | Participants treated 2 d before Election Day? | No |
|
|
| 8 | Participants treated 3 d before Election Day? | No |
|
|
| 9 | Participants treated 4 d before Election Day? | No |
| No |
| 10 | Salient election context? | Yes: US presidential general and gubernatorial general (NJ) |
| Yes: gubernatorial general (LA, MS, and KY) and mayoral general (Houston) |
Design deviations are shown in bold font.
Fig. 1.Plot of the effect size estimates from 1,200 models in the stage-2 specification curve, which includes the 270 specifications from the stage-1 specification curve and 930 models that represent analytical choices made by the replicating authors. A–F represent subsets of study days. The triangular points along the bottom of each panel indicate specifications using the exact set of 95 covariates used by Gerber et al. (35).