| Literature DB >> 32626823 |
Benjamin G Farrar1, Markus Boeckle1,2, Nicola S Clayton1.
Abstract
Direct replication studies follow an original experiment's methods as closely as possible. They provide information about the reliability and validity of an original study's findings. The present paper asks what comparative cognition should expect if its studies were directly replicated, and how researchers can use this information to improve the reliability of future research. Because published effect sizes are likely overestimated, comparative cognition researchers should not expect findings with p-values just below the significance level to replicate consistently. Nevertheless, there are several statistical and design features that can help researchers identify reliable research. However, researchers should not simply aim for maximum replicability when planning studies; comparative cognition faces strong replicability-validity and replicability-resource trade-offs. Next, the paper argues that it may not even be possible to perform truly direct replication studies in comparative cognition because of: 1) a lack of access to the species of interest; 2) real differences in animal behavior across sites; and 3) sample size constraints producing very uncertain statistical estimates, meaning that it will often not be possible to detect statistical differences between original and replication studies. These three reasons suggest that many claims in the comparative cognition literature are practically unfalsifiable, and this presents a challenge for cumulative science in comparative cognition. To address this challenge, comparative cognition can begin to formally assess the replicability of its findings, improve its statistical thinking and explore new infrastructures that can help to form a field that can create and combine the data necessary to understand how cognition evolves.Entities:
Keywords: Comparative cognition; Evidence; Replication; Reproducibility
Year: 2020 PMID: 32626823 PMCID: PMC7334049 DOI: 10.26451/abc.07.01.02.2020
Source DB: PubMed Journal: Anim Behav Cogn ISSN: 2372-4323
The Results of 40,000 Simulated Comparative Cognition Studies by Power.
| Power | Proportion Published | Unstandardized Effect Size | ||
|---|---|---|---|---|
| All Samples | Published | Mean overestimation in “published” findings | ||
| 80 | 0.796 | 5.77 | 6.53 | 13% |
| 50 | 0.494 | 3.82 | 5.56 | 45% |
| 20 | 0.205 | 1.89 | 4.88 | 158% |
| 5 | 0.053 | 0.007 | 4.45 | 64486% |
Note. The proportion published represents the proportion of studies producing p < .05, and the unstandardized effect sizes are the mean differences between the groups.
The Mathematically Derived Probability of a Successful Replication Attempt of an Original Study Randomly Selected from a Given Range of p-values from the 15,471 "Published" Simulation Studies.
| 0.01 < | 0.02 < | 0.03 < | 0.04 < | ||
| Probability of successful replication | 0.67 | 0.57 | 0.52 | 0.50 | 0.48 |
Figure 1Graphs showing example data from nine different designs of a looking-time experiment in rooks. These designs vary on the underlying effect size (50 ms, 100 ms, 200 ms), and the number of trials per condition (1, 5, 100). N = 7 for all designs. The power of each design is printed below each graph and was calculated by simulating 10,000 studies in each design and calculating the proportion of p-values less than .05.
Recommended Reading to Introduce Readers to Current topics in Replication Research
| Title | Reference |
|---|---|
| Reproducibility of scientific results | |
| Detecting and avoiding likely false-positive findings—A practical guide. | |
| Justify your alpha | |
| The regression trap and other pitfalls of replication science | |
| Re-thinking reproducibility as a criterion for research quality |