| Literature DB >> 35025022 |
Jessie R Baldwin1,2, Jean-Baptiste Pingault3,4, Tabea Schoeler3, Hannah M Sallis5,6,7, Marcus R Munafò5,6,8.
Abstract
Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society's most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. In this article, we describe these challenges and propose novel solutions and alternative approaches. Proposed solutions include approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) help ensure that pre-registered analyses will be appropriate for the data, and (4) address difficulties arising from reduced analytic flexibility in pre-registration. For each solution, we provide guidance on implementation for researchers and data guardians. The adoption of these practices can help to protect against researcher bias in secondary data analysis, to improve the robustness of research based on existing data.Entities:
Keywords: Open science; Pre-registration; Researcher bias; Secondary data analysis
Mesh:
Year: 2022 PMID: 35025022 PMCID: PMC8791887 DOI: 10.1007/s10654-021-00839-0
Source DB: PubMed Journal: Eur J Epidemiol ISSN: 0393-2990 Impact factor: 8.082
Limitations in the use of pre-registration to address QRPs
| Limitation | Example |
|---|---|
| Pre-registration may not prevent selective reporting/outcome switching | The COMPare Trials Project [ |
| Pre-registration may be performed retrospectively after the results are known | Mathieu et al. [ |
| Deviations from pre-registered protocols are common | Claesen et al. [ |
| Pre-registration may not improve the credibility of hypotheses | Rubin [ |
Challenges and potential solutions regarding sharing pre-existing data
| Challenge | Potential solutions |
|---|---|
Many datasets cannot be publicly shared because of ethical and legal requirements | Share a synthetic dataset (a simulated dataset which mimics an original dataset by preserving its statistical properties and associations between variables). For a tutorial, see Quintana [ |
| Provide specific instructions on how data can be accessed and links to codebooks/data dictionaries with variable information [ | |
If different researchers conduct similar statistical tests on a dataset and do not correct for multiple testing, this increases the risk of false positives [ | Test whether findings replicate in independent samples, as the chance of two identical false positives occurring in independent samples is small |
| Ensure that the research question is distinct from prior studies on the given dataset, to help ensure that proposed analyses are part of a different statistical family. Multiple analyses on a single dataset will not lead to false positives if the analyses are part of different statistical families |
Fig. 1Challenges in pre-registering secondary data analysis and potential solutions (according to researcher motivations). Note: In the “Potential solution” column, blue boxes indicate solutions that are researcher-led; green boxes indicate solutions that should be facilitated by data guardians