| Literature DB >> 34153038 |
Heidi Seibold1,2,3,4, Severin Czerny1, Siona Decke1, Roman Dieterle1, Thomas Eder1, Steffen Fohr1, Nico Hahn1, Rabea Hartmann1, Christoph Heindl1, Philipp Kopper1, Dario Lepke1, Verena Loidl1, Maximilian Mandl1, Sarah Musiol1, Jessica Peter1, Alexander Piehler1, Elio Rojas1, Stefanie Schmid1, Hannah Schmidt1, Melissa Schmoll1, Lennart Schneider1, Xiao-Yin To1, Viet Tran1, Antje Völker1, Moritz Wagner1, Joshua Wagner1, Maria Waize1, Hannah Wecker1, Rui Yang1, Simone Zellner1, Malte Nalenz1.
Abstract
Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses-such as the analysis of longitudinal data-reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.Entities:
Mesh:
Year: 2021 PMID: 34153038 PMCID: PMC8216542 DOI: 10.1371/journal.pone.0251194
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Data selection.
Data selection procedure according to our requirements and number of papers fulfilling the respective requirements.
Selected papers.
| Citation | Title | |
|---|---|---|
| [ | Wagner et al (2017) | Airway Microbial Community Turnover Differs by BPD Severity in Ventilated Preterm Infants |
| [ | Meda et al (2017) | Longitudinal Influence of Alcohol and Marijuana Use on Academic Performance in College Students |
| [ | Visaya et al (2015) | Analysis of Binary Multivariate Longitudinal Data via 2-Dimensional Orbits: An Application to the Agincourt Health and Socio-Demographic Surveillance System in South Africa |
| [ | Vo et al (2018) | Optimizing Community Screening for Tuberculosis: Spatial Analysis of Localized Case Finding from Door-to-Door Screening for TB in an Urban District of Ho Chi Minh City, Viet Nam |
| [ | Aerenhouts et al (2015) | Estimating Body Composition in Adolescent Sprint Athletes: Comparison of Different Methods in a 3 Years Longitudinal Design |
| [ | Tabatabai et al (2016) | Racial and Gender Disparities in Incidence of Lung and Bronchus Cancer in the United States: A Longitudinal Analysis |
| [ | Rawson et al (2015) | Association of Functional Polymorphisms from Brain-Derived Neurotrophic Factor and Serotonin-Related Genes with Depressive Symptoms after a Medical Stressor in Older Adults |
| [ | Kawaguchi, Desrochers(2018) | A Time-Lagged Effect of Conspecific Density on Habitat Selection by Snowshoe Hare |
| [ | Lemley et al (2016) | Morphometry Predicts Early GFR Change in Primary Proteinuric Glomerulopathies: A Longitudinal Cohort Study Using Generalized Estimating Equations |
| [ | Carmody et al (2018) | Fluctuations in Airway Bacterial Communities Associated with Clinical States and Disease Stages in Cystic Fibrosis |
| [ | Villalonga-Olives et al (2017) | Longitudinal Changes in Health Related Quality of Life in Children with Migrant Backgrounds |
Which statistical methods were used by the papers?.
| Overview Tables | Visualisations | Models Used | |
|---|---|---|---|
| [ | Baseline demographics | Several, e.g. spaghetti plot | Beta Binomial Mixed Model |
| [ | Baseline demographics, model output | Several, e.g. scatter plots (alkohol vs. marijuana use) of different time points | LMM |
| [ | Overview of household types | Several, e.g. lasagna plot | GEE |
| [ | Baseline demographics | none | GEE |
| [ | Correlation | none | LMM (cross-classified) |
| [ | Many especially smoking and lung cancer incidence rates for different year, genders, races and regions | Mean curves | LMM |
| [ | Baseline demographics | Mean curves | GEE |
| [ | Data overview | Mean curves | GEE |
| [ | Correlation matrix | Mean curves | GEE |
| [ | Sample characteristics | Several, e.g. FEV1 over time | GEE |
| [ | Baseline demographics | DAG | GEE |
Were the results reproducible?.
| Method documentation | Contact Attempts | Author Responses | Models Computable | Same Interpretation | Classification of Failure | |
|---|---|---|---|---|---|---|
| [ | Missing Details | 2 | 1 | partly | no | Software differences |
| [ | Missing Details | 0 | 0 | yes | yes | |
| [ | yes | 1 | 1 | partly | yes | Software differences |
| [ | Missing Details | 1 | 1 | yes | yes | |
| [ | Missing Details | 3 | 2 | partly | no | Software differences |
| [ | yes | 1 | 0 | no | no | Software differences, Model Description |
| [ | Correlation Structure missing | 1 | 1 | yes | yes | |
| [ | Correlation Structure missing | 1 | 1 | yes | yes | |
| [ | Correlation Structure missing | 3 | 1 | yes | yes | |
| [ | 4 | 1 | no | Data and Model description | ||
| [ | yes | 0 | 0 | yes | yes |
Which software was used by the papers?.
| Software | Open Source | Source Code | Computing Environment | |
|---|---|---|---|---|
| [ | SAS | no | partly | SAS version |
| [ | SPSS | no | no | SPSS version |
| [ | no information (email contact states Stata) | no | no | no information |
| [ | no information (email contact states Stata) | no | no | no information |
| [ | SAS | no | no | SAS version |
| [ | SAS | no | no | SAS version |
| [ | SAS | no | no | SAS version |
| [ | R | yes | upon request | Package version |
| [ | SAS | no | no | SAS version |
| [ | SPSS | no | no | SPSS version |
| [ | MPlus | no | no | MPlus version |
Fig 2Original and reproduced model parameter estimates for the ewbGEE model of article [15].
In this article the differences in parameters do not lead to a different interpretation.