| Literature DB >> 34886890 |
Abstract
In medicine and other academic settings, (doctoral) students often work in interdisciplinary teams together with researchers of pharmaceutical sciences, natural sciences in general, or biostatistics. They should be fundamentally taught good research practices, especially in terms of statistical analysis. This includes reproducibility as a central aspect. Acknowledging that even experienced researchers and supervisors might be unfamiliar with necessary aspects of a perfectly reproducible workflow, a lecture series on reproducible research (RR) was developed for young scientists in clinical pharmacology. The pilot series highlighted definitions of RR, reasons for RR, potential merits of RR, and ways to work accordingly. In trying to actually reproduce a published analysis, several practical obstacles arose. In this article, reproduction of a working example is commented to emphasize the manifold facets of RR, to provide possible explanations for difficulties and solutions, and to argue that harmonized curricula for (quantitative) clinical researchers should include RR principles. These experiences should raise awareness among educators and students, supervisors and young scientists. RR working habits are not only beneficial for ourselves or our students, but also for other researchers within an institution, for scientific partners, for the scientific community, and eventually for the public profiting from research findings.Entities:
Keywords: Heterogeneous treatment effects; Machine learning; Medical education; Reproducibility; Reproducible research
Mesh:
Substances:
Year: 2021 PMID: 34886890 PMCID: PMC8656016 DOI: 10.1186/s13104-021-05862-8
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Sample principle of a workflow for RR with a standardized folder structure
| Main folder | Contents | Comment on item |
|---|---|---|
| Project name | Naming according to recurring pattern, which may include initials of the researcher and the date | |
| └ | Readme | Mandatory (text) file with important project information on prerequisites, scientific and technical background, and an instruction how to run the project code. Can also include a list of necessary software (package) versions, if not supplied as a separate file |
| └ | Folder “data” | Folder with (raw) data or preprocessed data |
| └ | Folder “lib” | Folder for storing literature or cross-project scripts (e.g., R functions, R packages, …) |
| └ | Folder “results” | Folder for saving results of any kind (tables, figures, R-images, …) |
| └ | Folder “src” | Folder with all executable [“source()”] files |
| └ | Folder “paper” | Folder for storing publication drafts of any kind (e.g., Word documents, Markdown results, Shiny apps, …) |
| └ | Folder “old” | Optional collection folder for old version of scripts or similar |
| └ | Make | Central executable files for reproduction of the project |
Fig. 1Bootstrapped performance metrics used to derive mean estimates and 95% confidence intervals from the original publication [12], from the reproduction using the supplied code written in Python, and from the supplied code translated to R. Average risk reductions were calculated for two subgroups (buckets) of those patients with predicted benefit in absolute risk reduction (ARR > 0) and those patients without predicted benefit (ARR ≤ 0). A calibration line was fitted between quintiles of ARRs and predicted risk, whose slope is chosen for this set of performance metrics. As a decision value, the model predicted restricted mean survival time [RMST (days)] indicates the mean time to event if treatment choice would have been based on the predicted individual benefit [and is thus to be compared with the baseline value of 1061.2 days, 95% confidence interval: (1057.4; 1064.1)]. The c-for-benefit is a metric reflecting the model’s ability to predict treatment benefit (rather than risk for an outcome) [20]. Using the Python implementation to calculate this metric, the individual risk estimates reproduced in R yielded an estimate of 0.61 (0.55; 0.72). Of note, we restrict the presentation of results to distributions from resampling and their summary parameters; further numerical metrics to quantify reproducibility are left out for simplicity. Analyses were using the Anaconda distribution of Python version 3.7.3 (Anaconda Software Distribution, version 2–2.4.0) and the R software environment version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria)