| Literature DB >> 34367614 |
Daniel Nüst1, Stephen J Eglen2.
Abstract
The traditional scientific paper falls short of effectively communicating computational research. To help improve this situation, we propose a system by which the computational workflows underlying research articles are checked. The CODECHECK system uses open infrastructure and tools and can be integrated into review and publication processes in multiple ways. We describe these integrations along multiple dimensions (importance, who, openness, when). In collaboration with academic publishers and conferences, we demonstrate CODECHECK with 25 reproductions of diverse scientific publications. These CODECHECKs show that asking for reproducible workflows during a collaborative review can effectively improve executability. While CODECHECK has clear limitations, it may represent a building block in Open Science and publishing ecosystems for improving the reproducibility, appreciation, and, potentially, the quality of non-textual research artefacts. The CODECHECK website can be accessed here: https://codecheck.org.uk/. Copyright:Entities:
Keywords: Open Science; code sharing; data sharing; peer review; quality control; reproducibility; reproducible research; scholarly publishing
Year: 2021 PMID: 34367614 PMCID: PMC8311796 DOI: 10.12688/f1000research.51738.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. The inverse problem in reproducible research.
The left half of the diagram shows a diverse range of materials used within a laboratory. These materials are often then condensed for sharing with the outside world via the research paper, a static PDF document. Working backwards from the PDF to the underlying materials is impossible. This prohibits reuse and is not only non-transparent for a specific paper but is also ineffective for science as a whole. By sharing the materials on the left, others outside the lab can enhance this work.
Figure 2. The CODECHECK example workflow implementation.
Codecheckers act as detectives: They investigate and record, but do not fix issues. Numbers in bold refer to steps outlined in the text.
Figure 3. The dimensions of implementing a CODECHECK workflow.
Register of completed certificates as of December 2020.
An interactive version is available at .
| Certificate | Research area | Description |
|---|---|---|
|
| Machine learning | Code for benchmarking ML classification tool checked post acceptance of manuscript and before its
|
|
| Neuroscience | Code written for this project checked by second project member as demonstration using paper from
|
|
| Neuroscience | Code written for this project checked by second project member as demonstration using classic paper
|
|
| Neuroscience | Code written for this project checked by second project member as demonstration using classic paper
|
|
| Neuroscience | Check of independent reimplementation of spike-timing-dependent plasticity (STDP) model
|
|
| Neuroscience | Check of independent reimplementation of a generalized linear integrate-and-fire neural model
|
|
| Neuroscience | Check of independent reimplementation of analysing spike patterns of neurons
|
|
| COVID-19 | Code for modelling of interventions on COVID-19 cases in the UK checked at preprint stage
|
|
| COVID-19 | Code for analysis of effectiveness of measures to reduce transmission of SARS-CoV-2 checked as
|
|
| COVID-19 | Code for analysis of non-pharmaceutical interventions (Report 9) checked as a preprint
|
|
| COVID-19 | Code for modelling of COVID-19 spread across Europe was provided by authors and checked while
|
|
| COVID-19 | Code for modelling of COVID-19 spread across the USA was checked as preprint
|
|
| Neuroscience | Code for analysis of rest-activity patterns in people without con-mediated vision was checked as a
|
|
| Neuroscience | Code for analysis of perturbation patterns of neural activity was checked after publication as part of
|
|
| Neuroscience | Code for a neural network model for human focal seizures was checked after publication as part of
|
|
| GIScience | Code for models demonstrating the Modifiable Aral Unit Problem (MAUP) in spatial data science
|
|
| GIScience | Code for spatial data handling, analysis, and visualisation using a variety of R packages
|
|
| GIScience | AGILE conference reproducibility report using a demonstration data subset with cellular automaton for
|
|
| GIScience | AGILE conference reproducibility report with subsampled dataset for reachability analysis of suburban
|
|
| GIScience | AGILE conference reproducibility report using a container for checking in-database windows operators
|
|
| GIScience | AGILE conference reproducibility report checking code for comparing supervised machine learning
|
|
| GIScience | AGILE conference reproducibility report checking code for visualising text analysis on intents and
|
|
| GIScience | AGILE conference reproducibility report on analysis of spatial footprints of geo-tagged extreme weather
|
|
| Neuroscience | Code for multi-agent system for concept drift detection in electromyography
|
|
| GIScience | Adaptation and application of Local Indicators for Categorical Data (LICD) to archaeological data
|
Figure 4. Annotated certificate 2020–012 (first four pages only).