| Literature DB >> 35494870 |
Sheeba Samuel1,2, Birgitta König-Ries1,2.
Abstract
Scientific data management plays a key role in the reproducibility of scientific results. To reproduce results, not only the results but also the data and steps of scientific experiments must be made findable, accessible, interoperable, and reusable. Tracking, managing, describing, and visualizing provenance helps in the understandability, reproducibility, and reuse of experiments for the scientific community. Current systems lack a link between the data, steps, and results from the computational and non-computational processes of an experiment. Such a link, however, is vital for the reproducibility of results. We present a novel solution for the end-to-end provenance management of scientific experiments. We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis with Reproducibility), which allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational data and steps in an interoperable way. CAESAR integrates the REPRODUCE-ME provenance model, extended from existing semantic web standards, to represent the whole picture of an experiment describing the path it took from its design to its result. ProvBook, an extension for Jupyter Notebooks, is developed and integrated into CAESAR to support computational reproducibility. We have applied and evaluated our contributions to a set of scientific experiments in microscopy research projects. ©2022 Samuel and König-Ries.Entities:
Keywords: Jupyter Notebooks; Ontology; Provenance; Reproducibility; Research data management platform; Scientific experiments; Semantic Web; Visualization
Year: 2022 PMID: 35494870 PMCID: PMC9044346 DOI: 10.7717/peerj-cs.921
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1The architecture of CAESAR.
The data management platform consists of modules for provenance capture, representation, storage, comparison, and visualization. It also includes several additional services including API access, and SPARQL endpoint.
Figure 2The difference between the input and output of two different execution of a code cell in ProvBook.
Deleted elements are marked in red, newly added or created elements are marked in green.
Figure 3ProvTrack: tracking provenance of scientific experiments.
Figure 4A part of results for the competency question.