| Literature DB >> 28872630 |
Thomas Pasquier1, Matthew K Lau2, Ana Trisovic3,4, Emery R Boose2, Ben Couturier3, Mercè Crosas5, Aaron M Ellison2, Valerie Gibson4, Chris R Jones4, Margo Seltzer1.
Abstract
In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.Entities:
Year: 2017 PMID: 28872630 PMCID: PMC5584398 DOI: 10.1038/sdata.2017.114
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1A simple W3C PROV-DM compliant provenance graph.
In this example, two processes (Process1) and (Process 2), use the data from the inputs File 1 and File 2, respectively. The processes are associated respectively with the users Alice and Bob, respectively. Process 1 informed (transferred information to) Process 2, which generated the output File 3.
Figure 2Research teams across the sciences are integrating data provenance methods into their research practices in response to increases in computational demands.
On the left: (Photo Credit: A. Trisovic) The Compact Muon Solenoid (CMS) experiment at CERN during the technical stop in February 2017. On the right: (Photo Credit: M.K. Lau) One of several research towers used for ecological data collection at Harvard Forest. In addition to providing infrastructure for researchers to view the forest at multiple levels in the forest canopy, many instruments for automated observations, such as wind speed, CO2 flux, and leaf phenology, are placed on these towers. Data are relayed to a controlling computer via a wireless network.