| Literature DB >> 34950899 |
Nicholas J Tierney1,2,3, Karthik Ram4.
Abstract
Numerous arguments strongly support the practice of open science, which offers several societal and individual benefits. For individual researchers, sharing research artifacts such as data can increase trust and transparency, improve the reproducibility of one's own work, and catalyze new collaborations. Despite a general appreciation of the benefits of data sharing, research data are often only available to the original investigators. For data that are shared, lack of useful metadata and documentation make them challenging to reuse. In this paper, we argue that a lack of incentives and infrastructure for making data useful is the biggest barrier to creating a culture of widespread data sharing. We compare data with code, examine computational environments in the context of their ability to facilitate the reproducibility of research, provide some practical guidance on how one can improve the chances of their data being reusable, and partially bridge the incentive gap. While previous papers have focused on describing ideal best practices for data and code, we focus on common-sense ideas for sharing tabular data for a target audience of academics working in data science adjacent fields who are about to submit for publication.Entities:
Keywords: DSML 4: Production: Data science output is validated, understood, and regularly used for multiple domains/platforms
Year: 2021 PMID: 34950899 PMCID: PMC8672137 DOI: 10.1016/j.patter.2021.100368
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1The mechanisms for behavior change, the incentives, and our assessment of where the elements of data, code, and computational environment rank in terms of completing these aspects. We note that data are often required, but the preceding steps are not, in contrast to code, which has no policy.