| Literature DB >> 35855607 |
Ellen K Bledsoe1,2,3, Joseph B Burant1,4,5, Gracielle T Higino1,6, Dominique G Roche1,7, Sandra A Binning1,5, Kerri Finlay1,3, Jason Pither1,8, Laura S Pollock1,4, Jennifer M Sunday1,4, Diane S Srivastava1,6.
Abstract
Historical and long-term environmental datasets are imperative to understanding how natural systems respond to our changing world. Although immensely valuable, these data are at risk of being lost unless actively curated and archived in data repositories. The practice of data rescue, which we define as identifying, preserving, and sharing valuable data and associated metadata at risk of loss, is an important means of ensuring the long-term viability and accessibility of such datasets. Improvements in policies and best practices around data management will hopefully limit future need for data rescue; these changes, however, do not apply retroactively. While rescuing data is not new, the term lacks formal definition, is often conflated with other terms (i.e. data reuse), and lacks general recommendations. Here, we outline seven key guidelines for effective rescue of historically collected and unmanaged datasets. We discuss prioritization of datasets to rescue, forming effective data rescue teams, preparing the data and associated metadata, and archiving and sharing the rescued materials. In an era of rapid environmental change, the best policy solutions will require evidence from both contemporary and historical sources. It is, therefore, imperative that we identify and preserve valuable, at-risk environmental data before they are lost to science.Entities:
Keywords: data archiving; historical data; long-term ecological data; open science; reproducibility; transparency
Year: 2022 PMID: 35855607 PMCID: PMC9297007 DOI: 10.1098/rspb.2022.0938
Source DB: PubMed Journal: Proc Biol Sci ISSN: 0962-8452 Impact factor: 5.530
Box 2.1Photograph of loose data sheets, maps, reports and picture slides; these items and many more filled the boxes of research material left behind by Dr La Roi. Image credit: A. Hesketh.
Box 2.2Example of non-standard data to be rationalized and digitized, representing the significance of correlations between habitat features. These symbols were converted to numeric factors during digitization. Reproduced with modification from Lancaster [30, see Appendix 4, p. 103–104 therein].
Figure 1Prioritizing data for rescue: balancing the value of the data and its risk of loss. With many datasets in need of preservation and limited resources, the first step in the data rescue process requires developing a list of priorities for consideration and identifying relevant datasets (figure 2). We consider data prioritization to be a balance between the assessed value of a dataset in question and the potential risk of its loss in the absence of intervention (see Data prioritization under Guidelines). Alt-text is available in the electronic supplementary material. (Online version in colour.)
Figure 2Steps in the data rescue assembly line. First, data must be prioritized for rescue (Step 1). After team creation (Step 2) and metadata creation (Step 3), the data must be transferred and compiled into a logical format (Step 4). After data cleaning and validation (Step 5) is complete, the finalized data and metadata should be archived on a long-term data repository (Step 6). The ultimate goal is to have the rescued data openly available for re-use (Step 7). Alt-text is available in the electronic supplementary material.