Literature DB >> 30395209

A parasite's perspective on data sharing.

YoSon Park1, Casey S Greene1,2.   

Abstract

Data generation is expensive in terms of both time and money. Sharing data enables the rapid replication, validation, and application of discoveries, increasing the pace and accuracy of research. As research parasites, or users of other people's data, we recognize that a strong science ecosystem requires that those who share best be recognized. We find that widely accessible benchmark datasets have provided outsized benefits, and we hope that the benefits of sharing will also begin to accrue to individual investigators who share well. Funders can enhance progress by adjusting incentives to better support data sharers, which will make their programmatic investments more effective. We note some funders who are making such efforts.

Entities:  

Mesh:

Year:  2018        PMID: 30395209      PMCID: PMC6258825          DOI: 10.1093/gigascience/giy129

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


The primary goal of scientific publishing is to disseminate findings to the community and disclose supporting observations that allow the community to infer their validity and robustness. Widely used forms of data release, such as making data available only upon reasonable requests, can delay sharing indefinitely. Though these practices reduce the scientists’ ability to critically evaluate or build upon previous work, some journals still allow these practices. Some editors have questioned whether data sharing may bring harm to certain fields, going so far as to call someone who has re-analyzed shared data without providing co-authorships to sharers “research parasites” [1]. This has given rise to “Research Parasite Awards” focused on data reuse [2] and “Research Symbiont Awards” focused on data sharing [3] that seek to recognize exemplar participants in this scientific ecosystem. As research parasites, we provide our perspective on some of the best examples of data sharing. We note how these practices provide an efficient path to accurate, impactful findings. We also discuss how funders can encourage innovation and improvements in science by preferring to support researchers who are sharing. During the outbreak of the deadly Zika virus (ZIKV) in South and Central America in 2013 and 2014, the virus spread so rapidly that researchers studying it felt the need to share their data before completing the publication process. Acknowledging this, several researchers announced a data release, including sequence data of candidate strains, to the World Health Organization to enable other researchers to develop vaccines [4]. The Zika experimental-science team, who were studying rhesus macaques infected with Zika virus, made their data public rather than publishing them in a journal and continued to update their results daily for others to benefit from their progress [5]. These scientists committed to sharing information rapidly during public health emergencies, but there were no guidelines for formal recognition and rewards for such efforts at the time. There are also sharing-first projects outside of infectious disease research. As part of a £62 million initiative, the UK Biobank recruited 500,000 individuals from the United Kingdom aged between 40–69 years in 2006–2010 and publicly released all generated data for bona fide research use [6]. The consortium behind this effort collected and continues to collect extensive phenotypic data including various physical measurements, accelerometry, questionnaires, biochemical assays, and genotyping data from all of its participants. The conventional model for novel, multisite data collection is to restrict access to only a small set of participants who produce publications and other accomplishments linked to the resource. However, the UK Biobank is transforming the way public genetic and health data should be shared. This consortium, while providing strategically de-identified electronic health records, genotypes, and other research data to benefit public health, explicitly stated that no resource users will be expected to add the UK Biobank as co-authors on their publications or share income generated from research using these resources. Certain data resources undergo continuous updates. In an example from ecology, researchers are actively building platforms to embrace evolving or dynamic data [7]. Yenni and colleagues describe an automated workflow where researchers can automatically integrate new data with versioning and archiving and update associated manuscripts and expand relevant supplementary tables. This framework not only allows faster integration and automated analysis of continuously updated data, it also ensures systematic review of the workflow and faster dissemination of results and provides transparency for others to validate the study and its findings. Figure 1 depicts citations for publicly shared data between 2008 and the present. For the past decade, we estimated that approximately 6,516 publications benefited from these public resources.
Figure 1:

Citations for publicly shared data between 2008 and the present. For four main public data repositories, the Cancer Genome Atlas (TCGA), Genotype-Tissue Expression Project (GTEx), Encyclopedia of DNA Elements (ENCODE), and UK Biobank, we estimated the number of publications citing each dataset on PubMed [8]. For ENCODE and ENCODE_Community publications, the ENCODE official website [9] was used in place of PubMed for searches up to its finalization in 2016. For the past decade, we estimated that approximately 6,516 publications benefited from these public resources.

Citations for publicly shared data between 2008 and the present. For four main public data repositories, the Cancer Genome Atlas (TCGA), Genotype-Tissue Expression Project (GTEx), Encyclopedia of DNA Elements (ENCODE), and UK Biobank, we estimated the number of publications citing each dataset on PubMed [8]. For ENCODE and ENCODE_Community publications, the ENCODE official website [9] was used in place of PubMed for searches up to its finalization in 2016. For the past decade, we estimated that approximately 6,516 publications benefited from these public resources. If data sharing plays such a key role in more rapid and accurate scientific progress, how can this behavior be better encouraged? Funders have a unique role in the scientific ecosystem. Those working at universities and other research institutions often face incentive structures that are challenging for those who seek to make change, such as promotion and tenure committees that are not accustomed to crediting data sharing. Alberts et al. [10], in an article describing flaws in the biomedical research ecosystem, note that at many institutions, a faculty salary is almost entirely covered by research grants. This increases universities’ reliance on funder dollars and provides funders with an expanded role in driving evaluation metrics throughout the ecosystem. Funders benefit when scientists find errors earlier and make discoveries sooner, and some are strengthening their resource-sharing expectations. The Chan-Zuckerberg Initiative expects grantees to share protocols and preprints for projects that they fund. Alex's Lemonade Stand Foundation asks applicants to describe their past sharing behavior alongside their plans for sharing resources that result from the proposal, and both are assessed for impact. Since 2017, the Bill & Melinda Gates Foundation has required peer-reviewed publications and associated data to be discoverable and accessible openly and has offered to pay for open access as needed. These are important changes because researchers who share face challenges due to the scientific norms around credit. These challenges include the costs associated with open access publishing and high-quality data sharing as well as the risks that errors could be found in their work or that others could extend their work, causing them to lose future accomplishments and credit. By providing incentives for sharing, funders help to mitigate these perceived risks and thus improve the value of their investment. Funders can make their investments go further by supporting a culture among investigators that values behaviors that accelerate scientific progress.

Note from the Editors

The 2019 Research Parasite Awards will be awarded at the 2019 Pacific Symposium on Biocomputing in early January, and GigaScience, for the third year running, is proud to support the Junior Parasite prize with a travel grant.

Competing interests

The authors declare that they have no competing interests. Click here for additional data file.
  5 in total

1.  Data Sharing.

Authors:  Dan L Longo; Jeffrey M Drazen
Journal:  N Engl J Med       Date:  2016-01-21       Impact factor: 91.245

2.  Data-Sharing Models

Authors:  J Brian Byrd; Casey S Greene
Journal:  N Engl J Med       Date:  2017-06-08       Impact factor: 91.245

3.  Rescuing US biomedical research from its systemic flaws.

Authors:  Bruce Alberts; Marc W Kirschner; Shirley Tilghman; Harold Varmus
Journal:  Proc Natl Acad Sci U S A       Date:  2014-04-14       Impact factor: 11.205

4.  Celebrating parasites.

Authors:  Casey S Greene; Lana X Garmire; Jack A Gilbert; Marylyn D Ritchie; Lawrence E Hunter
Journal:  Nat Genet       Date:  2017-03-30       Impact factor: 38.330

5.  UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors:  Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal:  PLoS Med       Date:  2015-03-31       Impact factor: 11.069

  5 in total
  2 in total

1.  A field guide to cultivating computational biology.

Authors:  Gregory P Way; Casey S Greene; Piero Carninci; Benilton S Carvalho; Michiel de Hoon; Stacey D Finley; Sara J C Gosline; Kim-Anh Lȇ Cao; Jerry S H Lee; Luigi Marchionni; Nicolas Robine; Suzanne S Sindi; Fabian J Theis; Jean Y H Yang; Anne E Carpenter; Elana J Fertig
Journal:  PLoS Biol       Date:  2021-10-07       Impact factor: 8.029

Review 2.  Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases.

Authors:  Chirag Gupta; Pramod Chandrashekar; Ting Jin; Chenfeng He; Saniya Khullar; Qiang Chang; Daifeng Wang
Journal:  J Neurodev Disord       Date:  2022-05-02       Impact factor: 4.074

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.