Literature DB >> 28250935

Data archiving and availability in an era of open science.

Edward N Baker1.   

Abstract

The importance of preserving and making available the original experimental data underlying biological structural models is discussed, both for crystallography, where the raw data images pose particular challenges, and for other structure determination techniques.

Entities:  

Keywords:  data archiving; open science; raw diffraction data

Year:  2017        PMID: 28250935      PMCID: PMC5331459          DOI: 10.1107/S2052252516020340

Source DB:  PubMed          Journal:  IUCrJ        ISSN: 2052-2525            Impact factor:   4.769


The current moves to increasing openness in science have both philosophical and scientific rationales, and carry great potential benefits for science. The belief that the results of publicly funded research should be freely available to all is only part of this. Science itself is a form of international cultural heritage, and can best develop if the ideas it brings are spread as widely as possible. Open-access publishing, as exemplified by this journal, is one means by which this can be done. Of equal importance, however, is the need to preserve the underlying experimental data, preferably in a manner that makes them available to others. This enables results to be validated, re-evaluated or extended, increasing the value of the original work and opening possibilities for new directions. Crystallography has an inbuilt advantage here in that it is data-rich and the data are readily stored in electronic form. (This does not apply to the original biological or chemical samples, unfortunately, or to crystals, but that is another story). We have also been extraordinarily fortunate, since the earliest days of structural biology, in having scientists within our discipline with the vision to see the importance of archiving the structural and diffraction data, to preserve them, organize and annotate them and make them freely available (Berman et al., 2016 ▸). The Protein Data Bank (PDB) and its successor, the worldwide PDB (wwPDB), which is curated by its United States, European and Japanese partners (Berman et al., 2003 ▸), is a wonderful resource today, well managed and forward looking. Today, more than 120 000 macromolecular structures determined by crystallography are archived in the wwPDB and are joined by some 11 000 determined by NMR and 1100 by cryo-EM. The latter bring different kinds of data to be archived, and require different forms of validation, which are currently being worked through by expert taskforces for implementation within the wwPDB. Structures determined by cryo-EM, in particular, tend to be very large and complex, and with the development of a new generation of detectors are growing explosively in number (Kuhlbrandt, 2014 ▸; Subramaniam et al., 2016 ▸). A highlight for me as a card-carrying crystallographer, at the recent conference of the Asian Crystallographic Association (AsCA) in Hanoi, was to hear a beautiful account by Wah Chiu (Baylor College of Medicine, USA) of the ways in which cryo-EM map and model quality can now be assessed. But science does not stand still, and these three principal structure determination methods are increasingly being complemented by data from other sources (Sali et al., 2015 ▸), such as small-angle X-ray scattering (SAXS) and other solution scattering approaches. These help to expand the reach of structural biology into more complex systems, and it is important that these data, too, should be preserved. For these and other complementary methods there are difficult questions to be resolved. What are the key data that should be archived, and what metadata need to be captured with the experimental data if they are to be useful to other researchers? A forthcoming article by Kroon-Batenburg et al. in this journal (Kroon-Batenburg et al., 2017 ▸) highlights some of these issues. With the vastly expanded capacity of modern electronic media it is timely to ask whether raw crystallographic data files (the real primary data) could or should be archived in repositories where they can be accessed by other researchers. The advantages are many. With improved processing methods, better structures, at higher resolution, may be obtained. Other crystal phenomena such as diffuse scattering (often ignored in the pursuit of atomic structural models) could give new information on dynamics. ‘Pathological’ data sets that the original researchers had given up on could be reprocessed and might possibly bring valuable new structural information; we might bring to life a few of the skeletons that adorn our closets! The article by Kroon-Batenburg et al. is the latest update from a Working Group set up by the IUCr in 2011 to consider the practicalities of raw diffraction data deposition, and follows earlier papers on the topic, published in Acta Crystallographica Section D in 2014 (Terwilliger, 2014 ▸). It considers the present options for archiving raw data, and focuses particularly on the need for appropriate metadata to accompany the primary data if these data are to be truly useful into the future. I am sure I am not alone in having in my office old nine-track magnetic tapes, DAT tapes and other media containing raw data sets from the past, none of them readable now as technologies become outdated. Science will be the poorer if our primary experimental data are lost, as some of these now are, and in the spirit of open science I consider these to be challenges that really must be addressed.
  6 in total

1.  Announcing the worldwide Protein Data Bank.

Authors:  Helen Berman; Kim Henrick; Haruki Nakamura
Journal:  Nat Struct Biol       Date:  2003-12

2.  Biochemistry. The resolution revolution.

Authors:  Werner Kühlbrandt
Journal:  Science       Date:  2014-03-28       Impact factor: 47.728

3.  Archiving raw crystallographic data.

Authors:  Thomas C Terwilliger
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2014-09-30

4.  Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop.

Authors:  Andrej Sali; Helen M Berman; Torsten Schwede; Jill Trewhella; Gerard Kleywegt; Stephen K Burley; John Markley; Haruki Nakamura; Paul Adams; Alexandre M J J Bonvin; Wah Chiu; Matteo Dal Peraro; Frank Di Maio; Thomas E Ferrin; Kay Grünewald; Aleksandras Gutmanas; Richard Henderson; Gerhard Hummer; Kenji Iwasaki; Graham Johnson; Catherine L Lawson; Jens Meiler; Marc A Marti-Renom; Gaetano T Montelione; Michael Nilges; Ruth Nussinov; Ardan Patwardhan; Juri Rappsilber; Randy J Read; Helen Saibil; Gunnar F Schröder; Charles D Schwieters; Claus A M Seidel; Dmitri Svergun; Maya Topf; Eldon L Ulrich; Sameer Velankar; John D Westbrook
Journal:  Structure       Date:  2015-06-18       Impact factor: 5.006

Review 5.  The archiving and dissemination of biological structure data.

Authors:  Helen M Berman; Stephen K Burley; Gerard J Kleywegt; John L Markley; Haruki Nakamura; Sameer Velankar
Journal:  Curr Opin Struct Biol       Date:  2016-07-21       Impact factor: 6.809

6.  CryoEM at IUCrJ: a new era.

Authors:  Sriram Subramaniam; Werner Kühlbrandt; Richard Henderson
Journal:  IUCrJ       Date:  2016-01-01       Impact factor: 4.769

  6 in total
  2 in total

1.  Thermomechanical effect in molecular crystals: the role of halogen-bonding interactions.

Authors:  Sudhir Mittapalli; D Sravanakumar Perumalla; Jagadeesh Babu Nanubolu; Ashwini Nangia
Journal:  IUCrJ       Date:  2017-10-27       Impact factor: 4.769

2.  Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography.

Authors:  John R Helliwell; Wladek Minor; Manfred S Weiss; Elspeth F Garman; Randy J Read; Janet Newman; Mark J van Raaij; Janos Hajdu; Edward N Baker
Journal:  Acta Crystallogr D Struct Biol       Date:  2019-04-29       Impact factor: 7.652

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.