Literature DB >> 31063147

Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography.

John R Helliwell1, Wladek Minor2, Manfred S Weiss3, Elspeth F Garman4, Randy J Read5, Janet Newman6, Mark J van Raaij7, Janos Hajdu8, Edward N Baker9.   

Abstract

Entities:  

Keywords:  FAIR; IUCr policy; diffraction data

Mesh:

Substances:

Year:  2019        PMID: 31063147      PMCID: PMC6503765          DOI: 10.1107/S2059798319004844

Source DB:  PubMed          Journal:  Acta Crystallogr D Struct Biol        ISSN: 2059-7983            Impact factor:   7.652


× No keyword cloud information.
The unprecedented progress of modern science is driven, to a large extent, by the fast propagation of information. Descriptions of experiments and results, and their interpretation, are no longer disseminated solely in peer-reviewed scientific publications, but are frequently distributed through non-reviewed publication platforms as preprints, entries to data repositories, databases etc. As a result of ever faster computers and internet connections, many experimental results are now available instantaneously at the click of a mouse, irrespective of the location of the source or consumer. In many instances, experiments performed and interpreted by one scientific group stimulate the interest of other scientists enough to spur research in further laboratories. Not infrequently, the results of these follow-up experiments are in disagreement with the previously obtained results and/or interpretations (Baker, 2016 ▸), notably in psychology and the clinical sciences. In some cases, the original results cannot even be reproduced well enough to allow follow-up experiments to commence (Prinz et al., 2011 ▸). Repeating an entire experiment performed by others is usually not feasible because of the significant time, effort and funds it would require (Baker, 2015 ▸). So the question is, what should be done in this new era? How can new technical developments be best exploited for furthering science and the scientific output? The structural biology community has always been at the forefront of sharing processed, i.e. analysed, results. Since its creation in 1971, the Protein Data Bank (PDB; Berman et al., 2000 ▸) has become an indispensable daily resource for hundreds of thousands of scientists. Initially, the PDB curated only the molecular structure coordinate files, but since 2008 the deposition of the processed diffraction data, i.e. intensities or structure-factor amplitudes, has been mandatory for each derived coordinate set. At present, all serious scientific journals require the deposition of the coordinates of the structures and the associated diffaction data as well as the submission of a PDB validation report with the manuscript for review. Notable also is a recent initiative by Science of the introduction of a Statistical Board of Reviewing Editors (McNutt, 2014a ▸,b ▸). This is an initiative similar to the practice of some referees insisting on access to the underpinning crystallographic data (Helliwell, 2018 ▸). Certainly, the PDB is an indispensable resource not only for structural biology but for all modern biological, biomedical and biochemical science (Burley et al., 2019 ▸). However, even with diffraction data being a part of every macromolecular crystallographic deposition in the PDB, and even assuming ‘perfect’ data reduction and processing of the original diffraction images, some experimental information, e.g. diffuse scattering, is irrevocably lost. Moreover, our experience shows that quite often, the processing of diffraction data images is far from being perfect: the diffraction data could be processed to higher resolution as software improves, data are sometimes processed in an incorrect space group, the correction for radiation decay may not be optimal, corrupted images can be used during processing, instrument malfunctions are not identified etc. (Zimmerman et al., 2014 ▸). Recovery from such errors is very difficult, sometimes even impossible, and suboptimal, or even incorrect, macromolecular structures are often the result (Weiss et al., 2016 ▸). This can adversely affect subsequent research that uses the structure for data mining, for drug discovery or as a training set for artificial intelligence (AI) programs, for example. An overreliance on the incorrectly processed data in the original publication may mislead or even ruin subsequent research efforts. Not too long ago, the establishment of a repository of macromolecular crystallography diffraction image data sets was perceived to be a ‘mission impossible’ task, mainly because of the prohibitive cost of storage, but also because of the apparent difficulties in organizing such a repository and validating the metadata describing the experiment (Baker, 2017 ▸). However, in the past few years two initiatives have led to large-scale repositories dedicated to diffraction experiments now being available: the Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC, currently with over 3800 experiments and 7000 data sets) (Grabowski et al., 2016 ▸) and the Structural Biology Grid Consortium (SBGrid, currently with 400 diffraction experiments, 500 data sets) (Meyer et al., 2016 ▸). These are complemented by several smaller repositories, measured by the number of data sets available to the public, such as the Australian Store.Synchrotron facility (https://store.synchrotron.org.au/) and the data depository for X-ray lasers (CXIDB, https://www.cxidb.org) which hosts terabyte-range data sets. Universities have also started providing data archives for their researchers, such as the repository at the University of Manchester (http://www.itservices.manchester.ac.uk/ourservices/catalogue/research/servers/archive/). Diffraction image data sets are also deposited in general research data repositories such as Zenodo (https://zenodo.org/). Data sets stored in all these repositories are assigned digital object identifiers or dois, which are widely agreed as a primary requirement. In 2011, the IUCr established the Diffraction Data Deposition Working Group (DDDWG) in order to ‘address the growing calls within the crystallographic community for the deposition of diffraction data images, with some mechanism that allows their retrieval by other scientists for such purposes as re­analysis, software and methods development, validation and review’. In 2017, the DDDWG published its final report along with detailed recommendations (https://www.iucr.org/resources/data/dddwg/final-report), a summary of several community-based workshops and publications arising from them. The top two recommendations were as follows: (i) Authors should provide a permanent and prominent link from their article to the raw data sets which underpin their journal publication and associated database deposition of processed diffraction data (e.g. structure factor amplitudes and intensities) and coordinates, and which should obey the ‘FAIR’ principles that their raw diffraction data sets should be Findable, Accessible, Interoperable and Re-usable (https://www.force11.org/group/fairgroup/fairprinciples). (ii) A registered Digital Object Identifier (doi) should be the persistent identifier of choice (rather than a Uniform Resource Locator, url) as the most sustainable way to identify and locate a raw diffraction data set. In 2018, the IUCr Commission on Biological Macromolecules (CBM) and the IUCr Committee on Data submitted a memorandum to the IUCr Executive Committee and proposed a mechanism for making diffraction experiments publicly available. The goal of ensuring better reproducibility of scientific discoveries in structural biology would be achieved, in part, by: (1) Allowing the scientific community to identify and re-use the original diffraction image data from a diffraction experiment, which is the primary source of information used to determine a particular macromolecular structure. (2) Facilitating structure re-determination using those original diffraction image data. (3) Providing researchers with a straightforward mechanism that will permit assessing the correctness of the structure determination process. (4) Providing a mechanism to ensure that the structures in the PDB and the publications derived from them are of the highest possible quality. IUCr Journals are now taking the lead by encouraging authors to provide a doi for their deposited original raw diffraction data when they submit an article describing a new structure or a new method tested on unpublished diffraction data. In the case of methods developed or tested with raw diffraction data, these data must be available to referees, and deposition of such data will eventually become compulsory. Permanent and prominent links will be provided from articles to the underpinning experimental data of each published research study. We believe that these actions will maintain crystallography at the forefront of the effort for enhancing transparency and reproducibility of scientific results. In addition to the references cited above, readers interested in the hows, whys and whats of diffraction data archiving may be referred to the recent in-depth texts by Guss & McMahon (2014 ▸), Kroon-Batenburg & Helliwell (2014 ▸), Kroon-Batenburg et al. (2017 ▸), Helliwell et al. (2017 ▸), Terwilliger (2014 ▸) and Terwilliger & Bricogne (2014 ▸).
  17 in total

1.  The Protein Data Bank and the challenge of structural genomics.

Authors:  H M Berman; T N Bhat; P E Bourne; Z Feng; G Gilliland; H Weissig; J Westbrook
Journal:  Nat Struct Biol       Date:  2000-11

2.  Believe it or not: how much can we rely on published data on potential drug targets?

Authors:  Florian Prinz; Thomas Schlange; Khusru Asadullah
Journal:  Nat Rev Drug Discov       Date:  2011-08-31       Impact factor: 84.694

3.  Journals unite for reproducibility.

Authors:  Marcia McNutt
Journal:  Science       Date:  2014-11-07       Impact factor: 47.728

4.  Raising the bar.

Authors:  Marcia McNutt
Journal:  Science       Date:  2014-07-04       Impact factor: 47.728

5.  Archiving raw crystallographic data.

Authors:  Thomas C Terwilliger
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2014-09-30

6.  Data management in the modern structural biology and biomedical research environment.

Authors:  Matthew D Zimmerman; Marek Grabowski; Marcin J Domagalski; Elizabeth M Maclean; Maksymilian Chruszcz; Wladek Minor
Journal:  Methods Mol Biol       Date:  2014

7.  Experiences with making diffraction image data available: what metadata do we need to archive?

Authors:  Loes M J Kroon-Batenburg; John R Helliwell
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2014-09-30

8.  How to make deposition of images a reality.

Authors:  J Mitchell Guss; Brian McMahon
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2014-09-30

9.  Continuous mutual improvement of macromolecular structure models in the PDB and of X-ray crystallographic software: the dual role of deposited experimental data.

Authors:  Thomas C Terwilliger; Gerard Bricogne
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2014-09-30

10.  Data publication with the structural biology data grid supports live analysis.

Authors:  Peter A Meyer; Stephanie Socias; Jason Key; Elizabeth Ransey; Emily C Tjon; Alejandro Buschiazzo; Ming Lei; Chris Botka; James Withrow; David Neau; Kanagalaghatta Rajashankar; Karen S Anderson; Richard H Baxter; Stephen C Blacklow; Titus J Boggon; Alexandre M J J Bonvin; Dominika Borek; Tom J Brett; Amedeo Caflisch; Chung-I Chang; Walter J Chazin; Kevin D Corbett; Michael S Cosgrove; Sean Crosson; Sirano Dhe-Paganon; Enrico Di Cera; Catherine L Drennan; Michael J Eck; Brandt F Eichman; Qing R Fan; Adrian R Ferré-D'Amaré; J Christopher Fromme; K Christopher Garcia; Rachelle Gaudet; Peng Gong; Stephen C Harrison; Ekaterina E Heldwein; Zongchao Jia; Robert J Keenan; Andrew C Kruse; Marc Kvansakul; Jason S McLellan; Yorgo Modis; Yunsun Nam; Zbyszek Otwinowski; Emil F Pai; Pedro José Barbosa Pereira; Carlo Petosa; C S Raman; Tom A Rapoport; Antonina Roll-Mecak; Michael K Rosen; Gabby Rudenko; Joseph Schlessinger; Thomas U Schwartz; Yousif Shamoo; Holger Sondermann; Yizhi J Tao; Niraj H Tolia; Oleg V Tsodikov; Kenneth D Westover; Hao Wu; Ian Foster; James S Fraser; Filipe R N C Maia; Tamir Gonen; Tom Kirchhausen; Kay Diederichs; Mercè Crosas; Piotr Sliz
Journal:  Nat Commun       Date:  2016-03-07       Impact factor: 14.919

View more
  4 in total

1.  The Integrated Resource for Reproducibility in Macromolecular Crystallography: Experiences of the first four years.

Authors:  Marek Grabowski; Marcin Cymborowski; Przemyslaw J Porebski; Tomasz Osinski; Ivan G Shabalin; David R Cooper; Wladek Minor
Journal:  Struct Dyn       Date:  2019-11-22       Impact factor: 2.920

2.  Controlled dehydration, structural flexibility and gadolinium MRI contrast compound binding in the human plasma glycoprotein afamin.

Authors:  Andreas Naschberger; Pauline Juyoux; Jill von Velsen; Bernhard Rupp; Matthew W Bowler
Journal:  Acta Crystallogr D Struct Biol       Date:  2019-11-19       Impact factor: 7.652

3.  FACT and FAIR with Big Data allows objectivity in science: The view of crystallography.

Authors:  John R Helliwell
Journal:  Struct Dyn       Date:  2019-10-25       Impact factor: 2.920

4.  Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models.

Authors:  Dariusz Brzezinski; Marcin Kowiel; David R Cooper; Marcin Cymborowski; Marek Grabowski; Alexander Wlodawer; Zbigniew Dauter; Ivan G Shabalin; Miroslaw Gilski; Bernhard Rupp; Mariusz Jaskolski; Wladek Minor
Journal:  Protein Sci       Date:  2020-10-08       Impact factor: 6.993

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.