| Literature DB >> 31768399 |
Marek Grabowski, Marcin Cymborowski, Przemyslaw J Porebski1, Tomasz Osinski1, Ivan G Shabalin, David R Cooper, Wladek Minor.
Abstract
It has been increasingly recognized that preservation and public accessibility of primary experimental data are cornerstones necessary for the reproducibility of empirical sciences. In the field of molecular crystallography, many journals now recommend that authors of manuscripts presenting a new crystal structure should deposit their primary experimental data (X-ray diffraction images) to one of the dedicated resources created in recent years. Here, we describe our experiences developing the Integrated Resource for Reproducibility in Molecular Crystallography (IRRMC) and describe several examples of a crucial role that diffraction data can play in improving previously determined protein structures. In its first four years, several hundred crystallographers have deposited data from over 5200 diffraction experiments performed at over 60 different synchrotron beamlines or home sources all over the world. In addition to improving the resource and curating submitted data, we have been building a pipeline for extraction or, in some cases, reconstruction of the metadata necessary for seamless automated processing. Preliminary analysis indicates that about 95% of the archived data can be automatically reprocessed. A high rate of reprocessing success shows the feasibility of using the automated metadata extraction and automated processing as a validation step for the deposition of raw diffraction images. The IRRMC is guided by the Findable, Accessible, Interoperable, and Reusable data management principles.Entities:
Year: 2019 PMID: 31768399 PMCID: PMC6874509 DOI: 10.1063/1.5128672
Source DB: PubMed Journal: Struct Dyn ISSN: 2329-7778 Impact factor: 2.920
FIG. 1.Synchrotron beamlines with the highest number of structures represented in IRRMC.
Summary of automatic reprocessing and structure redetermination.
| MAD | SAD | MR | Other | Total | |
|---|---|---|---|---|---|
| Attempted reprocessing | 1072 | 737 | 1694 | 83 | 3584 |
| Successfully reprocessed | 1037 | 701 | 1592 | 74 | 3404 |
| Same space group and cell as in PDB | 942 | 596 | 1291 | 57 | 2886 |
| Same space group and cell, same or better resolution as in PDB | 503 | 282 | 637 | 25 | 1447 |
| Successful automatic phasing by SAD | 730 | 505 | … | … | 1235 |
FIG. 2.Resolution achieved by automated reprocessing of data vs the resolution in the original PDB deposit for structures determined by SAD.
FIG. 3.Resolution achieved by automated reprocessing of data vs the resolution in the original PDB deposit for structures determined by MAD.