| Literature DB >> 29174494 |
Swanand Gore1, Eduardo Sanz García1, Pieter M S Hendrickx1, Aleksandras Gutmanas2, John D Westbrook3, Huanwang Yang3, Zukang Feng3, Kumaran Baskaran4, John M Berrisford1, Brian P Hudson3, Yasuyo Ikegawa5, Naohiro Kobayashi5, Catherine L Lawson3, Steve Mading4, Lora Mak1, Abhik Mukhopadhyay1, Thomas J Oldfield1, Ardan Patwardhan1, Ezra Peisach3, Gaurav Sahni1, Monica R Sekharan3, Sanchayita Sen1, Chenghua Shao3, Oliver S Smart1, Eldon L Ulrich4, Reiko Yamashita5, Martha Quesada3, Jasmine Y Young3, Haruki Nakamura5, John L Markley4, Helen M Berman3, Stephen K Burley6, Sameer Velankar1, Gerard J Kleywegt1.
Abstract
The Worldwide PDB recently launched a deposition, biocuration, and validation tool: OneDep. At various stages of OneDep data processing, validation reports for three-dimensional structures of biological macromolecules are produced. These reports are based on recommendations of expert task forces representing crystallography, nuclear magnetic resonance, and cryoelectron microscopy communities. The reports provide useful metrics with which depositors can evaluate the quality of the experimental data, the structural model, and the fit between them. The validation module is also available as a stand-alone web server and as a programmatically accessible web service. A growing number of journals require the official wwPDB validation reports (produced at biocuration) to accompany manuscripts describing macromolecular structures. Upon public release of the structure, the validation report becomes part of the public PDB archive. Geometric quality scores for proteins in the PDB archive have improved over the past decade.Entities:
Keywords: 3D macromolecular structure; PDB; biocuration; data archiving; data deposition; structural biology; structure data quality; validation; wwPDB
Mesh:
Year: 2017 PMID: 29174494 PMCID: PMC5718880 DOI: 10.1016/j.str.2017.10.009
Source DB: PubMed Journal: Structure ISSN: 0969-2126 Impact factor: 5.006
Figure 1Summary Quality Metrics in the wwPDB Validation Reports
Sliders (top) and residue plots (bottom). (A) relatively good structure; (B) relatively poor structure. The solid sliders report on how a given structure ranks relative to all structures in the PDB. The open sliders report on the comparison with structures derived in a similar fashion (X-ray crystallographic structures are compared with other X-ray structures solved at a similar resolution, while NMR and EM structures are ranked relative to other NMR and EM structures in the PDB, respectively). Residue sequence plots flag residues that have unusual geometry features (i.e., bond length, bond angle, Ramachandran, RNA suiteness, or other torsion-angle outliers). Residues are color coded as follows: green, no geometric outliers; yellow, 1 type of outliers; orange 2 types of outliers; red, 3 or more types of outliers; gray, atomic coordinates not available; cyan, atomic coordinates are ill-defined by the NMR ensemble. For X-ray crystal structures, a red dot above a residue indicates a poor fit to electron density (RSRZ > 2).
Key Validation Metrics Reported in the wwPDB Structure Validation Reports and Used for Percentile Rank Calculation
| Metric | Details | Software Package and References |
|---|---|---|
| cross-validation of goodness of fit between the model and the experimental diffraction data not used for refinement. Applicable to crystallographic structures | DCC ( | |
| Clashscore | number of too-close contacts in an entry normalized per 1,000 atoms | MolProbity ( |
| Ramachandran outliers | fraction of polypeptide residues deemed to have very unusual backbone conformation (<0.5% of those observed in a high-quality reference set) | MolProbity ( |
| Side-chain outliers | fraction of polypeptide residues in non-rotameric side-chain conformations (<0.5% of those observed in a high-quality reference set) | MolProbity ( |
| fraction of polypeptide and/or polynucleotide residues that do not fit the electron density well when compared with other instances of the same residues in structures at similar resolution. Applicable to crystallographic structures | EDS ( | |
| RNA backbone | average score over all RNA nucleotides in the entry indicating the quality of the observed RNA backbone conformation | MolProbity ( |
Modes of Validation Pipeline Invocation
| Mode of Execution | Distinct Features of the Report | Access |
|---|---|---|
| Web-service API | preliminary, as input files may not have final nomenclature, optionally accepts experimental data. Watermarked “Preliminary” | installation instructions at |
| Stand-alone web server | preliminary, optionally accepts experimental data. Watermarked “Preliminary” | |
| Deposition interface | preliminary, contains deposition session identifier, requires experimental data for X-ray crystal, NMR, and EM structures. Watermarked “Preliminary” | |
| Biocuration | complete, as the input files have been updated to conform to PDB standards and nomenclature, confidential, recommended by wwPDB to accompany manuscript submissions, contains PDB entry code and title. Watermarked “Confidential” | only accessible by wwPDB biocurators |
| Public release | complete, includes PDB entry code, title, and authors. Not watermarked |
Figure 2List of 25 Journals, which Publish Most Papers Describing PDB Structures, Ranked According to Their Citation in the PDB from 2012 to 2016
Journals that require wwPDB validation reports for manuscript review are shown in black, while the ones that do not yet require the reports are shown in gray. Note that obsoleted entries are only included when calculating these statistics if they were superseded by a different PDB entry. Obsoleted (retracted) entries were excluded.
Component Software Packages Included in the 2017 Version of the Validation Pipeline
| Software Package | Which Section and Metric of the Report the Package Is Used for | Reference |
|---|---|---|
| MolProbity | model geometry: bond lengths and bond angles of standard protein residues and nucleotides, too-close contacts, Ramachandran outliers, rotamer outliers, RNA suiteness | |
| MAXIT | model geometry: symmetry-related too-close contacts, stereochemistry issues, identification of | Maxit (Z.F., |
| Mogul | model geometry: bond-length and bond-angle outliers in small molecules | |
| Xtriage (Phenix) | crystallographic data and refinement statistics: signal-to-noise, twinning | |
| DCC | crystallographic data and refinement statistics: | |
| EDS | fit to crystallographic data: real-space | |
| Cyrange | NMR ensemble composition: identification of well-defined protein cores | |
| RCI | NMR chemical shifts: prediction of protein backbone order parameter from chemical shifts | |
| PANAV | NMR chemical shifts: suggested referencing corrections in chemical shift assignments |
Step-by-Step Flow of Validation Pipeline Web-Service API
| Step | Description |
|---|---|
| Start a new validation session | returns a new unique code to reference subsequent API steps |
| Upload data files | coordinate models and supporting experimental data files (e.g., X-ray structure-factor amplitudes, NMR chemical shifts) |
| Submit pipeline execution request | queue the session for execution |
| Check completion status | return a completion status for the current session (e.g., queued, running, successfully completed, or failed completion) |
| Session file inventory | return a list of data and result files within the current session |
| Download output files | recover a session result file |
Figure 3Trends in Geometric Quality Metrics for Protein Structures in the PDB
Trends between 1995 and 2016 of geometric validation scores for X-ray crystal and NMR entries in the PDB as reported by MolProbity (Chen et al., 2010).
(A–C) Validation metrics for X-ray crystal structures: (A) Ramachandran outliers; (B) rotamer outliers; (C) clashscore.
(D–F) Metrics for well-defined regions of Solution NMR structures: (D) Ramachandran outliers; (E) rotamer outliers; (F) clashscore.
In each plot, the thick red line represents the median value of each metric for the given year, the box shows the quartile range (25%–75%), and the whiskers show the 1%–99% range. The worst and the best 1% of entries (outside of the whisker range) are plotted as dots.
Figure 4Trends in Geometric Quality Metrics for Small Molecules in the PDB
Trends between 1995 and 2016 of bond length and bond-angle RMSZ metrics as determined by Mogul (Bruno et al., 2004) for small molecules in X-ray crystal structures in the PDB at better than 2.5 Å resolution. (A) ligands with 1–20 non-hydrogen atoms; (B) ligands with 21–40 non-hydrogen atoms; (C) ligands with 41–60 non-hydrogen atoms. In each box plot, the thick red line represents the median value per year, the box shows the interquartile range (25%–75%), and the whiskers show the 1%–99% range. Values outside of the whisker range are plotted as dots. In each plot, the top panel shows the bond length RMSZ metric, the middle panel shows the bond-angle RMSZ metric, and the bottom panel shows the number of such ligands deposited in each year.