| Literature DB >> 22505269 |
Robbie P Joosten1, Krista Joosten, Garib N Murshudov, Anastassis Perrakis.
Abstract
Developments of the PDB_REDO procedure that combine re-refinement and rebuilding within a unique decision-making framework to improve structures in the PDB are presented. PDB_REDO uses a variety of existing and custom-built software modules to choose an optimal refinement protocol (e.g. anisotropic, isotropic or overall B-factor refinement, TLS model) and to optimize the geometry versus data-refinement weights. Next, it proceeds to rebuild side chains and peptide planes before a final optimization round. PDB_REDO works fully automatically without the need for intervention by a crystallographic expert. The pipeline was tested on 12 000 PDB entries and the great majority of the test cases improved both in terms of crystallographic criteria such as R(free) and in terms of widely accepted geometric validation criteria. It is concluded that PDB_REDO is useful to update the otherwise `static' structures in the PDB to modern crystallographic standards. The publically available PDB_REDO database provides better model statistics and contributes to better refinement and validation targets.Entities:
Mesh:
Year: 2012 PMID: 22505269 PMCID: PMC3322608 DOI: 10.1107/S0907444911054515
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Model-quality metrics
| Metric | Description |
|---|---|
| The standard | |
| Like | |
| The expected ratio of | |
| The weighted | |
| The weighted free | |
| σ( | The estimated standard deviation of |
| The expected | |
| The | |
| The maximal allowed | |
| fit(ρ) | The weighted mean fit of a group of atoms |
| The Wilson | |
| r.m.s. | The root-mean-square |
| r.m.s. | Like r.m.s. |
Extracted from the header of the input PDB file.
Calculated by REFMAC before refinement.
Calculated during TLS refinement in REFMAC directly after resetting the B factors.
Used as a cutoff value for picker.
Complex refers to the model with the most (B-factor-related) parameters and simple to the model with fewest.
Calculated by REFMAC after the final refinement.
The terms are swapped to compensate for the ‘lower-is-better’ nature of R free.
Calculated by WHAT_CHECK for the input PDB file.
Calculated by WHAT_CHECK for the final model.
Programs in the PDB_REDO pipeline
| Program | Software suite | Application in |
|---|---|---|
| Removes unwanted atoms and edits LINK records in PDB files | ||
| Checks and standardizes reflection data in mmCIF files | ||
| Extracts the description of the structure model and refinement from a PDB file | ||
| Compares | ||
| Fixes chirality errors | ||
| Selects | ||
| Selects the best refinement from a set | ||
| Removes waters | ||
| Real-space rebuilds side chains and add missing ones | ||
| Flips peptide planes | ||
| Parses | ||
| Performs reciprocal-space refinement | ||
| Checks TLS-group definitions and converts total | ||
| Converts reflection data from mmCIF to MTZ format | ||
| Converts reflection data from MTZ to mmCIF format | ||
| Converts reflection intensities to amplitudes | ||
| Manipulates MTZ files | ||
| Merges MTZ files | ||
| Creates all possible reflections given unit-cell parameters and resolution | ||
| Creates and completes | ||
| Calculates completeness, twinning fraction and | ||
| — | Assigns secondary structure | |
| Validates carbohydrates in structure model | ||
| Validates the structure model | ||
| Converts | ||
| Creates scenes for result visualization |
Structure-model categories
| Cutoff values | ||
|---|---|---|
| Category | Reflections per atom | Data resolution |
| xlow | <1.0 reflections per atom | Resolution ≥ 5.00 Å |
| vlow | 1.0 ≤ reflections per atom < 2.5 | 3.50 Å ≤ resolution < 5.00 Å |
| low | NA | 2.80 Å ≤ resolution < 3.50 Å |
| medium | NA | 1.70 Å ≤ resolution < 2.80 Å |
| high | NA | 1.20 Å ≤ resolution < 1.70 Å |
| atomic | NA | Resolution < 1.20 Å |
Reflections per atom takes precedence over data resolution.
In these categories only resolution cutoffs are used.
Figure 1YASARA scene showing the changes made to PDB entry 2ask (Silvian et al., 2006 ▶) by PDB_REDO. The atoms are coloured by atomic shift, with warmer colours marking larger shifts. (a) Overview of the structure model with atoms as spheres. Grey atoms were newly built by SideAide. (b) The residue with the greatest atomic shift (ArgA85) before (left) and after PDB_REDO. The side chain is moved to a completely different rotamer. (c) The rotamer change in Leu112 has led to a large displacement of the Cδ atoms, whereas the Cγ atom has hardly moved. (d) HisA32 with typical colouring for a side-chain flip. In the new conformation the side chain makes hydrogen bonds (thin orange rods) to sulfate A504 and water A508.
Whole data-set averages for model-quality metrics
| Metric | PDB entry | Re-refined model | Final model |
|---|---|---|---|
| 19.8 | 18.3 | 18.4 | |
| 24.0 | 22.0 | 22.2 | |
| Ramachandran plot | −1.30 | −0.66 | −0.61 |
| Side-chain rotamers | −1.21 | −0.69 | −0.24 |
| Coarse packing | −0.24 | −0.16 | −0.12 |
| Fine packing | −0.97 | −0.85 | −0.70 |
| No. of atomic bumps | 108 | 78 | 82 |
| No. of unsatisfied hydrogen-bond donors/acceptors | 43 | 37 | 37 |
Values extracted from the PDB header.
Model-normality Z scores from WHAT_CHECK with respect to a test set of 500+ high-resolution structure models. Higher values are better.
Figure 2Traffic-light diagrams of change in structure model-quality metrics after re-refinement (left column) and after full model optimization (re-refined and rebuilt; right column) for 12 000 structure models. Green bars represent improved structure models and red bars deteriorated models. Models are considered to be the same (yellow bars) if |ΔR free| ≤ 2σ(R free), |ΔZ score| ≤ 0.1 (for Ramachandran plot, rotamers, coarse and fine packing), |Δ(No. of bumps)| ≤ 10 or |Δ(No. of unsatisfied hydrogen-bond donors/acceptors)| ≤ 2.
Figure 3Box-and-whisker plot of the Ramachandran plot Z score (higher is better) of the original PDB entries (white) and the fully optimized PDB_REDO models (grey) in 0.2 Å resolution bins; the size of each bin is given in the bar chart. One severe outlier, PDB entry 2ac3 (Jauch et al., 2005 ▶), was caused by a TLS-related bug in PDB_REDO. With the latest version of PDB_REDO, the final Z score is −1.1.
Figure 4Box-and-whisker plots of R free extracted from the PDB header and for the fully optimized PDB_REDO models in 0.2 Å resolution bins; the size of each bin is given in the bar chart. The data are divided into two subsets: models for which the initial re-refinement was successful (10 662 models; left) and models for which it failed (1338 models; right). In the case of successful re-refinement R free improves over the entire resolution range. The five marked outliers were tested with a new version of PDB_REDO: PDB entry 1ocw (James et al., 2003 ▶) was removed from PDB_REDO because R head could not be reproduced, 2ac3 (Jauch et al., 2005 ▶) and 1u74 (Kang et al., 2004 ▶) were no longer outliers and 2cvf (Akiba et al., 2005 ▶) and 2bx5 (James et al., 2007 ▶) could no longer be re-refined successfully and will be investigated further. If the initial re-refinement fails, R free typically increases with many severe outliers.
Figure 5Percentage of structures in the test set as a function of the number of model-quality metrics (see Fig. 2 ▶) that improve (grey; left) or deteriorate (black; right). 85% of the structures improve in three metrics or more, whereas only 6% of the structures deteriorate in three metrics or more.
Figure 6Overall bond-angle deviations from target values expressed as root-mean-square Z scores (calculated by WHAT_CHECK). Each point is the average of all values in a 0.2 Å resolution bin. Only models with successful initial re-refinement were used. The values in the PDB (wcori; solid line) follow a downward trend to 1.9 Å and then level off; the values after full optimization in PDB_REDO (wcfin; dashed line) follow a downward trend to 2.7 Å and then increase. Bond-length deviations (not shown) follow the same trend.