| Literature DB >> 20376006 |
Gunnar F Schröder1, Michael Levitt, Axel T Brunger.
Abstract
X-ray diffraction plays a pivotal role in the understanding of biological systems by revealing atomic structures of proteins, nucleic acids and their complexes, with much recent interest in very large assemblies like the ribosome. As crystals of such large assemblies often diffract weakly (resolution worse than 4 A), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, whereas others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex. Determining the structure of such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution better than 5 A generally exceeds the number of degrees of freedom. Here we introduce a method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with R(free) (the free R-factor) determines the optimum deformation and influence of the homology model. For test cases at 3.5-5 A resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model as monitored by coordinate accuracy, the definition of secondary structure and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the Protein Data Bank, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to the study of weakly diffracting crystals using X-ray micro-diffraction as well as data from new X-ray light sources. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to subnanometre resolution, it can use similar tools.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20376006 PMCID: PMC2859093 DOI: 10.1038/nature08892
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Results for the penicillopepsin test calculations using the MLHL target function (experimental phase information)
In all panels, black lines refer to DEN refinements, whereas red lines refer to noDEN refinements. (a) Showing how the (γ, wDEN) grid-search determines the values that give the best Rfree value for the synthetic diffraction data set at dmin=4.5 Å. The Rfree value is contoured using values calculated on a 6 × 5 grid (marked by small ‘+’ signs) where the parameter γ was [0.0, 0.2, 0.4, 0.06, 0.8, 1.0] and wDEN was [3, 10, 30, 100, 300]. For each parameter pair we performed an extensive refinement protocol (Online Methods). The contour plot shows clear minima and maxima with the value of Rfree varying from 0.295 to 0.35. (b) Showing the contour map of the all-atom RMSD between the target structure 3app and the DEN-refined structure (repeat with the lowest Rfree value) at each grid point in (a). Again there are clear minima and maxima with the RMSD varying from 1.47 to 1.60 Å. (c) Showing the Rfree value as a function of dmin of the four synthetic diffraction data sets. Thick lines mark the lowest Rfree values obtained from the ten repeats using the optimum parameters; the corresponding thin lines mark the highest Rfree values. For the synthetic data sets at dmin ≥ 4 Å, DEN refinement performs much better than noDEN reaching lower Rfree values. (d) Showing how Zemla’s GDT (<1 Å) score17, which measures structural similarity to the target structure 3app, varies as a function of dmin; the dashed line indicates the GDT score of the initial model. At all resolutions, DEN out-performs noDEN and gives GDT values that are more favorable (higher) than those of the initial structure. (e) Showing how the RMSD of all atoms to the 3app target structure varies vs. dmin of the four synthetic diffraction data sets. Once again DEN gives lower RMSD values, especially at low-resolution. The DEN-refined models used in (d), and (e) correspond to the best models among ten repeats as assessed by Rfree (black dots in panel (c)). Black ellipses indicate on the contour maps values corresponding to the structure with lowest Rfree value obtained for dmin=4.5Å.
DEN Refinement Improves Structures Refined against Four Synthetic Data Sets of Penicillopepsina
| Rfree | Rfree-Rwork | Ramachandran Score | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Target Function | Resolution(Å) | DEN | noDEN | Improvement | DEN | noDEN | DEN | noDEN | Improvement |
| MLHL | 3.50 | 0.331 | 0.357 | 0.0256 | 0.05 | 0.09 | 0.783 | 0.783 | 0.0000 |
| MLHL | 4.00 | 0.322 | 0.328 | 0.0058 | 0.07 | 0.09 | 0.754 | 0.772 | −0.0184 |
| MLHL | 4.50 | 0.293 | 0.358 | 0.0651 | 0.02 | 0.11 | 0.702 | 0.632 | 0.0699 |
| MLHL | 5.00 | 0.300 | 0.400 | 0.0991 | 0.02 | 0.14 | 0.790 | 0.599 | 0.1912 |
| MLF | 3.50 | 0.378 | 0.390 | 0.0123 | 0.10 | 0.11 | 0.757 | 0.699 | 0.0588 |
| MLF | 4.00 | 0.347 | 0.391 | 0.0445 | 0.09 | 0.15 | 0.732 | 0.658 | 0.0735 |
| MLF | 4.50 | 0.348 | 0.413 | 0.0655 | 0.08 | 0.12 | 0.702 | 0.544 | 0.1581 |
| MLF | 5.00 | 0.341 | 0.425 | 0.0841 | 0.13 | 0.18 | 0.599 | 0.551 | 0.0478 |
| Average | 4.25 | 0.332 | 0.383 | 0.0503 | 0.07 | 0.12 | 0.727 | 0.655 | 0.0726 |
| Minimum | 3.50 | 0.293 | 0.328 | 0.0058 | 0.02 | 0.09 | 0.599 | 0.544 | −0.0184 |
| Maximum | 5.00 | 0.378 | 0.425 | 0.0991 | 0.13 | 0.18 | 0.790 | 0.783 | 0.1912 |
Starting from a homology model of penicillopepsin (PDB 3app) that was built using the endothiapepsin structure (PDB 4ape) as a template with an initial RMSD of 1.7 Å, DEN refinements were performed (Online Methods). DEN refined structures are dramatically improved over noDEN structures, especially at low resolution (>4Å), with an average improvement of 0.078 in Rfree for resolutions of 4.50 and 5.00 Å, with or without phases. At these same resolutions, the secondary structure definition also improved for DEN structures as shown by a higher Ramachandran Score (as determined by Molprobity30). At the higher resolutions of 3.50 and 4.00 Å, the Ramachandran Score only improves without phase information, which shows that DEN provides little new information at higher resolution when experimental phase information is available. As expected, Rfree values are lower when using phase information for both DEN and noDEN refinements with an average improvement of 0.042 for DEN and 0.045 for noDEN. In each column, green shading marks the most favorable maximum or minimum value (high Ramachandran Score or a low R-value);
pink shading marks the least favorable value.
Figure 2Re-refinement of nineteen low-resolution PDB structures
(a) Rfree values of PDB structures refined with DEN (blue) and without DEN (noDEN, orange). In every case the DEN refined structure has the lower Rfree value. For each protein, (γ, wDEN) parameter optimization was performed (Online Methods, Supplementary Fig. 4), and the structure with the lowest Rfree value used for analysis. (b) Fraction of residues in the favored region of the Ramachandran plot as determined by Molprobity30 termed here Ramachandran Score. (c) Significant correlation (correlation coefficient 0.83) is seen between Rfree Improvement and Ramachandran Score Improvement for DEN vs. noDEN.
DEN Refinement Improves Low Resolution Structures in the PDBa
| Rfree | Rfree-Rwork | Ramachandran Score | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PDB Identifier | Resolution(Å) | Number Residues | DEN | noDEN | Improvement | DEN | noDEN | DEN | noDEN | Improvement | Comments |
| 1av1 | 4.00 | 804 | 0.335 | 0.336 | 0.0012 | 0.07 | 0.07 | 0.840 | 0.872 | −0.0314 | |
| 1isr | 4.00 | 448 | 0.233 | 0.237 | 0.0043 | 0.07 | 0.07 | 0.833 | 0.833 | 0.0000 | |
| 1jl4 | 4.30 | 557 | 0.353 | 0.354 | 0.0009 | 0.12 | 0.11 | 0.718 | 0.705 | 0.0127 | |
| 1pgf | 4.50 | 1102 | 0.284 | 0.295 | 0.0108 | 0.08 | 0.11 | 0.856 | 0.804 | 0.0519 | Small differences throughout |
| 1r5u | 4.50 | 3517 | 0.334 | 0.335 | 0.0003 | 0.05 | 0.05 | 0.714 | 0.710 | 0.0046 | |
| 1xdv | 4.10 | 1517 | 0.358 | 0.367 | 0.0089 | 0.12 | 0.11 | 0.780 | 0.783 | −0.0034 | |
| 1xxi | 4.10 | 3532 | 0.407 | 0.465 | 0.0582 | 0.05 | 0.12 | 0.842 | 0.612 | 0.2301 | Large differences (~ 4 A domain motions) |
| 1ye1 | 4.50 | 574 | 0.312 | 0.350 | 0.0381 | 0.08 | 0.15 | 0.894 | 0.705 | 0.1890 | Small differences throughout |
| 1yi5 | 4.20 | 1356 | 0.323 | 0.336 | 0.0139 | 0.07 | 0.09 | 0.758 | 0.709 | 0.0497 | Local differences in several chains |
| 1z9j | 4.50 | 821 | 0.317 | 0.331 | 0.0135 | 0.07 | 0.09 | 0.838 | 0.762 | 0.0761 | Large differences in chain A (domain motion) |
| 2a62 | 4.50 | 319 | 0.340 | 0.353 | 0.0131 | 0.07 | 0.09 | 0.590 | 0.606 | −0.0159 | |
| 2bf1 | 4.00 | 304 | 0.479 | 0.492 | 0.0131 | 0.12 | 0.12 | 0.467 | 0.507 | −0.0400 | |
| 2i36 | 4.10 | 962 | 0.387 | 0.401 | 0.0137 | 0.02 | 0.03 | 0.839 | 0.687 | 0.1520 | Local difference in chain B |
| 2qag | 4.00 | 702 | 0.392 | 0.401 | 0.0091 | 0.02 | 0.02 | 0.616 | 0.614 | 0.0016 | |
| 2vkz | 4.00 | 10941 | 0.327 | 0.337 | 0.0095 | 0.05 | 0.07 | 0.832 | 0.762 | 0.0692 | Large differences in subdomain placements |
| 3bbw | 4.00 | 543 | 0.304 | 0.334 | 0.0304 | 0.01 | 0.04 | 0.876 | 0.776 | 0.0998 | Significant local difference |
| 3crw | 4.00 | 485 | 0.324 | 0.338 | 0.0136 | 0.09 | 0.11 | 0.836 | 0.777 | 0.0589 | Large difference in one domain (hinge motion) |
| 3dmk | 4.19 | 2127 | 0.407 | 0.428 | 0.0211 | 0.08 | 0.11 | 0.742 | 0.653 | 0.0896 | Differences throughout, reference model only 50% |
| 3du7 | 4.10 | 1839 | 0.332 | 0.336 | 0.0039 | 0.09 | 0.09 | 0.730 | 0.707 | 0.0225 | |
| Average | 4.19 | 1708 | 0.345 | 0.359 | 0.0146 | 0.07 | 0.09 | 0.768 | 0.715 | 0.0535 | |
| Minimum | 4.00 | 304 | 0.233 | 0.237 | 0.0003 | 0.01 | 0.02 | 0.467 | 0.507 | −0.0400 | |
| Maximum | 4.50 | 10941 | 0.479 | 0.492 | 0.0582 | 0.12 | 0.15 | 0.894 | 0.872 | 0.2301 | |
Nineteen PDB structures were re-refined with and without DEN (Online Methods). The tested proteins show a wide range of sizes extending from 304 residues for 2bf1 to 10941 residues for 1vkz. The final Rfree, Rfree − Rwork values, as well as Ramachandran Scores are shown. In all cases, DEN refinement shows improvement of Rfree as compared to noDEN; eleven out of nineteen cases show an Rfree improvement that is larger than 0.01. In fifteen of the nineteen cases DEN refinement also improves the Ramachandran Score (four exceptions are 2bf1, 1av1, 2a62 and 1xdv). As would be expected Rfree is larger than Rwork (the R-factor that was optimized) with average differences of 0.07 and 0.09 for DEN and noDEN refinement, respectively. In each column, green shading marks the most favorable maximum or minimum value (high Ramachandran Score or low R-value);
pink shading marks the least favorable value. The comments refer to the differences between the reference models and the corresponding DEN-refined crystal structures for the cases with γ<1 (cf. Supplementary Table 4). Two particular examples of these differences are shown in Supplementary Fig. 5.
Figure 3Electron density map improvement upon DEN refinement for three structures 3dmk, 1ye1, and 1xxi
The 1ye1 (c,d) and 1xxi (e,f) structures are among the cases that benefit most from DEN refinement, whereas the 3dmk (a,b) structure showed only moderate improvement of the Rfree value (Table 2). Nevertheless, in all three cases DEN refinement dramatically improves the electron density maps. The structures refined with DEN (DEN, in blue) and without DEN (noDEN, in orange) are superimposed, and the corresponding phase combined σA-weighted 2Fo-Fc electron density maps are shown in blue and red, respectively. The density maps for 3dmk and 1xxi were B-factor sharpened (B = −50 Å2) and the contour level was set to 1.5 σ.
Figure 4DEN provides information for degrees of freedom that are weakly defined by the experimental diffraction data
(a) Showing DEN (green) and noDEN (red) histograms of, RMSDD, the root-mean-square deviation of DEN restraint distances in the target structure (3app) from those in the ten refinement repeats (starting from the 4ape initial model with dmin=4.5Å, the MLHL target function24, and DEN optimum parameters (γ,wDEN)=(0,10); see Fig. 1a). The largest RMSDD is much smaller for DEN compared to noDEN. Inset: the RMS Fluctuations of each distance over the ten repeats of noDEN refinement (RMSF) are plotted against RMSDD for DEN (b, green) and noDEN (c, red). Large RMSF values (>1.5 Å) represent the DEN distances that are not well defined by the diffraction data. For DEN, these distances have small RMSDD values (<1.0 Å) whereas for noDEN they have large RMSDD values. Restraint distances are much closer to the distances in the target structure for DEN, which effectively provides information missing from low-resolution experimental data.