| Literature DB >> 19171972 |
Abstract
The prospect of phasing diffraction data sets ;de novo' for proteins with previously unseen folds is appealing but largely untested. In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. 15 new cases of diffraction data sets that are unambiguously phased with de novo models are presented. These diffraction data sets represent nine space groups and span a large range of solvent contents (33-79%) and asymmetric unit copy numbers (1-4). No correlation is observed between the ease of phasing and the solvent content or asymmetric unit copy number. Instead, a weak correlation is found with the length of the modeled protein: larger proteins required somewhat less accurate models to give successful molecular replacement. Overall, the results of this survey suggest that de novo models can phase diffraction data for approximately one sixth of proteins with sizes of 100 residues or less. However, for many of these cases, ;de novo phasing with de novo models' requires significant investment of computational power, much greater than 10(3) CPU days per target. Improvements in conformational search methods will be necessary if molecular replacement with de novo models is to become a practical tool for targets without homology to previously solved protein structures.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19171972 PMCID: PMC2631639 DOI: 10.1107/S0907444908020039
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
De novo phasing benchmark
| Minimum | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Structure factors | Model sequence | Space group | No. of residues in model | No. of molecules in ASU | Solvent content (%) | No. of models, 100 CPU days | No. of models, large-scale | Low-resolution models, 100 CPU days | All-atom models, 100 CPU days | All-atom models, large-scale | Models, native constraints | Overall |
| 51 | 1 | 43 | 3.5 × 105 | 1.7 × 107 | — | — | — | 0.882 | 0.882 | |||
| 51 | 1 | 43 | 3.5 × 105 | 1.7 × 107 | — | — | — | 0.627 | 0.627 | |||
| 55 | 1 | 46 | 2.7 × 105 | 4.2 × 105 | — | — | 0.745 | 0.891 | 0.709 | |||
| 55 | 4 | 70 | 2.6 × 105 | 7.4 × 105 | — | — | 0.927 | 0.982 | 0.709 | |||
| 61 | 2 | 72 | 2.3 × 105 | 7.3 × 105 | — | 0.541 | 0.656 | 0.787 | 0.541 | |||
| 61 | 3 | 59 | 2.3 × 105 | 7.3 × 105 | — | 0.672 | 0.689 | 0.836 | 0.639 | |||
| 65 | 1 | 41 | 2.8 × 105 | 2.8 × 105 | — | 0.754 | 0.708 | 0.800 | 0.677 | |||
| 68 | 1 | 47 | 2.4 × 105 | 3.2 × 105 | — | — | — | 0.882 | 0.515 | |||
| 71 | 2 | 35 | 2.0 × 105 | 5.4 × 107 | — | — | — | — | 1.000 | |||
| 71 | 2 | 60 | 2.0 × 105 | 5.4 × 107 | — | — | — | — | 0.901 | |||
| 71 | 1 | 33 | 2.0 × 105 | 5.4 × 107 | — | — | 0.690 | 0.662 | 0.549 | |||
| 71 | 2 | 58 | 2.0 × 105 | 5.4 × 107 | — | — | — | — | 0.915 | |||
| 71 | 1 | 73 | 2.0 × 105 | 5.4 × 107 | — | — | — | 0.549 | 0.549 | |||
| 74 | 1 | 54 | 2.8 × 105 | 4.9 × 105 | 0.649 | 0.622 | 0.500 | 0.635 | 0.419 | |||
| 74 | 4 | 60 | 2.8 × 105 | 4.9 × 105 | — | 0.635 | 0.716 | 0.811 | 0.635 | |||
| 75 | 1 | 43 | 2.3 × 105 | 8.3 × 106 | — | — | — | 0.307 | 0.307 | |||
| 85 | 1 | 28 | 2.3 × 105 | 8.4 × 106 | — | — | — | 0.753 | 0.459 | |||
| 85 | 1 | 33 | 2.3 × 105 | 8.4 × 106 | — | — | — | 0.800 | 0.800 | |||
| 89 | 2 | 49 | 1.7 × 105 | 7.0 × 106 | — | — | — | 0.494 | 0.494 | |||
| 89 | 2 | 46 | 1.7 × 105 | 7.0 × 106 | — | — | — | 0.674 | 0.674 | |||
| 99 | 1 | 51 | 1.6 × 105 | 9.2 × 105 | — | — | — | — | 0.747 | |||
| 105 | 1 | 35 | 1.5 × 105 | 4.4 × 105 | — | — | 0.400 | 0.600 | 0.400 | |||
| 106 | 1 | 43 | 1.8 × 105 | 1.5 × 105 | — | 0.453 | 0.443 | 0.491 | 0.283 | |||
| 106 | 2 | 45 | 1.8 × 105 | 1.5 × 105 | — | — | 0.660 | 0.594 | 0.585 | |||
| 106 | 4 | 42 | 1.8 × 105 | 1.5 × 105 | — | 0.538 | — | 0.689 | 0.538 | |||
| 117 | 2 | 47 | 1.5 × 105 | 1.1 × 105 | — | 0.453 | 0.521 | 0.897 | 0.436 | |||
| 128 | 2 | 57 | 1.2 × 105 | 3.5 × 106 | — | — | 0.508 | 0.398 | 0.398 | |||
| 128 | 1 | 79 | 1.2 × 105 | 3.5 × 106 | — | 0.430 | 0.359 | 0.367 | 0.313 | |||
| 128 | 1 | 41 | 1.2 × 105 | 3.5 × 106 | — | — | — | 0.492 | 0.320 | |||
| 128 | 2 | 43 | 1.2 × 105 | 3.5 × 106 | — | — | 0.398 | 0.422 | 0.398 | |||
F 1 Å is a measure of model accuracy: the fraction of Cα atoms within 1 Å of the crystal structure of the modeled sequence. A dash (—) indicates that no models were found within the specified subset that gave an unambiguous Phaser solution.
The Rosetta-modeled sequences were taken from an in-house curated benchmark used to test de novo modeling; in some cases the sequence does not include terminal segments (typically loops) or particular mutations present in the crystallized sequence.
Results of 100 CPU days per target without all-atom refinement, as is typically achievable by a state-of-the-art computer cluster; application of the same computational effort but including all-atom refinement led to pools of approximately one third the size.
Results from 104–105 CPU days per target, with all-atom refinement, as is achievable with distributed computing.
Out of each pool of de novo models, the 200 models with best energies were tested for molecular replacement.
Out of pools of approximately 50 000 models produced with the de novo method constrained with coarse native information for the backbone torsion angles, 40 models with the lowest Cα r.m.s.d. were tested for molecular replacement.
Minimum F 1 Å that led to an unambiguous Phaser solution among all models tested in this study, including an additional 50 models with the lowest Cα r.m.s.d. to the crystal structure for each set (results not separately shown). These values are used as estimates of the ‘ease of phasing’ for each data set (see Table 2 ▶).
Figure 1New examples of successful molecular replacement with Rosetta de novo models. (a)–(c) and (g)–(i) display correlations of Phaser translation-function Z score (TFZ) with model accuracy (the fraction of Cα atoms within 1 Å of the crystal structure). For each target, the displayed subsets are 200 randomly selected all-atom refined models (black) and 200 models with lowest energy from the 100 CPU-day low-resolution set (gray), from the 100 CPU-day all-atom refined set (magenta) and from the large-scale all-atom refined set (red). The solid line and dashed line display the mean TFZ scores and a cutoff value five standard deviations above the mean TFZ, respectively, in the randomly chosen models. Larger open circles indicate Phaser solutions with correct orientations in the unit cell (see text). (d)–(f) and (j)–(l) give overlays corresponding to each plot in (a)–(c) and (g)–(i), respectively, of the least accurate model that passes the TFZ cutoff value (red, partly transparent), nearly complete models built by ARP/wARP after molecular replacement (green) and the crystal structure (blue). In some cases, the modeled sequence did not include terminal segments present in the crystal structures [see red structures in (d)–(f) and (j)–(l)].
Correlation of different crystallographic parameters with the minimal accuracy of a de novo model required to phase the 30 diffraction data sets in Table 1 ▶
| Crystallographic parameter | ||
|---|---|---|
| No. of modeled residues | −0.592 | 5.7 × 10−4 |
| Highest resolution reflection | 0.232 | 0.22 |
| Lowest resolution reflection | −0.229 | 0.22 |
| No. of copies in asymmetric unit | 0.200 | 0.29 |
| No. of reflections | −0.146 | 0.44 |
| Matthews coefficient ( | −0.095 | 0.62 |
| No. of reflections > 4 Å | −0.091 | 0.64 |
| No. of reflections > 6 Å | −0.079 | 0.68 |
| No. of residues in asymmetric unit | −0.038 | 0.84 |
| Solvent content | −0.022 | 0.91 |
Correlations are to F 1 Å, the fraction of Cα atoms within 1 Å of the crystal structure, of the least accurate model that gives an unambiguous Phaser hit (see Table 1 ▶).
Figure 2Dependence of de novo phasing on crystallographic parameters. The ease of phasing is estimated as the minimal accuracy required for successful molecular replacement (minimum F 1 Å, the fraction of Cα atoms within 1 Å of the crystal structure). No correlation is observed with the crystal solvent content (a) or the number of molecules in each asymmetric unit (b), but a statistically significant correlation is found with the number of residues in the molecular-replacement model (c). See also Table 2 ▶.