| Literature DB >> 25619999 |
Gábor Bunkóczi1, Björn Wallner2, Randy J Read3.
Abstract
Predicted structures submitted for CASP10 have been evaluated as molecular replacement models against the corresponding sets of structure factor amplitudes. It has been found that the log-likelihood gain score computed for each prediction correlates well with common structure quality indicators but is more sensitive when the accuracy of the models is high. In addition, it was observed that using coordinate error estimates submitted by predictors to weight the model can improve its utility in molecular replacement dramatically, and several groups have been identified who reliably provide accurate error estimates that could be used to extend the application of molecular replacement for low-homology cases.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25619999 PMCID: PMC4321884 DOI: 10.1016/j.str.2014.11.020
Source DB: PubMed Journal: Structure ISSN: 0969-2126 Impact factor: 5.006
Figure 1Typical LLG versus GDT_TS Scatter Plots Observed for Targets
(A) Target TR705 contains two domains and refinement of one of these was requested. If the second domain is not taken into account in the likelihood calculations, the black curve is obtained, which shows no correlation between the two scores. However, by taking the contribution from the second domain into account (grey curve), a clear correlation is obtained (for scores shown, the contribution of the second domain alone is subtracted for the plot). ASU, asymmetric unit.
(B) Uninformative LLG plot for target T0653 with all models falling into the low accuracy zone.
(C) Very sensitive LLG plot for target T0717, domain 2 (taking the unpredicted domain 1 into account). Predictors have managed to model residues Val67 to Gly119 (out of 166 residues) very accurately, and this gives a clear signal in scoring with the 1.9 Å X-ray data. For the “outlier” models above GDT_TS = 35, the accuracy of the named residue segment is comparable with that of the rest of the structure.
(D) Atypically small signal observed for target T0704.
Summary of Results for Groups that Submitted Meaningful Error Estimates, Compared with the Three Best Structure-Only Predictors
| Code | Name | % Rms B Factor | I Score Constant B Factor | I Score Rms B Factor | Models above Baseline (%) | Citation |
|---|---|---|---|---|---|---|
| TS026 | ProQ2clust | 68 | −0.304 | −0.149 | 14.5 | |
| TS088 | Panther | 77 | −0.534 | −0.426 | 2.7 | |
| TS130 | Pcomb | 66 | −0.276 | −0.098 | 13.7 | |
| TS273 | IntFOLD2 | 81 | −0.416 | −0.248 | 6.5 | |
| TS277 | Bilab-ENABLE | 42 | −0.429 | −0.327 | 6.0 | |
| TS280 | ProQ2clust2 | 66 | −0.293 | −0.122 | 15.1 | |
| TS285 | McGuffin | 59 | −0.268 | −0.153 | 11.3 | |
| TS388 | ProQ2 | 80 | −0.308 | −0.204 | 11.2 | |
| TS479 | Boniecki_LoCoGRef | 55 | −0.465 | −0.408 | 7.2 | |
| TS498 | IntFOLD | 48 | −0.411 | −0.380 | 6.8 | |
| TS028 | YASARA | NA | −0.183 | 9.8 | ||
| TS301 | LEE | NA | −0.200 | 9.3 | ||
| TS330 | BAKER-ROSETTASERVER | NA | −0.186 | 12.4 |
%Rms B factor is the percentage of models for which B factors calculated from submitted error estimates gave the highest LLG score from all B-factor schemes evaluated. I scores are defined in Equation 4. Models above baseline indicate the percentage of models yielding higher LLG scores than the corresponding baseline structures used in the I score calculation. NA, no data available; rms, root mean square.
Figure 2Average Z Scores for Predictors Calculated with All Three B-Factor Schemes
In the original scheme, the numbers appearing in the B-factor field were used as is; in the root mean square (Rms) scheme, these were converted into a B factor using Equation 1 and, in the constant scheme, these were set to a constant number.
Figure 3Number of Targets Improved upon the Baseline Structure, Taking into Account All Three B-Factor Schemes
Improvement Scores for Borderline Molecular Replacement Models, Comparing the Effect of Error Estimates Using Structure-Based and Sequence-Based Alignments
| Target | Template | Improvement (%) | ||||
|---|---|---|---|---|---|---|
| Code | No. of Residues | Resolution (Å) | Code | Identity (%) | LSQMAN | MUSCLE |
| 2har | 263 | 1.90 | 1fby_a | 15 | 80.73 | 114.79 |
| 1w69 | 390 | 2.20 | 2alx_a | 19 | −8.20 | −7.89 |
| 1vyg | 135 | 2.40 | 3elx_a | 21 | 12.49 | 6.16 |
| 1vyg | 135 | 2.40 | 2f73_a | 28 | 30.89 | 18.55 |
| 1vyg | 135 | 2.40 | 1crb_a | 28 | 23.00 | 33.32 |
| 1u2y | 496 | 1.95 | 1bli_a | 14 | −10.36 | 58.67 |
| 1lke | 184 | 1.90 | 2hzq_a | 21 | 39.74 | 73.15 |
| 1lke | 184 | 1.90 | 1z24_a | 32 | 22.63 | −1.85 |
| 1yhf | 115 | 2.00 | 2b8m_a | 12 | 28.41 | 29.55 |
| 1ot2 | 686 | 2.10 | 3edd_a | 18 | −0.24 | 29.15 |
| 1p3c | 215 | 1.50 | 1mza_a | 17 | 17.81 | −21.49 |
| 1icn | 131 | 1.74 | 2ft9_a | 30 | 36.03 | 23.79 |
| 1z07 | 166 | 1.81 | 1r4a_a | 20 | 41.01 | 72.38 |
| 1z07 | 166 | 1.81 | 1zd9_a | 23 | 63.00 | 74.62 |
| 1dzx | 215 | 2.18 | 2irp_a | 23 | 15.82 | 24.83 |
| 1eem | 241 | 2.00 | 1fw1_a | 22 | 52.94 | 31.34 |
| 2ikg | 316 | 1.43 | 1pz1_a | 19 | 66.45 | 26.14 |
| 1t40 | 316 | 1.80 | 1pz1_a | 19 | 80.87 | 20.99 |
| 7taa | 478 | 1.99 | 3dhu_a | 17 | 25.36 | −11.44 |
| 1e0s | 174 | 2.28 | 2eqb_a | 16 | 25.04 | −27.30 |
The resolution column corresponds to the resolution of the data used for the calculation and not the full resolution of the data. Improvement is defined as the difference between the error-weighted LLG, computed using B factors calculated from coordinate errors predicted using ProQ2, and the LLG, computed using constant B factors, normalized by the absolute value of the LLG calculated with the constant B-factor scheme.