| Literature DB >> 21541343 |
John C Faver1, Mark L Benson, Xiao He, Benjamin P Roberts, Bing Wang, Michael S Marshall, C David Sherrill, Kenneth M Merz.
Abstract
The routine prediction of three-dimensional protein structure from sequence remains a challenge in computational biochemistry. It has been intuited that calculated energies from physics-based scoring functions are able to distinguish native from nonnative folds based on previous performance with small proteins and that conformational sampling is the fundamental bottleneck to successful folding. We demonstrate that as protein size increases, errors in the computed energies become a significant problem. We show, by using error probability density functions, that physics-based scores contain significant systematic and random errors relative to accurate reference energies. These errors propagate throughout an entire protein and distort its energy landscape to such an extent that modern scoring functions should have little chance of success in finding the free energy minima of large proteins. Nonetheless, by understanding errors in physics-based score functions, they can be reduced in a post-hoc manner, improving accuracy in energy computation and fold discrimination.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21541343 PMCID: PMC3081830 DOI: 10.1371/journal.pone.0018868
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Example model systems used to build-up interactions in proteins.
Accurate interaction energies for the model systems are assumed to yield accurate global interaction energies for a folded protein.
Figure 2Distortions in computed energy landscapes due to error propagation.
If each microstate of a protein under study contains a significant amount of error in its calculated energy (shown here as error bars), computed folding surfaces become distorted with respect to the actual folding surface. This effect introduces difficulty in distinguishing between local minima on the folding surface and in finding the native folds of proteins. This effect is magnified for especially large proteins with many intramolecular contacts contributing to their stable protein folds.
Interaction Energy Error Statistics of the 1UBQ Fragment Database.
| Method | μAll | σ2 All | μVdW | σ2 VdW | μPolar | σ2 Polar | R-factora |
| GAFF | 0.36 | 3.26 | 0.25 | 0.36 | 0.46 | 5.64 | 0.127 |
| FF99SBb | 0.73 | 4.04 | 0.12 | 1.27 | 1.22 | 5.83 | 0.170 |
| FF03b | 0.83 | 6.61 | 0.18 | 0.81 | 1.36 | 10.86 | 0.259 |
| AM1 | 3.15 | 9.50 | 1.04 | 0.70 | 4.85 | 10.28 | 0.373 |
| PM3 | 2.65 | 7.89 | 0.14 | 0.77 | 4.67 | 4.59 | 0.352 |
| PM6 | 1.67 | 2.24 | 0.84 | 0.32 | 2.34 | 2.82 | 0.211 |
| PM6-DH2 | 0.30 | 1.23 | −0.09 | 0.10 | 0.62 | 1.95 | 0.071 |
| PDDG | 3.21 | 16.23 | −0.62 | 0.90 | 6.30 | 7.48 | 0.484 |
| HF/6-31G* | 1.94 | 1.32 | 2.27 | 1.14 | 1.68 | 1.30 | 0.153 |
| HF/aDZ | 2.14 | 1.22 | 2.29 | 1.11 | 2.02 | 1.29 | 0.176 |
| HF/aTZ | 2.10 | 1.17 | 2.28 | 1.10 | 1.95 | 1.17 | 0.171 |
| HF/aQZ | 2.08 | 1.16 | 2.28 | 1.10 | 1.93 | 1.15 | 0.170 |
| MP2/6-31G* | 1.24 | 0.64 | 1.12 | 0.28 | 1.34 | 0.91 | 0.146 |
| MP2/aDZ | 0.48 | 0.16 | 0.21 | 0.01 | 0.69 | 0.19 | 0.061 |
| MP2/aTZ | 0.16 | 0.02 | 0.05 | 0.00 | 0.24 | 0.02 | 0.023 |
| B97-D/TZVP | 0.20 | 1.06 | −0.29 | 0.02 | 0.60 | 1.58 | 0.087 |
| M06/6-31G* | 0.75 | 0.42 | 0.63 | 0.12 | 0.85 | 0.64 | 0.104 |
| M06/aTZ | 0.73 | 0.16 | 0.57 | 0.08 | 0.85 | 0.19 | 0.090 |
| M06-L/6-31G* | 0.71 | 0.43 | 0.40 | 0.10 | 0.96 | 0.57 | 0.103 |
| M06-L/aTZ | 0.75 | 0.14 | 0.55 | 0.07 | 0.91 | 0.14 | 0.096 |
Mean and variance of interaction energy deviations (kcal/mol) from reference energies (a mix of MP2/CBS and CCSD(T)/CBS) for the interacting fragment molecules present in ubiquitin. The set of fragments was divided into 42 van der Waals interactions and 50 polar interactions. The related plots are presented in Table S1. a) The calculated R-factor serves as an analogy to the residual minimized in crystallographic structure refinement. A desirable value of R-factor would be less than 0.1. b) The force field- based atomic charge parameters were scaled to yield correct net charge on each fragment system.
Figure 3Histogram and probability density functions describing errors in B97-D/TZVP absolute electronic interaction energies of molecular fragments built from the native fold of ubiquitin.
Figure 4Dependence of random error estimates on chain length.
Larger protein folds have more intramolecular interactions and thus larger propagated random errors in evaluated total energies. This effect is expected to lead to difficulty in predicting the native folds of large proteins since it leads to unpredictable distortions in the overall energy surface.