| Literature DB >> 27070572 |
Ivan Getov1, Marharyta Petukh2, Emil Alexov3.
Abstract
UNLABELLED: Folding free energy is an important biophysical characteristic of proteins that reflects the overall stability of the 3D structure of macromolecules. Changes in the amino acid sequence, naturally occurring or made in vitro, may affect the stability of the corresponding protein and thus could be associated with disease. Several approaches that predict the changes of the folding free energy caused by mutations have been proposed, but there is no method that is clearly superior to the others. The optimal goal is not only to accurately predict the folding free energy changes, but also to characterize the structural changes induced by mutations and the physical nature of the predicted folding free energy changes. Here we report a new method to predict the Single Amino Acid Folding free Energy Changes (SAAFEC) based on a knowledge-modified Molecular Mechanics Poisson-Boltzmann (MM/PBSA) approach. The method is comprised of two main components: a MM/PBSA component and a set of knowledge based terms delivered from a statistical study of the biophysical characteristics of proteins. The predictor utilizes a multiple linear regression model with weighted coefficients of various terms optimized against a set of experimental data. The aforementioned approach yields a correlation coefficient of 0.65 when benchmarked against 983 cases from 42 proteins in the ProTherm database. AVAILABILITY: the webserver can be accessed via http://compbio.clemson.edu/SAAFEC/.Entities:
Keywords: MM/PBSA method; energy calculation; folding free energy; missense mutation
Mesh:
Substances:
Year: 2016 PMID: 27070572 PMCID: PMC4848968 DOI: 10.3390/ijms17040512
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The effect of different dielectric constants for charged (CRG), polar (PLR), and other (OTR) amino acid groups on the correlation coefficient. The value of the correlation coefficient is represented in color, and the color scheme is on the right of the figure. We performed multiple linear regression analysis of predicted folding free energy against experimental data using EE, VE, and SP energy terms.
Figure 2The distribution of the absolute values of the within sDB (statistical dataset).
The probabilities of the wild type (WT) and mutant type (MT) residues to cause “large effect”. The probability (P) values were calculated using unperturbed (original) ||. The P′ values were calculated using altered |altered| based on Equation (2).
| WT | MT | |||||
|---|---|---|---|---|---|---|
| AA | Total Cases | Total Cases | ||||
| A | 91 | 0.60 | 0.56 | 374 | 0.54 | 0.53 |
| C | 7 | 0.69 | 0.55 | 36 | 0.44 | 0.51 |
| D | 52 | 0.48 | 0.50 | 34 | 0.58 | 0.63 |
| E | 93 | 0.44 | 0.48 | 43 | 0.62 | 0.64 |
| F | 52 | 0.72 | 0.69 | 85 | 0.45 | 0.41 |
| G | 79 | 0.52 | 0.57 | 167 | 0.68 | 0.69 |
| H | 41 | 0.36 | 0.37 | 10 | 0.78 | 0.79 |
| I | 103 | 0.66 | 0.63 | 65 | 0.40 | 0.32 |
| K | 114 | 0.22 | 0.20 | 32 | 0.47 | 0.45 |
| L | 92 | 0.83 | 0.80 | 69 | 0.33 | 0.25 |
| M | 23 | 0.58 | 0.59 | 15 | 0.43 | 0.45 |
| N | 40 | 0.62 | 0.60 | 19 | 0.65 | 0.67 |
| P | 42 | 0.35 | 0.36 | 16 | 0.40 | 0.36 |
| Q | 25 | 0.25 | 0.30 | 41 | 0.26 | 0.30 |
| R | 37 | 0.51 | 0.49 | 20 | 0.62 | 0.53 |
| S | 40 | 0.30 | 0.21 | 55 | 0.60 | 0.53 |
| T | 86 | 0.50 | 0.46 | 39 | 0.74 | 0.70 |
| V | 141 | 0.61 | 0.59 | 107 | 0.58 | 0.52 |
| W | 23 | 0.81 | 0.86 | 17 | 0.44 | 0.45 |
| Y | 81 | 0.67 | 0.66 | 18 | 0.58 | 0.65 |
The probabilities of the type of the location and the secondary structure element (SSE) of the mutated site to cause “large effect”. The P′ values are calculated based on the distribution of altered || (see Equation (2)). If the total number of cases is less than 5, we assign the probability to be 0.5. See the Materials and Methods section for the definitions of SSE Type: BB, CC, CH, CS, CT, HH, HS, HT, SS, ST, and TT.
| Location | SSE | ||||||
|---|---|---|---|---|---|---|---|
| Location Type | Total Cases | SSE Type | Total Cases | ||||
| B-B | 102 | 0.74 | 0.70 | BB | 14 | 0.26 | 0.27 |
| B-PE | 132 | 0.78 | 0.76 | CC | 182 | 0.47 | 0.45 |
| E-E | 457 | 0.31 | 0.29 | CH | 6 | 0.81 | 0.65 |
| E-PE | 130 | 0.56 | 0.55 | CS | 8 | 0.59 | 0.61 |
| PE-PE | 441 | 0.65 | 0.65 | CT | 6 | 0.15 | 0.16 |
| ‒ | ‒ | ‒ | ‒ | HH | 378 | 0.55 | 0.53 |
| ‒ | ‒ | ‒ | ‒ | HS | 1 | 0.50 | 0.50 |
| ‒ | ‒ | ‒ | ‒ | HT | 2 | 0.50 | 0.50 |
| ‒ | ‒ | ‒ | ‒ | SS | 455 | 0.63 | 0.61 |
| ‒ | ‒ | ‒ | ‒ | ST | 2 | 0.50 | 0.50 |
| ‒ | ‒ | ‒ | ‒ | TT | 208 | 0.39 | 0.38 |
The optimized weights and the corresponding p-values of the multiple linear regression analysis between calculated and experimental values of the change of folding free energy. The correlation coefficient R is reported separately for “small” and “large” effect cases. The bottom line, the Rfinal, is reported for two cases: on the right for the entire database without distinguishing the cases of “small” and “large” effect, and on the left applying Equation (3) to predict the corresponding probabilities and then to apply Equations (11) and (12), respectively. The correlation coefficient in parentheses is obtained via 5-fold cross-validation.
| Weight, Small | Weight, Large | Weight, All | ||||
|---|---|---|---|---|---|---|
| Y-intercept | −7.44 × 10−1 | 0.00 × 100 | −2.27 × 100 | 0.00 × 100 | −1.58 × 100 | 0.00 × 100 |
| IE | 9.28 × 10−2 | 1.36 × 10−2 | ‒ | ‒ | ‒ | ‒ |
| EE | 5.93 × 10−1 | 3.37 × 10−7 | 8.54 × 10−1 | 0.00 × 100 | 8.93 × 10−1 | 0.00 × 100 |
| VE | 7.51 × 10−2 | 2.03 × 10−4 | 1.63 × 10−1 | 0.00 × 100 | 1.69 × 10−1 | 0.00 × 100 |
| SP | 4.53 × 10−1 | 5.14 × 10−8 | 6.32 × 10−1 | 0.00 × 100 | 6.68 × 10−1 | 0.00 × 100 |
| S | ‒ | ‒ | 4.07 × 10−1 | 4.18 × 10−2 | 4.85 × 10−1 | 1.03 × 10−3 |
| HYDR | ‒ | ‒ | ‒ | ‒ | −1.57 × 100 | 9.63 × 10−3 |
| Ssum | −1.26 × 10−1 | 1.99 × 10−5 | −6.55 × 10−1 | 4.05 × 10−4 | −6.67 × 10−1 | 2.24 × 10−6 |
| SASMT | NA | NA | 9.36 × 10−5 | 1.10 × 10−4 | −5.46 × 101 | 2.88 × 10−3 |
| SN/SASMT | NA | NA | −7.71 × 10−1 | 6.84 × 10−3 | −2.78 × 101 | 4.77 × 10−2 |
| R | 0.36 | ‒ | 0.62 | ‒ | ‒ | ‒ |
| #Poins | 426 | ‒ | 558 | ‒ | 984 | ‒ |
| R final | 0.65 (0.61) | ‒ | ‒ | ‒ | 0.62 | ‒ |
Figure 3Correlation between experimental data and values calculated with the Single Amino Acid Folding free Energy Changes (SAAFEC) approach of the change in folding free energy due to single point amino acid mutations.
Performance of Single Amino Acid Folding free Energy Changes (SAAFEC) method in predicting the effect of specific groups of mutations.
| Cases | R | Slope | Y-Intercept | Min | Max | ||
|---|---|---|---|---|---|---|---|
| SSE | HS, HH, SS | 652 | 0.67 | 0.92 | 0.07 | 0.00 | 7.09 |
| CC, CT, TT | 310 | 0.58 | 0.67 | −0.08 | 0.00 | 6.13 | |
| Location | B-B | 83 | 0.60 | 0.86 | −0.02 | 0.02 | 7.09 |
| B-PE | 99 | 0.62 | 0.93 | 0.08 | 0.00 | 6.71 | |
| PE-PE | 308 | 0.64 | 0.84 | 0.04 | 0.00 | 6.39 | |
| E-PE | 102 | 0.52 | 0.80 | 0.01 | 0.03 | 6.39 | |
| E-E | 396 | 0.37 | 0.64 | −0.12 | 0.00 | 4.51 | |
| Residues | Any→A | 301 | 0.69 | 0.89 | 0.18 | 0.00 | 5.54 |
| Large (RFWY)→Small (AGSV) | 67 | 0.67 | 0.86 | 0.06 | 0.04 | 6.39 |