| Literature DB >> 31266447 |
Ludovica Montanucci1, Emidio Capriotti2, Yotam Frank3, Nir Ben-Tal3, Piero Fariselli4,5.
Abstract
BACKGROUND: Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = -∆∆G(B → A), where A and B are amino acids.Entities:
Keywords: Multiple site variation; Protein stability; Protein variant; Unfolding free energy change
Mesh:
Substances:
Year: 2019 PMID: 31266447 PMCID: PMC6606456 DOI: 10.1186/s12859-019-2923-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Pearson correlation between scores and ∆∆G
| Data set | VariBench 1564 variants | Broom et al. 605 variants | S2648 2648 variants | P53 42 variants | Myoglobin 134 variants |
|---|---|---|---|---|---|
| Score | |||||
|
| 0.269 | 0.354 | 0.284 | 0.636 | 0.148 |
|
| 0.398 | 0.423 | 0.387 | 0.328 | 0.454 |
|
| 0.248 | 0.263 | 0.298 | 0.143 | 0.500 |
|
| 0.452 | 0.581 | 0.497 | 0.423 | 0.548 |
The Pearson correlation coefficient and the scores were calculated as described in the Method section. The composition of the data sets is summarized in the Data sets section
Performances of the sequence-based and structure-based baseline methods on single site variations data sets
| Coefficients derived from | Method | VariBench 1564 variants | Broom et al. 605 variants | S2648 2648 variants | P53 42 variants | Myoglobin 134 variants |
|---|---|---|---|---|---|---|
| VariBench | DDGun | 0.50, 1.71 | 0.52, 1.77 | 0.50, 1.40 | 0.70, 1.45 | 0.48, 1.20 |
| DDGun3D | 0.54, 1.70 | 0.62, 1.68 | 0.57, 1.33 | 0.67, 1.54 | 0.57, 1.0 | |
| S2648 | DDGun | 0.50, 1.71 | 0.52 1.77 | 0.50 1.38 | 0.70 1.48 | 0.48 1.16 |
| DDGun3D | 0.54, 1.71 | 0.62 1.68 | 0.57 1.33 | 0.67 1.57 | 0.58 0.98 | |
| Broom et al. | DDGun | 0.48, 1.73 | 0.52, 1.78 | 0.49, 1.42 | 0.71, 1.41 | 0.45, 1.29 |
| DDGun3D | 0.54, 1.69 | 0.62, 1.66 | 0.57, 1.32 | 0.68, 1.51 | 0.56, 1.0 | |
| Average | DDGun | 0.49, 1.72 | 0.52, 1.78 | 0.50, 1.4 | 0.7, 1.45 | 0.47, 1.21 |
| DDGun3D | 0.54, 1.70 | 0.62, 1.67 | 0.57, 1.33 | 0.67, 1.55 | 0.57, 0.99 |
The Pearson correlation coefficient and the root mean squared error (RMSE) in kcal/mol are defined in section Methods
Anti-symmetry performances of DDGun on the Ssym data set [28]
| Method | Performance | Anti-symmetry | ||
|---|---|---|---|---|
| Direct variants Pearson r, RMSE | Inverse variants Pearson r, RMSE | rdir-inv | <δ > (kcal/mol) | |
| DDGun | 0.48, 1.47 | 0.48, 1.50 | -0.99 | −0.007 |
| DDGun3D | 0.56, 1.42 | 0.53, 1.46 | −0.99 | −0.02 |
| PopMusicSyma | 0.48, 1.58 | 0.48, 1.62 | −0.77 | 0.03 |
| SDMa | 0.51, 1.74 | 0.32, 2.28 | −0.75 | −0.32 |
| Maestroa | 0.52, 1.36 | 0.32, 2.09 | −0.34 | −0.58 |
| FoldXa | 0.63, 1.56 | 0.39, 2.13 | −0.38, | −0.47 |
The Pearson correlation coefficient (r), the root mean square error (RMSE), the correlation coefficient between observed and predicted ∆∆G values (r), and the bias (<δ>) are defined in the Method section. RMSE and < δ > are expressed in kcal/mol. aThese values are taken from Pucci et al. [28] and are the two best performing methods in terms of anti-symmetry (PopMusicSym and SDM) and the two methods that can also predict multiple variations (Maestro and FoldX)
Performances on the 914 multiple site variation from Protherm
| Method | Performance | Anti-symmetry | |||
|---|---|---|---|---|---|
| Direct and Inverse | Direct variants | Inverse variants | rdir-inv | <δ > (kcal/mol) | |
| DDGun | 0.44, 2.23 | 0.37, 2.23 | 0.37, 2.23 | −1.00 | 0.00 |
| DDGun3D | 0.45, 2.27 | 0.39, 2.24 | 0.38, 2.25 | −0.99 | −0.007 |
| Maestro | 0.30, 2.59 | 0.55, 1.96 | 0.08, 3.10 | −0.20 | −0.92 |
| FoldX | 0.44, 3.10 | 0.41, 2.95 | 0.33, 3.24 | −0.71 | −0.21 |
The Pearson correlation coefficient (r), the root mean square error (RMSE), the correlation coefficient between observed and predicted ∆∆G values (r), and the bias (<δ>) are defined in the Method section (Eqs. 9–12). RMSE and < δ > are expressed in kcal/mol
Fig. 1Scatter plot of the predictions of Maestro, FoldX, and DDGun3D on the PTmul data set. The x-axis shows the prediction of direct variation, and the y-axis shows the prediction for the corresponding reciprocal variation. The predictions from Maestro, FoldX and DDGun3D are plotted in yellow, green and red respectively
Composition data sets used in this study
| Data Set | Reference | Total variants | Number of proteins | Stabilizing (ΔΔG ≥ 0) | Destabilizing (ΔΔG < 0) |
|---|---|---|---|---|---|
| VariBench | Yang et al. [ | 1564 | 99 | 436 | 1128 |
| Broom | Broom et al. [ | 605 | 58 | 147 | 458 |
| S2648 | Dehouck et al. [ | 2648 | 132 | 602 | 2046 |
| P53 | Pires et al. [ | 42 | 1 | 11 | 31 |
| Myoglobin | Kepp et al. [ | 134 | 1 | 38 | 96 |
| Ssym | Pucci et al. [ | 684 | 15 wild-type | 342 | 342 |
| PTmul | From ProTherm | 914 | 90 | 310 | 604 |
Fig. 2Analysis of the overlap between the single-site variant data sets. Each cell reports the percentage of the common mutations between the two corresponding data sets