| Literature DB >> 25885774 |
Josef Laimer1,2, Heidi Hofer3, Marko Fritz4, Stefan Wegenkittl5, Peter Lackner6.
Abstract
BACKGROUND: Point mutations can have a strong impact on protein stability. A change in stability may subsequently lead to dysfunction and finally cause diseases. Moreover, protein engineering approaches aim to deliberately modify protein properties, where stability is a major constraint. In order to support basic research and protein design tasks, several computational tools for predicting the change in stability upon mutations have been developed. Comparative studies have shown the usefulness but also limitations of such programs.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25885774 PMCID: PMC4403899 DOI: 10.1186/s12859-015-0548-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Scheme of MAESTRO’s main components and data flow.
Overview of the validation data sets used in this work
|
|
|
|
|
|
|---|---|---|---|---|
| SP1 | 2648 mutations |
| 5-fold cross validation | [ |
| SP2 | 350 mutations |
| Performance test | [ |
| SP3 | 1925 mutations |
| 20-fold cross validation | [ |
| SP4 | 1765 mutations |
| 10-fold cross validation | ProTherm |
| MP | 479 multi-point mutations |
| 10-fold cross validation | ProTherm |
| SS1 | 75 disulfide bonds | S-S bond | Performance test | [ |
| SS2 | 15 engineered disulfide bonds | S-S bond | Performance test | [ |
Figure 2Correlation with experimental data. Regression results for MAESTRO on the single mutation data sets SP1 (left) and SP3 (right).
Performance comparison using the SP2 set
|
|
|
|
|
|---|---|---|---|
| AUTOMUTE | 315 | 0.46/0.45/0.45 | 1.43/1.46/1.99 |
| CUPSAT | 346 | 0.37/0.35/0.50 | 1.91/1.96/2.14 |
| Dmutant | 350 | 0.48/0.47/0.57 | 1.81/1.87/2.31 |
| Eris | 334 | 0.35/0.34/0.49 | 4.12/4.28/3.91 |
| I-Mutant-2.0 | 346 | 0.29/0.27/0.27 | 1.65/1.69/2.39 |
| PopMuSiC-2.0 | 350 | 0.67/0.67/0.71 | 1.16/1.19/1.67 |
| SDM | 350 | 0.52/0.53/0.63 | 1.80/1.81/2.11 |
| mCSM | 350 | 0.73/0.74/0.82 | 1.08/1.10/1.48 |
| MAESTRO-Score | 350 | 0.56/0.57/0.68 | −/ −/ − |
| MAESTRO | 350 | 0.70/0.69/0.76 | 1.13/1.17/1.67 |
Results except for MAESTRO are taken from Dehouck et al. [12] and Pires et al. [14] respectively. aThe test set contains 350 entries, however several methods failed to compute the Δ ΔG prediction for some mutants, resulting in a reduced number of predictions. In these cases Δ ΔG was set to 0.0 kcal/mol for calculating the correlation coefficient. bThree values are given for Pearson’s ρ as well as for the associated standard errors. They correspond (i) to the whole validation set, (ii) the subset of 309 mutants for which all methods provide a result, and (iii) the subset of 87 mutants with an experimental Δ ΔG≥2 kcal/mol or Δ ΔG≤2 kcal/mol respectively.
Figure 3Binary classification. Classification performance of MAESTRO-Score and MAESTRO on the data sets SP1 and SP2. The data are derived from n-fold cross validation experiments.
Performance comparison on blind tests
|
|
|
|
|
|
|---|---|---|---|---|
| mCSMa | SP1 | 5-fold Positionc | 0.54 | 1.23 |
| MAESTRO | SP1 | 5-fold Positionc | 0.67 | 1.12 |
| mCSMa | SP1 | 5-fold Proteinc | 0.51 | 1.26 |
| MAESTRO | SP1 | 5-fold Proteind | 0.63 | 1.17 |
| mCSMa | SP1 351d | Blind Position | 0.67 | 1.19 |
| DUETb | SP1 351d | Blind Position | 0.71 | 1.13 |
| MAESTRO | SP1 351e | Blind Position | 0.71 | 1.16 |
aResults derived from Pires et al. [14], supplementary material. bResults derived from Pires et al. [15]. c5-fold cross validation on position level. All mutations of a certain mutation site are either in the test or training set. d5-fold cross validation on protein level. All mutations of a certain protein are either in the test or training set. eBlind test on a subset of the SP1 data set, provided by Pires et al. [14]. The set includes 351 mutants, whose positions are not in the remaining training set.
10-fold cross validation results for our own data sets SP4 (single point) and MP (multi-point), as well on a joined data set which include the mutations of SP4 and MP
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| 1 | 1765 | 0.68 | 1.31 | 0.68 | 1.32 | ||
| >1 | 479 | 0.77 | 1.41 | 0.71 | 1.52 | ||
| 2 | 285 | 0.70 | 1.56 | 0.64 | 1.69 | ||
| 3 | 109 | 0.84 | 1.06 | 0.80 | 1.14 | ||
| ≥4 | 85 | 0.88 | 1.27 | 0.84 | 1.37 | ||
| ≥1 | 2244 | 0.69 | 1.36 | ||||
Figure 4Confidence estimation and prediction error. Deviation between experimental determined Δ ΔG values and the predictions for different confidence value ranges. The prediction error is defined as the absolute difference between the experimental determined Δ ΔG and the predicted Δ ΔG. Data are given for the three main single point mutation sets (SP1, SP3, SP4) as well as the multi-point mutation set (MP). The numbers of prediction per group are shown at the top. In all cases, the deviation shrinks with higher confidence values.
Prediction of disulfide bonds in 15 structures with known engineered disulfide bonds
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| 1FG9 | Glu7:A–Ser69:A | 1 | 0.04 | 0 | 0.00 | 15 | 0.71 |
| 1LMB | Tyr88:3–Tyr88:4 | 10 | 0.20 | 9 | 0.18 | 32 | 0.48 |
| 1RNB | Ala43–Ser80 | 2 | 0.03 | 9 | 0.14 | 3 | 0.07 |
| 1RNB | Ser85–His102 | 33 | 0.52 | 40 | 0.63 | 0 | 0.00 |
| 1SNO | Gly79–Asn118 | 3 | 0.04 | 4 | 0.05 | 22 | 0.34 |
| 1XNB | Ser100–Asn148 | 1 | 0.01 | 13 | 0.10 | 8 | 0.08 |
| 2CBA | Leu60–Ser173 | 66 | 0.45 | 68 | 0.46 | 55 | 0.47 |
| 2CI2 | Thr22–Val82 | 5 | 0.18 | 4 | 0.14 | 0 | 0.00 |
| 2LZM | Ile9–Leu164 | 31 | 0.44 | 38 | 0.54 | 13 | 0.21 |
| 2RN2 | Cys13–Asn44 | 14 | 0.16 | 21 | 0.24 | 25 | 0.33 |
| 2ST1 | Thr22–Ser87 | 37 | 0.16 | 30 | 0.13 | 44 | 0.23 |
| 3GLY | Asn20–Ala27 | 104 | 0.39 | 163 | 0.62 | 187 | 0.82 |
| 3GLY | Thr246–Cys320 | 35 | 0.13 | 34 | 0.13 | 19 | 0.08 |
| 4DFR | Pro39–Cys85 | 11 | 0.10 | 11 | 0.10 | 16 | 0.13 |
| 9RAT | Ala4–Val118 | 8 | 0.15 | 10 | 0.19 | 25 | 0.68 |
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
| |
Abs.Rank: absolute rank, Rel.Rank: relative rank. aThe letter or digit after the colon denotes the chain(s) used in case of multi-chain PDB entries.