| Literature DB >> 22435732 |
Abstract
BACKGROUND: Protein stabilities can be affected sometimes by point mutations introduced to the protein. Current sequence-information-based protein stability prediction encoding schemes of machine learning approaches include sparse encoding and amino acid property encoding. Property encoding schemes employ physical-chemical information of the mutated protein environments, however, they produce complexity in the mean time when many properties joined in the scheme. The complexity introduces noises that affect machine learning algorithm accuracies. In order to overcome the problem we described a new encoding scheme that graded twenty amino acids into groups according to their specific property values.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22435732 PMCID: PMC3820156 DOI: 10.1186/1471-2105-13-44
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Cross-validation performance of the sequence-based SVM method of different encoding schemes
| Encoding scheme | MCC | Q(N) | Q(+) | Q(-) | Specificity (N) | Specificity | Specificity | PPV | PPV | PPV | NPV | NPV | NPV | MCC | MCC | MCC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Capriotti¶ | 56.00 | 0.27 | 48.00 | 54.00 | 54.00 | 62.00 | 44.00 | 44.00 | - | - | - | - | - | - | 0.17 | 0.29 | 0.29 |
| Sparse | 56.81 | 0.28 | 58.85 | 54.12 | 53.51 | 61.34 | 79.31 | 81.10 | 65.35 | 65.11 | 62.31 | 62.40 | 75.16 | 72.37 | 0.21 | 0.31 | 0.33 |
| 11-factors | 56.92 | 0.28 | 59.07 | 52.32 | 51.27 | 63.56 | 80.77 | 82.87 | 68.29 | 65.46 | 60.17 | 63.32 | 72.32 | 71.37 | 0.19 | 0.32 | 0.33 |
| HEC | 56.91 | 0.29 | 58.32 | 50.56 | 52.47 | 66.55 | 81.39 | 80.34 | 65.28 | 65.43 | 63.45 | 65.79 | 74.56 | 71.57 | 0.21 | 0.32 | 0.32 |
| K-D | 55.98 | 0.25 | 57.81 | 51.64 | 49.73 | 63.72 | 78.29 | 81.57 | 63.54 | 62.14 | 63.30 | 66.57 | 73.22 | 71.11 | 0.20 | 0.34 | 0.31 |
| AAproperty15 | 59.57 | 0.31 | 61.72 | 56.13 | 57.40 | 60.96 | 79.57 | 81.48 | 68.16 | 65.89 | 67.87 | 65.02 | 76.83 | 76.71 | 0.30 | 0.35 | 0.34 |
| AAproperty15Grade | 63.63 | 0.36 | 64.15 | 58.23 | 57.62 | 61.95 | 80.35 | 82.07 | 69.81 | 62.52 | 69.12 | 67.18 | 78.31 | 78.96 | 0.34 | 0.39 | 0.36 |
All numbers except MCC represent per cent values. +, - and N: the indexes are evaluated for increasing, decreasing or neutral protein free energy stability change, respectively according to the classification described in section 2 of Results and Discussions; for the definition of the different indexes see the Scoring the performance in Methods. ¶ data from Capriotti [19]
Performance on independent datasets
| Test set | MCC | Q(N) | Q(+) | Q(-) | Specificity | Specificity | Specificity | PPV | PPV(P (+)) | PPV | NPV | NPV | NPV | MCC | MCC | MCC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| clean.TEST_ | 71.82 | 0.51 | 80.65 | 63.21 | 62.35 | 66.90 | 92.10 | 92.17 | 71.19 | 72.75 | 72.72 | 77.13 | 88.34 | 88.09 | 0.48 | 0.58 | 0.51 |
| clean.S1615 | 72.71 | 0.54 | 79.68 | 66.79 | 65.62 | 70.26 | 91.45 | 92.14 | 72.73 | 71.95 | 73.15 | 76.98 | 89.21 | 88.94 | 0.50 | 0.60 | 0.55 |
| clean.S388 | 74.49 | 0.56 | 81.84 | 67.70 | 66.59 | 70.29 | 92.67 | 93.19 | 73.16 | 75.37 | 76.28 | 79.31 | 89.57 | 89.29 | 0.52 | 0.63 | 0.57 |
| clean.PoPMuSiC | 72.17 | 0.53 | 76.57 | 68.40 | 66.94 | 72.04 | 90.48 | 91.18 | 72.94 | 70.57 | 71.34 | 75.39 | 89.60 | 89.22 | 0.48 | 0.59 | 0.56 |
| clean.Potapov | 71.58 | 0.52 | 79.30 | 64.47 | 63.80 | 68.34 | 91.77 | 91.77 | 71.08 | 72.41 | 72.11 | 76.97 | 88.42 | 88.21 | 0.48 | 0.58 | 0.52 |
| Average | 72.55 | 0.53 | 79.61 | 66.11 | 65.06 | 69.57 | 91.69 | 92.09 | 72.22 | 72.61 | 73.12 | 77.16 | 89.03 | 88.75 | 0.49 | 0.60 | 0.54 |
For notation see Table 1. Independent test set details and statistics see Table S1 and S2
Figure 1ROC curves for different encoding schemes of the sequence-based predictor.
The amino acid property scores used in the AAproperty15 encoding scheme
| AA | Steric parameter | Hydrogen Bond Donors | Hydrophobicity scale | Hydrophilicity scale | Average Accessible surface area | van der Waals Parameter R0 | van der Waals Parameter Epsilon | Free Energy | Average side | Polarity | Isoelectric point | He | Ee | Ce | KDe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 0.510 | 0.169 | 0.471 | 0.279 | 0.141 | 0.294 | 0.000 | 0.262 | 0.512 | 0.000 | 0.404 | 0.811 | 0.667 | 0.700 | 0.858 |
| R | 0.667 | 0.726 | 0.321 | 1.000 | 0.905 | 0.529 | 0.327 | 0.169 | 0.372 | 1.000 | 1.000 | 0.777 | 0.691 | 0.719 | 0.011 |
| N | 0.745 | 0.390 | 0.164 | 0.658 | 0.510 | 0.235 | 0.140 | 0.313 | 0.116 | 0.065 | 0.330 | 0.691 | 0.655 | 0.790 | 0.029 |
| D | 0.745 | 0.304 | 0.021 | 0.793 | 0.515 | 0.235 | 0.140 | 0.601 | 0.140 | 0.956 | 0.000 | 0.725 | 0.624 | 0.783 | 0.029 |
| C | 0.608 | 0.314 | 0.760 | 0.072 | 0.000 | 0.559 | 0.140 | 0.947 | 0.907 | 0.028 | 0.285 | 0.661 | 0.804 | 0.737 | 0.924 |
| Q | 0.667 | 0.531 | 0.178 | 0.649 | 0.608 | 0.529 | 0.140 | 0.416 | 0.023 | 0.068 | 0.360 | 0.778 | 0.683 | 0.722 | 0.029 |
| E | 0.667 | 0.482 | 0.092 | 0.883 | 0.602 | 0.529 | 0.140 | 0.561 | 0.163 | 0.960 | 0.056 | 0.812 | 0.652 | 0.707 | 0.029 |
| G | 0.000 | 0.000 | 0.275 | 0.189 | 0.103 | 0.000 | 0.000 | 0.240 | 0.581 | 0.000 | 0.401 | 0.619 | 0.665 | 0.821 | 0.401 |
| H | 0.686 | 0.554 | 0.326 | 0.468 | 0.402 | 0.529 | 0.140 | 0.313 | 0.581 | 0.992 | 0.603 | 0.715 | 0.754 | 0.732 | 0.039 |
| I | 1.000 | 0.650 | 1.000 | 0.000 | 0.083 | 0.824 | 0.308 | 0.424 | 0.930 | 0.003 | 0.407 | 0.734 | 0.844 | 0.658 | 0.989 |
| L | 0.961 | 0.650 | 0.734 | 0.081 | 0.138 | 0.824 | 0.308 | 0.463 | 0.907 | 0.003 | 0.402 | 0.792 | 0.768 | 0.664 | 0.978 |
| K | 0.667 | 0.692 | 0.000 | 0.568 | 1.000 | 0.529 | 0.327 | 0.313 | 0.000 | 0.952 | 0.872 | 0.755 | 0.701 | 0.731 | 0.020 |
| M | 0.765 | 0.612 | 0.603 | 0.171 | 0.206 | 0.765 | 0.308 | 0.405 | 0.814 | 0.028 | 0.372 | 0.794 | 0.763 | 0.665 | 0.870 |
| F | 0.686 | 0.772 | 0.665 | 0.000 | 0.114 | 0.853 | 0.682 | 0.462 | 1.000 | 0.007 | 0.339 | 0.747 | 0.807 | 0.676 | 0.943 |
| P | 0.353 | 0.372 | 0.012 | 0.198 | 0.411 | 0.588 | 0.271 | 0.000 | 0.302 | 0.030 | 0.442 | 0.629 | 0.608 | 0.835 | 0.168 |
| S | 0.520 | 0.172 | 0.155 | 0.477 | 0.303 | 0.206 | 0.000 | 0.240 | 0.419 | 0.032 | 0.364 | 0.681 | 0.711 | 0.773 | 0.310 |
| T | 0.490 | 0.349 | 0.256 | 0.523 | 0.337 | 0.235 | 0.140 | 0.313 | 0.419 | 0.032 | 0.362 | 0.667 | 0.780 | 0.748 | 0.332 |
| W | 0.686 | 1.000 | 0.681 | 0.207 | 0.219 | 1.000 | 1.000 | 0.537 | 0.674 | 0.040 | 0.390 | 0.759 | 0.815 | 0.661 | 0.289 |
| Y | 0.686 | 0.796 | 0.591 | 0.477 | 0.454 | 0.853 | 0.682 | 1.000 | 0.419 | 0.031 | 0.362 | 0.721 | 0.813 | 0.692 | 0.214 |
| V | 0.745 | 0.487 | 0.859 | 0.036 | 0.094 | 0.647 | 0.234 | 0.369 | 0.674 | 0.003 | 0.399 | 0.714 | 0.864 | 0.655 | 0.985 |
The graded amino acid property encoding scheme AAproperty15Grade
| AA | Steric | Donors | Hydrophobicity | Hydrophilicity | Accessible | R0 | Epsilon | FreeEnergy | Angle | Polarity | Isoelectric | He | Ee | Ce | KDe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 0.1 | 0.1 | 0.5 | 0.5 | 0.1 | 0.1 | 0.1 | 0.1 | 0.5 | 0.1 | 0.9 | 0.9 | 0.1 | 0.5 | 0.9 |
| R | 0.5 | 0.9 | 0.5 | 0.9 | 0.9 | 0.5 | 0.5 | 0.1 | 0.5 | 0.9 | 0.9 | 0.9 | 0.5 | 0.5 | 0.1 |
| N | 0.9 | 0.5 | 0.1 | 0.9 | 0.9 | 0.1 | 0.5 | 0.1 | 0.1 | 0.5 | 0.5 | 0.1 | 0.1 | 0.9 | 0.1 |
| D | 0.9 | 0.1 | 0.1 | 0.9 | 0.9 | 0.1 | 0.5 | 0.9 | 0.1 | 0.9 | 0.1 | 0.5 | 0.1 | 0.9 | 0.1 |
| C | 0.1 | 0.1 | 0.9 | 0.1 | 0.1 | 0.5 | 0.5 | 0.9 | 0.9 | 0.5 | 0.1 | 0.1 | 0.9 | 0.5 | 0.9 |
| Q | 0.5 | 0.5 | 0.1 | 0.9 | 0.9 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.5 | 0.9 | 0.5 | 0.5 | 0.1 |
| E | 0.5 | 0.5 | 0.1 | 0.9 | 0.9 | 0.5 | 0.5 | 0.9 | 0.1 | 0.9 | 0.1 | 0.9 | 0.1 | 0.5 | 0.1 |
| G | 0.1 | 0.1 | 0.5 | 0.5 | 0.1 | 0.1 | 0.1 | 0.1 | 0.5 | 0.1 | 0.5 | 0.1 | 0.1 | 0.9 | 0.5 |
| H | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.9 | 0.9 | 0.1 | 0.5 | 0.5 | 0.1 |
| I | 0.9 | 0.5 | 0.9 | 0.1 | 0.1 | 0.9 | 0.5 | 0.5 | 0.9 | 0.1 | 0.9 | 0.5 | 0.9 | 0.1 | 0.9 |
| L | 0.9 | 0.5 | 0.9 | 0.1 | 0.1 | 0.9 | 0.5 | 0.5 | 0.9 | 0.1 | 0.9 | 0.9 | 0.5 | 0.1 | 0.9 |
| K | 0.5 | 0.9 | 0.1 | 0.9 | 0.9 | 0.5 | 0.5 | 0.1 | 0.1 | 0.9 | 0.9 | 0.5 | 0.5 | 0.5 | 0.1 |
| M | 0.9 | 0.5 | 0.5 | 0.1 | 0.5 | 0.9 | 0.5 | 0.5 | 0.9 | 0.5 | 0.5 | 0.9 | 0.5 | 0.1 | 0.9 |
| F | 0.5 | 0.9 | 0.9 | 0.1 | 0.1 | 0.9 | 0.9 | 0.5 | 0.9 | 0.1 | 0.5 | 0.5 | 0.9 | 0.1 | 0.9 |
| P | 0.1 | 0.5 | 0.1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.5 | 0.9 | 0.1 | 0.1 | 0.9 | 0.5 |
| S | 0.1 | 0.1 | 0.1 | 0.5 | 0.5 | 0.1 | 0.1 | 0.1 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.5 | 0.5 |
| T | 0.1 | 0.1 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.1 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.5 | 0.5 |
| W | 0.5 | 0.9 | 0.9 | 0.5 | 0.5 | 0.9 | 0.9 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.9 | 0.1 | 0.5 |
| Y | 0.5 | 0.9 | 0.5 | 0.5 | 0.5 | 0.9 | 0.9 | 0.9 | 0.5 | 0.5 | 0.5 | 0.5 | 0.9 | 0.1 | 0.5 |
| V | 0.9 | 0.5 | 0.9 | 0.1 | 0.1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.1 | 0.5 | 0.1 | 0.9 | 0.1 | 0.9 |