| Literature DB >> 21047386 |
Shaolei Teng1, Anand K Srivastava, Liangjiang Wang.
Abstract
BACKGROUND: Protein destabilization is a common mechanism by which amino acid substitutions cause human diseases. Although several machine learning methods have been reported for predicting protein stability changes upon amino acid substitutions, the previous studies did not utilize relevant sequence features representing biological knowledge for classifier construction.Entities:
Mesh:
Year: 2010 PMID: 21047386 PMCID: PMC2975416 DOI: 10.1186/1471-2164-11-S2-S5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Effect of window sizes on sequence-based prediction of protein stability changes.
| Window size | AC | SN | SP | ST | MCC | ROC |
|---|---|---|---|---|---|---|
| 1 | 66.92 | 70.69 | 65.20 | 67.94 | 0.3349 | 0.7425 |
| 3 | 73.91 | 74.83 | 73.49 | 74.16 | 0.4554 | 0.7996 |
| 5 | 77.51 | 76.67 | 77.90 | 77.28 | 0.5194 | 0.8512 |
| 7 | 80.80 | 76.43 | 82.83 | 79.63 | 0.5750 | 0.8737 |
| 9 | 81.28 | 75.66 | 83.78 | 79.72 | 0.5774 | 0.8755 |
| 11 | 81.82 | 74.48 | 85.11 | 79.79 | 0.5843 | 0.8804 |
| 13 | 82.10 | 71.84 | 86.67 | 79.26 | 0.5824 | 0.8797 |
| 15 | 81.45 | 69.71 | 86.75 | 78.23 | 0.5665 | 0.8775 |
| 17 | 81.88 | 69.50 | 87.58 | 78.54 | 0.5779 | 0.8799 |
| 19 | 81.21 | 68.80 | 86.98 | 77.89 | 0.5627 | 0.8779 |
| 21 | 81.29 | 68.98 | 86.98 | 77.98 | 0.5645 | 0.8735 |
Figure 1ROC curves to show the effect of context information on prediction of protein stability changes upon amino acid substitutions.
Predictive performance of classifiers constructed using single sequence features.
| Features | AC | SN | SP | ST | MCC | ROC |
|---|---|---|---|---|---|---|
| H | 75.88 | 71.62 | 77.79 | 74.70 | 0.4728 | 0.8237 |
| K | 73.29 | 73.90 | 73.02 | 73.46 | 0.4402 | 0.7925 |
| M | 68.06 | 73.52 | 65.62 | 69.57 | 0.3629 | 0.7480 |
| P | 75.94 | 71.24 | 78.04 | 74.64 | 0.4718 | 0.8234 |
| Co | 70.18 | 71.62 | 69.53 | 70.58 | 0.3838 | 0.7586 |
| A | 76.41 | 74.29 | 77.36 | 75.82 | 0.4904 | 0.8206 |
| B | 78.18 | 74.48 | 79.83 | 77.15 | 0.5199 | 0.8503 |
| C | 72.18 | 71.05 | 72.68 | 71.86 | 0.4116 | 0.7847 |
| Aa | 79.12 | 76.57 | 80.26 | 78.41 | 0.5431 | 0.8459 |
| Bu | 82.06 | 75.62 | 84.94 | 80.28 | 0.5919 | 0.8777 |
| S1 | 69.82 | 70.86 | 69.36 | 70.11 | 0.3756 | 0.7754 |
| S2 | 70.24 | 72.19 | 69.36 | 70.78 | 0.3875 | 0.7665 |
| S3 | 82.53 | 72.19 | 87.15 | 79.67 | 0.5922 | 0.8835 |
| F | 61.41 | 63.62 | 60.43 | 62.02 | 0.2226 | 0.6728 |
| R | 66.47 | 65.14 | 67.06 | 66.10 | 0.3008 | 0.7140 |
| Mc | 78.35 | 73.52 | 80.51 | 77.02 | 0.5202 | 0.8417 |
| No | 69.82 | 74.86 | 67.57 | 71.22 | 0.3944 | 0.7656 |
| Rf | 62.06 | 73.71 | 56.85 | 65.28 | 0.2831 | 0.6889 |
| Rm | 75.94 | 69.90 | 78.64 | 74.27 | 0.4672 | 0.8118 |
| Tt | 83.59 | 66.48 | 91.23 | 78.86 | 0.6035 | 0.8704 |
Figure 2ROC curves to show the different performance levels of classifiers constructed using individual sequence features.
Predictive performance of classifiers constructed by combining the best single features.
| Features | AC | SN | SP | ST | MCC | ROC |
|---|---|---|---|---|---|---|
| S3 | 82.53 | 72.19 | 87.15 | 79.67 | 0.5922 | 0.8835 |
| S3, Bu | 83.41 | 68.00 | 90.30 | 79.15 | 0.6019 | 0.8821 |
| S3, Bu, Tt | 82.88 | 61.90 | 92.26 | 77.08 | 0.5822 | 0.8725 |
| S3, Bu, Tt, B | 83.65 | 62.10 | 93.28 | 77.69 | 0.6009 | 0.8768 |
| S3, Bu, Tt, B, Aa | 83.65 | 61.90 | 93.36 | 77.63 | 0.6009 | 0.8743 |
| S3, Bu, Tt, B, Aa, Mc | 83.59 | 61.71 | 93.36 | 77.54 | 0.5993 | 0.8737 |
| All 20 features | 82.88 | 56.00 | 94.89 | 75.45 | 0.5791 | 0.8690 |
Predictive performance of classifiers constructed using the optimal subsets of sequence features.
| Features | AC | SN | SP | ST | MCC | ROC |
|---|---|---|---|---|---|---|
| S3 | 82.53 | 72.19 | 87.15 | 79.67 | 0.5922 | 0.8835 |
| Bu, Co | 83.00 | 74.10 | 86.98 | 80.54 | 0.6057 | 0.8872 |
| B, Co, S3 | 84.12 | 69.33 | 90.72 | 80.03 | 0.6194 | 0.8924 |
| B, Co, H, S3 | 84.29 | 69.33 | 90.98 | 80.16 | 0.6231 | 0.8940 |
| A, Aa, B, Co, P | 84.47 | 70.48 | 90.72 | 80.60 | 0.6287 | 0.8954 |
| A, Aa, B, Co, No, P | 84.59 | 70.29 | 90.98 | 80.63 | 0.6310 | 0.8961 |
Figure 3ROC curves for sequence-based prediction of protein stability change using multiple sequence features.
Figure 4Sample output from the MuStab web server.