| Literature DB >> 31684883 |
Yang Yang1,2,3, Xuesong Ding1, Guanchen Zhu1, Abhishek Niroula2, Qiang Lv1, Mauno Vihinen4.
Abstract
BACKGROUND: Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples.Entities:
Keywords: Machine learning; Prediction; Protein stability; Proteome properties
Mesh:
Substances:
Year: 2019 PMID: 31684883 PMCID: PMC6830000 DOI: 10.1186/s12864-019-6138-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Performance of prediction methods on 10-fold cross validation and blind test
| Performance with top importance features | |||||||
|---|---|---|---|---|---|---|---|
| Measure | 50 | 100 | 200 | 300 | 500 | 1000 | 2077 |
| PCC | 0.790 | 0.793 | 0.790 | 0.786 | 0.779 | 0.772 | 0.767 |
| RMSE | 0.165 | 0.164 | 0.165 | 0.166 | 0.169 | 0.171 | 0.173 |
| R2 | 30.5 | 47.8 | 28.2 | 35.7 | 39.2 | 27.7 | 32.4 |
| MSE | 0.030 | 0.024 | 0.030 | 0.028 | 0.026 | 0.029 | 0.026 |
| MAE | 0.133 | 0.125 | 0.141 | 0.133 | 0.134 | 0.133 | 0.135 |
| Blind test | |||||||
| Blind PCC | 0.702 | 0.736 | 0.735 | 0.740 | 0.756 | 0.755 | 0.758 |
| Blind RMSE | 0.197 | 0.189 | 0.189 | 0.187 | 0.183 | 0.184 | 0.183 |
| Blind R2 | −10.9 | −8.5 | −12.2 | −5.1 | −1.1 | −5.20 | −6.7 |
| Blind MSE | 0.039 | 0.036 | 0.036 | 0.035 | 0.033 | 0.034 | 0.033 |
| Blind MAE | 0.160 | 0.146 | 0.145 | 0.145 | 0.142 | 0.142 | 0.143 |
Fig. 1Correlation of protein length and Tm for the experimentally defined training dataset
Fig. 2Differences in predicted stabilities of isoforms vs chain length. Top, the longest isoform, middle, second longest isoform; bottom: other isoforms. Data are only for proteins with at least two isoforms. The graphs show melting temperature (Tm) vs protein sequence length
Fig. 3Analysis of the relationship of Tm to predicted sensitivity for harmful variants