| Literature DB >> 32226595 |
Chi-Wei Chen1,2, Meng-Han Lin2, Chi-Chou Liao2,3, Hsung-Pin Chang1, Yen-Wei Chu2,3,4,5,6,7.
Abstract
Protein mutations can lead to structural changes that affect protein function and result in disease occurrence. In protein engineering, drug design or and optimization industries, mutations are often used to improve protein stability or to change protein properties while maintaining stability. To provide possible candidates for novel protein design, several computational tools for predicting protein stability changes have been developed. Although many prediction tools are available, each tool employs different algorithms and features. This can produce conflicting prediction results that make it difficult for users to decide upon the correct protein design. Therefore, this study proposes an integrated prediction tool, iStable 2.0, which integrates 11 sequence-based and structure-based prediction tools by machine learning and adds protein sequence information as features. Three coding modules are designed for the system, an Online Server Module, a Stand-alone Module and a Sequence Coding Module, to improve the prediction performance of the previous version of the system. The final integrated structure-based classification model has a higher Matthews correlation coefficient than that of the single prediction tool (0.708 vs 0.547, respectively), and the Pearson correlation coefficient of the regression model likewise improves from 0.669 to 0.714. The sequence-based model not only successfully integrates off-the-shelf predictors but also improves the Matthews correlation coefficient of the best single prediction tool by at least 0.161, which is better than the individual structure-based prediction tools. In addition, both the Sequence Coding Module and the Stand-alone Module maintain performance with only a 5% decrease of the Matthews correlation coefficient when the integrated online tools are unavailable. iStable 2.0 is available at http://ncblab.nchu.edu.tw/iStable2.Entities:
Keywords: Integrated prediction; Machine learning; Protein stability change
Year: 2020 PMID: 32226595 PMCID: PMC7090336 DOI: 10.1016/j.csbj.2020.02.021
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1System architecture.
Performance of the classifier and regression models with structure-based and sequence-based tools with S3568.
| Model | Method | Classification | Regression | |||
|---|---|---|---|---|---|---|
| Sn | Sp | Acc | MCC | PCC | ||
| Structure-based | iStable2.0_PDB | 0.758 | 0.964 | 0.908 | 0.758 | 0.864 |
| DUET | 0.499 | 0.891 | 0.787 | 0.421 | 0.655 | |
| SDM | 0.566 | 0.748 | 0.700 | 0.293 | 0.474 | |
| SDM2 | 0.562 | 0.744 | 0.696 | 0.286 | 0.485 | |
| mCSM | 0.298 | 0.955 | 0.781 | 0.354 | 0.638 | |
| CUPSAT | 0.513 | 0.798 | 0.723 | 0.305 | 0.188 | |
| I-Mutant2.0_PDB | 0.650 | 0.922 | 0.850 | 0.601 | 0.689 | |
| PoPMuSiC | 0.333 | 0.936 | 0.776 | 0.347 | 0.626 | |
| AUTO-MUTE2.0_SVM | 0.802 | 0.848 | 0.840 | 0.560 | 0.716 | |
| AUTO-MUTE2.0_RF/TR | 0.604 | 0.979 | 0.879 | 0.676 | 0.725 | |
| MAESTRO | 0.457 | 0.850 | 0.746 | 0.322 | 0.566 | |
| Sequence-based | iStable2.0_SEQ | 0.670 | 0.953 | 0.877 | 0.672 | 0.820 |
| iPTREE-STAB | 0.537 | 0.945 | 0.837 | 0.550 | 0.484 | |
| I-Mutant2.0_SEQ | 0.565 | 0.919 | 0.825 | 0.525 | 0.625 | |
| MUpro_SVM | 0.599 | 0.906 | 0.825 | 0.531 | – | |
| MUpro_NN | 0.536 | 0.924 | 0.821 | 0.509 | – | |
Impact evaluation of OSM for prediction performance from the classification model.
| Model | Method | S3568 | S630 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sn | Sp | Acc | MCC | Sn | Sp | Acc | MCC | |||
| Structure-based | iStable2.0_PDB | 0.758 | 0.964 | 0.908 | 0.758 | 0.718 | 0.953 | 0.892 | 0.708 | |
| iStable2.0_PBD_SASC | 0.740 | 0.964 | 0.904 | 0.747 | 0.687 | 0.955 | 0.886 | 0.689 | ||
| Sequence-based | iStable2.0_SEQ | 0.670 | 0.953 | 0.877 | 0.672 | 0.644 | 0.953 | 0.873 | 0.652 | |
| iStable2.0_SEQ_SASC | 0.640 | 0.941 | 0.861 | 0.625 | 0.620 | 0.938 | 0.856 | 0.603 | ||
SASC: indicate the models used SAM and SCM only.
Impact evaluation of OSM for prediction performance from the regression model.
| Model | Method | S3568 | S630 |
|---|---|---|---|
| PCC | PCC | ||
| Structure-based | iStable2.0_PDB_Regression | 0.864 | 0.714 |
| iStable2.0_PDB_Regression_SASC | 0.861 | 0.709 | |
| Sequence-based | iStable2.0_SEQ_Regression | 0.820 | 0.695 |
| iStable2.0_SEQ_Regression_SASC | 0.821 | 0.651 |
SASC: indicate the models used SAM and SCM only.
Performance of classifiers and regression with structure-based and sequence-based models with S630.
| Model | Tool | Classification | Regression | |||
|---|---|---|---|---|---|---|
| Sn | Sp | Acc | MCC | PCC | ||
| Structure-based | iStable2.0_PDB | 0.718 | 0.953 | 0.892 | 0.708 | 0.714 |
| iStable_PDB | 0.744 | 0.901 | 0.860 | 0.640 | 0.665 | |
| DUET | 0.405 | 0.906 | 0.776 | 0.358 | 0.458 | |
| SDM | 0.620 | 0.392 | 0.451 | 0.010 | 0.349 | |
| SDM2 | 0.497 | 0.771 | 0.700 | 0.256 | 0.352 | |
| mCSM | 0.239 | 0.953 | 0.768 | 0.285 | 0.447 | |
| CUPSAT | 0.442 | 0.82 | 0.722 | 0.266 | 0.274 | |
| I-Mutant2.0_PDB | 0.571 | 0.929 | 0.837 | 0.547 | 0.669 | |
| PoPMuSiC | 0.344 | 0.901 | 0.757 | 0.291 | 0.424 | |
| AUTO-MUTE2.0_SVM | 0.245 | 0.981 | 0.790 | 0.370 | 0.520 | |
| AUTO-MUTE2.0_RF/TR | 0.350 | 0.981 | 0.817 | 0.473 | 0.534 | |
| MAESTRO | 0.417 | 0.807 | 0.706 | 0.227 | 0.329 | |
| Sequence-based | iStable2.0_SEQ | 0.644 | 0.953 | 0.873 | 0.652 | 0.695 |
| iStable_SEQ | 0.702 | 0.903 | 0.849 | 0.611 | – | |
| iPTREE-STAB | 0.350 | 0.970 | 0.810 | 0.443 | 0.496 | |
| I-Mutant2.0_SEQ | 0.509 | 0.927 | 0.819 | 0.491 | 0.546 | |
| MUpro_SVM | 0.264 | 0.923 | 0.752 | 0.247 | – | |
| MUpro_NN | 0.245 | 0.934 | 0.756 | 0.248 | – | |
| EASE-MM | 0.693 | 0.732 | 0.722 | 0.384 | 0.541 | |
| INPS | 0.472 | 0.857 | 0.757 | 0.343 | 0.449 | |
Comparison of the performance of models with different thresholds tested with S630.
| Model | Method | Sn | Sp | Acc | MCC |
|---|---|---|---|---|---|
| Structure-based | iStable2.0_PDB | 0.718 | 0.953 | 0.892 | 0.708 |
| Threshold: 0.5 | 0.451 | 0.960 | 0.894 | 0.475 | |
| Threshold: 0.1 | 0.601 | 0.953 | 0.876 | 0.613 | |
| Threshold: 0 | 0.577 | 0.949 | 0.852 | 0.590 | |
| Threshold:−0.1 | 0.608 | 0.932 | 0.841 | 0.585 | |
| Threshold: −0.5 | 0.684 | 0.887 | 0.802 | 0.590 | |
| Sequence-based | iStable2.0_PDB | 0.644 | 0.953 | 0.873 | 0.652 |
| Threshold: 0.5 | 0.476 | 0.951 | 0.889 | 0.468 | |
| Threshold: 0.1 | 0.601 | 0.945 | 0.870 | 0.595 | |
| Threshold: 0 | 0.613 | 0.942 | 0.857 | 0.607 | |
| Threshold: −0.1 | 0.593 | 0.929 | 0.832 | 0.569 | |
| Threshold:−0.5 | 0.609 | 0.898 | 0.776 | 0.539 |
Evaluation of prediction results with data from pH-temperature ranges by accuracy.
| pH | ≦6 | 6 ~ 8 | >8 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Temperature (°C) | ≦37 | 37 ~ 65 | >65 | ≦37 | 37 ~ 65 | >65 | ≦37 | 37 ~ 65 | >65 |
| iStable2.0_PDB | 0.857 | 0.870 | 1.000 | 0.936 | 0.837 | 0.929 | 0.556 | 1.000 | 1.000 |
| iStable_PDB | 0.898 | 0.844 | 1.000 | 0.900 | 0.741 | 0.571 | 0.722 | 0.800 | 1.000 |
| DUET | 0.755 | 0.766 | 1.000 | 0.829 | 0.660 | 0.714 | 0.667 | 0.900 | 0.000 |
| SDM | 0.367 | 0.636 | 0.969 | 0.819 | 0.558 | 0.500 | 0.500 | 0.800 | 1.000 |
| SDM2 | 0.673 | 0.662 | 0.938 | 0.758 | 0.605 | 0.500 | 0.500 | 0.800 | 0.500 |
| mCSM | 0.694 | 0.779 | 1.000 | 0.833 | 0.626 | 0.786 | 0.722 | 0.800 | 0.000 |
| CUPSAT | 0.796 | 0.857 | 0.969 | 0.740 | 0.619 | 0.571 | 0.278 | 0.700 | 0.000 |
| I-Mutant2.0_PDB | 0.837 | 0.909 | 1.000 | 0.890 | 0.748 | 0.714 | 0.278 | 0.900 | 0.000 |
| PoPMuSiC | 0.694 | 0.740 | 1.000 | 0.822 | 0.653 | 0.643 | 0.611 | 0.700 | 0.000 |
| AUTO-MUTE2.0_SVM | 0.714 | 0.818 | 1.000 | 0.904 | 0.639 | 0.643 | 0.278 | 0.600 | 0.000 |
| AUTO-MUTE2.0_RF | 0.878 | 0.870 | 1.000 | 0.911 | 0.653 | 0.714 | 0.278 | 0.600 | 0.000 |
| MAESTRO | 0.714 | 0.753 | 1.000 | 0.694 | 0.694 | 0.643 | 0.278 | 0.800 | 0.500 |
| iStable2.0_SEQ | 0.878 | 0.844 | 1.000 | 0.911 | 0.830 | 0.857 | 0.500 | 0.900 | 1.000 |
| iStable_SEQ | 0.878 | 0.857 | 1.000 | 0.922 | 0.769 | 0.500 | 0.722 | 0.900 | 1.000 |
| iPTREE-STAB | 0.755 | 0.844 | 1.000 | 0.872 | 0.667 | 0.786 | 0.722 | 0.900 | 0.000 |
| I-Mutant2.0_SEQ | 0.796 | 0.896 | 1.000 | 0.865 | 0.741 | 0.571 | 0.333 | 1.000 | 0.000 |
| MUpro_SVM | 0.714 | 0.727 | 0.969 | 0.883 | 0.585 | 0.571 | 0.278 | 0.500 | 0.000 |
| MUpro_NN | 0.714 | 0.753 | 0.969 | 0.883 | 0.585 | 0.500 | 0.278 | 0.600 | 0.000 |
| EASE-MM | 0.816 | 0.766 | 1.000 | 0.687 | 0.701 | 0.571 | 0.556 | 0.900 | 0.500 |
| INPS | 0.776 | 0.597 | 0.844 | 0.808 | 0.755 | 0.643 | 0.500 | 0.800 | 1.000 |