| Literature DB >> 35273581 |
Zahoor Ahmed1, Hasan Zulfiqar1, Abdullah Aman Khan2,3, Ijaz Gul1,4, Fu-Ying Dao1, Zhao-Yue Zhang1, Xiao-Long Yu5, Lixia Tang1.
Abstract
Thermophilic proteins have important application value in biotechnology and industrial processes. The correct identification of thermophilic proteins provides important information for the application of these proteins in engineering. The identification method of thermophilic proteins based on biochemistry is laborious, time-consuming, and high cost. Therefore, there is an urgent need for a fast and accurate method to identify thermophilic proteins. Considering this urgency, we constructed a reliable benchmark dataset containing 1,368 thermophilic and 1,443 non-thermophilic proteins. A multi-layer perceptron (MLP) model based on a multi-feature fusion strategy was proposed to discriminate thermophilic proteins from non-thermophilic proteins. On independent data set, the proposed model could achieve an accuracy of 96.26%, which demonstrates that the model has a good application prospect. In order to use the model conveniently, a user-friendly software package called iThermo was established and can be freely accessed at http://lin-group.cn/server/iThermo/index.html. The high accuracy of the model and the practicability of the developed software package indicate that this study can accelerate the discovery and engineering application of thermally stable proteins.Entities:
Keywords: feature selection; iThermo; neural network; protein feature extraction; thermophilic proteins
Year: 2022 PMID: 35273581 PMCID: PMC8902591 DOI: 10.3389/fmicb.2022.790063
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Flow chart of a framework for predicting thermophilic proteins.
Best hyperparameters for MLP classifier.
| Hyperparameters | Value |
| Batch size | 60 |
| Epochs | 1200 |
| Learning rate | 0.001 |
| Momentum | 0.8 |
| Decay | 1e–8 |
| Nesterov | True |
| Verbose | 1 |
Performance of descriptors before and after feature selection and in feature fusion.
| Descriptors | SN | SP | AAC | MCC | AUC | |
| Before feature selection | ACC | 0.9304 | 0.9308 | 0.9306 | 0.8626 | 0.9723 |
| tPseAAC | 0.9011 | 0.8793 | 0.8899 | 0.7914 | 0.9551 | |
| aPseAAC | 0.8901 | 0.8720 | 0.8808 | 0.7714 | 0.9519 | |
| DC | 0.7546 | 0.8720 | 0.8149 | 0.5963 | 0.8812 | |
| DDE | 0.8022 | 0.8374 | 0.8203 | 0.6319 | 0.9081 | |
| CKSAAP | 0.7912 | 0.5398 | 0.6619 | 0.3855 | 0.7365 | |
| CTD | 0.9377 | 0.9100 | 0.9235 | 0.8612 | 0.9786 | |
| After feature selection | ACC | 0.9524 | 0.9239 | 0.9377 | 0.8902 | 0.9735 |
| tPseAAC | 0.8938 | 0.8962 | 0.8950 | 0.7943 | 0.9580 | |
| aPseAAC | 0.8971 | 0.8824 | 0.8895 | 0.7863 | 0.9610 | |
| DC | 0.8859 | 0.8754 | 0.8416 | 0.6620 | 0.9143 | |
| DDE | 0.7802 | 0.8651 | 0.8238 | 0.6430 | 0.9165 | |
| CKSAAP | 0.7070 | 0.8374 | 0.7740 | 0.5156 | 0.8349 | |
| CTD | 0.9167 | 0.9135 | 0.9150 | 0.8330 | 0.9644 | |
| Feature fusion | 0.9634 | 0.9619 | 0.9626 | 0.9269 | 0.9864 |
FIGURE 2Performance comparison of MLP classifier with other classifiers.
FIGURE 3Contribution of features of all descriptors to model performance.