| Literature DB >> 35910197 |
Rui Fan1,2, Bing Suo3, Yijie Ding2.
Abstract
The prediction of protein function is a common topic in the field of bioinformatics. In recent years, advances in machine learning have inspired a growing number of algorithms for predicting protein function. A large number of parameters and fairly complex neural networks are often used to improve the prediction performance, an approach that is time-consuming and costly. In this study, we leveraged traditional features and machine learning classifiers to boost the performance of vesicle transport protein identification and make the prediction process faster. We adopt the pseudo position-specific scoring matrix (PsePSSM) feature and our proposed new classifier hypergraph regularized k-local hyperplane distance nearest neighbour (HG-HKNN) to classify vesicular transport proteins. We address dataset imbalances with random undersampling. The results show that our strategy has an area under the receiver operating characteristic curve (AUC) of 0.870 and a Matthews correlation coefficient (MCC) of 0.53 on the benchmark dataset, outperforming all state-of-the-art methods on the same dataset, and other metrics of our model are also comparable to existing methods.Entities:
Keywords: hypergraph learning; local hyperplane; membrane proteins; protein function prediction; transport proteins
Year: 2022 PMID: 35910197 PMCID: PMC9326258 DOI: 10.3389/fgene.2022.960388
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Flowchart of our model.
Details of the dataset used in our study.
| Original | Train Set | Train Set (RUS) | Test Set | |
|---|---|---|---|---|
| Vesicular transport | 2533 | 2214 | 2214 | 319 |
| Non-vesicular transport | 9086 | 7573 | 2214 | 1513 |
FIGURE 2Sketch of an HKNN.
FIGURE 3A hypergraph and its association matrix H.
Details in parameter tuning of .
| k | AUC | ACC | Precision | Specificity |
|---|---|---|---|---|
| 200 | 0.8127 | 0.7256 | 0.7677 | 0.8035 |
| 350 | 0.8241 | 0.7319 | 0.7897 | 0.8311 |
| 500 | 0.8284 | 0.7362 | 0.7940 | 0.8338 |
| 650 | 0.8292 | 0.7398 | 0.7954 | 0.8333 |
| 800 | 0.8287 | 0.7425 | 0.7927 | 0.8265 |
| 950 | 0.8279 | 0.7437 | 0.7840 | 0.8134 |
Comparison of classification metrics among different kernels.
| Kernel Type | AUC | MCC | ACC | Precision | Specificity |
|---|---|---|---|---|---|
| Linear | 0.7618 | 0.3739 | 0.6719 | 0.7833 | 0.8686 |
| Polynomial | 0.8021 | 0.4664 | 0.7322 | 0.7519 | 0.7687 |
| Laplacian | 0.8243 | 0.5153 | 0.7575 | 0.7592 | 0.7597 |
| RBF | 0.8309 | 0.5099 | 0.7538 | 0.7760 | 0.7922 |
Comparison of classification metrics among different models.
| Techniques | AUC | MCC | ACC | Precision | Specificity |
|---|---|---|---|---|---|
| KNN | 0.7824 | 0.4189 | 0.7078 | 0.6886 | 0.6519 |
| RF | 0.8019 | 0.4576 | 0.7285 | 0.7267 | 0.7231 |
| SVM | 0.8091 | 0.4820 | 0.7405 | 0.7466 | 0.7502 |
| HKNN | 0.8203 | 0.4976 | 0.7484 | 0.7442 | 0.7371 |
| OG-HKNN | 0.8289 | 0.4944 | 0.7446 | 0.7843 | 0.8130 |
| HG-HKNN | 0.8309 | 0.5099 | 0.7538 | 0.7760 | 0.7922 |
Comparison of our model with other existing technologies.
| Techniques | AUC | MCC | ACC | Sensitivity | Precision | Specificity |
|---|---|---|---|---|---|---|
| GRU | 0.848 | 0.44 | 79.2 | 70.8 | 44.0 | 81.0 |
| BLSTM | 0.846 | 0.46 | 84.6 | 54.2 | 55.8 | 90.9 |
| BLAST | 0.82 | 0.43 | 83.6 | 54.1 | 52.8 | 89.8 |
| Vesicular-GRU | 0.861 | 0.52 | 82.3 | 79.2 | 48.7 | 82.9 |
| HG-HKNN | 0.870 | 0.53 | 84.1 | 72.1 | 53.2 | 86.7 |