| Literature DB >> 35625503 |
Jie Pan1, Shiwei Wang1, Changqing Yu2, Liping Li2,3, Zhuhong You4, Yanmei Sun1.
Abstract
Protein-protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the Yeast, Human, and Oryza sativa PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis.Entities:
Keywords: Discrete Hilbert transform; position-specific scoring matrices; protein–protein interaction; rotation forest
Year: 2022 PMID: 35625503 PMCID: PMC9139052 DOI: 10.3390/biology11050775
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Five-fold CV results performed by the proposed model on the Yeast PPIs dataset.
| Dataset | ACC (%) | Sen. (%) | Spec. (%) | PR (%) | MCC (%) | AUC |
|---|---|---|---|---|---|---|
| 1 | 90.88 | 89.13 | 92.67 | 92.57 | 83.42 | 0.9562 |
| 2 | 91.55 | 90.31 | 92.79 | 92.55 | 84.52 | 0.9581 |
| 3 | 92.40 | 89.42 | 95.37 | 95.04 | 85.93 | 0.9581 |
| 4 | 92.49 | 90.90 | 94.15 | 94.20 | 86.10 | 0.9608 |
| 5 | 92.31 | 89.15 | 95.30 | 94.73 | 85.75 | 0.9599 |
| Average | 91.93 ± 0.69 | 89.78 ± 0.79 | 94.05 ± 1.30 | 93.82 ± 1.19 | 85.14 ± 1.15 | 0.9586 ± 0.0018 |
Five-fold CV results performed by the proposed model on the Human PPIs dataset.
| Dataset | ACC (%) | Sen. (%) | Spec. (%) | PR (%) | MCC (%) | AUC |
|---|---|---|---|---|---|---|
| 1 | 96.20 | 95.23 | 97.08 | 96.72 | 92.67 | 0.9834 |
| 2 | 95.47 | 95.23 | 95.69 | 95.47 | 91.34 | 0.9808 |
| 3 | 96.94 | 97.10 | 96.78 | 96.62 | 94.06 | 0.9850 |
| 4 | 96.63 | 95.73 | 97.40 | 96.89 | 93.44 | 0.9817 |
| 5 | 96.51 | 95.53 | 97.41 | 97.14 | 93.24 | 0.9846 |
| Average | 96.35 ± 0.56 | 95.76 ± 0.78 | 96.87 ± 0.71 | 96.57 ± 0.64 | 92.95 ± 1.03 | 0.9831 ± 0.0018 |
Five-fold CV results performed by the proposed model on the Oryza sativa PPIs dataset.
| Dataset | ACC (%) | Sen. (%) | Spec. (%) | PR (%) | MCC (%) | AUC |
|---|---|---|---|---|---|---|
| 1 | 93.91 | 94.64 | 93.22 | 92.94 | 88.55 | 0.9635 |
| 2 | 94.38 | 94.64 | 94.13 | 93.83 | 89.38 | 0.9656 |
| 3 | 94.79 | 95.09 | 94.48 | 94.70 | 90.12 | 0.9674 |
| 4 | 94.22 | 95.28 | 93.17 | 93.22 | 89.10 | 0.9628 |
| 5 | 93.91 | 92.84 | 95.08 | 95.40 | 88.54 | 0.9689 |
| Average | 94.24 ± 0.37 | 94.50 ± 0.97 | 94.02 ± 0.82 | 94.02 ± 1.03 | 89.14 ± 0.66 | 0.9667 ± 0.0022 |
Figure 1ROC curves generated by the proposed model on the Yeast dataset.
Figure 2ROC curves generated by the proposed model on the Human dataset.
Figure 3ROC curves generated by the proposed model on the Oryza sativa dataset.
Predictive performance comparison among four different classifiers.
| Dataset | Method | ACC (%) | Sens. (%) | Spec. (%) | PR (%) | MCC (%) | AUC |
|---|---|---|---|---|---|---|---|
|
| SVM | 84.44 ± 0.84 | 83.14 ± 1.01 | 85.77 ± 1.40 | 85.37 ± 1.68 | 73.71 ± 1.17 | 0.9149 ± 0.0061 |
| RF | 81.97 ± 0.41 | 80.26 ± 1.27 | 83.68 ± 0.48 | 83.09 ± 0.84 | 70.41 ± 0.55 | 0.8979 ± 0.0038 | |
| KNN | 81.39 ± 1.07 | 75.19 ± 2.16 | 87.63 ± 1.17 | 85.88 ± 1.21 | 69.47 ± 1.37 | 0.8967 ± 0.0057 | |
| AdaBoost | 78.15 ± 1.82 | 76.88 ± 1.90 | 79.45 ± 2.95 | 85.46 ± 2.85 | 65.87 ± 1.97 | 0.8546 ± 0.0120 | |
| RoF | 91.93 ± 0.69 | 89.78 ± 0.79 | 94.05 ± 1.30 | 93.82 ± 1.19 | 85.14 ± 1.15 | 0.9586 ± 0.0018 | |
|
| SVM | 87.93 ± 0.86 | 85.78 ± 1.28 | 89.89 ± 1.37 | 88.59 ± 1.53 | 78.69 ± 1.31 | 0.9446 ± 0.0069 |
| RF | 95.32 ± 0.96 | 92.63 ± 1.93 | 97.82 ± 0.94 | 97.50 ± 1.03 | 91.04 ± 1.74 | 0.9804 ± 0.0016 | |
| KNN | 87.92 ± 1.19 | 76.67 ± 2.44 | 98.23 ± 0.49 | 97.51 ± 0.74 | 78.10 ± 1.96 | 0.9758 ± 0.0046 | |
| AdaBoost | 75.64 ± 1.69 | 71.36 ± 3.87 | 79.53 ± 3.04 | 76.19 ± 2.29 | 62.88 ± 1.83 | 0.8362 ± 0.0170 | |
| RoF | 96.35 ± 0.56 | 95.76 ± 0.78 | 96.87 ± 0.71 | 96.57 ± 0.64 | 92.95 ± 1.03 | 0.9831 ± 0.0018 | |
|
| SVM | 85.58 ± 1.27 | 84.06 ± 1.08 | 87.16 ± 2.46 | 86.73 ± 2.63 | 75.32 ± 1.78 | 0.9246 ± 0.0085 |
| RF | 84.19 ± 0.92 | 81.71 ± 1.23 | 86.68 ± 1.03 | 85.99 ± 0.85 | 73.34 ± 1.24 | 0.9070 ± 0.0096 | |
| KNN | 76.51 ± 0.70 | 85.19 ± 0.88 | 67.82 ± 0.91 | 72.58 ± 1.10 | 63.50 ± 0.77 | 0.8327 ± 0.0040 | |
| AdaBoost | 80.82 ± 1.37 | 81.50 ± 1.87 | 80.16 ± 1.61 | 80.40 ± 2.00 | 69.01 ± 1.67 | 0.8876 ± 0.0132 | |
| RoF | 94.24 ± 0.37 | 94.50 ± 0.97 | 94.02 ± 0.82 | 94.02 ± 1.03 | 89.14 ± 0.66 | 0.9667 ± 0.0022 |
Figure 4Comparison of the results produced by different classifier models on three benchmark datasets. (a) Is the obtained accuracy results on the three benchmark datasets. (b) Is the obtained AUC results on the three benchmark datasets.
Prediction accuracy of the four independent datasets.
| Species | Test Pair | Our Method | Ding et al. [ | Huang et al. [ | Zhan et al. [ | Wang et al. [ |
|---|---|---|---|---|---|---|
|
| 1412 | 94.29% | 90.23% | 82.22% | 91.93% | 80.10% |
|
| 1420 | 91.67% | 90.34% | 82.18% | 91.34% | N/A |
|
| 313 | 93.12% | 91.37% | 79.87% | 94.89% | 89.14% |
|
| 4013 | 92.14% | 86.72% | 81.19% | 93.20% | 92.96% |
Performance comparisons of computational methods on the Yeast dataset.
| Author | Method | ACC (%) | PR (%) | Sens. (%) | MCC (%) |
|---|---|---|---|---|---|
| Guo et al. [ | ACC + SVM | 89.33 | 89.93 | 88.87 | N/A |
| Yang et al. [ | LD + KNN | 86.15 | 90.24 | 81.30 | N/A |
| Wang et al. [ | 3-MER + CNN | 90.26 | 91.65 | 88.14 | 82.38 |
| Zhou et al. [ | LD + SVM | 88.56 | 89.50 | 87.37 | 77.15 |
| An et al. [ | PSSMMF + SVM | 90.48 | 90.58 | 90.26 | 82.84 |
| You et al. [ | PCA + ELLM | 87.00 | 87.59 | 86.15 | 77.36 |
| Our method | DHT + RoF | 91.93 | 93.82 | 89.78 | 85.14 |
Performance comparisons of computational methods on the Human dataset.
| Author | Method | ACC (%) | PR (%) | Sens. (%) | MCC (%) |
|---|---|---|---|---|---|
| Ding et al. [ | MMI + RF | 96.08 | 96.67 | 95.05 | 92.17 |
| Li et al. [ | OLPP + RoF | 96.09 | 96.56 | 95.20 | 92.47 |
| Pan et al. [ | LDA + SVM | 90.70 | N/A | 89.7 | 81.3 |
| Li et al. [ | IWLD + SVM | 90.57 | 89.01 | 91.61 | 81.22 |
| Our method | DHT + RoF | 96.35 | 96.57 | 95.76 | 92.95 |