| Literature DB >> 36101379 |
Xinke Zhan1, Mang Xiao2, Zhuhong You3, Chenggang Yan4,5, Jianxin Guo1, Liping Wang1, Yaoqi Sun4, Bingwan Shang1.
Abstract
Protein-protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein-protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.Entities:
Keywords: KNN; PSSM; SVM; locality preserving projections; rotation forest
Year: 2022 PMID: 36101379 PMCID: PMC9311754 DOI: 10.3390/biology11070995
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
The results of different feature vectors on the Yeast and H. pylori datasets.
| Feature Vectors | Dataset | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) |
|---|---|---|---|---|---|
| 40 |
|
|
|
|
|
|
| 92.18 ± 0.70 | 93.66 ± 2.21 | 90.56 ± 1.52 | 85.54 ± 1.15 | |
| 60 |
| 92.55 ± 0.32 | 96.56 ± 0.53 | 88.25 ± 0.81 | 86.16 ± 0.31 |
|
| 92.49 ± 2.18 | 94.59 ± 2.13 | 90.12 ± 2.59 | 86.16 ± 3.67 | |
| 80 |
| 92.60 ± 0.32 | 96.37 ± 0.55 | 88.51 ± 0.54 | 86.23 ± 0.57 |
|
|
|
|
|
| |
| 100 |
| 91.90 ± 0.44 | 94.94 ± 0.90 | 88.52 ± 0.46 | 85.08 ± 0.73 |
|
| 92.21 ± 1.19 | 94.10 ± 1.74 | 90.12 ± 2.31 | 85.63 ± 2.03 | |
| 120 |
| 92.56 ± 0.75 | 96.44 ± 0.79 | 88.40 ± 0.93 | 86.19 ± 1.27 |
|
| 91.90 ± 1.66 | 93.94 ± 1.14 | 89.56 ± 2.56 | 85.14 ± 2.81 | |
| 140 |
| 92.52 ± 0.48 | 95.96 ± 0.32 | 88.77 ± 0.79 | 86.12 ± 0.81 |
|
| 91.46 ± 1.09 | 92.74 ± 2.54 | 89.89 ± 1.84 | 84.34 ± 1.84 |
Figure 1The accuracy performance of the Yeast and H. pylori datasets.
Prediction performance of the Yeast dataset based on five-fold cross-validation method.
| Testing Set | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) | AUC |
|---|---|---|---|---|---|
| 1 | 92.58 | 97.34 | 88.14 | 86.23 | 0.9509 |
| 2 | 92.80 | 96.24 | 88.78 | 86.58 | 0.9502 |
| 3 | 92.85 | 97.39 | 88.35 | 86.68 | 0.9511 |
| 4 | 92.00 | 95.91 | 87.44 | 85.20 | 0.9472 |
| 5 | 93.83 | 97.13 | 90.02 | 88.37 | 0.9535 |
|
|
|
|
|
|
|
Prediction performance of the H. pylori dataset based on five-fold cross-validation method.
| Testing Set | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) | AUC |
|---|---|---|---|---|---|
| 1 | 92.80 | 93.50 | 91.52 | 86.61 | 0.9449 |
| 2 | 91.77 | 93.19 | 89.97 | 84.87 | 0.9373 |
| 3 | 91.60 | 93.49 | 90.10 | 84.59 | 0.9364 |
| 4 | 93.65 | 95.02 | 92.07 | 88.10 | 0.9564 |
| 5 | 92.97 | 95.32 | 90.44 | 86.91 | 0.9565 |
|
|
|
|
|
|
|
Figure 2ROC curves yielded by RF on Yeast.
Figure 3ROC curves yielded by RF on H. pylori.
Prediction performance of the Yeast dataset based on five-fold cross-validation method.
| Testing Set | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) | AUC |
|---|---|---|---|---|---|
| 1 | 81.27 | 83.41 | 79.90 | 69.53 | 0.8866 |
| 2 | 81.18 | 81.28 | 80.02 | 69.42 | 0.8802 |
| 3 | 79.48 | 80.85 | 78.37 | 67.37 | 0.8700 |
| 4 | 80.33 | 80.54 | 79.07 | 68.37 | 0.8791 |
| 5 | 81.36 | 80.88 | 80.95 | 69.65 | 0.8860 |
|
|
|
|
|
|
|
Prediction performance of H. pylori dataset based on five-fold cross-validation method.
| Testing Set | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) | AUC |
|---|---|---|---|---|---|
| 1 | 88.16 | 88.21 | 87.28 | 79.11 | 0.9495 |
| 2 | 89.37 | 92.83 | 85.12 | 80.91 | 0.9305 |
| 3 | 87.65 | 90.81 | 84.82 | 78.32 | 0.9356 |
| 4 | 88.85 | 94.47 | 82.41 | 80.02 | 0.9287 |
| 5 | 89.54 | 92.96 | 85.67 | 81.21 | 0.9477 |
|
|
|
|
|
|
|
Figure 4The ROC curve of the SVM classifier on the Yeast dataset.
Figure 5The ROC curve of the SVM classifier on the H. pylori dataset.
The experimental results compared with other prediction models in the Yeast and H. pylori datasets.
| Dataset | Model | Accu. (%) | Prec. (%) | Sen. (%) | MCC. (%) | AUC |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| SVM | 80.72 ± 0.81 | 81.39 ± 1.16 | 79.66 ± 0.98 | 68.87 ± 0.98 | 0.8804 ± 0.0067 | |
| KNN | 74.73 ± 1.38 | 76.57 ± 2.18 | 71.28 ± 1.18 | 62.15 ± 1.31 | 0.7472 ± 0.0139 | |
|
|
|
|
|
|
|
|
| SVM | 88.71 ± 0.80 | 91.86 ± 2.42 | 85.06 ± 1.76 | 79.91 ± 1.21 | 0.9384 ± 0.0097 | |
| KNN | 91.05 ± 1.01 | 91.85 ± 1.72 | 90.12 ± 0.94 | 83.70 ± 1.64 | 0.9104 ± 0.0101 |
Figure 6Comparison of ROC curves for different classifiers of RF, SVM, and KNN on two datasets: Yeast and H. pylori.
Prediction results were obtained on four independent datasets.
| Species | Test Pairs | Accu. (%) |
|---|---|---|
|
| 1412 |
|
|
| 313 |
|
|
| 1420 |
|
|
| 4013 |
|
Comparison results of different methods on H. pylori.
| Model | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) |
|---|---|---|---|---|
| Ensemble of HKNN [ | 86.60 | 85.00 | 86.70 | N/A |
| HKNN [ | 84.00 | 84.00 | 86.00 | N/A |
| Ensemble ELM [ | 87.50 | 88.95 | 86.15 | 78.13 |
| Signature products [ | 83.40 | 85.70 | 79.90 | N/A |
| Phylogenetic bootstrap [ | 75.80 | 80.20 | 69.80 | N/A |
| Boosting [ | 79.52 | 81.69 | 80.37 | 70.64 |
|
|
|
|
|
|
Comparison results of different methods on Yeast.
| Method | Model | Acc. (%) | Prec. (%) | Sen. (%) | MCC. (%) |
|---|---|---|---|---|---|
| You’s work [ | PCA-EELM | 87.00 ± 0.29 | 87.59 ± 0.32 | 86.15 ± 0.43 | 77.36 ± 0.44 |
| Zhou’s work [ | SVM+LD | 88.56 ± 0.33 | 89.50 ± 0.60 | 87.37 ± 0.22 | 77.15 ± 0.68 |
| Yang’s work [ | Cod1 | 75.08 ± 1.13 | 74.75 ± 1.23 | 75.81 ± 1.20 | N/A |
| Cod2 | 80.04 ± 1.06 | 95.44 ± 0.30 | 96.25 ± 1.26 | N/A | |
| Cod3 | 80.41 ± 0.47 | 65.50 ± 1.44 | 97.90 ± 1.06 | N/A | |
| Cod4 | 86.15 ± 1.17 | 90.24 ± 1.34 | 81.03 ± 1.74 | N/A | |
| Guo’s work [ | ACC | 89.33 ± 2.67 | 88.87 ± 6.16 | 89.93 ± 3.68 | N/A |
| AC | 87.36 ± 1.38 | 87.82 ± 4.33 | 87.30 ± 4.68 | N/A | |
|
|
|
|
|
|
|