| Literature DB >> 31319578 |
Yang Li1, Li-Ping Li2, Lei Wang3, Chang-Qing Yu4, Zheng Wang1, Zhu-Hong You1.
Abstract
Protein plays a critical role in the regulation of biological cell functions. Among them, whether proteins interact with each other has become a fundamental problem, because proteins usually perform their functions by interacting with other proteins. Although a large amount of protein-protein interactions (PPIs) data has been produced by high-throughput biotechnology, the disadvantage of biological experimental technique is time-consuming and costly. Thus, computational methods for predicting protein interactions have become a research hot spot. In this research, we propose an efficient computational method that combines Rotation Forest (RF) classifier with Local Binary Pattern (LBP) feature extraction method to predict PPIs from the perspective of Position-Specific Scoring Matrix (PSSM). The proposed method has achieved superior performance in predicting Yeast, Human, and H. pylori datasets with average accuracies of 92.12%, 96.21%, and 86.59%, respectively. In addition, we also evaluated the performance of the proposed method on the four independent datasets of C. elegans, H. pylori, H. sapiens, and M. musculus datasets. These obtained experimental results fully prove that our model has good feasibility and robustness in predicting PPIs.Entities:
Keywords: position-specific scoring matrix; protein sequence; protein–protein interactions; rotation forest
Year: 2019 PMID: 31319578 PMCID: PMC6679202 DOI: 10.3390/ijms20143511
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The workflow of the proposed method.
5-fold cross-validation results obtained using the proposed method on three datasets.
| Data Sets | ACC (%) | PE (%) | SN (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
|
| 92.12 ± 0.54 | 94.20 ± 0.78 | 89.76 ± 0.96 | 85.46 ± 0.92 | 96.11 ± 0.77 |
|
| 96.21 ± 0.76 | 97.23 ± 1.19 | 94.77 ± 1.09 | 92.70 ± 1.42 | 98.62 ± 0.48 |
|
| 86.59 ± 0.48 | 87.70 ± 1.89 | 85.17 ± 2.20 | 76.73 ± 0.74 | 92.69 ± 0.48 |
ACC = accuracy, PE = precision, SN = sensitivity, MCC = Matthews correlation coefficient, AUC = Area Under the Curve.
Figure 2Receiver Operating Characteristic (ROC) curves are performed by the proposed method on Yeast protein–protein interactions (PPIs) dataset.
Figure 3Receiver Operating Characteristic (ROC) curves are performed by the proposed method on Human protein–protein interactions (PPIs) dataset.
Figure 4Receiver Operating Characteristic (ROC) curves are performed by the proposed method on H. pylori protein–protein interactions (PPIs) dataset.
Comparison of the results of the proposed model and Support Vector Machine (SVM) model in three datasets.
| Dataset | Classifier | ACC (%) | PE (%) | SN (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
|
| RF | 92.12 ± 0.54 | 94.20 ± 0.78 | 89.76 ± 0.96 | 85.46 ± 0.92 | 96.11 ± 0.77 |
| SVM | 86.99 ± 0.43 | 88.05 ± 0.88 | 85.62 ± 1.23 | 77.36 ± 0.64 | 93.66 ± 0.64 | |
|
| RF | 96.21 ± 0.76 | 97.23 ± 1.19 | 94.77 ± 1.09 | 92.70 ± 1.42 | 98.62 ± 0.48 |
| SVM | 92.56 ± 0.70 | 93.71 ± 1.06 | 90.47 ± 0.82 | 86.18 ± 1.23 | 97.36 ± 0.65 | |
|
| RF | 86.59 ± 0.48 | 87.70 ± 1.89 | 85.17 ± 2.20 | 76.73 ± 0.74 | 92.69 ± 0.48 |
| SVM | 81.62 ± 1.22 | 80.73 ± 3.79 | 83.40 ± 3.56 | 69.93 ± 1.56 | 89.52 ± 0.53 |
Figure 5Receiver Operating Characteristics (ROC) curves are performed by the Support Vector Machine (SVM) method on Yeast protein–protein interactions (PPIs) dataset.
Figure 6Receiver Operating Characteristics (ROC) curves are performed by the Support Vector Machine (SVM) method on Human protein–protein interactions (PPIs) dataset.
Figure 7Receiver Operating Characteristics (ROC) curves are performed by the Support Vector Machine (SVM) method on H. pylori protein–protein interactions (PPIs) dataset.
Performance comparison of different methods on Yeast dataset.
| Author | Model | ACC (%) | PE (%) | SN (%) | MCC (%) |
|---|---|---|---|---|---|
| Guos’ work [ | ACC | 89.33 ± 2.67 | 88.87 ± 6.16 | 89.93 ± 3.68 | N/A |
| AC | 87.36 ± 1.38 | 87.82 ± 4.33 | 87.30 ± 4.68 | N/A | |
| You et al.’s work [ | PCA-EELM | 87.00 ± 0.29 | 87.59 ± 0.32 | 86.15 ± 0.43 | 77.36 ± 0.44 |
| Yang et al.’s work [ | Cod1 | 75.08 ± 1.13 | 74.75 ± 1.23 | 75.81 ± 1.20 | N/A |
| Cod2 | 80.04 ± 1.06 | 82.17 ± 1.35 | 76.77 ± 0.69 | N/A | |
| Cod3 | 80.41 ± 0.47 | 81.86 ± 0.99 | 78.14 ± 0.90 | N/A | |
| Cod4 | 86.15 ± 1.17 | 90.24 ± 1.34 | 81.03 ± 1.74 | N/A | |
| Zhou et al.’s work [ | SVM + LD | 88.56 ± 0.33 | 89.50 ± 0.60 | 87.37 ± 0.22 | 77.15 ± 0.68 |
| Wang et al.’s work [ | PCVM + ZM | 94.48 ± 1.2 | 93.92 ± 2.4 | 95.13 ± 2.0 | 89.58 ± 2.2 |
| Our method | SVM + PSSM | 86.99 ± 0.43 | 88.05 ± 0.88 | 85.62 ± 1.23 | 77.36 ± 0.64 |
| RF + PSSM | 92.12 ± 0.54 | 94.20 ± 0.78 | 89.76 ± 0.96 | 85.46 ± 0.92 |
ACC: Auto Cross Covariance; AC: Auto Covariance; PCA-EELM: Principal component analysis-ensemble extreme learning machine; LD: Local description; PCVM + ZM: Probabilistic Classification Vector Machines+ Zernike Moments.
Performance comparison of different methods on Human dataset.
| Model | ACC (%) | SN (%) | MCC (%) |
|---|---|---|---|
| LDA + RF [ | 96.4 | 94.2 | 92.8 |
| LDA + RoF | 95.7 | 97.6 | 91.8 |
| LDA + SVM | 90.7 | 89.7 | 81.3 |
| AC + RF | 95.5 | 94.0 | 91.4 |
| AC + RoF | 95.1 | 93.3 | 91.0 |
| AC + SVM | 89.3 | 94.0 | 79.2 |
| Our method | 96.21 | 94.77 | 92.70 |
LDA: Linear discriminant analysis; RoF: Rotation forest; RF: Random forest.
Predicted results on four independent datasets.
| Species | Test Pairs | ACC (%) |
|---|---|---|
|
| 4013 | 94.82 |
|
| 1420 | 94.79 |
|
| 1412 | 95.11 |
|
| 313 | 93.93 |