| Literature DB >> 27437399 |
Zhen-Guo Gao1, Lei Wang2, Shi-Xiong Xia1, Zhu-Hong You1, Xin Yan3, Yong Zhou1.
Abstract
Protein-Protein Interactions (PPIs) play vital roles in most biological activities. Although the development of high-throughput biological technologies has generated considerable PPI data for various organisms, many problems are still far from being solved. A number of computational methods based on machine learning have been developed to facilitate the identification of novel PPIs. In this study, a novel predictor was designed using the Rotation Forest (RF) algorithm combined with Autocovariance (AC) features extracted from the Position-Specific Scoring Matrix (PSSM). More specifically, the PSSMs are generated using the information of protein amino acids sequence. Then, an effective sequence-based features representation, Autocovariance, is employed to extract features from PSSMs. Finally, the RF model is used as a classifier to distinguish between the interacting and noninteracting protein pairs. The proposed method achieves promising prediction performance when performed on the PPIs of Yeast, H. pylori, and independent datasets. The good results show that the proposed model is suitable for PPIs prediction and could also provide a useful supplementary tool for solving other bioinformatics problems.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27437399 PMCID: PMC4942601 DOI: 10.1155/2016/4563524
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The workflow of our method.
Figure 2The average prediction accuracy corresponding to different lg of the AC algorithm in the proposed model.
Figure 3Accuracy surface obtained from Rotation Forest for optimizing regularization parameters K and L.
5-fold cross-validation results obtained by using the proposed method on Yeast dataset.
| Testing set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 97.59 | 100.00 | 95.14 | 95.28 |
| 2 | 97.54 | 100.00 | 95.03 | 95.19 |
| 3 | 98.17 | 100.00 | 96.40 | 96.40 |
| 4 | 97.59 | 100.00 | 95.01 | 95.27 |
| 5 | 97.99 | 99.82 | 96.27 | 96.06 |
|
| ||||
|
|
|
|
|
|
Figure 4ROC curves performed by the proposed method on Yeast PPIs dataset.
5-fold cross-validation results obtained by using the proposed method on H. pylori dataset.
| Testing set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 85.76 | 87.45 | 82.87 | 75.52 |
| 2 | 83.53 | 82.65 | 84.38 | 72.49 |
| 3 | 86.11 | 87.55 | 83.57 | 76.02 |
| 4 | 81.99 | 83.27 | 79.51 | 70.42 |
| 5 | 86.82 | 90.88 | 83.55 | 77.06 |
|
| ||||
|
|
|
|
|
|
Figure 5ROC curves performed by the proposed method on H. pylori dataset.
Different methods on Yeast dataset performance comparison.
| Model | Test set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|---|
| Guo et al.'s work [ | ACC | 89.33 ± 2.67 | 88.87 ± 6.16 | 89.93 ± 3.68 | N/A |
| AC | 87.36 ± 1.38 | 87.82 ± 4.33 | 87.30 ± 4.68 | N/A | |
|
| |||||
| You et al.'s work [ | PCA-EELM | 87.00 ± 0.29 | 87.59 ± 0.32 | 86.15 ± 0.43 | 77.36 ± 0.44 |
|
| |||||
| Yang et al.'s work [ | Cod1 | 75.08 ± 1.13 | 74.75 ± 1.23 | 75.81 ± 1.20 | N/A |
| Cod2 | 80.04 ± 1.06 | 82.17 ± 1.35 | 76.77 ± 0.69 | N/A | |
| Cod3 | 80.41 ± 0.47 | 81.86 ± 0.99 | 78.14 ± 0.90 | N/A | |
| Cod4 | 86.15 ± 1.17 | 90.24 ± 0.45 | 81.03 ± 1.74 | N/A | |
|
| |||||
| Zhou et al.'s work [ | SVM + LD | 88.56 ± 0.33 | 89.50 ± 0.60 | 87.37 ± 0.22 | 77.15 ± 0.68 |
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
| |
Different methods on H. pylori dataset performance comparison.
| Model | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|
| Phylogenetic bootstrap [ | 75.80 | 80.20 | 69.80 | N/A |
| HKNN [ | 84.00 | 84.00 | 86.00 | N/A |
| Ensemble of HKNN [ | 86.60 | 85.00 | 86.70 | N/A |
| Signature products [ | 83.40 | 85.70 | 79.90 | N/A |
| Boosting [ | 79.52 | 81.69 | 80.37 | 70.64 |
| Ensemble ELM [ | 87.50 | 86.15 | 88.95 | 78.13 |
|
|
|
|
|
|
Prediction results in independent datasets.
| Species | Test pairs | Accu. (%) |
|---|---|---|
|
| 4013 | 96.01 |
|
| 6954 | 97.73 |
|
| 1412 | 98.30 |
|
| 313 | 96.81 |