| Literature DB >> 26634213 |
Yu-An Huang1, Zhu-Hong You2, Xin Gao3, Leon Wong1, Lirong Wang4.
Abstract
Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset of Yeast, Human, and H. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we used Yeast PPIs samples as training set to predict PPIs of other five species datasets.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26634213 PMCID: PMC4641304 DOI: 10.1155/2015/902198
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
5-fold cross-validation results obtained by using proposed method on Yeast dataset.
| Testing set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 96.74 | 100.00 | 93.60 | 93.68 | 97.07 |
| 2 | 95.89 | 100.00 | 91.93 | 92.10 | 96.04 |
| 3 | 96.92 | 100.00 | 93.75 | 94.00 | 96.83 |
| 4 | 95.75 | 100.00 | 91.50 | 91.84 | 95.49 |
| 5 | 96.12 | 99.60 | 92.39 | 92.50 | 96.03 |
| Average |
|
|
|
|
|
5-fold cross-validation results obtained by using proposed method on Human dataset.
| Testing set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 96.20 | 99.73 | 92.53 | 92.66 | 96.85 |
| 2 | 96.32 | 99.72 | 92.48 | 92.85 | 95.52 |
| 3 | 96.32 | 99.06 | 93.32 | 92.89 | 96.59 |
| 4 | 96.45 | 99.72 | 92.73 | 93.08 | 96.60 |
| 5 | 96.20 | 99.72 | 92.12 | 92.61 | 96.78 |
| Average |
|
|
|
|
|
5-fold cross-validation results obtained by using proposed method on H. pylori dataset.
| Testing set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 86.08 | 84.95 | 85.87 | 75.99 | 89.42 |
| 2 | 84.71 | 84.62 | 85.47 | 74.08 | 88.08 |
| 3 | 88.83 | 89.17 | 87.59 | 80.13 | 91.11 |
| 4 | 87.29 | 87.02 | 87.02 | 77.79 | 90.15 |
| 5 | 86.82 | 89.29 | 86.21 | 76.97 | 90.48 |
| Average |
|
|
|
|
|
Figure 1ROC curves performed by proposed method on Yeast PPIs dataset.
Figure 3ROC curves performed by proposed method on Human PPIs dataset.
Figure 5ROC curves performed by proposed method on H. pylori PPIs dataset.
Comparison with support vector machine on three datasets.
| Dataset | Classifier | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
|
| WSRC |
|
|
|
|
|
| SVM | 84.97 ± 0.93 | 85.46 ± 1.21 | 84.30 ± 0.83 | 74.46 ± 1.29 | 92.35 ± 0.72 | |
|
| ||||||
|
| WSRC |
|
|
|
|
|
| SVM | 85.33 ± 1.29 | 86.92 ± 1.92 | 81.59 ± 2.40 | 74.81 ± 1.89 | 93.15 ± 1.11 | |
|
| ||||||
|
| WSRC |
|
|
|
|
|
| SVM | 80.67 ± 1.95 | 83.18 ± 9.85 | 79.89 ± 11.83 | 67.69 ± 3.33 | 90.39 ± 1.91 | |
Figure 2ROC curves performed by SVM-based method on Yeast PPIs dataset.
Figure 4ROC curves performed by SVM-based method on Human PPIs dataset.
Figure 6ROC curves performed by SVM-based method on H. pylori PPIs dataset.
Prediction results on five species based on our model.
| Species | Test pairs | Accuracy |
|---|---|---|
|
| 6954 | 66.08% |
|
| 4013 | 81.19% |
|
|
|
|
|
|
|
|
|
| 313 | 79.87% |
Performance comparison of different methods on the Yeast dataset.
| Model | Test set | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|---|
| Guos' work [ | ACC | 89.33 ± 2.67 | 88.87 ± 6.16 | 89.93 ± 3.68 | N/A |
| AC | 87.36 ± 1.38 | 87.82 ± 4.33 | 87.30 ± 4.68 | N/A | |
|
| |||||
| Zhous' work [ | SVM + LD | 88.56 ± 0.33 | 89.50 ± 0.60 | 87.37 ± 0.22 | 77.15 ± 0.68 |
|
| |||||
| Yangs' work [ | Cod1 | 75.08 ± 1.13 | 74.75 ± 1.23 | 75.81 ± 1.20 | N/A |
| Cod2 | 80.04 ± 1.06 | 82.17 ± 1.35 | 76.77 ± 0.69 | N/A | |
| Cod3 | 80.41 ± 0.47 | 81.86 ± 0.99 | 78.14 ± 0.90 | N/A | |
| Cod4 | 86.15 ± 1.17 | 90.24 ± 1.34 | 81.03 ± 1.74 | N/A | |
|
| |||||
| Wongs' work [ | RF + PR-LPQ | 93.92 ± 0.36 | 96.45 ± 0.45 | 91.10 ± 0.31 | 88.56 ± 0.63 |
|
| |||||
| Yous' work [ | PCA-EELM | 87.00 ± 0.29 | 87.59 ± 0.32 | 86.15 ± 0.43 | 77.36 ± 0.44 |
|
| |||||
| Proposed method |
|
|
|
|
|
Performance comparison of different methods on the H. pylori dataset.
| Model | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|
| Phylogenetic bootstrap [ | 75.80 | 80.20 | 69.80 | N/A |
| HKNN [ | 84.00 | 84.00 | 86.00 | N/A |
| Signature products [ | 83.40 | 85.70 | 79.90 | N/A |
| Ensemble of HKNN [ | 86.60 | 85.00 | 86.70 | N/A |
| Boosting [ | 79.52 | 81.69 | 80.37 | 70.64 |
| Ensemble ELM [ | 87.50 | 86.15 | 88.95 | 78.13 |
| Proposed method |
|
|
|
|
Performance comparison of different methods on the Human dataset.
| Model | Accu. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|
| LDA + RF [ | 96.4 | N/A | 94.2 | 92.8 |
| LDA + RoF [ | 95.7 | N/A | 97.6 | 91.8 |
| LDA + SVM [ | 90.7 | N/A | 89.7 | 81.3 |
| AC + RF [ | 95.5 | N/A | 94.0 | 91.4 |
| AC + RoF [ | 95.1 | N/A | 93.3 | 91.0 |
| AC + SVM [ | 89.3 | N/A | 94.0 | 79.2 |
| Proposed method |
|
|
|
|