| Literature DB >> 29617272 |
Tao Wang1, Liping Li2, Yu-An Huang3, Hui Zhang4, Yahong Ma5, Xing Zhou6.
Abstract
Protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of cells; thus, detecting PPIs is one of the most important issues in current molecular biology. Although much effort has been devoted to using high-throughput techniques to identify protein-protein interactions, the experimental methods are both time-consuming and costly. In addition, they yield high rates of false positive and false negative results. In addition, most of the proposed computational methods are limited in information about protein homology or the interaction marks of the protein partners. In this paper, we report a computational method only using the information from protein sequences. The main improvements come from novel protein sequence representation by combing the continuous and discrete wavelet transforms and from adopting weighted sparse representation-based classifier (WSRC). The proposed method was used to predict PPIs from three different datasets: yeast, human and H. pylori. In addition, we employed the prediction model trained on the PPIs dataset of yeast to predict the PPIs of six datasets of other species. To further evaluate the performance of the prediction model, we compared WSRC with the state-of-the-art support vector machine classifier. When predicting PPIs of yeast, humans and H. pylori dataset, we obtained high average prediction accuracies of 97.38%, 98.92% and 93.93% respectively. In the cross-species experiments, most of the prediction accuracies are over 94%. These promising results show that the proposed method is indeed capable of obtaining higher performance in PPIs detection.Entities:
Keywords: protein sequence; protein-protein interaction; weighted sparse representation
Mesh:
Substances:
Year: 2018 PMID: 29617272 PMCID: PMC6017726 DOI: 10.3390/molecules23040823
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Five-fold cross validation result obtained in predicting the yeast PPIs dataset.
| Test Set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 97.32 | 100.00 | 94.70 | 94.77 | 97.69 |
| 2 | 97.63 | 100.00 | 95.24 | 95.37 | 97.67 |
| 3 | 97.05 | 100.00 | 94.08 | 94.26 | 97.05 |
| 4 | 97.76 | 100.00 | 95.63 | 95.63 | 97.55 |
| 5 | 97.14 | 100.00 | 94.13 | 94.43 | 97.41 |
| Average | 97.38 ± 0.31 | 100.00 ± 0.00 | 94.76 ± 0.68 | 94.89 ± 0.59 | 97.48 ± 0.26 |
Five-fold cross validation result obtained in predicting the H. pylori PPIs dataset.
| Test Set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 92.28 | 96.72 | 88.04 | 85.72 | 92.72 |
| 2 | 93.83 | 95.94 | 91.23 | 88.38 | 93.74 |
| 3 | 94.68 | 95.99 | 92.93 | 89.91 | 94.20 |
| 4 | 95.20 | 97.70 | 93.40 | 90.81 | 95.36 |
| 5 | 93.66 | 95.70 | 90.41 | 88.01 | 94.99 |
| Average | 93.93 ± 1.11 | 96.41 ± 0.81 | 91.20 ± 2.15 | 88.57 ± 1.95 | 94.20 ± 1.05 |
Figure 1ROC cures yielded by five-fold cross validation: (a) ROC from proposed method result for yeast protein-protein interactions (PPIs) dataset; (b) ROC from the proposed method result for H. pylori PPIs dataset; (c) ROC from proposed method result for Human PPIs dataset; (d) comparison of ROCs between the weighted sparse representation-based classifier (WSRC) method and the support vector machine (SVM) method on the first fold of Human PPIs dataset.
Five-fold cross validation result obtained in predicting the human PPIs dataset.
| Classification Model | Testing Set | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|---|
| Proposed Method | 1 | 99.19 | 99.86 | 98.38 | 98.39 | 99.41 |
| 2 | 98.88 | 100.00 | 97.60 | 97.78 | 99.03 | |
| 3 | 98.70 | 99.87 | 97.47 | 97.43 | 98.78 | |
| 4 | 98.64 | 100.00 | 97.07 | 97.29 | 98.41 | |
| 5 | 99.20 | 100.00 | 98.33 | 98.40 | 99.03 | |
| Average | 98.92 ± 0.27 | 99.95 ± 0.07 | 97.77 ± 0.57 | 97.86 ± 0.52 | 98.93 ± 0.37 | |
| Combined Wavelet Feature with SVM | 1 | 91.39 | 96.15 | 84.57 | 83.91 | 94.71 |
| 2 | 90.77 | 97.17 | 82.53 | 82.83 | 94.77 | |
| 3 | 88.85 | 96.08 | 80.53 | 79.87 | 93.65 | |
| 4 | 88.79 | 94.53 | 80.56 | 79.62 | 91.50 | |
| 5 | 90.85 | 96.75 | 83.85 | 83.14 | 95.33 | |
| Average | 90.13 ± 1.22 | 96.14 ± 1.00 | 82.41 ± 1.85 | 81.87 ± 1.99 | 93.99 ± 1.52 |
Performance comparison of different methods on the yeast dataset. (N/A means Not applicable).
| Method | Approach | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) |
|---|---|---|---|---|---|
| Guos’ work [ | ACC | 89.33 ± 2.67 | 88.87 ± 6.16 | 89.93 ± 3.68 | N/A |
| AC (Auto Covariance) | 87.36 ± 1.38 | 87.82 ± 4.33 | 87.30 ± 4.68 | N/A | |
| Zhous’ work [ | SVM + LD | 88.56 ± 0.33 | 89.50 ± 0.60 | 87.37 ± 0.22 | 77.15 ± 0.68 |
| Yangs’ work [ | Cod1 | 75.08 ± 1.13 | 74.75 ± 1.23 | 75.81 ± 1.20 | N/A |
| Cod2 | 80.04 ± 1.06 | 82.17 ± 1.35 | 76.77 ± 0.69 | N/A | |
| Cod3 | 80.41 ± 0.47 | 81.86 ± 0.99 | 78.14 ± 0.90 | N/A | |
| Cod4 | 86.15 ± 1.17 | 90.24 ± 1.34 | 81.03 ± 1.74 | N/A | |
| Huangs’ work [ | CW + PseAAC | 92.05 ± 0.59 | 95.87 ± 0.89 | 88.82 ± 0.98 | 86.09 ± 1.02 |
| Our work | WSRC + AM [ | 96.03 ± 0.55 | 100.00 ± 0.00 | 92.07 ± 1.03 | 92.36 ± 1.01 |
| WSRC + BGR [ | 96.14 ± 0.43 | 100.00 ± 0.00 | 92.29 ± 0.77 | 92.55 ± 0.80 | |
| WSRC + LBP − HF [ | 96.60 ± 0.31 | 100.00 ± 0.00 | 93.20 ± 0.69 | 93.42 ± 0.58 | |
| WSRC+ LPQ [ | 96.25 ± 0.17 | 100.00 ± 0.00 | 92.51 ± 0.45 | 92.77 ± 0.33 | |
| WSRC + CW&DW | 97.38 ± 0.31 | 100.00 ± 0.00 | 94.76 ± 0.68 | 94.89 ± 0.59 |
Performance comparison of different methods on the H. pylori dataset. (N/A means Not applicable).
| Method | Accuracy (%) | Precision (%) | Sensitivity (%) | MCC (%) |
|---|---|---|---|---|
| Phylogenetic Booststrap [ | 75.80 | 80.20 | 69.80 | N/A |
| HKNN [ | 84.00 | 84.00 | 86.00 | N/A |
| Signature Products [ | 83.40 | 85.70 | 79.90 | N/A |
| Ensemble of HKNN [ | 86.60 | 85.00 | 86.70 | N/A |
| Boosting [ | 79.52 | 81.69 | 80.37 | 70.64 |
| Proposed Method | 93.93 | 96.41 | 91.20 | 88.57 |
Prediction results for six species based on our model.
| Species | Test Pairs | Accuracy |
|---|---|---|
|
| 21774 | 97.36% |
|
| 6897 | 86.56% |
|
| 4013 | 96.64% |
|
| 1406 | 94.24% |
|
| 1420 | 95.07% |
|
| 312 | 94.23% |