| Literature DB >> 28029645 |
Lei Wang1,2, Zhu-Hong You3, Xing Chen4, Jian-Qiang Li5, Xin Yan6, Wei Zhang2, Yu-An Huang5.
Abstract
Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.Entities:
Keywords: cancer; disease; multiple sequences alignments; position-specific scoring matrix
Mesh:
Substances:
Year: 2017 PMID: 28029645 PMCID: PMC5354898 DOI: 10.18632/oncotarget.14103
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Accuracy surface obtained of rotation forest for optimizing regularization parameters K and L
5-fold cross-validation results obtained by using proposed method on Yeast data set
| Testing set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|
| 98.17 | 100.00 | 96.32 | 96.40 | |
| 98.30 | 100.00 | 96.69 | 96.66 | |
| 98.17 | 100.00 | 96.37 | 96.40 | |
| 98.30 | 99.62 | 96.88 | 96.65 | |
| 98.97 | 100.00 | 97.93 | 97.97 | |
Figure 2ROC curves performed by proposed method on Yeast data set
5-fold cross-validation results obtained by using proposed method on H. pylori data set
| Testing set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|
| 92.45 | 93.44 | 92.23 | 86.00 | |
| 88.16 | 86.93 | 88.49 | 79.10 | |
| 90.05 | 92.06 | 87.63 | 82.06 | |
| 89.37 | 90.56 | 88.10 | 80.99 | |
| 88.70 | 87.93 | 89.16 | 79.95 | |
Figure 3ROC curves performed by the proposed method on H. pylori data set
Performance comparison of different models on Yeast data set
| Model | Test set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|---|
| Guos’ work [ | ACC | 89.33±2.67 | 88.87±6.16 | 89.93±3.68 | N/A |
| AC | 87.36±1.38 | 87.82±4.33 | 87.30±4.68 | N/A | |
| Zhous’ work [ | SVM + LD | 88.56±0.33 | 89.50±0.60 | 87.37±0.22 | 77.15±0.68 |
| Yangs’ work [ | Cod1 | 75.08±1.13 | 74.75±1.23 | 75.81±1.20 | N/A |
| Cod2 | 80.04±1.06 | 82.17±1.35 | 76.77±0.69 | N/A | |
| Cod3 | 80.41±0.47 | 81.86±0.99 | 78.14±0.90 | N/A | |
| Cod4 | 86.15±1.17 | 90.24±0.45 | 81.03±1.74 | N/A | |
| Yous’ work [ | PCA-EELM | 87.00±0.29 | 87.59±0.32 | 86.15±0.43 | 77.36±0.44 |
Performance comparison of different models on H. pylori data set
| Model | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|
| Phylogentic bootstrap [ | 75.80 | 80.20 | 69.80 | N/A |
| HKNN [ | 84.00 | 84.00 | 86.00 | N/A |
| Signature products [ | 83.40 | 85.70 | 79.90 | N/A |
| Ensemble of HKNN [ | 86.60 | 85.00 | 86.70 | N/A |
| Boosting [ | 79.52 | 81.69 | 80.37 | 70.64 |
| Ensemble ELM [ | 87.50 | 86.15 | 88.95 | 78.13 |
| Our method |
Prediction results on four species based on our model
| Species | Test pairs | Accu.(%) |
|---|---|---|
| 4013 | 98.50 | |
| 6954 | 91.00 | |
| 1412 | 97.45 | |
| 313 | 98.08 |
The newly confirmed PPIs with high possibility in the Yeast data set
| Protein ID | Protein ID | The probability of protein-protein interactions | Evidence |
|---|---|---|---|
| DIP:1113N | DIP:655N | 0.9917 | DIP |
| sw:P29295 | sw:P20604 | 0.9912 | MINT |
| sw:P47054 | sw:P49687 | 0.9908 | IntAct |
| DIP:1040N | DIP:2463N | 0.9891 | DIP |
| sw:P04050 | sw:P16370 | 0.9869 | MINT |
| DIP:2808N | DIP:6282N | 0.9854 | DIP |
| DIP:1408N | DIP:6416N | 0.9848 | DIP |
| DIP:1558N | DIP:2370N | 0.9846 | DIP |
| DIP:5037N | DIP:799N | 0.9840 | DIP |
| sw:Q12176 | sw:Q03532 | 0.9839 | MINT, IntAct |
| DIP:1364N | DIP:2483N | 0.9836 | DIP |
| DIP:1726N | DIP:834N | 0.9833 | DIP |
| DIP:2417N | DIP:5630N | 0.9831 | DIP |
| sw:P18888 | sw:P32591 | 0.9826 | MINT, IntAct |
| sw:Q04067 | sw:P40217 | 0.9812 | MINT, IntAct |
Figure 4The schematic diagram of the prediction model