| Literature DB >> 27732957 |
Ji-Yong An1, Zhu-Hong You2, Xing Chen3, De-Shuang Huang4, Zheng-Wei Li1, Gang Liu5, Yin Wang1.
Abstract
Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at http://219.219.62.123:8888/RVMBIGP.Entities:
Keywords: cancer; disease; position-specific scoring matrix; protein self-interaction
Mesh:
Substances:
Year: 2016 PMID: 27732957 PMCID: PMC5347703 DOI: 10.18632/oncotarget.12517
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Prediction performance of proposed method on yeast dataset by five tests
| Testing set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 94.79 | 69.39 | 79.31 | 70.37 |
| 2 | 95.66 | 74.17 | 86.41 | 78.53 |
| 3 | 95.37 | 68.00 | 91.40 | 77.13 |
| 4 | 95.75 | 72.73 | 88.89 | 78.86 |
| 5 | 95.85 | 80.00 | 84.75 | 80.81 |
Prediction performance of proposed method on human dataset by five tests
| Testing set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 98.90 | 89.86 | 95.12 | 91.94 |
| 2 | 98.93 | 92.77 | 93.97 | 92.86 |
| 3 | 98.83 | 91.80 | 94.12 | 92.40 |
| 4 | 98.45 | 87.92 | 94.72 | 90.54 |
| 5 | 98.90 | 89.87 | 96.38 | 92.54 |
Comparison of the prediction performance by the RVM and SVM classifier based on BIGP on the yeast dataset
| Testing set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| RVM+PSSM+BIGP | ||||
| 1 | 94.79 | 69.39 | 79.31 | 70.37 |
| 2 | 95.66 | 74.17 | 86.41 | 78.53 |
| 3 | 95.37 | 68.00 | 91.40 | 77.13 |
| 4 | 95.75 | 72.73 | 88.89 | 78.86 |
| 5 | 95.85 | 80.00 | 84.75 | 80.81 |
| Average | 95.48 ± 0.42 | 72.86 ± 4.70 | 85.07 ± 6.73 | 77.14 ± 4.01 |
| SVM+PSSM+BIGP | ||||
| 1 | 92.86 | 29.59 | 85.29 | 49.87 |
| 2 | 90.93 | 22.50 | 96.43 | 44.52 |
| 3 | 89.77 | 20.00 | 80.65 | 38.99 |
| 4 | 91.31 | 25.62 | 100.0 | 48.30 |
| 5 | 91.89 | 35.20 | 93.62 | 55.25 |
| Average | 91.35 ± 1.14 | 26.58 ± 6.00 | 91.20 ± 8.01 | 47.21 ± 5.99 |
Comparison of the prediction performance by the RVM and SVM classifier based on BIGP on the human dataset
| Testing set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| RVM+PSSM+BIGP | ||||
| 1 | 98.90 | 89.86 | 95.12 | 91.94 |
| 2 | 98.93 | 92.77 | 93.97 | 92.86 |
| 3 | 98.83 | 92.80 | 94.12 | 92.40 |
| 4 | 98.45 | 87.92 | 94.72 | 90.54 |
| 5 | 98.90 | 89.87 | 96.38 | 92.54 |
| Average | 98.80 ± 0.20 | 90.44 ± 1.89 | 94.86 ± 0.97 | 92.06 ± 0.91 |
| SVM+PSSM+BIGP | ||||
| 1 | 95.44 | 39.63 | 98.85 | 61.14 |
| 2 | 95.58 | 46.81 | 97.35 | 66.02 |
| 3 | 95.41 | 48.36 | 94.40 | 66.11 |
| 4 | 94.85 | 44.53 | 98.33 | 64.43 |
| 5 | 95.51 | 50.21 | 90.84 | 66.23 |
| Average | 95.35 ± 0.30 | 45.91 ± 4.08 | 95.95 ± 3.33 | 64.79 ± 2.17 |
Figure 1Flowchart of the proposed featureextraction method based on PSI-BLAST-constructed position specificscoring matrix
Figure 2Performance comparisons betweenRVM and SVM on yeast dataset
Performance comparison of the RVMBIGP and the other methods on yeast dataset
| Model | Ac (%) | Sp (%) | Sn (%) | Mcc (%) |
|---|---|---|---|---|
| SLIPPER [ | 71.90 | 72.18 | 69.72 | 28.42 |
| DXECPPI [ | 87.46 | 94.93 | 29.44 | 28.25 |
| PPIevo [ | 66.28 | 87.46 | 60.14 | 18.01 |
| LocFuse [ | 66.66 | 68.10 | 55.49 | 15.77 |
| CRS [ | 72.69 | 74.37 | 59.58 | 23.68 |
| SPAR [ | 76.96 | 80.02 | 53.24 | 24.84 |
Performance comparison of the RVMBIGP and the other methods on human dataset
| Model | Ac (%) | Sp (%) | Sn (%) | Mcc (%) |
|---|---|---|---|---|
| SLIPPER [ | 91.10 | 95.06 | 47.26 | 41.97 |
| DXECPPI [ | 30.90 | 25.83 | 87.08 | 8.25 |
| PPIevo [ | 78.04 | 25.82 | 87.83 | 20.82 |
| LocFuse [ | 80.66 | 80.50 | 50.83 | 20.26 |
| CRS [ | 91.54 | 96.72 | 34.17 | 36.33 |
| SPAR [ | 92.09 | 97.40 | 33.33 | 38.36 |
Figure 3Performance comparisons betweenRVM and SVM on human dataset