| Literature DB >> 27213337 |
Ji-Yong An1, Zhu-Hong You2, Fan-Rong Meng3, Shu-Juan Xu4, Yin Wang5.
Abstract
Protein-Protein Interactions (PPIs) play essential roles in most cellular processes. Knowledge of PPIs is becoming increasingly more important, which has prompted the development of technologies that are capable of discovering large-scale PPIs. Although many high-throughput biological technologies have been proposed to detect PPIs, there are unavoidable shortcomings, including cost, time intensity, and inherently high false positive and false negative rates. For the sake of these reasons, in silico methods are attracting much attention due to their good performances in predicting PPIs. In this paper, we propose a novel computational method known as RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the AB feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We performed five-fold cross-validation experiments on yeast and Helicobacter pylori datasets, and achieved very high accuracies of 92.98% and 95.58% respectively, which is significantly better than previous works. In addition, we also obtained good prediction accuracies of 88.31%, 89.46%, 91.08%, 91.55%, and 94.81% on other five independent datasets C. elegans, M. musculus, H. sapiens, H. pylori, and E. coli for cross-species prediction. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-AB method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool. To facilitate extensive studies for future proteomics research, we developed a freely available web server called RVMAB-PPI in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/ppi_ab/.Entities:
Keywords: PSSM; average blocks; protein sequence; relevance vector machine
Mesh:
Substances:
Year: 2016 PMID: 27213337 PMCID: PMC4881578 DOI: 10.3390/ijms17050757
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Five-fold cross validation results shown using our proposed method on yeast dataset. Ac: Accuracy; Sn: Sensitivity; Pe: Precision; Mcc: Matthews’s correlation coefficient.
| Testing Set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 93.12 | 92.46 | 93.64 | 87.18 |
| 2 | 94.32 | 94.77 | 94.02 | 89.29 |
| 3 | 92.00 | 92.21 | 91.96 | 85.28 |
| 4 | 92.00 | 91.38 | 92.14 | 85.26 |
| 5 | 93.48 | 93.11 | 93.94 | 87.11 |
| Average | 92.98 ± 0.99 | 92.79 ± 1.2 | 93.14 ± 1.00 | 86.82 ± 1.66 |
Five-fold cross validation results shown using our proposed method on H. pylori dataset.
| Testing Set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 95.54 | 97.53 | 93.56 | 91.47 |
| 2 | 95.71 | 94.70 | 96.95 | 91.79 |
| 3 | 97.08 | 97.25 | 96.92 | 94.34 |
| 4 | 94.17 | 94.14 | 93.45 | 88.98 |
| 5 | 95.38 | 94.50 | 96.69 | 91.16 |
| Average | 95.58 ± 1.0 | 95.62 ± 1.6 | 95.51 ± 1.83 | 91.55 ± 1.91 |
Five-fold cross validation results shown using our proposed method on yeast dataset. SVM: Support Vector Machine; PSSM: Position Specific Scoring Matrix; AB: Average Blocks; RVM: Relevance Vector Machine.
| Testing Set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| SVM + PSSM + AB | ||||
| - | 84.49 | 84.65 | 84.27 | 73.79 |
| - | 87.22 | 88.04 | 86.81 | 77.69 |
| - | 84.09 | 84.41 | 84.11 | 73.23 |
| - | 85.47 | 85.05 | 85.12 | 75.15 |
| - | 86.16 | 85.47 | 86.63 | 76.15 |
| Average | 85.49 ± 1.26 | 85.52 ± 1.46 | 85.39 ± 1.28 | 75.20 ± 1.80 |
| RVM + PSSM + AB | ||||
| - | 93.12 | 92.46 | 93.77 | 87.18 |
| - | 94.32 | 94.77 | 93.86 | 89.29 |
| - | 92.00 | 92.21 | 91.79 | 85.28 |
| - | 92.00 | 91.38 | 92.59 | 85.26 |
| - | 93.48 | 93.11 | 93.86 | 87.11 |
| Average | 92.98 ± 0.99 | 92.79 ± 1.2 | 93.14 ± 1.00 | 86.82 ± 1.66 |
Figure 1Comparison of Receiver Operating Curve (ROC) curves performed between Relevance Vector Machine (RVM) and support vector machine (SVM) on yeast dataset.
Prediction performance on five species based on our model. PPV: Positive Predictive Value; NPV: Negative Predictive value; F1: F-Score.
| Testing Set | Ac (%) | Sn (%) | PPV (%) | NPV (%) | F1 (%) |
|---|---|---|---|---|---|
|
| 91.55 | 91.55 | 100% | 0 | 95.59 |
|
| 89.4 | 89.46 | 100% | 0 | 94.44 |
|
| 91.08 | 91.08 | 100% | 0 | 95.33 |
|
| 94.81 | 94.81 | 100% | 0 | 97.34 |
|
| 88.31 | 88.31 | 100% | 0 | 93.79 |
Predicting ability of different methods on the yeast dataset. ACC: Auto Covariance; LD: Local Description; PCA: Principal Component Analysis; EELM: Ensemble Extreme Learning Machines; N/A: No Available.
| Model | Testing Set | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|---|
| Guos’ work [ | ACC | 89.33 ± 2.67 | 89.93 ± 3.60 | 88.77 ± 6.16 | N/A |
| AC | 87.36 ± 1.38 | 87.30 ± 4.68 | 87.82 ± 4.33 | N/A | |
| Zhous’ work [ | SVM + LD | 88.56 ± 0.33 | 87.37 ± 0.22 | 89.50 ± 0.60 | 77.15 ± 0.68 |
| Yangs’ work [ | Cod1 | 75.08 ± 1.13 | 75.81 ± 1.20 | 74.75 ± 1.23 | N/A |
| Cod2 | 80.04 ± 1.06 | 76.77 ± 0.69 | 82.17 ± 1.35 | N/A | |
| Cod3 | 80.41 ± 0.47 | 78.14 ± 0.90 | 81.66 ± 0.99 | N/A | |
| Cod4 | 86.15 ± 1.17 | 81.03 ± 1.74 | 90.24 ± 1.34 | N/A | |
| Yous’ work [ | PCA-EELM | 87.00 ± 0.29 | 86.15 ± 0.43 | 87.59 ± 0.32 | 77.36 ± 0.44 |
| Proposed method | RVM | 92.98 ± 0.99 | 92.79 ± 1.2 | 93.14 ± 1.00 | 86.82 ± 1.66 |
Predicting ability of different methods on the H. pylori dataset.
| Model | Ac (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| Nanni [ | 83 | 86 | 85.1 | N/A |
| Nanni [ | 84 | 86 | 84 | N/A |
| Nanni and Lumini [ | 86.6 | 86.7 | 85 | N/A |
| Z-H You [ | 87.5 | 88.95 | 86.15 | 78.13 |
| L Nanni [ | 84 | 84 | 86 | N/A |
| Proposed method | 95.58 | 95.62 | 95.51 | 91.55 |
Five-fold cross validation results shown using our proposed method on yeast dataset.
| Dataset | Testing Set | Ac (%) |
|---|---|---|
| The Original Dataset | 1 | 87.66 |
| 2 | 88.16 | |
| 3 | 87.17 | |
| 4 | 87.84 | |
| 5 | 85.92 | |
| The Dataset Processed by Using PCA | 1 | 93.12 |
| 2 | 94.32 | |
| 3 | 92.00 | |
| 4 | 92.00 | |
| 5 | 93.48 |
Figure 2The flow chart of the proposed method.