| Literature DB >> 24566145 |
Shao-Wu Zhang1, Li-Yang Hao2, Ting-He Zhang3.
Abstract
Protein-protein interactions (PPIs) play a key role in many cellular processes. Unfortunately, the experimental methods currently used to identify PPIs are both time-consuming and expensive. These obstacles could be overcome by developing computational approaches to predict PPIs. Here, we report two methods of amino acids feature extraction: (i) distance frequency with PCA reducing the dimension (DFPCA) and (ii) amino acid index distribution (AAID) representing the protein sequences. In order to obtain the most robust and reliable results for PPI prediction, pairwise kernel function and support vector machines (SVM) were employed to avoid the concatenation order of two feature vectors generated with two proteins. The highest prediction accuracies of AAID and DFPCA were 94% and 93.96%, respectively, using the 10 CV test, and the results of pairwise radial basis kernel function are considerably improved over those based on radial basis kernel function. Overall, the PPI prediction tool, termed PPI-PKSVM, which is freely available at http://159.226.118.31/PPI/index.html, promises to become useful in such areas as bio-analysis and drug development.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24566145 PMCID: PMC3958907 DOI: 10.3390/ijms15023220
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Results of DFPCA and AAID with PRBF SVM in 10 CV test.
| Feature Set | ||||
|---|---|---|---|---|
| Hf | 95.94 ± 1.92 | 91.98 ± 2.88 | 93.78 ± 1.44 | 0.8765 |
| Vf | 95.66 ± 2.75 | 92.52 ± 2.40 | 93.96 ± 1.86 | 0.8798 |
| Pf | 95.78 ± 2.23 | 92.07 ± 1.69 | 93.76 ± 1.93 | 0.8760 |
| Zf | 96.06 ± 1.24 | 91.71 ± 3.13 | 93.69 ± 1.86 | 0.8747 |
| LEWP710101 | 95.86 ± 2.23 | 92.08 ± 4.32 | 93.80 ± 2.42 | 0.8768 |
| QIAN880138 | 96.06 ± 2.83 | 92.27 ± 1.50 | 94.00 ± 1.22 | 0.8808 |
| NADH010104 | 95.82 ± 2.98 | 92.04 ± 2.51 | 93.76 ± 1.66 | 0.8760 |
| NAGK730103 | 96.06 ± 2.83 | 92.09 ± 4.02 | 93.90 ± 3.31 | 0.8789 |
| AURR980116 | 95.94 ± 2.07 | 92.33 ± 1.42 | 93.98 ± 1.24 | 0.8804 |
Results of RBF and PRBF with DFPCA in the 10 CV test.
| Feature Set | Kernel Function | |||
|---|---|---|---|---|
| Hf | RBF | 89.96 ± 0.52 | 89.65 ± 2.17 | 89.88 ± 1.05 |
| PRBF | 95.94 ± 1.92 | 91.98 ± 2.88 | 93.78 ± 1.44 | |
| Vf | RBF | 90.20 ± 1.31 | 89.33 ± 2.60 | 89.72 ± 1.72 |
| PRBF | 95.66 ± 2.75 | 92.52 ± 2.40 | 93.96 ± 1.86 | |
| Pf | RBF | 89.32 ± 0.86 | 89.26 ± 2.91 | 89.28 ± 1.44 |
| PRBF | 95.78 ± 2.23 | 92.07 ± 1.69 | 93.76 ± 1.93 | |
| Zf | RBF | 90.84 ± 1.85 | 88.79 ± 2.50 | 89.64 ± 1.18 |
| PRBF | 96.06 ± 1.24 | 91.71 ± 3.13 | 93.69 ± 1.86 |
Results of DF and DFPCA with PRBF SVM in the 10 CV test.
| Feature Set | Feature Extraction Approach | ||||
|---|---|---|---|---|---|
| Hf | DF | 97.37 ± 2.55 | 66.67 ± 27.8 | 74.34 ± 24.3 | 0.5485 |
| DFPCA | 95.94 ± 1.92 | 91.98 ± 2.88 | 93.78 ± 1.44 | 0.8765 | |
| Vf | DF | 97.21 ± 2.39 | 71.40 ± 23.0 | 78.17 ± 27.1 | 0.6093 |
| DFPCA | 95.66 ± 2.75 | 92.52 ± 2.40 | 93.96 ± 1.86 | 0.8798 | |
| Pf | DF | 97.13 ± 4.70 | 69.48 ± 25.5 | 77.23 ± 27.2 | 0.5937 |
| DFPCA | 95.78 ± 2.23 | 92.07 ± 1.69 | 93.76 ± 1.93 | 0.8760 | |
| Zf | DF | 97.65 ± 4.82 | 62.29 ± 29.5 | 69.26 ± 23.6 | 0.4680 |
| DFPCA | 96.06 ± 1.24 | 91.71 ± 3.13 | 93.69 ± 1.86 | 0.8747 |
Effect of random sampling of the noninteracting protein subchain pairs on the performance of PPI-PKSVM with DFPCA and PRBF SVM in the 10CV test.
| Sampling Time | ||||
|---|---|---|---|---|
| 1 | 95.38 ± 3.35 | 91.20 ± 3.37 | 93.09 ± 3.45 | 0.8627 |
| 2 | 95.42 ± 1.39 | 91.52 ± 3.24 | 93.29 ± 1.65 | 0.8665 |
| 3 | 95.46 ± 3.03 | 91.21 ± 1.63 | 93.13 ± 2.29 | 0.8635 |
| 4 | 95.46 ± 3.03 | 91.49 ± 1.70 | 93.29 ± 2.13 | 0.8666 |
| 5 | 95.94 ± 1.92 | 91.98 ± 2.88 | 93.78 ± 1.44 | 0.8765 |
Performance comparison of different PPI methods using Shen’s dataset a in the 10 CV test.
| Method | ||||
|---|---|---|---|---|
| LEWP710101 | 97.3 ± 0.04 | 99.2 ± 0.04 | 98.3 ± 0.00 | 0.966 ± 0.0006 |
| QIAN880138 | 97.3 ± 0.10 | 99.1 ± 0.10 | 98.3 ± 0.10 | 0.966 ± 0.002 |
| NADH010104 | 97.2 ± 0.07 | 99.2 ± 0.04 | 98.3 ± 0.05 | 0.965 ± 0.0007 |
| NAGK730103 | 97.2 ± 0.06 | 99.2 ± 0.04 | 98.2 ± 0.06 | 0.965 ± 0.0004 |
| AURR980116 | 97.3 ± 0.04 | 99.1 ± 0.06 | 98.2 ± 0.06 | 0.965 ± 0.0006 |
| Hf-DFPCA | 97.6 ± 0.20 | 99.1 ± 0.10 | 98.4 ± 0.10 | 0.967 ± 0.002 |
| Vf-DFPCA | 97.5 ± 0.10 | 98.9 ± 1.00 | 98.3 ± 0.80 | 0.965 ± 0.007 |
| Pf-DFPCA | 96.9 ± 0.10 | 99.5 ± 0.60 | 98.2 ± 0.60 | 0.964 ± 0.004 |
| Zf-DFPCA | 97.9 ± 0.90 | 96.0 ± 0.20 | 96.9 ± 1.10 | 0.939 ± 0.002 |
| LDA-RF | 94.2 ± 0.40 | 98.0 ± 0.30 | 96.4 ± 0.30 | 0.928 ± 0.006 |
| LDA-RoF | 93.7± 0.50 | 97.6 ± 0.60 | 95.7 ± 0.40 | 0.918 ± 0.007 |
| LDA-SVM | 89.7 ± 1.30 | 91.5 ± 1.10 | 90.7 ± 0.90 | 0.813 ± 0.018 |
| AC-RF | 94.0 ± 0.60 | 96.6 ± 0.40 | 95.5 ± 0.30 | 0.914 ± 0.007 |
| AC-RoF | 93.3 ± 0.70 | 97.1 ± 0.70 | 95.1 ± 0.60 | 0.910 ± 0.009 |
| AC-SVM | 94.0 ± 0.60 | 84.9 ± 1.70 | 89.3 ± 0.80 | 0.792 ± 0.014 |
| PseAAC-RF | 94.1 ± 0.90 | 96.9 ± 0.30 | 95.6 ± 0.40 | 0.912 ± 0.007 |
| PseAAC-RoF | 93.6 ± 0.90 | 96.7 ± 0.40 | 95.3 ± 0.50 | 0.907 ± 0.009 |
| PseAAC-SVM | 89.9 ± 0.70 | 92.0 ± 0.40 | 91.2 ± 0.4 | 0.821 ± 0.006 |
Shen’s dataset contains two subdatasets, C and D, which are available at http://www.csbio.sjtu.edu.cn/bioinf/LR_PPI/Data.htm;
These results are taken from Table 4 of the literature [25].
Amino acid groups classified according to their physicochemical value.
| Physicochemical property | Group 1 | Group 2 | Group 3 |
|---|---|---|---|
| Hydrophobicity | |||
| van der Waals volume | |||
| Polarity | |||
| Polarizability |