| Literature DB >> 26712745 |
Leon Wong1, Zhu-Hong You2, Zhong Ming3, Jianqiang Li4, Xing Chen5, Yu-An Huang6.
Abstract
Protein-Protein Interactions (PPIs) play a vital role in most cellular processes. Although many efforts have been devoted to detecting protein interactions by high-throughput experiments, these methods are obviously expensive and tedious. Targeting these inevitable disadvantages, this study develops a novel computational method to predict PPIs using information on protein sequences, which is highly efficient and accurate. The improvement mainly comes from the use of the Rotation Forest (RF) classifier and the Local Phase Quantization (LPQ) descriptor from the Physicochemical Property Response (PR) Matrix of protein amino acids. When performed on three PPI datasets including Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori, we obtained good results of average accuracies of 93.8%, 97.96%, and 89.47%, which are much better than in previous studies. Extensive validations have also been explored to evaluate the performance of the Rotation Forest ensemble classifier with the state-of-the-art Support Vector Machine classifier. These promising results indicate that the proposed method might play a complementary role for future proteomics research.Entities:
Keywords: Local Phase Quantization; Physicochemical Property Response Matrix (PR); Rotation Forest; protein-protein interaction
Mesh:
Substances:
Year: 2015 PMID: 26712745 PMCID: PMC4730268 DOI: 10.3390/ijms17010021
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1(a) Overall prediction accuracy rate with increasing K of feature subsets; (b) Overall prediction accuracy rate with increasing L of decision trees.
The prediction results of the H. pylori dataset using the proposed method.
| Test Set | Sensitivity (%) | Precision (%) | Accuracy (%) | MCC | AUC |
|---|---|---|---|---|---|
| 1 | 90.57 | 91.81 | 91.08 | 0.8375 | 0.9158 |
| 2 | 90.48 | 88.96 | 89.54 | 0.8126 | 0.9048 |
| 3 | 87.15 | 90.61 | 89.19 | 0.8070 | 0.8896 |
| 4 | 89.04 | 89.66 | 89.37 | 0.8099 | 0.8823 |
| 5 | 88.65 | 87.11 | 88.16 | 0.7912 | 0.8842 |
| Average | 89.18 ± 1.42 | 89.63 ± 1.77 | 89.47 ± 1.05 | 0.81 ± 0.0167 | 0.90 ± 0.0145 |
The prediction results of the S. cerevisiae dataset using the proposed method.
| Test Set | Sensitivity (%) | Precision (%) | Accuracy (%) | MCC | AUC |
|---|---|---|---|---|---|
| 1 | 89.22 | 97.16 | 93.34 | 0.8752 | 0.9381 |
| 2 | 91.18 | 95.61 | 93.47 | 0.8779 | 0.9387 |
| 3 | 90.40 | 96.61 | 93.52 | 0.8786 | 0.9368 |
| 4 | 91.07 | 97.06 | 94.32 | 0.8924 | 0.9331 |
| 5 | 91.34 | 96.88 | 94.37 | 0.8933 | 0.9358 |
| Average | 90.64 ± 0.87 | 96.66 ± 0.62 | 93.80 ± 0.50 | 0.88 ± 0.009 | 0.94 ± 0.002 |
Figure 2Receiver Operating Characteristic (ROC) from proposed method result for H. pylori protein-protein interaction (PPI) dataset.
Figure 3ROC from proposed method result for S. cerevisiae PPI dataset.
The prediction results of the Human dataset using the proposed method compared with SVM.
| Model | Test Set | Sensitivity (%) | Precision (%) | Accuracy (%) | MCC | AUC |
|---|---|---|---|---|---|---|
| Rotation Forest | 1 | 97.68 | 97.93 | 97.91 | 0.9590 | 0.97.68 |
| 2 | 98.07 | 97.57 | 97.91 | 0.9591 | 0.97.93 | |
| 3 | 96.21 | 99.06 | 97.79 | 0.9566 | 0.97.65 | |
| 4 | 96.98 | 98.40 | 97.85 | 0.9578 | 0.97.79 | |
| 5 | 97.64 | 98.80 | 98.34 | 0.9673 | 0.98.53 | |
| Average | 97.32 ± 0.73 | 98.35 ± 0.61 | 97.96 ± 0.22 | 0.96 ± 0.004 | 0.98 ± 0.004 | |
| SVM | 1 | 87.52 | 93.59 | 90.92 | 0.8343 | 0.9055 |
| 2 | 86.28 | 92.07 | 89.88 | 0.8170 | 0.8959 | |
| 3 | 85.46 | 93.00 | 90.01 | 0.8185 | 0.8985 | |
| 4 | 85.62 | 93.05 | 90.44 | 0.8244 | 0.9047 | |
| 5 | 84.93 | 93.27 | 89.82 | 0.8156 | 0.8935 | |
| Average | 85.96 ± 0.99 | 93.00 ± 0.57 | 90.21 ± 0.46 | 0.82 ± 0.008 | 0.90 ± 0.005 |
Figure 4ROC from proposed method result for Human PPI dataset.
Figure 5ROC from SVM-based method result for Human PPI dataset.
Comparison of other methods on the S. cerevisiae dataset.
| Model | Test Set | Sensitivity (%) | Precision (%) | Accuracy (%) | MCC (%) |
|---|---|---|---|---|---|
| Zhou’s work | SVM + LD | 87.37 ± 0.22 | 89.50 ± 0.60 | 88.56 ± 0.33 | 77.15 ± 0.68 |
| Guo’s work | ACC | 89.93 ± 3.68 | 88.87 ± 6.16 | 89.33 ± 2.67 | |
| AC | 87.30 ± 0.22 | 87.82 ± 4.33 | 87.36 ± 1.38 | ||
| Yang’s work | Cod1 | 75.81 ± 1.20 | 74.75 ± 1.23 | 75.08 ± 1.13 | |
| Cod2 | 76.77 ± 0.69 | 82.17 ± 1.35 | 80.04 ± 1.06 | ||
| Cod3 | 78.14 ± 0.90 | 81.86 ± 0.99 | 80.41 ± 0.47 | ||
| Cod4 | 81.03 ± 1.74 | 90.24 ± 1.34 | 86.15 ± 1.17 | ||
| Proposed Method | Average | 90.64 ± 0.87 | 96.66 ± 0.62 | 93.80 ± 0.50 | 88.35 ± 0.87 |
N/A means none available.
Comparison of other methods on the H. pylori dataset.
| Model | Sensitivity (%) | Precision (%) | Accuracy (%) | MCC (%) |
|---|---|---|---|---|
| Phylogenetic bootstrap | 69.80 | 80.20 | 75.80 | |
| Boosting | 80.37 | 81.69 | 79.52 | 70.64 |
| Signature products | 79.90 | 85.70 | 83.40 | |
| HKNN | 86.00 | 84.00 | 84.00 | |
| Proposed Method | 89.18 | 89.63 | 89.47 | 81.16 |
N/A means none available.
The values of the hydrophobicity property for each amino acid.
| Amino Acids | A | R | N | D | C | Q | E | G | H | I |
|---|---|---|---|---|---|---|---|---|---|---|
| Values | 0.61 | 0.60 | 0.06 | 0.46 | 1.07 | 0 | 0.47 | 0.07 | 0.61 | 2.22 |
| Values | 1.53 | 1.15 | 1.18 | 2.02 | 1.95 | 0.05 | 0.05 | 2.65 | 1.88 | 1.32 |