| Literature DB >> 27571061 |
Zheng-Wei Li1, Zhu-Hong You2, Xing Chen3, Jie Gui4, Ru Nie5.
Abstract
Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research.Entities:
Keywords: discriminative vector machine; evolutionary information; physicochemical characteristics; protein interactions; protein sequence
Mesh:
Year: 2016 PMID: 27571061 PMCID: PMC5037676 DOI: 10.3390/ijms17091396
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Fivefold cross validation results using the proposed method on Yeast dataset.
| Testing Set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 93.52 | 93.07 | 98.34 | 88.94 |
| 2 | 94.76 | 92.41 | 96.56 | 87.57 |
| 3 | 93.83 | 93.64 | 95.68 | 87.61 |
| 4 | 94.43 | 93.52 | 96.67 | 90.02 |
| 5 | 95.21 | 92.19 | 95.33 | 91.19 |
| Average | 94.35 ± 0.68 | 92.97 ± 0.65 | 96.52 ± 1.17 | 89.07 ± 1.56 |
Average accuracy (Acc), sensitivity (Sen), precision (Pre), and Matthews’s correlation coefficient (MCC).
Fivefold cross validation results using the proposed method on H. Pylori dataset.
| Testing Set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 92.81 | 92.28 | 93.40 | 83.32 |
| 2 | 89.59 | 90.73 | 91.73 | 81.08 |
| 3 | 90.82 | 93.37 | 90.19 | 84.51 |
| 4 | 91.06 | 89.64 | 89.27 | 83.61 |
| 5 | 88.75 | 90.59 | 89.12 | 81.43 |
| Average | 90.61 ± 1.55 | 91.32 ± 1.48 | 90.74 ± 1.81 | 82.79 ± 1.47 |
Average accuracy (Acc), sensitivity (Sen), precision (Pre), and Matthews’s correlation coefficient (MCC).
Fivefold cross validation results on Yeast dataset between our method and support vector machine (SVM).
| Model | Testing Set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|---|
| SVM | 1 | 85.12 | 84.87 | 86.34 | 75.92 |
| 2 | 86.16 | 83.91 | 85.36 | 74.99 | |
| 3 | 87.96 | 85.64 | 86.61 | 77.58 | |
| 4 | 85.42 | 85.80 | 88.67 | 75.02 | |
| 5 | 84.21 | 86.70 | 85.33 | 74.76 | |
| Average | 85.77 ± 1.41 | 85.38 ± 1.05 | 86.46 ± 1.36 | 75.65 ± 1.16 | |
| DVM | 1 | 93.52 | 93.07 | 98.34 | 88.94 |
| 2 | 94.76 | 92.41 | 96.56 | 87.57 | |
| 3 | 93.83 | 93.64 | 95.68 | 87.61 | |
| 4 | 94.43 | 93.52 | 96.67 | 90.02 | |
| 5 | 95.21 | 92.19 | 95.33 | 91.19 | |
| Average | 94.35 ± 0.68 | 92.97 ± 0.65 | 96.52 ± 1.17 | 89.07 ± 1.56 |
Average accuracy (Acc), sensitivity (Sen), precision (Pre), and MCC.
Figure 1Comparison of receiver operating characteristic (ROC) curves between discriminative vector machine (DVM) and support vector machine (SVM) on Yeast dataset.
Practical predicting results of different methods on the Yeast dataset.
| Model | Testing Set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|---|
| Guo [ | ACC | 89.33 ± 2.67 | 89.93 ± 3.68 | 88.87 ± 6.16 | N/A |
| AC | 87.36 ± 1.38 | 87.30 ± 4.68 | 87.82 ± 4.33 | N/A | |
| Yang [ | Cod1 | 75.08 ± 1.13 | 75.81 ± 1.20 | 74.75 ± 1.23 | N/A |
| Cod2 | 80.04 ± 1.06 | 76.77 ± 0.69 | 82.17 ± 1.35 | N/A | |
| Cod3 | 80.41 ± 0.47 | 78.14 ± 0.90 | 81.66 ± 0.99 | N/A | |
| Cod4 | 86.15 ± 1.17 | 81.03 ± 1.74 | 90.24 ± 1.34 | N/A | |
| You [ | PCA-EELM | 87.00 ± 0.29 | 86.15 ± 0.43 | 87.59 ± 0.32 | 77.36 ± 0.44 |
| Wong [ | RF + PR-LPQ | 93.92 ± 0.36 | 91.10 ± 0.31 | 96.45 ± 0.45 | 88.56 ± 0.63 |
| Proposed Method | DVM | 94.35 ± 0.67 | 92.97 ± 0.51 | 96.52 ± 0.57 | 89.07 ± 1.30 |
N/A—Not applicable.
Practical predicting results of different methods on the H. Pylori dataset.
| Model | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| Nanni [ | 83.00 | 86.00 | 85.10 | N/A |
| Nanni [ | 84.00 | 86.00 | 84.00 | N/A |
| Nanni and Lumini [ | 86.60 | 86.70 | 85.00 | N/A |
| You [ | 87.50 | 88.95 | 86.15 | 78.13 |
| Martin [ | 83.40 | 79.90 | 85.70 | N/A |
| Wong [ | 89.47 | 89.18 | 89.63 | 81.00 |
| Proposed Method | 90.61 | 91.32 | 90.74 | 82.79 |
N/A—Not applicable.
Numerical indices of the four physicochemical characteristics for the 20 amino acids.
| Amino Acid Name | Hydrophobicity | Polarity | Polarizability | van der Waals Volume |
|---|---|---|---|---|
| Alanine | 0.61 | 8.1 | 0.046 | 1.00 |
| Arginine | 0.60 | 10.5 | 0.291 | 6.13 |
| Asparagine | 0.06 | 11.6 | 0.134 | 2.95 |
| Aspartic Acid | 0.46 | 13.0 | 0.105 | 2.78 |
| Cysteine | 1.07 | 5.5 | 0.128 | 2.43 |
| Glutamine | 0.0 | 10.5 | 0.180 | 3.95 |
| Glutamic Acid | 0.47 | 12.3 | 0.151 | 3.78 |
| Glycine | 0.07 | 9.0 | 0.000 | 0.00 |
| Histidine | 0.61 | 10.4 | 0.230 | 4.66 |
| Isoleucine | 2.22 | 5.2 | 0.186 | 4.00 |
| Leucine | 1.53 | 4.9 | 0.186 | 4.00 |
| Lysine | 1.15 | 11.3 | 0.219 | 4.77 |
| Methionine | 1.18 | 5.7 | 0.221 | 4.43 |
| Phenylalanine | 2.02 | 5.2 | 0.290 | 5.89 |
| Proline | 1.95 | 8.0 | 0.131 | 2.72 |
| Serine | 0.05 | 9.2 | 0.062 | 1.60 |
| Threonine | 0.05 | 8.6 | 0.108 | 2.60 |
| Tryptophan | 2.65 | 5.4 | 0.409 | 8.08 |
| Tyrosine | 1.88 | 6.2 | 0.298 | 6.47 |
| Valine | 1.32 | 5.9 | 0.140 | 3.00 |
Figure 2The flow chart of the proposed method.