| Literature DB >> 26000305 |
Zhu-Hong You1, Jianqiang Li1, Xin Gao2, Zhou He3, Lin Zhu4, Ying-Ke Lei4, Zhiwei Ji4.
Abstract
Proteins and their interactions lie at the heart of most underlying biological processes. Consequently, correct detection of protein-protein interactions (PPIs) is of fundamental importance to understand the molecular mechanisms in biological systems. Although the convenience brought by high-throughput experiment in technological advances makes it possible to detect a large amount of PPIs, the data generated through these methods is unreliable and may not be completely inclusive of all possible PPIs. Targeting at this problem, this study develops a novel computational approach to effectively detect the protein interactions. This approach is proposed based on a novel matrix-based representation of protein sequence combined with the algorithm of support vector machine (SVM), which fully considers the sequence order and dipeptide information of the protein primary sequence. When performed on yeast PPIs datasets, the proposed method can reach 90.06% prediction accuracy with 94.37% specificity at the sensitivity of 85.74%, indicating that this predictor is a useful tool to predict PPIs. Achieved results also demonstrate that our approach can be a helpful supplement for the interactions that have been detected experimentally.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26000305 PMCID: PMC4426769 DOI: 10.1155/2015/867516
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The schematic diagram for detecting protein-protein interactions by integrating experimental PPI data with SVM model.
The matrix-based representation for a protein amino acid sequence.
|
|
|
|
| ⋯ |
| |
|---|---|---|---|---|---|---|
| AA |
|
|
|
| ⋯ |
|
| AR |
|
|
|
| ⋯ |
|
| AN |
|
|
|
| ⋯ |
|
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| VV |
|
|
|
| ⋯ |
|
Comparing the prediction performance by the proposed method and some state-of-the-art works on the yeast dataset. Here, N/A means not available.
| Model | Test set | SN (%) | PPV (%) | ACC (%) | MCC (%) |
|---|---|---|---|---|---|
| Proposed method | SVM |
|
|
|
|
|
| |||||
| Guos' work | ACC | 89.93 ± 3.68 | 88.87 ± 6.16 | 89.33 ± 2.67 | N/A |
| AC | 87.30 ± 4.68 | 87.82 ± 4.33 | 87.36 ± 1.38 | N/A | |
|
| |||||
| Zhous' work | SVM + LD | 87.37 ± 0.22 | 89.50 ± 0.60 | 88.56 ± 0.33 | 77.15 ± 0.68 |
|
| |||||
| Yangs' work | Cod1 | 75.81 ± 1.20 | 74.75 ± 1.23 | 75.08 ± 1.13 | N/A |
| Cod2 | 76.77 ± 0.69 | 82.17 ± 1.35 | 80.04 ± 1.06 | N/A | |
| Cod3 | 78.14 ± 0.90 | 81.86 ± 0.99 | 80.41 ± 0.47 | N/A | |
| Cod4 | 81.03 ± 1.74 | 90.24 ± 1.34 | 86.15 ± 1.17 | N/A | |
Comparing the prediction performance by the proposed method and amino acid dipeptide composition method on the yeast dataset.
| Methods | Kernel | Mean/std. | Testing | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | SN | SP | PPV | NPV |
| MCC | AUC | |||
| The proposed method | Sigmoid | Mean | 0.8734 | 0.8379 | 0.9092 | 0.9032 | 0.8474 | 0.8693 | 0.7784 | 0.9385 |
| Variance | 0.0073 | 0.0093 | 0.0078 | 0.0087 | 0.0063 | 0.0088 | 0.0111 | 0.0071 | ||
| Gaussian | Mean |
|
|
|
|
|
|
|
| |
| Variance | 0.0064 | 0.0094 | 0.0095 | 0.0098 | 0.0048 | 0.0076 | 0.0103 | 0.0064 | ||
| Polynomial | Mean | 0.8963 | 0.8517 | 0.9408 | 0.9351 | 0.8639 | 0.8915 | 0.8134 | 0.9506 | |
| Variance | 0.0079 | 0.0072 | 0.0112 | 0.0118 | 0.0050 | 0.0085 | 0.0124 | 0.0061 | ||
| Linear | Mean | 0.8642 | 0.8267 | 0.9016 | 0.8938 | 0.8389 | 0.8589 | 0.7646 | 0.9238 | |
| Variance | 0.0048 | 0.0098 | 0.0114 | 0.0103 | 0.0073 | 0.0052 | 0.0068 | 0.0038 | ||
|
| ||||||||||
| AADC method | Sigmoid | Mean | 0.6776 | 0.6726 | 0.6825 | 0.6792 | 0.6760 | 0.6758 | 0.5630 | 0.7343 |
| Variance | 0.0088 | 0.0194 | 0.0098 | 0.0107 | 0.0136 | 0.0133 | 0.0062 | 0.0129 | ||
| Gaussian | Mean | 0.8654 | 0.8349 | 0.8959 | 0.8892 | 0.8443 | 0.8612 | 0.7666 | 0.9292 | |
| Variance | 0.0065 | 0.0104 | 0.0047 | 0.0041 | 0.0119 | 0.0058 | 0.0095 | 0.0087 | ||
| Polynomial | Mean | 0.8514 | 0.8196 | 0.8833 | 0.8754 | 0.8305 | 0.8465 | 0.7465 | 0.7540 | |
| Variance | 0.0063 | 0.0144 | 0.0078 | 0.0072 | 0.0110 | 0.0077 | 0.0090 | 0.3751 | ||
| Linear | Mean | 0.8409 | 0.8150 | 0.8668 | 0.8597 | 0.8240 | 0.8367 | 0.7320 | 0.9021 | |
| Variance | 0.0060 | 0.0050 | 0.0146 | 0.0128 | 0.0070 | 0.0049 | 0.0080 | 0.0030 | ||
Figure 2The ROC (receiver operator characteristic) curve illustrating the performance of different activation functions. The curve presents the true positive rate (sensitivity) against the false positive rate (1 − specificity).
Performance comparison of different methods on the H. pylori dataset. Here, N/A means not available.
| Methods | SN (%) | PE (%) | ACC (%) | MCC (%) |
|---|---|---|---|---|
| Phylogenetic bootstrap | 69.8 | 80.2 | 75.8 | N/A |
| HKNN | 86 | 84 | 84 | N/A |
| Signature products | 79.9 | 85.7 | 83.4 | N/A |
| Boosting | 80.37 | 81.69 | 79.52 | 70.64 |
| Proposed method |
|
|
|
|