| Literature DB >> 29596363 |
Li-Ping Li1, Yan-Bin Wang2, Zhu-Hong You3, Yang Li4, Ji-Yong An5.
Abstract
Protein-protein interactions (PPI) are key to protein functions and regulations within the cell cycle, DNA replication, and cellular signaling. Therefore, detecting whether a pair of proteins interact is of great importance for the study of molecular biology. As researchers have become aware of the importance of computational methods in predicting PPIs, many techniques have been developed for performing this task computationally. However, there are few technologies that really meet the needs of their users. In this paper, we develop a novel and efficient sequence-based method for predicting PPIs. The evolutionary features are extracted from the position-specific scoring matrix (PSSM) of protein. The features are then fed into a robust relevance vector machine (RVM) classifier to distinguish between the interacting and non-interacting protein pairs. In order to verify the performance of our method, five-fold cross-validation tests are performed on the Saccharomyces cerevisiae dataset. A high accuracy of 94.56%, with 94.79% sensitivity at 94.36% precision, was obtained. The experimental results illustrated that the proposed approach can extract the most significant features from each protein sequence and can be a bright and meaningful tool for the research of proteomics.Entities:
Keywords: evolutionary information; low rank; protein sequence; protein–protein interactions (PPI); relevance vector machine (RVM)
Mesh:
Substances:
Year: 2018 PMID: 29596363 PMCID: PMC5979371 DOI: 10.3390/ijms19041029
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Five-fold cross-validation results shown using our proposed method on the Yeast dataset.
| Model | Testing Set | Accuracy | Sensitivity | Specificity | PPV | NPV | MCC |
|---|---|---|---|---|---|---|---|
| PSSM+ LR+RVM | 1 | 94.7% | 95.4% | 94.0% | 93.9% | 95.46% | 89.3% |
| 2 | 95.3% | 96.1% | 94.5% | 94.7% | 95.96% | 91.1% | |
| 3 | 93.9% | 93.9% | 93.8% | 93.8% | 93.91% | 88.5% | |
| 4 | 93.8% | 93.6% | 94.1% | 94.4% | 93.22% | 88.4% | |
| 5 | 95.1% | 94.9% | 95.2% | 94.9% | 95.2% | 90.6% | |
| Average | |||||||
| PSSM+ LR+SVM | 1 | 88.3% | 87.3% | 89.3% | 88.8% | 87.8% | 79.4% |
| 2 | 89.3% | 89.4% | 89.1% | 89.2% | 89.3% | 80.8% | |
| 3 | 89.8% | 89.2% | 90.3% | 90.7% | 88.8% | 81.6% | |
| 4 | 89.7% | 88.3% | 91.2% | 90.9% | 88.6% | 81.6% | |
| 5 | 90.0% | 88.4% | 91.5% | 90.8% | 89.2% | 81.9% | |
| Average |
SVM: support vector machine; PSSM: position specific scoring matrix; AB: average blocks; RVM: relevance vector machine; PPV: Positive Predictive Value; NPV: Negative Predictive Value; MCC: Matthews Correlation Coefficient.
Figure 1A comparison of the receiver operating characteristic (ROC) curves of the relevance vector machines (RVMs) classifier and the support vector machines (SVMs) classifier on the Yeast dataset.
The prediction ability of the different methods on the Yeast dataset.
| Model | Testing Set | Acc (%) | Sen (%) | Pre (%) | Mcc (%) |
|---|---|---|---|---|---|
| Guos’ work [ | ACC | 89.3 ± 2.6 | 89.9 ± 3.6 | 88.8 ± 6.1 | N/A |
| AC | 87.4 ± 1.3 | 87.3 ± 4.6 | 87.8 ± 4.3 | N/A | |
| Zhous’ work [ | SVM+LD | 88.6 ± 0.3 | 87.4 ± 0.2 | 89.5 ± 0.6 | 77.2 ± 0.7 |
| Yangs’ work [ | Cod1 | 75.1 ± 1.1 | 75.8 ± 1.2 | 74.8 ± 1.2 | N/A |
| Cod2 | 80.0 ± 1.0 | 76.8 ± 0.6 | 82.2 ± 1.3 | N/A | |
| Cod3 | 80.4 ± 0.4 | 78.1 ± 0.9 | 81.7 ± 0.9 | N/A | |
| Cod4 | 86.2 ± 1.1 | 81.0 ± 1.7 | 90.2 ± 1.3 | N/A | |
| Yous’ work [ | PCA-EELM | 87.0 ± 0.2 | 86.2 ± 0.4 | 87.6 ± 0.3 | 77.4 ± 0.4 |
| Proposed method | LRA+RVM | 94.6 ± 0.6 | 94.8 ± 1.0 | 94.4 ± 0.4 | 89.6 ± 1.2 |
ACC: Auto Covariance; LD: Local Description; PCA: Principal Component Analysis; EELM: Ensemble Extreme Learning Machines; N/A: Not Available; Acc: Accuracy; Sen: sensitivity; Pre: precision; Mcc: Matthew’s Correlation Coefficient.
The prediction ability of the different methods on the Helicobacter pylori protein–protein interactions (PPIs) dataset.
| Methods | Acc(%) | Sen (%) | Pre (%) | Mcc (%) |
|---|---|---|---|---|
| HKNN | 84.0 | 86.0 | 84.0 | N/A |
| Phylogenetic bootstrap | 75.8 | 69.8 | 80.2 | N/A |
| Signature Products | 83.4 | 79.9 | 85.7 | N/A |
| Boosting | 79.5 | 80.4 | 81.7 | N/A |
| Proposed method |
HKNN: Hyperplane Distance Nearest Neighbor.
Figure 2The flow chart of the proposed method.