| Literature DB >> 28820478 |
Yan-Bin Wang1,2, Zhu-Hong You3, Li-Ping Li4, Yu-An Huang5, Hai-Cheng Yi6.
Abstract
Protein-protein interactions (PPIs) play a very large part in most cellular processes. Although a great deal of research has been devoted to detecting PPIs through high-throughput technologies, these methods are clearly expensive and cumbersome. Compared with the traditional experimental methods, computational methods have attracted much attention because of their good performance in detecting PPIs. In our work, a novel computational method named as PCVM-LM is proposed which combines the probabilistic classification vector machine (PCVM) model and Legendre moments (LMs) to predict PPIs from amino acid sequences. The improvement mainly comes from using the LMs to extract discriminatory information embedded in the position-specific scoring matrix (PSSM) combined with the PCVM classifier to implement prediction. The proposed method was evaluated on Yeast and Helicobacter pylori datasets with five-fold cross-validation experiments. The experimental results show that the proposed method achieves high average accuracies of 96.37% and 93.48%, respectively, which are much better than other well-known methods. To further evaluate the proposed method, we also compared the proposed method with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the same datasets. The comparison results clearly show that our method is better than the SVM-based method and other existing methods. The promising experimental results show the reliability and effectiveness of the proposed method, which can be a useful decision support tool for protein research.Entities:
Keywords: Legendre moments; position specific scoring matrix; probabilistic classification vector machine; protein-protein interactions
Mesh:
Substances:
Year: 2017 PMID: 28820478 PMCID: PMC6152086 DOI: 10.3390/molecules22081366
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The flowchart of the proposed feature extraction method.
Five-fold cross-validation results using our proposed method on the Yeast dataset.
| Testing Set | Acc (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 96.67 | 97.52 | 95.93 | 93.55 |
| 2 | 96.33 | 96.59 | 95.93 | 92.93 |
| 3 | 96.25 | 96.40 | 96.24 | 92.78 |
| 4 | 96.42 | 95.78 | 96.93 | 93.09 |
| 5 | 96.17 | 96.69 | 95.74 | 92.63 |
| Average | 96.37 ± 0.2 | 96.60 ± 0.6 | 96.15 ± 0.5 | 93.00 ± 0.4 |
Five-fold cross-validation results using our proposed method on the H. pylori dataset.
| Testing Set | Acc (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 93.48 | 94.74 | 92.15 | 87.81 |
| 2 | 93.48 | 89.72 | 96.56 | 87.75 |
| 3 | 93.14 | 89.97 | 96.42 | 87.20 |
| 4 | 93.65 | 89.94 | 97.88 | 88.09 |
| 5 | 93.66 | 92.61 | 94.27 | 88.12 |
| Average | 93.48 ± 0.2 | 94.40 ± 2.2 | 95.46 ± 2.3 | 87.79 ± 0.4 |
Figure 2ROC curves performed of a probabilistic classification vector machines model (PCVM) on the Yeast dataset.
Figure 3ROC curves performed of PCVM model on the H. pylori dataset.
Five-fold cross-validation results using the SVM-based method on the Yeast dataset.
| Testing Set | Acc (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 92.83 | 96.20 | 90.23 | 86.66 |
| 2 | 92.67 | 97.10 | 88.91 | 86.38 |
| 3 | 92.25 | 85.60 | 99.05 | 85.60 |
| 4 | 92.25 | 98.15 | 87.65 | 85.62 |
| 5 | 92.34 | 85.45 | 99.23 | 85.73 |
| Average | 92.47 ± 0.3 | 92.50 ± 6.4 | 93.01 ± 5.7 | 86.00 ± 0.5 |
Five-fold cross-validation results using the SVM-based method on the H. pylori dataset.
| Testing Set | Acc (%) | Sn (%) | Pe (%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 90.74 | 99.65 | 84.27 | 82.99 |
| 2 | 90.22 | 99.65 | 83.38 | 82.14 |
| 3 | 90.74 | 81.94 | 100.00 | 82.98 |
| 4 | 90.39 | 82.47 | 99.22 | 82.48 |
| 5 | 90.41 | 100.00 | 83.53 | 82.42 |
| Average | 90.50 ± 0.2 | 92.74 ± 9.6 | 90.08 ± 8.7 | 82.60 ± 0.4 |
Figure 4ROC curves performed of the support vector machine (SVM) on the Yeast dataset.
Prediction results of the proposed method on four other species.
| Species | Test Pairs | Accuracy |
|---|---|---|
| 4013 | 92.60% | |
| 6954 | 92.80% | |
| 1412 | 80.10% | |
| 313 | 89.14% |
Practical predicting results of different methods on the Yeast dataset. N/A: Not Available.
| Model | Testing Set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|---|
| ACC | 89.33 ± 2.67 | 89.93 ± 3.68 | 88.87 ± 6.16 | N/A | |
| AC | 87.36 ± 1.38 | 87.30 ± 4.68 | 87.82 ± 4.33 | N/A | |
| Cod1 | 75.08 ± 1.13 | 75.81 ± 1.20 | 74.75 ± 1.23 | N/A | |
| Cod2 | 80.04 ± 1.06 | 76.77 ± 0.69 | 82.17 ± 1.35 | N/A | |
| Cod3 | 80.41 ± 0.47 | 78.14 ± 0.90 | 81.66 ± 0.99 | N/A | |
| Cod4 | 86.15 ± 1.17 | 81.03 ± 1.74 | 90.24 ± 1.34 | N/A | |
| PCA-EELM | 87.00 ± 0.29 | 86.15 ± 0.43 | 87.59 ± 0.32 | 77.36 ± 0.44 | |
| RF-PR-LPQ | 93.92 ± 0.36 | 91.10 ± 0.31 | 96.45 ± 0.45 | 88.56 ± 0.63 | |
| PCVM | 96.37 ± 0.20 | 96.60 ± 0.6 | 96.15 ± 0.5 | 93.00 ± 0.4 |
Practical predicting results of different methods on the H. pylori dataset. N/A: Not Available.
| Model | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| Nanni [ | 83.00 | 86.00 | 85.10 | N/A |
| Nanni [ | 84.00 | 86.00 | 84.00 | N/A |
| Nanni and Lumini [ | 86.60 | 86.70 | 85.00 | N/A |
| Z-H You [ | 87.50 | 88.95 | 86.15 | 78.13 |
| L Nanni [ | 84.00 | 84.00 | 84.00 | N/A |
| Proposed Method | 93.48 | 94.40 | 95.46 | 87.79 |