| Literature DB >> 28423569 |
Zheng-Wei Li1, Zhu-Hong You2, Xing Chen3, Li-Ping Li2, De-Shuang Huang4, Gui-Ying Yan5, Ru Nie1, Yu-An Huang6.
Abstract
Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.Entities:
Keywords: cancer; disease; position-specific scoring matrix; protein-protein interactions; weber local descriptor
Mesh:
Substances:
Year: 2017 PMID: 28423569 PMCID: PMC5410333 DOI: 10.18632/oncotarget.15564
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Performance of the proposed method using five-fold cross validation on Yeast data set
| Test set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| 95.89 | 94.06 | 97.82 | 91.85 | |
| 96.16 | 94.41 | 97.63 | 92.35 | |
| 96.87 | 95.23 | 98.65 | 93.80 | |
| 96.92 | 95.14 | 98.60 | 93.89 | |
| 96.74 | 95.44 | 97.85 | 93.50 | |
Performance of the proposed method using five-fold cross validation on H. Pylori data set
| Test set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| 92.62 | 93.25 | 92.95 | 85.18 | |
| 91.08 | 89.96 | 91.27 | 82.13 | |
| 92.11 | 93.15 | 91.28 | 84.24 | |
| 90.74 | 91.07 | 90.44 | 81.48 | |
| 92.47 | 93.33 | 91.41 | 84.95 | |
Figure 1ROC curves of proposed method on Yeast data set
Figure 2ROC curves of proposed method on H. Pylori data set
Five-fold cross validation results performed on Human data set
| Model | Test set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|---|
| DVM | 97.18 | 95.61 | 98.40 | 94.37 | |
| 97.30 | 95.05 | 99.62 | 94.71 | ||
| 96.38 | 94.73 | 97.43 | 92.75 | ||
| 97.73 | 96.29 | 98.95 | 95.48 | ||
| 97.92 | 96.83 | 98.65 | 95.82 | ||
| SVM | 89.89 | 90.83 | 88.21 | 79.79 | |
| 91.54 | 91.79 | 91.57 | 83.08 | ||
| 89.40 | 90.78 | 86.99 | 78.82 | ||
| 90.93 | 92.96 | 88.64 | 81.95 | ||
| 91.24 | 91.68 | 89.66 | 82.44 | ||
Figure 3ROC curves of proposed DVM-based method on Human data set
Figure 4ROC curves of SVM-based method on Human data set
Predictive results of proposed method on five other species
| Species | Test pairs | Accuracy |
|---|---|---|
| 6954 | 76.23% | |
| 4013 | 92.72% | |
| 1406 | 89.40% | |
| 1420 | 86.37% | |
| 312 | 87.69% |
Predictive results of different methods on Yeast data set
| Model | Test set | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|---|
| ACC | 89.33±2.67 | 89.93±3.68 | 88.87±6.16 | N/A | |
| AC | 87.36±1.38 | 87.30±4.68 | 87.82±4.33 | N/A | |
| Cod1 | 75.08±1.13 | 75.81±1.20 | 74.75±1.23 | N/A | |
| Cod2 | 80.04±1.06 | 76.77±0.69 | 82.17±1.35 | N/A | |
| Cod3 | 80.41±0.47 | 78.14±0.90 | 81.66±0.99 | N/A | |
| Cod4 | 86.15±1.17 | 81.03±1.74 | 90.24±1.34 | N/A | |
| EELM | 87.00±0.29 | 86.15±0.43 | 87.59±0.32 | 77.36±0.44 | |
| RF+PR-LPQ | 93.92±0.36 | 91.10±0.31 | 96.45±0.45 | 88.56±0.63 | |
| DVM |
Predictive results of different methods on H. Pylori data set
| Model | Acc (%) | Sen (%) | Pre (%) | MCC (%) |
|---|---|---|---|---|
| 83.00 | 86.00 | 85.10 | N/A | |
| 84.00 | 86.00 | 84.00 | N/A | |
| 86.60 | 86.70 | 85.00 | N/A | |
| 87.50 | 88.95 | 86.15 | 78.13 | |
| 83.40 | 79.90 | 85.70 | N/A | |
| 89.47 | 89.18 | 89.63 | 81.00 | |
Figure 5Four filters used in the original WLD
Figure 6Sobel operators used in the improved WLD (IWLD)
Figure 7Flow chart of our proposed method for the prediction of PPIs