| Literature DB >> 32316294 |
Da Xu1, Hanxiao Xu1, Yusen Zhang1, Wei Chen1, Rui Gao2.
Abstract
Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.Entities:
Keywords: WSRC classifier; contact information; graph energy; physicochemical properties; protein-protein interaction
Year: 2020 PMID: 32316294 PMCID: PMC7221971 DOI: 10.3390/molecules25081841
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The performance comparisons of different classifiers. WSRC: weighted sparse representation-based classification; KNN: K-nearest neighbors; SVM: support vector machine.
five-fold cross-validation results on the human data set.
| Testing Set | ACC (%) | SEN (%) | MCC (%) | Pre (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 99.33 | 98.76 | 98.66 | 99.87 | 99.99 |
| 2 | 99.63 | 99.34 | 99.26 | 99.87 | 100 |
| 3 | 99.57 | 99.22 | 99.14 | 99.87 | 100 |
| 4 | 99.39 | 99.24 | 98.77 | 99.49 | 99.98 |
| 5 | 99.51 | 99.49 | 99.02 | 99.49 | 99.99 |
| Average | 99.49 | 99.21 | 98.97 | 99.72 | 99.99 |
five-fold cross-validation results on the H. pylori data set.
| Testing Set | ACC (%) | SEN (%) | MCC (%) | Pre (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 95.55 | 98.21 | 91.24 | 92.88 | 99.27 |
| 2 | 97.94 | 98.26 | 95.89 | 97.59 | 99.34 |
| 3 | 98.11 | 99.65 | 96.27 | 96.64 | 98.94 |
| 4 | 97.94 | 97.69 | 95.88 | 98.34 | 99.41 |
| 5 | 96.23 | 97.32 | 92.46 | 95.41 | 99.01 |
| Average | 97.15 | 98.23 | 94.35 | 96.17 | 99.19 |
five-fold cross-validation results on the yeast data set.
| Testing Set | ACC (%) | SEN (%) | MCC (%) | Pre (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 99.60 | 99.18 | 99.20 | 100 | 100 |
| 2 | 99.46 | 98.95 | 98.93 | 100 | 100 |
| 3 | 99.55 | 99.20 | 99.11 | 99.91 | 100 |
| 4 | 99.51 | 99.00 | 99.02 | 100 | 100 |
| 5 | 99.69 | 99.38 | 99.38 | 100 | 100 |
| Average | 99.56 | 99.14 | 99.13 | 99.98 | 100 |
Note: ACC: accuracy; SEN: sensitivity; MCC: Matthews correlation coefficient; Pre: precision; AUC: area under the curve.
Figure 2Comparison results of different methods on the human data set.
Figure 3Comparison results of different methods on the H. pylori data set.
Figure 4Comparison results of different methods on the yeast data set.
Figure 5The prediction results of crossover network (Wnt-related network).
Comparison of different methods on the Wnt-related network using yeast data set as the training data set.
| Wnt-Related Network | Proportion | Accuracy (%) |
|---|---|---|
| Proposed method | 92/96 | 95.83 |
| Ding’s work [ | 89/96 | 92.71 |
| Shen’s work [ | 73/96 | 76.04 |
| Zhou’s work [ | 87/96 | 90.63 |
| Chen’s work [ | 89/96 | 92.71 |
Figure 6The prediction results of multi-core network.
Comparison of the accuracy (%) between different methods on the independent data sets using yeast data set as the training data set.
| Data Set | Testing Pairs | Proposed Method | Huang’s Work [ | Du’s Work [ | Ding’s Work [ |
|---|---|---|---|---|---|
|
| 1420 | 93.80 | 85.77 | 93.66 | 92.03 |
|
| 1412 | 99.93 | 88.81 | 93.77 | 94.58 |
|
| 4013 | 86.24 | 72.79 | 94.84 | 90.28 |
|
| 313 | 94.57 | 83.39 | 91.37 | 92.25 |
|
| 21975 | 99.87 | 89.35 | N/A | N/A |
Note: N/A means not available.
Figure 7The flowchart of the proposed method for predicting protein-protein interactions (PPIs).
The details of three benchmark data sets.
| Datasets | Protein Pairs | Interaction Pairs | Non-Interaction Pairs | References |
|---|---|---|---|---|
| human | 8161 | 3899 | 4262 | [ |
|
| 2916 | 1458 | 1458 | [ |
| yeast | 11,188 | 5594 | 5594 | [ |
Figure 8The schematic diagram of protein sequence feature extraction.