| Literature DB >> 30150728 |
Lei Wang1,2, Zhu-Hong You3, Xin Yan4, Shi-Xiong Xia5, Feng Liu6, Li-Ping Li7, Wei Zhang8, Yong Zhou5.
Abstract
The interaction among proteins is essential in all life activities, and it is the basis of all the metabolic activities of the cells. By studying the protein-protein interactions (PPIs), people can better interpret the function of protein, decoding the phenomenon of life, especially in the design of new drugs with great practical value. Although many high-throughput techniques have been devised for large-scale detection of PPIs, these methods are still expensive and time-consuming. For this reason, there is a much-needed to develop computational methods for predicting PPIs at the entire proteome scale. In this article, we propose a new approach to predict PPIs using Rotation Forest (RF) classifier combine with matrix-based protein sequence. We apply the Position-Specific Scoring Matrix (PSSM), which contains biological evolution information, to represent protein sequences and extract the features through the two-dimensional Principal Component Analysis (2DPCA) algorithm. The descriptors are then sending to the rotation forest classifier for classification. We obtained 97.43% prediction accuracy with 94.92% sensitivity at the precision of 99.93% when the proposed method was applied to the PPIs data of yeast. To evaluate the performance of the proposed method, we compared it with other methods in the same dataset, and validate it on an independent datasets. The results obtained show that the proposed method is an appropriate and promising method for predicting PPIs.Entities:
Mesh:
Year: 2018 PMID: 30150728 PMCID: PMC6110764 DOI: 10.1038/s41598-018-30694-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The five-fold cross-validation results achieved on the Yeast dataset using the proposed method.
| Testing set | Accu. (%) | Sen. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 97.50 | 95.04 | 100.00 | 95.11 | 97.27 |
| 2 | 96.92 | 94.32 | 99.63 | 93.98 | 96.88 |
| 3 | 97.63 | 95.22 | 100.00 | 95.37 | 97.89 |
| 4 | 97.68 | 95.38 | 100.00 | 95.46 | 97.46 |
| 5 | 97.41 | 94.66 | 100.00 | 94.94 | 98.05 |
|
|
Figure 1The ROC curves performed on the Yeast dataset using the proposed method.
The five-fold cross-validation results achieved on the H. pylori dataset using the proposed method.
| Testing set | Accu. (%) | Sen. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 88.68 | 79.23 | 96.98 | 78.51 | 88.36 |
| 2 | 86.79 | 77.03 | 96.20 | 75.21 | 86.71 |
| 3 | 88.68 | 79.12 | 98.33 | 79.00 | 90.22 |
| 4 | 88.16 | 78.05 | 97.39 | 77.76 | 89.18 |
| 5 | 88.01 | 77.55 | 98.28 | 77.83 | 89.33 |
|
|
Figure 2The ROC curves performed on the H. pylori dataset using the proposed method.
The five-fold cross-validation results achieved on the Yeast dataset using the SVM classifier.
| Testing set | Accu. (%) | Sen. (%) | Prec. (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|---|
| 1 | 87.84 | 85.37 | 90.00 | 75.79 | 94.80 |
| 2 | 85.47 | 81.48 | 89.20 | 71.27 | 94.13 |
| 3 | 87.71 | 83.84 | 90.63 | 75.60 | 94.41 |
| 4 | 89.23 | 86.41 | 91.71 | 78.59 | 95.47 |
| 5 | 86.21 | 85.00 | 86.36 | 72.38 | 94.16 |
| Average | 87.29 ± 1.48 | 84.42 ± 1.88 | 89.58 ± 2.02 | 74.73 ± 2.93 | 94.59 ± 0.56 |
|
|
Figure 3The ROC curves performed on the Yeast dataset using the SVM classifier.
The performance comparison among different descriptor on the Yeast dataset.
| Descriptor | Accu. (%) | Sen. (%) | Prec. (%) | MCC (%) |
|---|---|---|---|---|
| AC | 93.14 ± 0.69 | 86.28 ± 1.23 | 87.10 ± 1.20 | |
| DCT | 93.65 ± 0.67 | 87.30 ± 1.41 | 88.02 ± 1.21 | |
| Original | 81.50 ± 0.62 | 70.55 ± 0.51 | 90.33 ± 1.94 | 64.57 ± 1.65 |
| 2DPCA | 99.93 ± 0.17 |
The performance comparison between different methods on the Yeast dataset.
| Author | Model | Accu. (%) | Sen. (%) | Prec. (%) | MCC (%) |
|---|---|---|---|---|---|
| Guos’ work[ | ACC | 89.33 ± 2.67 | 89.93 ± 3.68 | 88.87 ± 6.16 | N/A |
| AC | 87.36 ± 1.38 | 87.30 ± 4.68 | 87.82 ± 4.33 | N/A | |
| Zhous’ work[ | SVM + LD | 88.56 ± 0.33 | 87.37 ± 0.22 | 89.50 ± 0.60 | 77.15 ± 0.68 |
| Yangs’ work[ | Cod1 | 75.08 ± 1.13 | 75.81 ± 1.20 | 74.75 ± 1.23 | N/A |
| Cod2 | 80.04 ± 1.06 | 76.77 ± 0.69 | 82.17 ± 1.35 | N/A | |
| Cod3 | 80.41 ± 0.47 | 78.14 ± 0.90 | 81.86 ± 0.99 | N/A | |
| Cod4 | 86.15 ± 1.17 | 81.03 ± 1.74 | 90.24 ± 0.45 | N/A | |
| Yous’ work[ | PCA-EELM | 87.00 ± 0.29 | 86.15 ± 0.43 | 87.59 ± 0.32 | 77.36 ± 0.44 |
|
|
The performance comparison of different methods on the H. pylori dataset.
| Model | Accu. (%) | Sen. (%) | Prec. (%) | MCC (%) |
|---|---|---|---|---|
| Signature products[ | 83.40 | 79.90 | 85.70 | N/A |
| Ensemble ELM[ | 87.50 | 88.95 | 86.15 |
|
| Phylogentic bootstrap[ | 75.80 | 69.80 | 80.20 | N/A |
| HKNN[ | 84.00 | 86.00 | 84.00 | N/A |
| Ensemble of HKNN[ | 86.60 | 86.70 | 85.00 | N/A |
| Boosting[ | 79.52 | 80.37 | 81.69 | 70.64 |
|
| 77.66 |
Predictive results of four species based on the proposed method.
| Species | Test pairs | Accu. (%) |
|---|---|---|
| 4013 | 91.43 | |
| 6954 | 99.93 | |
| 1412 | 92.00 | |
| 313 | 90.73 |
Figure 4Flow chart of the proposed method.