| Literature DB >> 30881376 |
Zhan-Heng Chen1,2, Li-Ping Li1, Zhou He3, Ji-Ren Zhou1, Yangming Li4, Leon Wong1,2.
Abstract
Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.Entities:
Keywords: deep learning; disease; position-specific scoring matrix; self-interacting proteins; wavelet transform
Year: 2019 PMID: 30881376 PMCID: PMC6405691 DOI: 10.3389/fgene.2019.00090
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1The flowchart of our work.
FIGURE 2Cascade forest structure.
FIGURE 3Flow chart of Multi-grained scanning approach.
Performance of proposed model on human and yeast dataset.
| Accu | Spec | Prec | Recall | F1_score | MCC | |
|---|---|---|---|---|---|---|
| Datasets | (%) | (%) | (%) | (%) | (%) | (%) |
| 95.43 | 99.09 | 84.07 | 54.06 | 65.81 | 65.26 | |
| 93.65 | 99.28 | 88.73 | 47.01 | 61.46 | 61.87 | |
FIGURE 4Performance between GCForest and SVM on human dataset.
FIGURE 5Performace between GCForest and SVM on yeast dataset.
Measure the quality of GCForest and the other methods on human dataset.
| Accu | Spec | Recall | MCC | F1 Score | |
|---|---|---|---|---|---|
| Model | (%) | (%) | (%) | (%) | (%) |
| SLIPPER ( | 91.10 | 95.06 | 47.26 | 41.97 | 46.82 |
| DXECPPI ( | 30.90 | 25.83 | 87.08 | 8.25 | 17.28 |
| PPIevo ( | 78.04 | 25.82 | 87.83 | 20.82 | 27.73 |
| LocFuse ( | 80.66 | 80.50 | 50.83 | 20.26 | 27.65 |
| CRS ( | 91.54 | 96.72 | 34.17 | 36.33 | 36.83 |
| SPAR ( | 92.09 | 97.40 | 33.33 | 38.36 | 41.13 |
| Random forest | 94.33 | 100.00 | 29.14 | 52.39 | 45.13 |
Measure the quality of GCForest and the other methods on yeast dataset.
| Accu | Spec | Recall | MCC | F1 Score | |
|---|---|---|---|---|---|
| Model | (%) | (%) | (%) | (%) | (%) |
| SLIPPER ( | 71.90 | 72.18 | 69.72 | 28.42 | 36.16 |
| DXECPPI ( | 87.46 | 94.93 | 29.44 | 28.25 | 34.89 |
| PPIevo ( | 66.28 | 87.46 | 60.14 | 18.01 | 28.92 |
| LocFuse ( | 66.66 | 68.10 | 55.49 | 15.77 | 27.53 |
| CRS ( | 72.69 | 74.37 | 59.58 | 23.68 | 33.05 |
| SPAR ( | 76.96 | 80.02 | 53.24 | 24.84 | 34.54 |
| Random Forest | 92.77 | 100.00 | 44.10 | 63.81 | 61.21 |
FIGURE 6ROC curve of GCForest based on the results of human SIPs dataset.
FIGURE 7ROC curve of GCForest based on the results of yeast SIPs dataset.