| Literature DB >> 29989064 |
Yan-Bin Wang1,2, Zhu-Hong You2, Li-Ping Li2, De-Shuang Huang3, Feng-Feng Zhou4, Shan Yang2.
Abstract
Self-interacting proteins (SIPs) play a significant role in the execution of most important molecular processes in cells, such as signal transduction, gene expression regulation, immune response and enzyme activation. Although the traditional experimental methods can be used to generate SIPs data, it is very expensive and time-consuming based only on biological technique. Therefore, it is important and urgent to develop an efficient computational method for SIPs detection. In this study, we present a novel SIPs identification method based on machine learning technology by combing the Zernike Moments (ZMs) descriptor on Position Specific Scoring Matrix (PSSM) with Probabilistic Classification Vector Machines (PCVM) and Stacked Sparse Auto-Encoder (SSAE). More specifically, an efficient feature extraction technique called ZMs is firstly utilized to generate feature vectors on Position Specific Scoring Matrix (PSSM); Then, Deep neural network is employed for reducing the feature dimensions and noise; Finally, the Probabilistic Classification Vector Machine is used to execute the classification. The prediction performance of the proposed method is evaluated on S.erevisiae and Human SIPs datasets via cross-validation. The experimental results indicate that the proposed method can achieve good accuracies of 92.55% and 97.47%, respectively. To further evaluate the advantage of our scheme for SIPs prediction, we also compared the PCVM classifier with the Support Vector Machine (SVM) and other existing techniques on the same data sets. Comparison results reveal that the proposed strategy is outperforms other methods and could be a used tool for identifying SIPs.Entities:
Keywords: Deep learning; Probabilistic Classification Vector Machines; Zernike Moments
Mesh:
Substances:
Year: 2018 PMID: 29989064 PMCID: PMC6036743 DOI: 10.7150/ijbs.23817
Source DB: PubMed Journal: Int J Biol Sci ISSN: 1449-2288 Impact factor: 6.580
Figure 1General case of mapping transforms.
Figure 2Illustration of the architecture of SSAE.
Fivefold results by means of our scheme on S.erevisiae dataset.
| Testing Set | Acc (%) | Sn (%) | Sp(%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 92.96 | 39.84 | 98.39 | 53.11 |
| 2 | 92.85 | 43.70 | 98.83 | 58.41 |
| 3 | 92.20 | 40.41 | 99.09 | 57.07 |
| 4 | 92.60 | 51.02 | 98.18 | 61.97 |
| 5 | 92.13 | 44.65 | 99.08 | 60.56 |
| Average | 92.55 ± 0.3 | 43.92 ± 4.4 | 98.71± 0.4 | 58.22 ± 3.4 |
Five-fold results by means of our scheme on human dataset.
| Testing Set | Acc (%) | Sn (%) | Sp(%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 97.55 | 70.07 | 100 | 82.61 |
| 2 | 97.55 | 69.53 | 100 | 82.30 |
| 3 | 97.29 | 68.98 | 100 | 81.85 |
| 4 | 97.90 | 74.65 | 100 | 85.43 |
| 5 | 97.07 | 64.46 | 100 | 79.03 |
| Average | 97.47 ± 0.3 | 69.54 ± 3.6 | 100 | 82.24 ± 2.2 |
Figure 3ROC curves performed by our scheme on the S.erevisiae dataset.
Figure 4ROC curves performed by our scheme on the Human dataset.
Fivefold results by means of the SVMs on S.erevisiae dataset.
| Testing Set | Acc (%) | Sn (%) | Sp(%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 91.24 | 30.01 | 99.86 | 32.85 |
| 2 | 91.08 | 40.64 | 96.48 | 53.17 |
| 3 | 90.51 | 30.92 | 99.64 | 42.47 |
| 4 | 90.51 | 30.41 | 99.91 | 42.37 |
| 5 | 90.92 | 40.14 | 98.08 | 55.27 |
| Average | 90.85 ± 0.3. | 34.42 ± 5.4 | 98.79±1.5 | 45.22 ± 9.1 |
Fivefold results by means of the SVMs on Human dataset.
| Testing Set | Acc (%) | Sn (%) | Sp(%) | Mcc (%) |
|---|---|---|---|---|
| 1 | 97.06 | 64.08 | 100 | 78.80 |
| 2 | 96.69 | 58.78 | 100 | 75.33 |
| 3 | 96.60 | 61.06 | 100 | 76.72 |
| 4 | 96.81 | 61.46 | 100 | 77.06 |
| 5 | 96.95 | 63.07 | 100 | 78.13 |
| Average | 96.82 ± 0.2 | 61.69 ± 2.0 | 100 | 77.21 ± 1.3 |
Figure 5ROC curves performed by SVM-based on the S.erevisiae dataset.
Figure 6ROC curves performed by SVM-based on the Human dataset.
The prediction results of different methods on the S.erevisiae dataset.
| Model | Acc (%) | Sn (%) | Sp(%) | MCC (%) | AUC |
|---|---|---|---|---|---|
| SLIPPER | 71.90 | 69.72 | 72.18 | 28.42 | 0.7723 |
| DXECPPI | 87.46 | 29.44 | 94.93 | 28.25 | 0.6934 |
| PPIevo | 66.28 | 60.14 | 87.46 | 18.01 | 0.6728 |
| LocFuse | 66.66 | 55.49 | 68.10 | 15.77 | 0.7087 |
| CRS | 72.69 | 59.58 | 74.37 | 23.68 | 0.7115 |
| SPAR | 76.96 | 53.24 | 80.02 | 24.84 | 0.7455 |
| Our method | 92.55 | 34.42 | 98.71 | 45.22 | 0.8937 |
The prediction results of different methods the Human dataset.
| Model | Acc (%) | Sn (%) | Sp(%) | MCC (%) | AUC |
|---|---|---|---|---|---|
| SLIPPER | 91.10 | 47.26 | 95.06 | 41.97 | 0.8723 |
| DXECPPI | 30.90 | 87.08 | 25.83 | 8.25 | 0.5806 |
| PPIevo | 78.04 | 87.83 | 25.82 | 20.82 | 0.7329 |
| LocFuse | 80.66 | 50.83 | 80.50 | 20.26 | 0.7087 |
| CRS | 91.54 | 34.17 | 96.72 | 36.33 | 0.8196 |
| SPAR | 92.09 | 33.33 | 94.70 | 38.36 | 0.8229 |
| Our method | 97.47 | 69.54 | 100.00 | 82.24 | 0.9987 |