| Literature DB >> 32517646 |
Wei Du1, Yu Sun1, Gaoyang Li1, Huansheng Cao2, Ran Pang1, Ying Li3.
Abstract
BACKGROUND: Compared with disease biomarkers in blood and urine, biomarkers in saliva have distinct advantages in clinical tests, as they can be conveniently examined through noninvasive sample collection. Therefore, identifying human saliva-secretory proteins and further detecting protein biomarkers in saliva have significant value in clinical medicine. There are only a few methods for predicting saliva-secretory proteins based on conventional machine learning algorithms, and all are highly dependent on annotated protein features. Unlike conventional machine learning algorithms, deep learning algorithms can automatically learn feature representations from input data and thus hold promise for predicting saliva-secretory proteins.Entities:
Keywords: Capsule network; Convolutional neural network; Deep learning; Saliva-secretory protein
Year: 2020 PMID: 32517646 PMCID: PMC7285745 DOI: 10.1186/s12859-020-03579-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The flowchart for predicting human saliva-secretory proteins
The performances of CapsNet-SSP and other methods on the training set
| Methods | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | AUC |
|---|---|---|---|---|---|---|---|
| KNN | 0.835 | 0.763 | 0.907 | 0.891 | 0.822 | 0.677 | 0.878 |
| Decision Tree | 0.820 | 0.789 | 0.852 | 0.842 | 0.815 | 0.642 | 0.800 |
| Random Forest | 0.810 | 0.752 | 0.867 | 0.850 | 0.798 | 0.623 | 0.879 |
| AdaBoost | 0.830 | 0.754 | 0.905 | 0.889 | 0.836 | 0.667 | 0.905 |
| SVM | 0.830 | 0.760 | 0.899 | 0.883 | 0.822 | 0.666 | 0.877 |
The threshold is set where the MCC reaches the maximum value
Fig. 2ROC curves and precision-recall curves of different methods on the training dataset
The performances of CapsNet-SSP and other methods on the independent test set
| Methods | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | AUC |
|---|---|---|---|---|---|---|---|
| KNN | 0.778 | 0.649 | 0.907 | 0.875 | 0.745 | 0.575 | 0.809 |
| Decision Tree | 0.772 | 0.692 | 0.851 | 0.823 | 0.752 | 0.550 | 0.740 |
| Random Forest | 0.781 | 0.804 | 0.758 | 0.769 | 0.786 | 0.563 | 0.836 |
| AdaBoost | 0.792 | 0.703 | 0.881 | 0.855 | 0.772 | 0.593 | 0.847 |
| SVM | 0.781 | 0.784 | 0.778 | 0.779 | 0.782 | 0.562 | 0.857 |
The threshold is set where the MCC reaches the maximum value
Fig. 3ROC curves and precision-recall curves of different methods on the independent test set
Performance comparison using different architectures in CapsNet-SSP
| Architectures | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | AUC |
|---|---|---|---|---|---|---|---|
| One-Lane Conv | 0.792 (8.8e-08) | 0.827 (5.0e-07) | 0.758 (2.3e-07) | 0.771 (4.0e-08) | 0.811 (3.6e-07) | 0.602 (1.6e-07) | 0.863 (1.1e-06) |
| Multi-Lane Conv | 0.812 (4.4e-08) | 0.806 (6.4e-06) | 0.818 (1.0e-07) | 0.814 (1.0e-08) | 0.810 (9.6e-07) | 0.624 (4.1e-08) | 0.869 (2.5e-06) |
| One-Lane CapsNet | 0.832 (0.015) | 0.847 (0.04) | 0.818 (0.013) | 0.822 (0.013) | 0.834 (0.013) | 0.665 (0.025) | 0.915 (0.001) |
(N/A) | (N/A) | (N/A) | (N/A) | (N/A) | (N/A) | (N/A) |
The threshold is set where the MCC reaches the maximum value, and the values in brackets are p-values
Fig. 4The ROC curves and precision-recall curves using different architectures in CapsNet-SSP
Performance comparison of deep learning architectures
| Architectures | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | AUC |
|---|---|---|---|---|---|---|---|
| DeepSig | 0.792 (0.011) | 0.745 (0.030) | 0.838 (0.009) | 0.820 (0.016) | 0.781 (7.8e-04) | 0.586 (0.011) | 0.867 (5.8e-07) |
| DanQ | 0.802 (1.4e-05) | 0.745 1.8e-05) | 0.859 (3.5e-05) | 0.839 (3.3e-05) | 0.789 (2.2e-05) | 0.608 (1.8e-05) | 0.886 (6.3e-06) |
| DeepLoc | 0.843 (0.013) | 0.755 (0.029) | 0.929 (0.037) | 0.914 (0.038) | 0.827 (0.016) | 0.695 (0.013) | 0.891 (0.015) |
The threshold is set where the MCC reaches the maximum value, and the values in brackets are p-values
Fig. 5The ROC curves and precision-recall curves of different deep learning architectures
Comparison of the performances of different strategies for class imbalance
| Strategies | Accuracy | Sensitivity | Specificity | Precision | F-score | MCC | AUC |
|---|---|---|---|---|---|---|---|
| No strategy | 0.853 | 0.796 | 0.909 | 0.897 | 0.843 | 0.710 | 0.916 |
| Hybrid-based | 0.868 | 0.857 | 0.879 | 0.875 | 0.866 | 0.736 | 0.939 |
| Boosting-based | 0.868 | 0.827 | 0.909 | 0.900 | 0.862 | 0.738 | 0.918 |
| Bagging-based | 0.888 | 0.847 | 0.929 | 0.922 | 0.884 | 0.779 | 0.948 |
The threshold is set where the MCC reaches the maximum value
Ranking result comparison for experimentally verified human saliva-secretory proteins
| Top number | SVM | DeepLoc | CapsNet-SSP |
|---|---|---|---|
| 1000 | 3 (0.168) | 5 (0.025) | 16 (5.28E-12) |
| 2000 | 7 (0.042) | 19 (2.13E-10) | 24 (5.35E-16) |
| 3000 | 7 (0.132) | 27 (2.83E-15) | 29 (7.28E-18) |
| 4000 | 9 (0.121) | 30 (1.65E-15) | 33 (1.56E-19) |
Comparison of the ranking results for different cancer biomarkers in saliva
| Top number | HNSCC | OSCC | LC | BC |
|---|---|---|---|---|
| 1000 | 7 (1.39E-4) | 13 (7.31E-9) | 6 (5.23E-6) | 3 (0.010) |
| 2000 | 11 (8.41E-6) | 28 (6.73E-22) | 7 (2.02E-9) | 4 (0.011) |
| 3000 | 15 (2.16E-7) | 34 (2.03E-26) | 9 (1.40E-6) | 6 (0.001) |
| 4000 | 19 (2.01E-9) | 34 (4.55E-22) | 10 (8.16E-9) | 9 (3.75E-6) |
Fig. 6Architecture of the proposed model
Fig. 7Computation between the PrimaryCaps and HiddenCaps layers of each group