| Literature DB >> 36253710 |
Jiquan Shen1, Jiawei Shi1, Junwei Luo2, Haixia Zhai1, Xiaoyan Liu1, Zhengjiang Wu1, Chaokun Yan3, Huimin Luo3.
Abstract
MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results.Entities:
Keywords: Cancer subtype; Classification; Deep learning
Mesh:
Year: 2022 PMID: 36253710 PMCID: PMC9575247 DOI: 10.1186/s12859-022-04980-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1DCGN architecture
Fig. 2Function image of activation functions
Fig. 3Structure of the GRU (where σ represents the sigmoid function)
Cancers and their specific subtypes
| Cancer category | Classification systems | Specific typology | Genes | Number of samples |
|---|---|---|---|---|
| BRCA | PAM50 | Basal, HER2 + , luminal A/B, normal-like, normal | 20,000 | 4221 |
| BLCA | MDA | Luminal, basal, p53-like | 20,087 | 1010 |
| TCGA | Luminal_infiltrated, Luminal_papillary, Luminal, Neuronal, Basal_squamous | 20,000 | 761 | |
| CIT-Curie | MC1, MC2, MC3, MC4, MC5, MC6, MC7 | 20,087 | 909 | |
| Lund | UroA-Prog, UroB, UroC, Uro-Inf, GU, GU-Inf, Mes-like, Ba/Sq, Sc/NE-like, Ba/Sq-Inf | 20,000 | 1185 |
Fig. 4Activation functions accuracy curve (experimental results of different activation functions on other datasets are provided in the Additional file 1: Fig. S2)
Fig. 5Ablation experiment results
BRCA20000-dimension dataset results
| Methods | DCNN | SVM | GBDT | LightGBM | gcForest | SAE | BiGRU | DCGN |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 90.2 | 94.7 | 94.3 | 94.6 | 95.2 | 94.2 | 95 | 96 |
| Precision | 95.1 | 94.9 | 94.5 | 94.7 | 95.4 | 95.3 | 95.4 | 98.7 |
| Recall | 94.8 | 94.8 | 94.3 | 94.5 | 95.2 | 94.8 | 94.7 | 98.7 |
| F1-score | 94.6 | 94.8 | 94.3 | 94.6 | 95.3 | 94.7 | 94.8 | 98.6 |
| Accuracy | 88.7 | 94 | 93.2 | 93.6 | 94.1 | 93.7 | 94 | 94.8 |
| Precision | 92.5 | 94 | 93.4 | 93.7 | 94.2 | 93.1 | 93.9 | 96.8 |
| Recall | 92.2 | 94.1 | 93.2 | 93.5 | 94.2 | 93.2 | 94 | 96.7 |
| F1-score | 92.3 | 94 | 93.2 | 93.6 | 94.3 | 93.2 | 94.1 | 97 |
Fig. 6Confusion matrix derived for several well-performing methods
Kappa coefficient and Hamming distance values of each model on the BRCA 20,000-dimension dataset
| Methods | DCNN | SVM | GBDT | LightGBM | gcForest | SAE | BiGRU | DCGN |
|---|---|---|---|---|---|---|---|---|
| Kappa | 0.937 | 0.947 | 0.937 | 0.937 | 0.936 | 0.953 | 0.937 | 0.984 |
| Hamming distance | 0.051 | 0.043 | 0.052 | 0.052 | 0.053 | 0.039 | 0.051 | 0.013 |
Experimental results of BLCA datasets
| Methods | DCNN | SVM | GBDT | LightGBM | gcForest | SAE | BiGRU | DCGN |
|---|---|---|---|---|---|---|---|---|
| Dataset | BLCA-MDA | |||||||
| Accuracy | 91.5 (90) | 93 (91.1) | 92.5 (91.4) | 92.5 (90) | 92.7 (89.5) | 93 (92.1) | 93.5 (92.7) | 95.5 (94.2) |
| Precision | 94.3 (92) | 93.4 (91.2) | 92.9 (90.6) | 93 (90.4) | 92.7 (90) | 93.2 (92.2) | 93.6 (92.6) | 97.4 (94.5) |
| Recall | 93 (91.8) | 93.3 (91) | 92.5 (90.4) | 93.2 (90) | 92.7 (90) | 93 (92) | 93.5 (92.4) | 97.3 (94.2) |
| F1-score | 93.3 (92) | 93.4 (91) | 92.6 (90.4) | 93.3 (90) | 92.6 (89.7) | 93 (92.1) | 93.4 (92.4) | 97.3 (94.2) |
The value in front of () represents the highest-level result, and the value in () represents the average result over ten iterations
Hamming distance and Kappa coefficient values obtained when applying models to BLCA datasets
| Kappa | Hamming distance | |||||||
|---|---|---|---|---|---|---|---|---|
| Dataset | MDA | Lund | TCGA | CIT-Curie | MDA | Lund | TCGA | CIT-Curie |
| DCNN | 0.896 | 0.908 | 0.956 | 0.978 | 0.067 | 0.081 | 0.024 | 0.02 |
| SVM | 0.91 | 0.93 | 0.98 | 0.984 | 0.059 | 0.06 | 0.019 | 0.014 |
| GBDT | 0.895 | 0.924 | 0.982 | 0.979 | 0.069 | 0.067 | 0.016 | 0.017 |
| LightGBM | 0.91 | 0.902 | 0.983 | 0.975 | 0.059 | 0.087 | 0.015 | 0.02 |
| gcForest | 0.881 | 0.915 | 0.983 | 0.98 | 0.079 | 0.075 | 0.015 | 0.016 |
| SAE | 0.895 | 0.898 | 0.941 | 0.98 | 0.069 | 0.096 | 0.045 | 0.016 |
| BiGRU | 0.903 | 0.929 | 0.966 | 0.973 | 0.064 | 0.063 | 0.021 | 0.022 |
| DCGN | 0.933 | 0.938 | 0.991 | 0.993 | 0.044 | 0.054 | 0.006 | 0.005 |