| Literature DB >> 34149811 |
Jianlin Wang1, Xuebing Dai1, Huimin Luo1, Chaokun Yan1, Ge Zhang1, Junwei Luo2.
Abstract
The Pan-Cancer Atlas consists of original sequencing data from various sources, provides the opportunity to perform systematic studies on the commonalities and differences between diverse cancers. The analysis for the pan-cancer dataset could help researchers to identify the key factors that could trigger cancer. In this paper, we present a novel pan-cancer classification method, referred to MI_DenseNetCAM, to identify a set of genes that can differentiate all tumor types accurately. First, the Mutual Information (MI) was utilized to eliminate noise and redundancy from the pan-cancer datasets. Then, the gene data was further converted to 2D images. Next, the DenseNet model was adopted as a classifier and the Guided Grad-CAM algorithm was applied to identify the key genes. Extensive experimental results on the public RNA-seq data sets with 33 different tumor types show that our method outperforms the other state-of-the-art classification methods. Moreover, gene analysis further demonstrated that the genes selected by our method were related to the corresponding tumor types.Entities:
Keywords: DenseNet; RNA-seq data; cancer classification; guided grad-CAM algorithm; pan-cancer
Year: 2021 PMID: 34149811 PMCID: PMC8209511 DOI: 10.3389/fgene.2021.670232
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The workflow of MI_DenseNetCAM. (A) Cancer classification and prediction through MI and deep learning combined analysis from pan-cancer datasets. (B) Principle diagram of the Guided Grad-Cam algorithm.
Figure 2The structure of the DenseNet.
Parameter settings.
| MI_DenseNetCAM | learning_rate = 0.0001, num_epochs=200, batch_size = 32, growth_rate = 16, compression_factor = 0.5, image_dimension = 60 |
| MI_KNN | n_neighbors = 5 |
| Var_CNN | learning_rate = 0.0001, num_epochs = 200, batch_size = 500 |
| rL-GenSVM | phi = 1/3, |
| ET-SVM | C = 0.004, kernel = “linear,” decision_function_shape = “ovo,” gama = 1 |
The experimental results of five methods.
| MI_DenseNetCAM | 96.81% | 96.89% | 96.81% | 96.85% |
| MI_KNN | 92.61% | 92.46% | 92.61% | 92.40% |
| Var_CNN | 95.59% | 95.54% | 95.59% | 95.43% |
| rL-GenSVM | 87.29% | 87.73% | 87.29% | 86.91% |
| ET-SVM | 90.73% | 90.22% | 90.73% | 89.99% |
Benchmark datasets.
| Adrenocortical carcinoma | ACC | 79 | 1 | 0.95 | 0.95 | 0.63 | 0.92 |
| Bladder urothelial carcinoma | BLCA | 408 | 0.98 | 0.87 | 0.97 | 0.53 | 0.78 |
| Breast invasive carcinoma | BRCA | 1093 | 0.99 | 0.99 | 0.99 | 0.92 | 0.99 |
| Cervical and endocervical cancers | CESC | 304 | 0.95 | 0.88 | 0.93 | 0.65 | 0.86 |
| Cholangiocarcinoma | CHOL | 36 | 0.75 | 0.58 | 0.56 | 0.40 | 0 |
| Colon adenocarcinoma | COAD | 457 | 0.95 | 0.99 | 0.95 | 0.82 | 0.98 |
| Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | DLBC | 48 | 1 | 1 | 1 | 1 | 1 |
| Esophageal carcinoma | ESCA | 184 | 0.85 | 0.69 | 0.77 | 0.50 | 0.45 |
| Glioblastoma multiforme | GBM | 160 | 0.95 | 0.92 | 0.94 | 0.83 | 0.81 |
| Head and Neck squamous cell carcinoma | HNSC | 520 | 0.99 | 0.95 | 0.98 | 0.96 | 0.94 |
| Kidney Chromophobe | KICH | 66 | 0.89 | 0.75 | 0.87 | 0.80 | 0.64 |
| Kidney renal clear cell carcinoma | KIRC | 533 | 0.94 | 0.93 | 0.95 | 0.89 | 0.95 |
| Kidney renal papillary cell carcinoma | KIRP | 290 | 0.94 | 0.86 | 0.93 | 0.82 | 0.83 |
| Acute Myeloid Leukemia | LAML | 179 | 1 | 1 | 1 | 1 | 1 |
| Brain Lower Grade Glioma | LGG | 516 | 1 | 0.95 | 0.98 | 0.96 | 0.98 |
| Liver hepatocellular carcinoma | LIHC | 371 | 0.97 | 0.96 | 0.97 | 0.91 | 0.96 |
| Lung adenocarcinoma | LUAD | 515 | 0.95 | 0.91 | 0.95 | 0.91 | 0.96 |
| Lung squamous cell carcinoma | LUSC | 501 | 0.93 | 0.85 | 0.91 | 0.84 | 0.82 |
| Mesothelioma | MESO | 87 | 0.99 | 0.95 | 0.94 | 0.89 | 0.62 |
| Ovarian serous cystadenocarcinoma | OV | 304 | 1 | 0.98 | 0.99 | 1 | 1 |
| Pancreatic adenocarcinoma | PAAD | 178 | 1 | 0.97 | 0.97 | 0.95 | 0.64 |
| Pheochromocytoma and Paraganglioma | PCPG | 179 | 1 | 0.99 | 1 | 0.95 | 0.96 |
| Prostate adenocarcinoma | PRAD | 497 | 0.99 | 1 | 1 | 0.96 | 0.99 |
| Rectum adenocarcinoma | READ | 166 | 0 | 0 | 0.35 | 0 | 0 |
| Sarcoma | SARC | 259 | 0.98 | 0.95 | 0.97 | 0.74 | 0.98 |
| Skin Cutaneous Melanoma | SKCM | 469 | 0.98 | 0.97 | 0.98 | 1 | 0.96 |
| Stomach adenocarcinoma | STAD | 415 | 0.96 | 0.90 | 0.96 | 0.93 | 0.98 |
| Testicular Germ Cell Tumors | TGCT | 150 | 1 | 0.99 | 0.99 | 1 | 0.83 |
| Thyroid carcinoma | THCA | 501 | 1 | 1 | 1 | 1 | 0.99 |
| Thymoma | THYM | 120 | 1 | 0.98 | 0.99 | 1 | 0.91 |
| Uterine Corpus Endometrial Carcinoma | UCEC | 545 | 0.95 | 0.92 | 0.96 | 0.95 | 0.78 |
| Uterine Carcinosarcoma | UCS | 57 | 0.83 | 0.72 | 0.81 | 0.83 | 0 |
| Uveal Melanoma | UVM | 80 | 1 | 1 | 0.99 | 1 | 1 |
The performance evaluation results of different preprocess strategies.
| Var_DenseNet | 94.46% | 94.62% | 94.46% | 94.37% |
| Chi2_DenseNet | 95.42% | 95.54% | 95.42% | 95.40% |
| FTest_DenseNet | 95.03% | 95.20% | 95.03% | 95.01% |
| MI_DenseNetCAM | 96.81% | 96.89% | 96.81% | 96.85% |
The performance evaluation results of four different classifiers.
| MI_KNN | 92.61% | 92.46% | 92.61% | 92.40% |
| MI_CNN | 94.30% | 94.37% | 94.30% | 94.28% |
| MI_SVM | 91.53% | 91.67% | 91.53% | 90.97% |
| MI_DenseNetCAM | 96.81% | 96.89% | 96.81% | 96.85% |
The performance evaluation results of different image dimensions.
| 30 * 30 | 93.60% | 93.54% | 93.60% | 93.46% |
| 50 * 50 | 95.03% | 94.82% | 95.03% | 94.85% |
| 60 * 60 | 96.81% | 96.89% | 96.81% | 96.85% |
| 70 * 70 | 95.22% | 95.41% | 95.22% | 95.23% |
| 90 * 90 | 94.17% | 94.18% | 94.17% | 94.07% |
| 110 * 110 | 92.93% | 93.18% | 92.93% | 92.81% |
| 130 * 130 | 93.41% | 93.88% | 93.41% | 93.34% |
Figure 3The Classification accuracy of different gene numbers.
Selected genes.
| 40 | GSTA1, C4A, COL3A1, PABPC1, COL1A1, KRT13, S100A6, SERPINA1, FGA, MUC2, COL1A2, APOE, KRT5, MALAT1, GFAP, TUBA1A, KRT14, KLK1, ATP1A1, RGS5, SPP1, CLU, S100A9, TF, APOC1, MUC1, ADAM6, SFTPA2, BCAM, TTR, CHGA, SCG2, FASN, PDLIM5, LGALS4, CA2, MYH11, SILV, PGC, TG |
The KEGG pathway analysis.
| hsa04610 | Complement and coagulation cascades | 9.50E-09 | C3,CLU,C4A,FGA,SERPINA1 |
| hsa05133 | Pertussis | 6.40E-07 | C3,CALML3,SFTPA2,C4A |
| hsa04974 | Protein digestion and absorption | 1.22E-06 | COL3A1,COL1A2,ATP1A1,COL1A1 |
| hsa05146 | Amoebiasis | 1.50E-06 | COL3A1,MUC2,COL1A2,COL1A1 |
| hsa04611 | Platelet activation | 4.19E-06 | COL3A1,COL1A2,FGA,COL1A1 |
| hsa04918 | Thyroid hormone synthesis | 3.80E-05 | TG,ATP1A1,TTR |
| hsa04971 | Gastric acid secretion | 3.95E-05 | CALML3,CA2,ATP1A1 |
| hsa04933 | AGE-RAGE signaling pathway in diabetic complications | 9.05E-05 | COL3A1,COL1A2,COL1A1 |
| hsa04926 | Relaxin signaling pathway | 1.93E-04 | COL3A1,COL1A2,COL1A1 |
| hsa04964 | Proximal tubule bicarbonate reclamation | 2.02E-04 | CA2,ATP1A1 |
| hsa04915 | Estrogen signaling pathway | 2.29E-04 | CALML3,KRT14,KRT13 |
| hsa04145 | Phagosome | 3.02E-04 | C3,TUBA1A,SFTPA2 |
| hsa04979 | Cholesterol metabolism | 8.78E-04 | APOE,APOC1 |
| hsa04961 | Endocrine and other factor-regulated calcium reabsorption | 8.78E-04 | KLK1,ATP1A1 |
| hsa04978 | Mineral absorption | 9.82E-04 | TF,ATP1A1 |
| hsa05150 | Staphylococcus aureus infection | 1.58E-03 | C3,C4A |
| hsa04976 | Bile secretion | 1.77E-03 | CA2,ATP1A1 |
| hsa04512 | ECM-receptor interaction | 2.49E-03 | COL1A2,COL1A1 |
| hsa04970 | Salivary secretion | 2.71E-03 | CALML3,ATP1A1 |
| hsa04972 | Pancreatic secretion | 3.20E-03 | CA2,ATP1A1 |
| hsa04925 | Aldosterone synthesis and secretion | 3.20E-03 | CALML3,ATP1A1 |
| hsa04916 | Melanogenesis | 3.39E-03 | CALML3,TYRP1 |
| hsa04270 | Vascular smooth muscle contraction | 5.65E-03 | CALML3,MYH11 |
| hsa05322 | Systemic lupus erythematosus | 5.73E-03 | C3,C4A |
| hsa04910 | Insulin signaling pathway | 6.07E-03 | CALML3,FASN |
| hsa05418 | Fluid shear stress and atherosclerosis | 6.24E-03 | CALML3,GSTA1 |
| hsa01100 | Metabolic pathways | 6.77E-03 | TYRP1,BCAM,FASN,GSTA1,CA2 |
| hsa04261 | Adrenergic signaling in cardiomyocytes | 7.12E-03 | CALML3,ATP1A1 |
| hsa04022 | cGMP-PKG signaling pathway | 8.84E-03 | CALML3,ATP1A1 |
| hsa04530 | Tight junction | 9.14E-03 | MYH11,TUBA1A |
| hsa05010 | Alzheimer disease | 9.24E-03 | CALML3,APOE |
Figure 4The heat map of the top 40 genes across all tumor samples.