| Literature DB >> 32117905 |
Hao Zhang1, Zhou Jin1,2, Ling Cheng3, Bin Zhang1.
Abstract
Lung cancer is a highly prevalent type of cancer with a poor 5-year survival rate of about 4-17%. Eighty percent lung cancer belongs to non-small-cell lung cancer (NSCLC). For a long time, the treatment of NSCLC has been mostly guided by tumor stage, and there has been no significant difference between the therapy strategy of lung adenocarcinoma (LUAD) and squamous cell lung carcinoma (SCLC), the two major subtypes of NSCLC. In recent years, important molecular differences between LUAD and SCLC are increasingly identified, indicating that targeted therapy will be more and more histologically specific in the future. To investigate the LUAD and SCLC difference on multi-omics scale, we analyzed the methylation and gene expression data together. With the Boruta method to remove irrelevant features and the MCFS (Monte Carlo Feature Selection) method to identify the significantly important features, we identified 113 key methylation features and 23 key gene expression features. HNF1B and TP63 were found to be dysfunctional on both methylation and gene expression levels. The experimentally determined interaction network suggested that TP63 may play an important role in connecting methylation genes and expression genes. Many of the discovered signature genes have been supported by literature. Our results may provide directions of precision diagnosis and therapy of LUAD and SCLC.Entities:
Keywords: Boruta; Monte Carlo Feature Selection; gene expression; lung adenocarcinoma; methylation; squamous cell lung carcinoma
Year: 2020 PMID: 32117905 PMCID: PMC7019569 DOI: 10.3389/fbioe.2020.00003
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
The 136 methylation and gene expression signature identified with the MCFS method.
| 1 | DSC3 | 35 | cg08796240 | 69 | cg14487292 | 103 | cg08621277 |
| 2 | KRT5 | 36 | cg08198430 | 70 | cg03545620 | 104 | cg13387113 |
| 3 | cg02194717 | 37 | cg10969178 | 71 | DSG3 | 105 | S1PR5 |
| 4 | cg17814481 | 38 | cg07838427 | 72 | cg10991454 | 106 | cg14769121 |
| 5 | cg00415665 | 39 | cg15958289 | 73 | ANXA8L1 | 107 | cg25634000 |
| 6 | cg04432660 | 40 | cg19445207 | 74 | cg18736431 | 108 | cg07417666 |
| 7 | cg12932675 | 41 | DLX5 | 75 | cg14108894 | 109 | cg18383680 |
| 8 | cg13715502 | 42 | cg26117023 | 76 | cg17775621 | 110 | cg11640015 |
| 9 | cg08436756 | 43 | cg16148454 | 77 | cg15221831 | 111 | cg02328660 |
| 10 | cg02771299 | 44 | cg13089599 | 78 | cg26150462 | 112 | cg08379517 |
| 11 | cg06555468 | 45 | cg00180559 | 79 | cg11288202 | 113 | cg04778236 |
| 12 | cg13626676 | 46 | cg21845794 | 80 | cg27623451 | 114 | cg11416243 |
| 13 | KRT6C | 47 | cg26819757 | 81 | cg02459569 | 115 | cg18368125 |
| 14 | cg01397507 | 48 | cg03782130 | 82 | cg24228306 | 116 | cg09853371 |
| 15 | SPRR2A | 49 | cg17005319 | 83 | RORC | 117 | cg16260888 |
| 16 | cg23613253 | 50 | cg26795540 | 84 | cg07538160 | 118 | cg10842126 |
| 17 | cg24235613 | 51 | cg17957094 | 85 | cg12448539 | 119 | cg17094593 |
| 18 | cg16969274 | 52 | cg17543218 | 86 | cg08774902 | 120 | cg15335334 |
| 19 | FAT2 | 53 | cg13522118 | 87 | cg04488647 | 121 | KRT17 |
| 20 | cg02579706 | 54 | cg26431815 | 88 | cg08190615 | 122 | RFC4 |
| 21 | TMEM63A | 55 | cg06332339 | 89 | cg09470758 | 123 | cg27009392 |
| 22 | cg07568117 | 56 | cg19883066 | 90 | cg21922731 | 124 | TP63 |
| 23 | KRT6A | 57 | cg21013395 | 91 | cg20197694 | 125 | cg08327518 |
| 24 | cg25922471 | 58 | cg19526267 | 92 | ACSL5 | 126 | cg05800082 |
| 25 | cg23628350 | 59 | cg02634861 | 93 | KRT6B | 127 | cg05128003 |
| 26 | cg19032799 | 60 | cg20803931 | 94 | RAE1 | 128 | cg04926361 |
| 27 | cg04703476 | 61 | cg05351785 | 95 | cg24083274 | 129 | cg01943337 |
| 28 | cg01176141 | 62 | cg21936454 | 96 | cg23037777 | 130 | cg06520450 |
| 29 | cg12788467 | 63 | cg03361585 | 97 | cg07112556 | 131 | cg15441535 |
| 30 | cg24211826 | 64 | cg20637223 | 98 | cg26807301 | 132 | cg25521254 |
| 31 | MUC1 | 65 | ANXA8 | 99 | HNF1B | 133 | cg21176488 |
| 32 | FMO5 | 66 | cg15247247 | 100 | cg18771553 | 134 | cg05267427 |
| 33 | cg06200607 | 67 | cg06411879 | 101 | cg18720506 | 135 | cg05575304 |
| 34 | VSNL1 | 68 | cg10720966 | 102 | cg04345366 | 136 | cg20544605 |
FIGURE 1The heatmap of LUAD and SCLC lung cancer patients with 113 methylation features. Almost all samples were correctly clustered using the 113 methylation features and only three SCLC samples were misclassified.
FIGURE 2The heatmap of LUAD and SCLC lung cancer patients with 23 gene expression features. Almost all samples were correctly clustered using the 23 gene expression features and only three SCLC samples were misclassified.
The confusion matrix using 136 mixed methylation and gene expression features.
| Predicted LUAD | 77 | 2 |
| Predicted SCLC | 0 | 20 |
| Performance Measurements | Sensitivity: 1.000, specificity: 0.909, accuracy: 0.980, MCC: 0.941 | |
The confusion matrix using 23 gene expression features.
| Predicted LUAD | 77 | 3 |
| Predicted SCLC | 0 | 19 |
| Performance Measurements | Sensitivity: 1.000, specificity: 0.864, accuracy: 0.970, MCC: 0.912 | |
FIGURE 3The methylation genes and expression genes with experimentally determined interactions on STRING network. The light-yellow nodes were methylation genes, and the light-blue nodes were expression genes. The overlapped methylation and expression genes were marked in red, and the overlapped methylation and CNV genes were marked in pink. TP63 played an important role in connecting methylation genes and expression genes.
The GO enrichment results of the identified signature.
| GO:0070268 cornification | 8.58E-05 | 5.39E-09 | 9 |
| GO:0009913 epidermal cell differentiation | 0.0109 | 1.42E-06 | 11 |
| GO:0031424 keratinization | 0.0109 | 2.05E-06 | 9 |
| GO:0030216 keratinocyte differentiation | 0.0109 | 2.73E-06 | 10 |
| GO:0060429 epithelium development | 0.0115 | 3.59E-06 | 20 |
| GO:0030855 epithelial cell differentiation | 0.0130 | 4.91E-06 | 15 |
| GO:0043588 skin development | 0.0172 | 7.57E-06 | 11 |
| GO:0009888 tissue development | 0.0202 | 1.01E-05 | 25 |
| GO:0008544 epidermis development | 0.0319 | 1.80E-05 | 11 |
| GO:0005737 cytoplasm | 0.0045 | 2.34E-06 | 79 |
| GO:0005829 cytosol | 0.0083 | 8.55E-06 | 46 |
The confusion matrix using 113 methylation features.
| Predicted LUAD | 77 | 2 |
| Predicted SCLC | 0 | 20 |
| Performance Measurements | Sensitivity: 1.000, specificity: 0.909, accuracy: 0.980, MCC: 0.941 | |