| Literature DB >> 31516576 |
Dajie Zhou1,2, Yilin Sun3, Yanfei Jia1, Duanrui Liu1, Jing Wang1, Xiaowei Chen1, Yujie Zhang2, Xiaoli Ma1.
Abstract
Smoking is one of the most important factors associated with the development of lung cancer. However, the signaling pathways and driver genes in smoking-associated lung adenocarcinoma remain unknown. The present study analyzed 433 samples of smoking-associated lung adenocarcinoma and 75 samples of non-smoking lung adenocarcinoma from the Cancer Genome Atlas database. Gene Ontology (GO) analysis was performed using the Database for Annotation, Visualization and Integrated Discovery and the ggplot2 R/Bioconductor package. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was performed using the R packages RSQLite and org.Hs.eg.db. Multivariate Cox regression analysis was performed to screen factors associated with patient survival. Kaplan-Meier and receiver operating characteristic curves were used to analyze the potential clinical significance of the identified biomarkers as molecular prognostic markers for the five-year overall survival time. A total of 373 differentially expressed genes (DEGs; |log2-fold change|≥2.0 and P<0.01) were identified, of which 71 were downregulated and 302 were upregulated. These DEGs were associated with 28 significant GO functions and 11 significant KEGG pathways (false discovery rate <0.05). Two hundred thirty-eight proteins were associated with the 373 differentially expressed genes, and a protein-protein interaction network was constructed. Multivariate regression analysis revealed that 7 mRNAs, cytochrome P450 family 17 subfamily A member 1, PKHD1 like 1, retinoid isomerohydrolase RPE65, neurotensin receptor 1, fetuin B, insulin-like growth factor binding protein 1 and glucose-6-phosphatase catalytic subunit, significantly distinguished between non-smoking and smoking-associated adenocarcinomas. Kaplan-Meier analysis demonstrated that patients in the 7 mRNAs-high-risk group had a significantly worse prognosis than those of the low-risk group. The data obtained in the current study suggested that these genes may serve as potential novel prognostic biomarkers of smoking-associated lung adenocarcinoma.Entities:
Keywords: bioinformatics analysis; differentially expressed genes; lung adenocarcinoma; prognostic value; smoking
Year: 2019 PMID: 31516576 PMCID: PMC6732981 DOI: 10.3892/ol.2019.10733
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Figure 1.Analysis of differentially expressed mRNAs in smoking-associated adenocarcinomas compared with non-smoking lung adenocarcinomas. (A) Heatmap displaying the expression levels of the differentially expressed genes. (B) Volcano plot of the log2FC and -log10 (FDR). Significant RNA expression differences in smoking and non-smoking lung adenocarcinoma are presented (upregulated genes in red and downregulated genes in green). FC, fold change; FDR, false discovery rate.
Differentially expressed genes in smoking-associated lung adenocarcinoma compared with non-smoking adenocarcinoma.
| A, Upregulated genes |
|---|
| CALB1, HIST1H4C, HIST1H1E, HIST1H1B, POU5F2, HIST1H4B, HIST2H2AB, HIST1H4E, HIST1H2BB, WFDC5, HIST1H4D, HIST1H1D, HIST1H2BI, PNMA5, HIST1H3B, HIST1H2AB, WFDC12, HIST1H2AJ, TEX19, KIR2DL1, HIST1H2BL, MSTN, HIST1H2AH, HIST1H2BE, GPR22, HIST1H3C, TAS2R30, NNAT, NTS, APOA1, GPR52, DHRS2, HIST1H2BM, HIST2H2AC, HIST1H3F, PRH2, HIST1H4A, HIST1H2BH, HIST1H3J, LRRC38, APOA2, AFP, HIST1H1A, HIST1H3A, HIST1H2AL, HIST1H3I, PRB4, HIST1H2BO, HIST2H3D, NECAB2, PRB3, CHGA, HRG, INSM1, TAC3, IFNK, MYT1, MAEL, SCG2, HIST1H4F, PRSS48, ACTN3, HIST1H4L, C10orf113, NSG2, HIST1H2BF, VTN, IRX4, SPIC, LRRTM2, TAS2R13, GAL, DPPA2, PSG11, FABP7, TKTL1, SEZ6, ZPBP2, NKX2-3, PSG1, KCNH6, ADGRB1, GABRA2, TAS2R46, TUBA3E, ADAM20, PSG8, STXBP5L,4-Mar, OR6T1, ANGPTL3, ZP2, PSG5, F2, TAGLN3, PSG3, HBE1, FXYD4, SERPINB13, TDRD12, PNMA6E, SPATA21, CDK5R2, BOLL, RPE65, SPINK4, HIST1H2AD, PTPRN, HMX2, SPRR2E, PBOV1, SLC14A2, SPRR2G, MAB21L2, CT45A1, AKR1C4, RNF113B, BHMT, PSG2, AMBP, PRSS56, HRH3, PI3, KRT14, TSPYL6, SLC1A6, CHRNB2, RBM46, TDRD15, MPC1L, XKR7, ACTL6B, NOS1, CLCA4, PSG7, FGF4, LIPF, KIR3DL2, EPHA5, KRT13, KCNJ13, C12orf40, OR4A16, FEV, GC, SBSN, DPPA5, CXorf67, LRTM2, CGA, APOC3, TSPY2, PSG6, KNG1, NEUROD4, FRG2C, NKX2-2, TAS2R50, CNGA3, KRT5, TAS2R3, CDH9, GCG, APOB, HHLA1, HEPACAM2, KLK13, VSX2, KRT31, NEUROG3, NTSR1, ADH7, CA6, SLC7A14, MSMB, KRT33A, C6orf10, FOXI1, VGLL2, SNX31, PTF1A, DKK4, LGALS14, UGT2A1, CLEC2A, TSPY3, DEFA5, KRT83, BANF2, FETUB, PRB1, TMIGD1, LCE3D, KRT77, TEX13B, CBLN1, OR51B5, CRISP1, SERPINA11, FAM83C, MYBPC1, NRSN1, RAX, SPRR2A, KPRP, H3.Y, SCG3, NPY, NLRP11, PPP1R3A, CALY, PAH, FGF3, DSPP, PSG4, MUC2, CACNG7, AMBN, SOHLH1, INS, SLC6A2, TUNAR, FAM205C, GPR50, BPIFB4, IGFBP1, G6PC, SPINT4, TAS2R43, KRT9, TMPRSS11A, ALB, CRYBA2, GMNC, HSD3B1, SLC6A19, ADAMTS19, MORC1, SLC6A5, RBP3, ADGRG7, SULT1C3, PNMA6F, PAQR9, PRLHR, UCN3, NEUROD1, HDGFL1, SPRR2D, SRARP, TLE7, FGF21, CERS3, CT45A10, LUZP4, CLCA1, TAC1, FRG2, S100A7, ZNF560, ZMAT4, SAGE1, SLC17A6, HIST1H2BA, CACNG2, UGT3A1, AMELY, NTSR2, LCN9, LIN28A, C10orf99, TFAP2B, OR13H1, GNAT3, UGT1A7, HAO1, TAAR1, LGALS13, DSG3, MAGEA11, CPLX2, OTX2, RBFOX1, CRH, STRA8, TSPY1, GLRA4, NR0B1, PCSK2, ST8SIA3, ASCL1, NLRP13, BLID, KRT76, CRYGD, AMELX, PRODH2, DMRTB1, CT47B1, SPRR2B, CALCA, AC187653.1, OR56A3 |
| ITLN1, PRG4, MYRFL, CYP17A1, STAR, HSD3B2, MYL2, TNMD, PKHD1L1, ASIC2, FAM9C, BMX, C21orf62, EBF3, GPR26, FAM9A, PDZRN4, RSPO1, CYP11B1, SLC3A1, CRB2, CYP4F8, AXDND1, SPAG11B, CYP21A2, CYP11B2, SERTM1, MYH7, RHAG, MC2R, SSX3, ANKRD1, FABP1, FBN2, EMX2, CALN1, HPR, STAC2, SORCS3, PCDH8, TUSC5, BARHL2, PRSS38, CEACAM18, OLFM4, DCX, SULT2A1, SCGB2A2, SPAG11A, AGXT2, CASR, C1orf94, BTNL3, HOXA13, VCX3B, BNC1, CRABP1, SNTG1, REG3A, DPCR1, REG3G, REG4, SPANXD, SPANXC, MUC17, ADIPOQ, UGT1A8, SLC2A2, CALML5, TRIM48, FTHL17 |
Figure 2.Significantly enriched GO terms and KEGG pathway analysis in smoking-associated lung adenocarcinoma. (A) GO analysis classified the DEGs by biological process, cellular component and molecular function. (B) Significantly enriched GO terms for the DEGs in smoking lung adenocarcinoma (functions). (C) Top 10 significant GO terms and associated hub genes. The color key represents the corresponding GO (D) KEGG pathway analysis of significantly enriched genes and hub gene counts. For each term, the number of enriched genes is indicated by the bar size; while the level of significance is represented by the color. Blue indicates low significance while red represents high significance (FDR<0.05). GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; DEGs, differentially expressed genes; FC, fold change; FDR, false discovery rate.
Significant GO enrichment analysis of differentially expressed genes in smoking-associated lung adenocarcinoma.
| TERM ID | Term | Count | False discovery rate |
|---|---|---|---|
| GO:0000786 | Nucleosome | 34 | 2.66×10−30 |
| GO:0006334 | Nucleosome assembly | 31 | 2.32×10−22 |
| GO:0005576 | Extracellular region | 87 | 1.87×10−16 |
| GO:0032200 | Telomere organization | 13 | 3.83×10−11 |
| GO:0000788 | Nuclear nucleosome | 15 | 4.70×10−11 |
| GO:0000183 | Chromatin silencing at rDNA | 14 | 1.23×10−10 |
| GO:0046982 | Protein heterodimerization activity | 37 | 3.58×10−10 |
| GO:0006335 | DNA replication-dependent nucleosome assembly | 13 | 4.57×10−10 |
| GO:0045814 | Negative regulation of gene expression, epigenetic | 14 | 9.80×10−9 |
| GO:0044267 | Cellular protein metabolic process | 19 | 1.41×10−8 |
| GO:0051290 | Protein heterotetramerization | 13 | 1.89×10−8 |
| GO:0045815 | Positive regulation of gene expression, epigenetic | 14 | 1.87×10−7 |
| GO:0000228 | Nuclear chromosome | 13 | 2.76×10−7 |
| GO:0008544 | Epidermis development | 15 | 1.07×10−6 |
| GO:0031047 | Gene silencing by RNA | 16 | 4.51×10−6 |
| GO:0000784 | Nuclear chromosome, telomeric region | 15 | 2.08×10−4 |
| GO:0006704 | Glucocorticoid biosynthetic process | 6 | 4.59×10−4 |
| GO:0042393 | Histone binding | 14 | 5.15×10−4 |
| GO:0045653 | Negative regulation of megakaryocyte differentiation | 7 | 0.001 |
| GO:0007565 | Female pregnancy | 12 | 0.001 |
| GO:0060968 | Regulation of gene silencing | 6 | 0.001 |
| GO:0005615 | Extracellular space | 52 | 0.001 |
| GO:0034774 | Secretory granule lumen | 6 | 0.002 |
| GO:0010951 | Negative regulation of endopeptidase activity | 13 | 0.005 |
| GO:0016233 | Telomere capping | 7 | 0.005 |
| GO:0006336 | DNA replication-independent nucleosome assembly | 7 | 0.012 |
| GO:0007218 | Neuropeptide signaling pathway | 11 | 0.035 |
| GO:0006705 | Mineralocorticoid biosynthetic process | 4 | 0.043 |
GO, Gene Ontology.
Significant Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of differentially expressed genes in smoking-associated lung adenocarcinoma.
| Pathway ID | Pathway | Count | P-value (adjust) | Genes |
|---|---|---|---|---|
| hsa05322 | Systemic lupus erythematosus | 30 | 6.82×10−22 | HIST1H4C, HIST1H4B, HIST2H2AB, HIST1H4E, HIST1H2BB, HIST1H4D, HIST1H2BI, HIST1H3B, HIST1H2AB, HIST1H2AJ, HIST1H2BL, HIST1H2AH, HIST1H2BE, HIST1H3C, HIST1H2BM, HIST2H2AC, HIST1H3F, HIST1H4A, HIST1H2BH, HIST1H3J, HIST1H3A, HIST1H2AL, HIST1H3I, HIST1H2BO, HIST2H3D, HIST1H4F, HIST1H4L, HIST1H2BF, HIST1H2AD, HIST1H2BA |
| hsa05034 | Alcoholism | 33 | 1.65×10−21 | HIST1H4C, HIST1H4B, HIST2H2AB, HIST1H4E, HIST1H2BB, HIST1H4D, HIST1H2BI, HIST1H3B, HIST1H2AB, HIST1H2AJ, HIST1H2BL, HIST1H2AH, HIST1H2BE, HIST1H3C, HIST1H2BM, HIST2H2AC, HIST1H3F, HIST1H4A, HIST1H2BH, HIST1H3J, HIST1H3A, HIST1H2AL, HIST1H3I, HIST1H2BO, HIST2H3D, HIST1H4F, HIST1H4L, HIST1H2BF, HIST1H2AD, NPY, CALML5, HIST1H2BA, CRH |
| hsa00140 | Steroid hormone biosynthesis | 10 | 1.21×10−5 | CYP17A1, HSD3B2, CYP11B1, CYP21A2, CYP11B2, AKR1C4, UGT2A1, UGT1A8, HSD3B1, UGT1A7 |
| hsa05203 | Viral carcinogenesis | 16 | 8.13×10−5 | HIST1H4C, HIST1H4B, HIST1H4E, HIST1H2BB, HIST1H4D, HIST1H2BI, HIST1H2BL, HIST1H2BE, HIST1H2BM, HIST1H4A, HIST1H2BH, HIST1H2BO, HIST1H4F, HIST1H4L, HIST1H2BF, HIST1H2BA |
| hsa04927 | Cortisol synthesis and secretion | 8 | <0.001 | CYP17A1, STAR, HSD3B2, CYP11B1, CYP21A2, MC2R, HSD3B1, NR0B1 |
| hsa04742 | Taste transduction | 9 | <0.001 | ASIC2, TAS2R30, TAS2R13, GABRA2, TAS2R46, TAS2R50, TAS2R3, TAS2R43, GNAT3 |
| hsa04950 | Maturity onset diabetes of the young | 5 | 0.003 | NKX2-2, NEUROG3, SLC2A2, INS, NEUROD1 |
| hsa04913 | Ovarian steroidogenesis | 6 | 0.007 | CYP17A1, STAR, HSD3B2, CGA, INS, HSD3B1 |
| hsa04979 | Cholesterol metabolism | 6 | 0.007 | STAR, APOA1, APOA2, ANGPTL3, APOC3, APOB |
| hsa04925 | Aldosterone synthesis and secretion | 7 | 0.046 | STAR, HSD3B2, CYP21A2, CYP11B2, MC2R, CALML5, HSD3B1 |
| hsa03320 | Peroxisome proliferator-activated receptor signaling pathway | 6 | 0.049 | FABP1, APOA1, APOA2, FABP7, APOC3, ADIPOQ |
Hsa, homo sapiens.
Figure 3.DEG protein-protein interaction network and hub gene analysis. A total of 238 DEGs were filtered into a PPI network containing 360 nodes and 1116 edges. Upregulated proteins are shown in red, and downregulated proteins are shown in blue. DEG, differentially expressed gene; PPI, protein-protein interaction.
Top five most significant gene clusters analyzed by MCODE in the protein-protein interaction network.
| Cluster | Nodes number | Edges number | Genes |
|---|---|---|---|
| 1 | 30 | 420 | HIST1H4C, HIST1H3F, HIST1H4D, HIST1H4L, HIST1H4E, HIST1H3A, HIST1H4F, HIST1H3I, HIST1H2AH, HIST1H4B, HIST2H2AC, HIST2H2AB, HIST1H2BH, HIST1H2AB, HIST1H2AJ, HIST2H3D, HIST1H2BM, HIST1H4A, HIST1H2BL, HIST1H2BA, HIST1H2BF, HIST1H2BB, HIST1H2BO, HIST1H2AD, HIST1H3J, HIST1H3B, HIST1H3C, HIST1H2BE, HIST1H2AL, HIST1H2BI |
| 2 | 16 | 89 | NPY, GAL, TAS2R13, ALB, KNG1, GCG, TAS2R46, GNAT3, HRH3, TAS2R43, TAS2R3, TAC1, CASR, TAS2R30, NTS, TAS2R50 |
| 3 | 23 | 81 | HSD3B2, RHAG, HBE1, APOA2, CYP11B2, CALCA, CYP17A1, SULT2A1, AMBP, CRH, MC2R, STAR, CYP21A2, IGFBP1, NR0B1, APOB, APOA1, TAC3, AFP, CYP11B1, NTSR2, NTSR1, APOC3 |
| 4 | 5 | 10 | LGALS13, PSG2, PSG1, PSG3, PSG6 |
| 5 | 4 | 6 | SPINT4, SPAG11B, SPAG11A, CRISP1 |
Figure 4.Cox proportional hazards regression model. (A) Kaplan-Meier curves for the analysis of overall survival differences in low and high-risk patients (P<0.001). (B) ROC curves of the sensitivity and specificity of 7 mRNAs in overall survival prediction in patients. (C) A risk heat map constructed from 7 mRNAs that had a significant impact on survival from 414 patients. The risk value gradually increases from left to right. ROC, receiver operating characteristic; AUC, area under the curve; CI, confidence interval; CYP17A1, cytochrome P450 family 17 subfamily A member 1; FETUB, fetuin B; G6PC, glucose-6-phosphatase catalytic subunit; IGFBP1, insulin-like growth factor binding protein 1; NTSR1, neurotensin receptor 1; PKHD1L1, PKHD1 like 1; RPE65, retinoid isomerohydrolase RPE65.