| Literature DB >> 27602958 |
Joshua R Freeman1,2, Su Chu1, Thomas Hsu3, Yen-Tsung Huang1,4,5.
Abstract
Tobacco smoke is a well-established lung cancer carcinogen. We hypothesize that epigenetic processes underlie carcinogenesis. The objective of this study is to examine the effects of smoke exposure on DNA methylation to search for novel susceptibility loci. We obtained epigenome-wide DNA methylation data from lung adenocarcinoma (LUAD) and lung squamous cell (LUSC) tissues in The Cancer Genome Atlas (TCGA). We performed a two-stage discovery (n = 326) and validation (n = 185) analysis to investigate the association of epigenetic DNA methylation level with cigarette smoking pack-years. We also externally validated our findings in an independent dataset. Linear model with least square estimator and spline regression were performed to examine the association between DNA methylation and smoking. We identified five CpG sites highly associated with pack-years of cigarette smoking. Smoking was negatively associated with methylation levels in cg25771041 (WWTR1, p = 3.6 × 10-9), cg16200496 (NFIX, p = 3.4 × 10-12), cg22515201 (PLA2G6, p = 1.0 × 10-9) and cg24823993 (NHP2L1, p = 5.1 × 10-8) and positively associated with the methylation level in cg11875268 (SMUG1, p = 4.3 × 10-8). The CpG-smoking association was stronger in LUSC than LUAD. Of the five loci, smoking explained the most variation in cg16200496 (R2 = 0.098 [both types] and 0.144 [LUSC]). We identified 5 novel CpG candidates that demonstrate differential methylation patterns associated with smoke exposure in lung neoplasms.Entities:
Keywords: DNA methylation; epigenetics; non-small cell lung cancer; smoking
Mesh:
Year: 2016 PMID: 27602958 PMCID: PMC5342499 DOI: 10.18632/oncotarget.11831
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Demographic characters of sample by two-stage analysis and smoking status
| Discovery Set | Validation Set | Discovery vs. Validation | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Covariates | Light | Heavy | Light | Heavy | Discovery ( | Validation ( | |||
| Age | 66.36 (58, 74) | 70 (61, 73) | 0.04 | 66 (59, 72) | 67 (62, 72.25) | 0.048 | 67 (60, 73) | 66.36 (61, 72) | 0.81 |
| % of Male | 91 (53%) | 103 (67%) | 0.007 | 54 (55.6%) | 61 (69.3%) | 0.055 | 194 (59.5%) | 115 (62.2%) | 0.56 |
| Race | 0.81 | ||||||||
| Black | 10 (5.7%) | 4 (2.6%) | 0.37 | 6 (6.2%) | 3 (3.4%) | 0.68 | 14 (4.3%) | 9 (4.9%) | |
| White | 136 (78.6%) | 124 (81%) | 74 (76%) | 69 (78.4%) | 260 (79.8%) | 143 (77.3%) | |||
| Other | 27 (15.6%) | 25 (16.3%) | 17 (17.5%) | 16 (18.2%) | 52 (15.95%) | 33 (17.8%) | |||
| KRAS mutation | 5 (2.9%) | 2 (1.3%) | 0.55 | 4 (4.12%) | 1 (1.13%) | 0.43 | 7 (2.1%) | 5 (2.7%) | 0.92 |
| EGFR mutation | 5 (2.9%) | 3 (2%) | 0.86 | 1 (1.03%) | 2 (2.27%) | 0.94 | 8 (2.5%) | 3 (1.6%) | 0.76 |
| PackYears | 3.367 (3.045, 3.58) | 4.11 (3.93, 4.39) | 2.20E-16 | 3.26 (3.05, 3.58) | 4.08 (3.93, 4.39) | 2.20E-16 | 3.71 (3.31, 4.11) | 3.714 (3.26, 4.04) | 0.71 |
| % of ACA | 105 (60.7%) | 59 (38.6%) | 1.00E-04 | 63 (65%) | 41 (46.6%) | 0.018 | 164 (50.3%) | 104 (56%) | 0.23 |
| Smoking History | 40 (23.1%) | 61 (39.9%) | N/A | 23 (23.7%) | 36 (41%) | 0.02 | 101 (31%) | 59 (32.2%) | 0.16 |
Median (1st, 3rd quartiles).
Light and Heavy were determined by a median cutoff for the smoking packyears.
P-values were calculated using Chi-squre tests. All other p-values were calculated using Student's t-test.
Figure 1Manhattan plot of p-values for internally validated (No. of loci = 98) and externally validated (No. of loci = 5) CpG sites by chromosome
Bonferroni genome-wide significance (6.73) is represented as a horizontal solid line. Red dots are internally validated sites; green dots are internally and externally validated sites.
Results of the 5 externally validated CpG sites
| CpG Descriptors | Discovery | Validation | Pooled | Without documented KRAS or EGFR Mutations | With Adjustment for Cancer Cell Stage | External Validation | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CpG ID | Ch | Symbol | Location | Beta | FDR | Beta | Beta | Beta | Beta | Beta | Beta | |||||||
| cg25771041 | 3 | WWTR1 | 149376042 | –0.353 | 1.48E–05 | 0.023 | –0.463 | 8.28E–05 | –0.387 | 3.63E–09 | –0.393 | 6.71E– 09 | –0.383 | 5.42E–09 | –1.13 | 0.046 | –0.292 | 0.357 |
| cgl1875268 | 12 | SMUG1 | 54576025 | 0.849 | 2.92E–05 | 0.035 | 0.978 | 3.31E– 04 | 0.875 | 4.28E–08 | 0.866 | 1.12E–07 | 0.876 | 4.54E–08 | 2.23 | 0.0017 | 0.67 | 0.097 |
| cgl6200496 | 19 | NFIX | 13107141 | –0.634 | 6.71 E–07 | 0.003 | –0.878 | 2.27E–06 | –0.721 | 3.40E–12 | –0.742 | 4.40E–12 | –0.721 | 3.94E–12 | –2.23 | 0.0344 | –0.324 | 0.585 |
| cg22515201 | 22 | PLA2G6 | 38577827 | –0.929 | 1.64E–06 | 0.006 | –0.649 | 9.82E–04 | –0.859 | 1.04E–09 | –0.851 | 2.09E–09 | –0.851 | 6.64E–10 | –2.33 | 0.0947 | –1.01 | 0.193 |
| cg24823993 | 22 | NHP2L1 | 42085003 | –0.406 | 3.42E–05 | 0.038 | –0.244 | 6.52E–06 | –0.351 | 5.13E–08 | –0.354 | 1.02E–07 | –0.356 | 3.58E–08 | –3.06 | 0.0513 | –1.29 | 0.143 |
cg25771041 is in CpG island; cgl 1875268 is not in CpG island; cgl 6200496, cg22515201 and cg24823993 are all in CpG island and promoter associated.
Beta here is the difference in methylation M-value per one-unit increase in log-transformed smoking pack-years.
Beta here is the difference in methylation M-value comparing ever smokers with never smokers.
Beta here is the difference in methylation M-value between current smokers and former smokers as well as between former smokers and never smokers.
Figure 2Methylation signal profiles for genes with externally validated methylation sites
(A. . The analyses are plotted by genomic datasets. TCGA represents the TCGA dataset, and External-Binary (Ext-Bin) and External-Ordinal (Ext-Ord) represent the external validation analyses conducted in the GSE56044 dataset using binary and ordinal categorizations of smoking status. Signal strength is plotted via transformed p-values (-log10(p)) by genomic location (Mb) for each gene.
Figure 3Dose-response relationships for externally validated CpG sites by M-values (logit transformed beta values) and smoking in pack years
The plots are based on generalized additive models with penalized spline using thin plate smoothing basis. Degree of Freedom (DoF) is provided for each plot. The solid black line represents the linear spline model of the change in M-value by pack years (smoking). The red, dotted line represents the upper and lower 95% confidence bounds. (A) cg11875268 in SMUG1; (B) cg16200496 in NFIX; (C) cg22515201 in PLA2G6; (D) cg24823993 in NHP2L1; (E) cg25771041 in WWTR1.