| Literature DB >> 27323854 |
Xu Gao1, Yan Zhang1, Lutz Philipp Breitling1,2, Hermann Brenner1,3,4.
Abstract
Lung cancer is a leading cause of cancer-related mortality worldwide, and cigarette smoking is the major environmental hazard for its development. This study intended to examine whether smoking could alter methylation of genes at lung cancer risk loci identified by genome-wide association studies (GWASs). By systematic literature review, we selected 75 genomic candidate regions based on 120 single-nucleotide polymorphisms (SNPs). DNA methylation levels of 2854 corresponding cytosine-phosphate-guanine (CpG) candidates in whole blood samples were measured by the Illumina Infinium Human Methylation450 Beadchip array in two independent subsamples of the ESTHER study. After correction for multiple testing, we successfully confirmed associations with smoking for one previously identified CpG site within the KLF6 gene and identified 12 novel sites located in 7 genes: STK32A, TERT, MSH5, ACTA2, GATA3, VTI1A and CHRNA5 (FDR <0.05). Current smoking was linked to a 0.74% to 2.4% decrease of DNA methylation compared to never smoking in 11 loci, and all but one showed significant associations (FDR <0.05) with life-time cumulative smoking (pack-years). In conclusion, our study demonstrates the impact of tobacco smoking on DNA methylation of lung cancer related genes, which may indicate that lung cancer susceptibility genes might be regulated by methylation changes in response to smoking. Nevertheless, this mechanism warrants further exploration in future epigenetic and biomarker studies.Entities:
Keywords: DNA methylation; lung cancer; tobacco smoking; whole blood sample
Mesh:
Substances:
Year: 2016 PMID: 27323854 PMCID: PMC5312292 DOI: 10.18632/oncotarget.10007
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Characteristics of study population in discovery and validation panels
| Characteristics | Discovery Panel | Validation Panel | |
|---|---|---|---|
| 978 | 531 | ||
| 62.1 (6.5) | 62.0 (6.6) | 0.817 | |
| <0.001 | |||
| Male | 495 (50.6%) | 207 (39.0%) | |
| Female | 483 (49.4%) | 324 (61.0%) | |
| 0.877 | |||
| Current smoker | 181 (18.5%) | 98 (18.4%) | |
| Former smoker | 328 (33.5%) | 182 (34.3%) | |
| Never smoker | 469 (48.0%) | 251 (47.3%) | |
| 0.246 | |||
| Underweight (<18.5) | 8 (0.8%) | 1 (0.2%) | |
| Normal (18.5-<25.0) | 237 (24.3%) | 161 (30.3%) | |
| Overweight (25.0-<30.0) | 472 (48.4%) | 228 (42.9%) | |
| Obese (≥30.0) | 258 (26.5%) | 141 (26.6%) | |
| 0.511 | |||
| Abstainer | 311 (34.1%) | 169 (34.4%) | |
| Low | 531 (58.2%) | 290 (59.1%) | |
| Intermediate | 53 (5.8%) | 27 (5.5%) | |
| High | 17 (1.9%) | 5 (1.0%) | |
| 0.061 | |||
| Inactive | 189 (19.3%) | 109 (20.5%) | |
| Low | 433 (44.3%) | 261 (49.2%) | |
| Medium or high | 356 (36.4%) | 161 (30.3%) | |
| 0.647 | |||
| Not prevalent | 819 (84.4%) | 436 (83.5%) | |
| Prevalent | 151 (15.6%) | 86 (16.5%) | |
| 0.627 | |||
| Not prevalent | 796 (81.5%) | 438 (82.5%) | |
| Prevalent | 181 (18.5%) | 93 (17.5%) | |
| 0.748 | |||
| Not prevalent | 892 (93.4%) | 487 (93.8%) | |
| Prevalent | 63 (6.6%) | 32 (6.2%) | |
| CD8+ T-cells | 0.081 (0.039) | 0.098 (0.041) | <0.001 |
| CD4+ T-cells | 0.166 (0.058) | 0.171 (0.056) | 0.041 |
| NK cells | 0.098 (0.044) | 0.096 (0.042) | 0.281 |
| B-cells | 0.063 (0.024) | 0.070 (0.019) | <0.001 |
| Monocytes | 0.101 (0.022) | 0.100 (0.020) | 0.867 |
| Granulocytes | 0.548 (0.097) | 0.531 (0.094) | 0.002 |
| Current smokers | 36.8 (19.3) | 33.9 (17.5) | 0.250 |
| Former smokers | 23.3 (16.3) | 19.9 (15.1) | 0.031 |
| 17.3 (11.3) | 17.6 (10.6) | 0.755 |
Mean values (SD) for continuous variables and n (%) for categorical variables; Kruskal-Wallis Test was applied to examine continuous variables and Chi-Square test was applied to examine categorical variables
Data missing for 3 participants in discovery panel
Data missing for 66 and 40 participants, respectively, in discovery and validation panels. Categories defined as follows: abstainer, low [women: 0 −<20 g/d, men: 0 −<40 g/d], intermediate [20 −<40 g/d and 40 −<60 g/d, respectively], high [≥40 g/d and ≥60 g/d, respectively]
Categories defined as follows: inactive [< 1h of physical activity/week], medium or high [≥2 h of vigorous and ≥ 2 h of light physical activity/week], low [other]
Data missing for 8 and 9 participants, respectively, in discovery and validation panels
CVD: cardiovascular disease. Data missing for 1 participant in discovery panel
Data missing for 23 and 12 participants, respectively, in discovery and validation panels
Estimated by the Houseman algorithm [27]
A pack-year was defined as having smoked 20 cigarettes per day for 1 year, including all participants from validation panel, pack-year= 0 for never smokers
Former smokers only, data missing for 9 and 3 participants, respectively, in discovery and validation panels; cessation time equals age at recruitment minus age at cessation
Figure 1Manhattan plot of discovery panel
Red line: raw p-value of FDR = 0.05; Green dots: 31 significant sites; Chr: chromosome position
Significant associations between tobacco smoking and methylation of lung cancer related genes in validation panel
| CpG site | Gene | Mean β value (Standard deviation) | Effect size | Estimate (se) | FDR | ||
|---|---|---|---|---|---|---|---|
| Never smoker | Current smoker | ||||||
| cg00640087 | 0.165 (0.036) | 0.159 (0.035) | −0.006 | −7.4 e-3 (3.1 e-3) | 0.019 | 0.049 | |
| cg03281572 | 0.812 (0.028) | 0.793 (0.036) | −0.019 | −0.018 (3.0 e-3) | 3.8 e-7 | 1.2 e-5 | |
| cg07269053 | 0.733 (0.039) | 0.715 (0.052) | −0.018 | −0.013 (5.0 e-3) | 0.007 | 0.023 | |
| cg10163955 | 0.669 (0.043) | 0.640 (0.049) | −0.029 | −0.024 (5.1 e-3) | 5.1 e-4 | 6.6 e-5 | |
| cg11430077 | 0.147 (0.032) | 0.132 (0.029) | −0.015 | −0.013 (4.0 e-3) | 0.001 | 0.006 | |
| cg12324353 | 0.788 (0.032) | 0.779 (0.032) | −0.009 | −0.011 (3.5 e-3) | 0.002 | 0.011 | |
| cg17928584 | 0.156 (0.053) | 0.161 (0.052) | 0.005 | 0.012 (5.0 e-3) | 0.020 | 0.049 | |
| cg19335412 | 0.461 (0.036) | 0.451 (0.033) | −0.010 | −0.011 (4.1 e-3) | 0.009 | 0.026 | |
| cg19696491 | 0.470 (0.058) | 0.488 (0.060) | 0.018 | 0.018 (6.9 e-3) | 0.010 | 0.028 | |
| cg20640261 | 0.443 (0.048) | 0.424 (0.048) | −0.019 | −0.015 (5.0 e-3) | 0.003 | 0.013 | |
| cg22770911 | 0.481 (0.033) | 0.458 (0.042) | −0.023 | −0.015 (4.5 e-3) | 0.001 | 0.005 | |
| cg24287110 | 0.365 (0.056) | 0.349 (0.053) | −0.016 | −0.022 (6.0 e-3) | 6.2 e-4 | 0.005 | |
| cg24908166 | 0.926 (0.021) | 0.916 (0.026) | −0.010 | −0.010 (2.6 e-3) | 0.0001 | 0.001 | |
Adjusted for age (years), sex, random batch effects, leukocyte distribution (Houseman algorithm [27]), alcohol consumption (abstainer/ low/ intermediate/ high), body mass index (BMI, underweight/ normal weight/ overweight/ obese), physical activity (inactive/ low/ medium or high), prevalence of cardiovascular diseases (yes/no), prevalence of diabetes (yes/no) and prevalence of cancer (yes/no)
All 31 loci identified by discovery panel were validated by the three models, and the threshold of FDR is 0.05. A total of 13 CpG sites were validated as significant smoking-related CpG sites by validation
Effect size = Mean βcurrent smoker – Mean βnever smoker
Associations of cumulative smoking exposure (pack-years) and cessation time (year) with methylation of validated CpG sites
| CpG site | Gene | Cumulative smoking exposure | Smoking cessation time | ||||
|---|---|---|---|---|---|---|---|
| Estimate (se) | FDR | Estimate (se) | FDR | ||||
| cg00640087 | −2.3 e-4 (7.2 e-5) | 1.4 e-3 | 1.7 e-3 | 2.7 e-4 (1.9 e-4) | 0.155 | 0.252 | |
| cg03281572 | −3.8 e-4 (8.8 e-5) | 1.6 e-5 | 5.1 e-5 | 4.9 e-4 (2.6 e-4) | 0.060 | 0.131 | |
| cg07269053 | −2.5 e-4 (1.0 e-4) | 0.015 | 0.016 | 2.3 e-4 (3.0 e-4) | 0.455 | 0.493 | |
| cg10163955 | −5.5 e-4 (1.1 e-4) | 5.8 e-7 | 3.7 e-6 | 7.0 e-4 (3.2 e-4) | 0.030 | 0.131 | |
| cg11430077 | −3.0 e-4 (8.8 e-5) | 8.0 e-4 | 1.1 e-3 | 5.3 e-4 (2.4 e-4) | 0.032 | 0.131 | |
| cg12324353 | −3.1 e-4 (7.3 e-5) | 2.3 e-5 | 6.1 e-5 | 3.7 e-4 (1.9 e-4) | 0.052 | 0.131 | |
| cg17928584 | 3.7 e-4 (1.1 e-4) | 7.0 e-4 | 1.1 e-3 | −3.5 e-4 (2.9 e-4) | 0.230 | 0.307 | |
| cg19335412 | −3.2 e-4 (9.2 e-5) | 6.0 e-4 | 1.1 e-3 | 4.4 e-4 (2.5 e-4) | 0.084 | 0.156 | |
| cg19696491 | 2.0 e-4 (1.5 e-4) | 0.178 | 0.178 | 4.7 e-4 (4.2 e-4) | 0.262 | 0.310 | |
| cg20640261 | −5.3 e-4 (1.1 e-4) | 2.5 e-6 | 1.1 e-5 | 6.4 e-4 (3.1 e-4) | 0.040 | 0.131 | |
| cg22770911 | −4.8 e-4 (9.1 e-5) | 2.2 e-7 | 2.9 e-6 | 6.0 e-4 (2.7 e-4) | 0.027 | 0.131 | |
| cg24287110 | −5.7 e-4 (1.4 e-4) | 5.1 e-5 | 1.1 e-4 | 4.5 e-4 (3.8 e-4) | 0.236 | 0.307 | |
| cg24908166 | −1.8 e-4 (5.5 e-5) | 1.5 e-3 | 1.7 e-3 | 4.3 e-5 (1.6 e-4) | 0.788 | 0.788 | |
Estimated by mixed linear regression in validation panels. Both models were adjusted for age (years), sex, batch effects, leukocyte distribution (Houseman algorithm [27]), alcohol consumption (abstainer/ low/ intermediate/ high), body mass index (BMI, underweight/ normal weight/ overweight/ obese), physical activity (inactive/ low/ medium/ high), prevalence of cardiovascular diseases (yes/no), prevalence of diabetes (yes/no) and prevalence of cancer (yes/no); The threshold of FDR (false discovery rate) is 0.05
A pack-year was defined as having smoked 20 cigarettes per day for 1 year, including all participants from validation panel, pack-year= 0 for never smokers
Cessation time defined as age at the time of recruitment minus age at cessation, including former and current smokers from validation panel, cessation time = 0 for current smokers
Characteristics of the validated CpG sites
| CpG site | Position | Gene | Function | Placement | Reported SNPs | SNP position |
|---|---|---|---|---|---|---|
| cg17928584 | chr5:146,614,458 | Encoding members of the serine/threonine kinase family that has a paramount role in cellular homeostasis, transcription factor phosphorylation and cell-cycle regulation | TSS200 | rs2895680 | chr5:146,643,865-146,644,365 | |
| cg12324353 cg24908166 | chr5:1,269,197 chr5:1,268,801 | Encoding human telomerase reverse transcriptase, which is important in the maintenance of telomere length | BodyBody | rs2736100 rs2853677 rs465498 | chr5:1,286,266-1,286,766 chr5:1,286,944-1,287,444 chr5:1,325,553-1,326,053 | |
| cg00640087 cg20640261 | chr6:31,707,203 chr6:31,707,020 | Encoding a member of the mutS family of proteins that are involved in DNA mismatch repair and meiotic recombination | TSS1500 TSS1500 | rs3117582 | chr6:31,620,270-31,620,770 | |
| cg19335412 | chr10:90,694,875 | Encoding a protein which belongs to the actin family of proteins and are highly conserved proteins that play a role in cell motility, structure and integrity | 3′UTR | rs1926203 | chr10:90,727,084-90,727,584 | |
| cg10163955cg11430077cg22770911 | chr10:8,101,402chr10:8,099,019chr10:8,101,307 | Encoding a protein which belongs to the GATA family of transcription factors | BodyBodyBody | rs1663689 | chr10:9,024,945-9,025,445 | |
| cg24287110 | chr10:3,824,688 | Encoding a member of the Kruppel-like family of transcription factors, which is a transcriptional activator and functions as a tumor suppressor | Body | rs10508266 rs3750861 | chr10:3,839,764-3,840,264 chr10:3,824,183-3,824,683 | |
| cg03281572 cg07269053 | chr10:114,502,318 chr10:114,497,612 | Encoding vesicle transport through interaction with t-SNAREs homolog 1A | Body Body | rs7086803 | chr10:114,498,226-114,498,726 | |
| cg19696491 | chr15:78,857,125 | Encoding a nicotinic acetylcholine receptor subunit, which is a member of a superfamily of ligand-gated ion channels that mediate fast signal transmission at synapses | TSS1500 | rs1051730 | chr15:78,894,089-78,894,589 chr15:78,882,675-78,883,175 chr15:78,805,773-78,806,273 |
According to GRCh37/hg19
This SNP is located close to GATA3
CHRNA5 is cis-eQTL gene of this SNP
Figure 2Flowchart of selection of CpG sites