Literature DB >> 30425263

DNA methylation and associated gene expression in blood prior to lung cancer diagnosis in the Norwegian Women and Cancer cohort.

Torkjel Manning Sandanger1, Therese Haugdahl Nøst2, Florence Guida3,4, Charlotta Rylander2, Gianluca Campanella3, David C Muller3, Jenny van Dongen5, Dorret I Boomsma5, Mattias Johansson4, Paolo Vineis3,6, Roel Vermeulen3,7, Eiliv Lund2, Marc Chadeau-Hyam3,7.   

Abstract

The majority of lung cancer is caused by tobacco smoking, and lung cancer-relevant epigenetic markers have been identified in relation to smoking exposure. Still, smoking-related markers appear to mediate little of the effect of smoking on lung cancer. Thus in order to identify disease-relevant markers and enhance our understanding of pathways, a wide search is warranted. Through an epigenome-wide search within a case-control study (131 cases, 129 controls) nested in a Norwegian prospective cohort of women, we found 25 CpG sites associated with lung cancer. Twenty-three were classified as associated with smoking (LC-AwS), and two were classified as unassociated with smoking (LC-non-AwS), as they remained associated with lung cancer after stringent adjustment for smoking exposure using the comprehensive smoking index (CSI): cg10151248 (PC, CSI-adjusted odds ratio (OR) = 0.34 [0.23-0.52] per standard deviation change in methylation) and cg13482620 (B3GNTL1, CSI-adjusted OR = 0.33 [0.22-0.50]). Analysis among never smokers and a cohort of smoking-discordant twins confirmed the classification of the two LC-non-AwS CpG sites. Gene expression profiles demonstrated that the LC-AwS CpG sites had different enriched pathways than LC-non-AwS sites. In conclusion, using blood-derived DNA methylation and gene expression profiles from a prospective lung cancer case-control study in women, we identified 25 CpG lung cancer markers prior to diagnosis, two of which were LC-non-AwS markers and related to distinct pathways.

Entities:  

Mesh:

Year:  2018        PMID: 30425263      PMCID: PMC6233189          DOI: 10.1038/s41598-018-34334-6

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Lung cancer is the leading cause of cancer death worldwide, causing as many deaths as the next four most deadly cancers combined (breast, prostate, colon, and pancreas), and the incidence of lung cancer is projected to double by 2050[1]. Recent advances in molecular biology, and the emergence of cost efficient OMICs data has contributed new insight into mechanisms involved in lung carcinogenesis. For instance, several epigenome-wide association studies have identified methylation changes that are associated with lung cancer risk[2-8], using mainly tumor tissue collected at the time of diagnosis, but also blood in prevalent cases[9]. Other studies focusing on established lung carcinogens such as smoking used peripheral blood and identified several differentially methylated CpG sites[10-13]. Of these, a recent meta-analysis including 15,907 participants identified 2,623 differentially methylated CpG sites in relation to smoking status[14]. Although smoking is an established causal risk factor responsible for a vast proportion of lung cancer incidence, identified smoking-related CpG sites have been shown to mediate some or little of the effects of smoking on lung cancer[6,15]. Further, the most common histological subtype of lung cancer in never smokers is adenocarcinoma[16], which might originate from specific, and as yet unknown, molecular mechanisms different from those involved in smoking-induced lung cancer[17]. It is therefore of interest to identify lung cancer biomarkers that are unrelated to smoking exposure to (i) gain better understanding of the etiology of lung cancer and (ii) to investigate whether biological pathways affected by smoking-induced changes in DNA-methylation are similar to those affected by differential methylation at CpG sites that are not associated with smoking exposure. To-date, very few studies have investigated methylation markers of lung cancer risk that are not associated with such exposure[8,18]. In the current study, we used full-resolution DNA methylation profiles from prospectively collected blood samples to identify methylation alterations in relation to future lung cancer diagnosis and assess the relationship between these markers and smoking exposure. Specifically, we adopted a stringent adjustment strategy to identify disease-related methylation changes at CpG sites that are not associated with smoking exposure and compare them to changes in methylation level at disease-related CpG sites that are also associated with smoking exposure. Finally, we exploited gene expression data measured in the same individuals to aid functional interpretation by exploring biological pathways of gene expression profiles affected by methylation changes at all lung cancer-related CpG sites.

Materials and Methods

Participants

Our study population included women from a lung cancer case–control study nested in the post genome cohort (N ~ 50 000) within the Norwegian Women and Cancer Study (NOWAC)[19-21]. All participating women were cancer-free at recruitment (1991–2006) and at time of blood sampling (2003–2006). Linkage to the national cancer registry identified 134 incident lung cancer cases. Cases were diagnosed between 2004 and 2011, and for each case, one control was matched on time since blood sampling and birth year. All participants gave written informed consent and the study was approved by the Regional Committee for Medical and Health Research Ethics and the Norwegian Data Inspectorate. We confirm that all methods employed in the study were performed in accordance with the relevant guidelines and regulations. We used methylation data from the Netherlands Twin Register (NTR) for replication. Subjects in the NTR biobank study were recruited between 2004 and 2011[22,23]. The study included 769 monozygotic (MZ) and 424 dizygotic (DZ) twin pairs. A blood sample was collected at inclusion and we included in the present study 125 MZ and 146 DZ adult twin pairs who were discordant with respect to their smoking status at time of blood sampling. We included pairs in which one twin never smoked and the other twin was a current smoker (N = 53 MZ, and 77 DZ), and pairs including one never smoker and one former smoker (N = 72 MZ, and 69 DZ).

DNA methylation and gene expression microarray data

Genome-wide DNA methylation profiles from bisulphite-converted, hybridized genomic DNA from buffy coat samples were generated using Illumina Infinium HumanMethylation450 Bead-Chips following a protocol described previously for both NOWAC[24] and NTR[22,25] samples. DNA methylation levels at each locus were expressed as the ratio of intensities arising from methylated cytosines over total intensities. For NOWAC, sample preparation and data pre-processing were performed as described elsewhere[24]. In brief, probes (i) on sex chromosomes, (ii) reported to be cross-reactive[26] and (iii) for which methylation levels were measured in <20% of the samples were excluded. Five samples did not pass quality controls and three subjects were excluded due to >95% missing in DNA methylation results. The final analysis included 428,629 probes targeting autosomal CpG loci in 260 women (131 cases and 129 controls). In NTR data, sample- and probe-level quality checks and data pre-processing were performed as described in detail previously[25] and only CpG sites identified in the NOWAC discovery data were interrogated. For 248 of the 260 women from the NOWAC study with DNA methylation data, gene expression profiles were also available and were generated at the Norwegian University of Science and Technology. Total RNA was isolated using established protocols[27] and microarray analyses were performed using the IlluminaHuman HT-12 expression Bead-Chips. Microarray data were quality-checked and pre-processed as previously described[28]. Original probe values were background-corrected and probes reported to have poor quality from Illumina or detected in <95% of samples were filtered out. Only transcripts on autosomal chromosomes were included in the analysis. The final gene expression data set included 18,955 transcripts assayed in 248 individuals.

Statistical models

We investigated the relationship between future lung cancer status and methylation levels using unconditional logistic regression models. As already described[29], we corrected for technically-induced variation in methylation and gene expression data by fitting a preliminary linear mixed model including technical covariates (chip ID and position on the chip for methylation data and date of mRNA isolation and date of complementary RNA generation for gene expression data) as random intercepts, and, to account for the case control matching we adjusted (fixed effects) our models for the two matching criteria: age at blood collection and sample storage time. Methylation and gene expression levels used in the downstream analyses were represented by the residuals from these mixed models. Multiple testing was accounted for by using a Bonferroni correction ensuring a family-wise error rate below 5% (corresponding per test significance level was set to 1.16e-07). We report as effect size estimates the odds ratios (OR) for one standard deviation change in the methylation levels. We further adjusted the logistic regression models for blood cell composition, estimated according to the methods proposed by Houseman[30,31]. We specifically adjusted for estimated proportions of leukocytes (excluding natural killer cells and eosinophil granulocytes). Lung cancer-related CpG sites that are not associated with smoking exposure (LC-non-AwS) were defined as those (i) found significantly associated to lung cancer in the main logistic model, and (ii) remaining associated to lung cancer upon adjustment for smoking. Conversely, lung cancer-related CpG sites that are associated with smoking exposure (LC-AwS) are defined as those losing statistical significance upon adjustment for smoking exposure. Confirmation of their lack of association to smoking exposure was sought in never smokers and an independent study including smoking-discordant twins. We investigated three measures of smoking exposure: smoking status, pack-years, and the comprehensive smoking index (CSI)[32]. CSI scores (Table S1) were obtained using duration of smoking (dur; years), intensity (int; average number of cigarettes per day during years of smoking), and time since smoking cessation (tsc; years) and fitting the following model to our data: X2 = (1 − 0.5dur∗/τ)(0.5tsc∗/τ) ln(int + 1), where τ is the estimated half-life parameter, and δ is an estimated lag time parameter describing tsc and total duration as follows: tsc∗ = max(tsc − δ, 0) and dur∗ = max(dur + tsc − δ) − tsc∗. To further assess possible relations between methylation levels at disease-related sites and smoking exposure, we used the methylation data of the NTR study and ran paired Student’s T-test analyses comparing the mean methylation differences within pairs of MZ and DZ smoking discordant twins. Paired T-tests were performed on residual methylation levels, which were obtained by adjusting the methylation levels (beta-values) for sex, age at blood sampling, measured cell counts (percentage of monocytes, eosinophils, and neutrophils), and technical covariates: array row and sample plate. We ran a series of sensitivity analyses that included conditional logistic regressions for the (N = 128) case-control complete pairs. Further sensitivity analyses were restricted to (i) cases from each of the main histological subtypes separately (adenocarcinomas (N = 64), small cell and squamous (N = 43), others (N = 24))[33], (ii) cases diagnosed before or after the median time elapsed from blood collection to diagnosis (4.2 years), (iii) cases that were current (N = 81), former (N = 36), or never (N = 14) smokers, separately. In these stratified analyses subsets of cases were compared to all healthy controls (N = 37, 35, 57 in current, former and never smokers) included in the study and, because case-control pairs were broken, we used unconditional logistic regression models as defined for the main analysis. In order to ensure a wide explorative search of (N = n1) LC-non-AwS markers, which are likely weaker and less numerous than the (N = n2) LC-AwS markers, we complemented our list of n1 LC-non-AwS markers by defining a ‘second order’ set of (N = n1′) LC-non-AwS CpG sites as defined by those associated to a least one of the n1 LC-non-AwS markers, but not directly to disease status. These were identified by regressing the methylation levels of the n1 LC-non-AwS CpG sites against the (428,629 − n1) remaining CpG sites. As before, we used Bonferroni corrected per-test significance level here defined as 0.05/(n1x(428,629 − n1)). In order to help functional interpretation of the resulting epigenetic alterations, gene expression data measured in the same individuals were linked to the DNA methylation levels of the identified markers. Specifically, we ran linear regression models assessing the association between the 18,955 assayed transcripts and (i) each of the first and second order (n1 + n1′) LC-non-AwS CpG sites and (ii) each of the n2 LC-AwS CpG sites. Statistical significance of each CpG-transcript pair was evaluated adopting a Bonferroni corrected per-test significance level (0.05/((n1 + n1′) × 18,955), and 0.05/(n2 × 18,955), respectively). The transcripts involved in any significant CpG-transcript pair were subsequently included in overrepresentation analyses based on hypergeometric tests setting a nominal p-value of 0.05 using the ‘enrichGO’ function of the Bioconductor ‘clusterProfiler‘ package[34]. All statistical analyses were performed using R (ver. 3.1.2, Foundation for Statistical Computing, Vienna, Austria).

Results

Sample description and overall lung cancer risk

Baseline characteristics of the NOWAC women and NTR study populations are summarized in Table 1. As expected, NOWAC cases were more commonly current smokers at blood sampling (62%) than controls (29%) (Table S1). Logistic regression models demonstrated elevated lung cancer risk in former smokers (OR = 4.07 (95% CI: 1.97–8.79), and in current smokers (OR = 8.46 (95% CI: 4.31–17.53)) as compared to never smokers. The model including CSI score alone indicated an OR of 3.66 (95% CI: 2.53–5.42) for one unit increase in CSI values and the model provided the better fit compared to other smoking metrics (including smoking status and pack years, AIC results not shown). Estimated cell type proportions were similar in cases and controls, except for the natural killer cells, which were underrepresented in cases and among current smokers (Table S2).
Table 1

Characteristics of the NOWAC (women only) and the NTR populations.

VariableNOWAC studyNTR study
Cases N = 131Controls N = 129Monozygotic N = 250 (125 pairs)Dizygotic N = 292 (146 pairs)
MeanSDMeanSDMeanSDMeanSD
Age at sample56.554.0156.573.9837.7711.9133.248.18
Age at diagnosis60.54.12
Time to diagnosis3.881.99
Pack-years20.4613.38.6211.18
CSI1.30.650.630.71
N % N % N % N %
Histological subtypes
Adenocarcinomas6448.9
Small cell carcinomas2519.1
Squamous cell carcinomas1813.7
Others2418.3
Smoking status
Current8161.83728.685321.27726.4
Former3627.53527.137228.86923.6
Never1410.75744.191255014650
Characteristics of the NOWAC (women only) and the NTR populations.

Differentially methylated CpG sites associated to lung cancer risk

We identified 25 CpG sites at which lower methylation levels were associated to higher lung cancer risk (Tables 2 and S3; boxplots of the methylation according to case/control status in Figure S1 and volcano plot in Figure S2). After adjustment for smoking, n2 = 23 of these sites were classified as LC-AwS markers, as their associations lost statistical significance (Table 2). Among the different smoking metrics considered, CSI appeared to provide the most stringent adjustment as depicted by flattened p-value distribution (Figure S3, estimates for the covariates adjusted for are presented in Table S4). Only n1 = 2 CpGs remained associated with lung cancer risk after controlling for CSI and were classified as LC-non-AwS markers (Table 2): cg10151248; PC (OR = 0.34) and cg13482620; B3GNTL1 (OR = 0.33). These two LC-non-AwS CpGs were also significantly associated with lung cancer after further adjustment for blood cell composition (Table S5). The correlations between the n = 2 LC-non-AwS CpG sites and the n2 = 23 LC-AwS sites were moderate (Fig. 1). Conversely, we observed stronger block correlations within the LC-AwS sites, and in particular a subset of eight CpG sites (Figure S4). Results in figures and tables are presented separately for LC-AwS and LC-non-AwS sites.
Table 2

25 Bonferroni significant CpG sites differentially methylated in cases as compared to controls (N = 131 cases, 129 controls) that were un-associated with smoking (LC-non-AwS), or associated with smoking (LC-AwS).

Probe IDGene nameChromosomeUnadjusted modelAdjusted model - smoking statusAdjusted model - pack-yearsAdjusted model - CSI
OR95% CIp-valueOR95% CIp-valueOR95% CIp-valueOR95% CIp-value
LC-Non-AwS CpG sites
cg10151248 PC 110.360.25–0.51 1.2E-08 0.360.25–0.52 7.2E-08 0.350.24–0.51 5.0E-08 0.340.23–0.5 7.1E-08
cg13482620 B3GNTL1 170.410.3–0.57 8.5E-08 0.390.27–0.552.1E-070.330.22–0.5 6.7E-08 0.330.22–0.5 8.4E-08
LC-AwS CpG sites
cg05575921 AHRR 50.370.27–0.49 2.9E-11 0.380.23–0.642.5E-040.550.37–0.812.2E-030.560.35–0.91.6E-02
cg03636183 F2RL3 190.380.28–0.52 2.5E-10 0.490.32–0.758.5E-040.580.4–0.833.2E-030.580.38–0.891.2E-02
cg06126421 NA 60.380.28–0.52 8.6E-10 0.540.36–0.82.1E-030.590.41–0.854.7E-030.630.42–0.942.5E-02
cg21566642 NA 20.420.31–0.56 2.5E-09 0.600.38–0.942.7E-020.670.46–0.983.7E-020.690.43–1.111.3E-01
cg02152091 NA 80.390.29–0.53 3.7E-09 0.390.28–0.55 2.8E-08 0.400.28–0.55 6.5E-08 0.430.31–0.65.3E-07
cg03898802 DOPEY2 210.390.28–0.53 6.8E-09 0.410.29–0.582.6E-070.410.29–0.584.4E-070.420.29–0.598.7E-07
cg06500852 NA 20.400.29–0.55 7.8E-09 0.430.31–0.65.7E-070.450.33–0.632.5E-060.460.33–0.644.9E-06
cg20024310 NA 70.370.26–0.52 1.7E-08 0.350.24–0.52 9.2E-08 0.360.24–0.532.7E-070.360.24–0.546.0E-07
cg06368429 KPNA7 70.390.28–0.54 1.9E-08 0.440.31–0.612.0E-060.410.28–0.585.6E-070.430.3–0.624.1E-06
cg02451831 KIAA0087 70.410.3–0.56 1.9E-08 0.500.35–0.78.4E-050.530.38–0.741.9E-040.540.39–0.764.0E-04
cg13936208 NA 120.380.27–0.53 2.1E-08 0.380.26–0.553.7E-070.420.29–0.62.1E-060.410.29–0.62.8E-06
cg13525026 MYO15A 170.430.32–0.58 2.3E-08 0.480.35–0.665.0E-060.480.35–0.667.8E-060.480.34–0.669.6E-06
cg08928494 CA5A 160.390.28–0.55 2.7E-08 0.410.29–0.585.8E-070.390.27–0.574.7E-070.400.28–0.588.6E-07
cg25305703 NA 80.430.32–0.58 3.2E-08 0.560.4–0.786.0E-040.570.41–0.796.4E-040.610.44–0.853.3E-03
cg00395990 PDZD3 110.420.31–0.57 3.7E-08 0.460.34–0.643.3E-060.480.34–0.661.2E-050.500.36–0.693.8E-05
cg25324976 CSHL1 170.430.31–0.58 6.0E-08 0.450.32–0.632.7E-060.450.32–0.635.6E-060.480.34–0.672.1E-05
cg01940273 NA 20.460.35–0.61 6.1E-08 0.670.44–15.1E-020.730.51–1.037.1E-020.790.53–1.172.4E-01
cg11635401 MYO9B 190.430.31–0.58 9.1E-08 0.400.28–0.573.0E-070.400.28–0.572.5E-070.410.29–0.585.7E-07
cg22475974 NA 40.440.33–0.6 9.2E-08 0.470.34–0.642.2E-060.480.35–0.653.6E-060.500.37–0.692.1E-05
cg21838013 CRTAM 110.410.3–0.57 1.0E-07 0.440.31–0.622.1E-060.420.3–0.61.3E-060.450.32–0.646.3E-06
cg23069177 OCA2 150.430.31–0.58 1.0E-07 0.430.3–0.61.7E-060.430.31–0.612.3E-060.410.29–0.591.4E-06
cg21161138 AHRR 50.450.34–0.61 1.0E-07 0.660.45–0.952.7E-020.700.5–0.973.3E-020.770.54–1.111.6E-01
cg16976547 FES 150.420.31–0.58 1.1E-07 0.440.31–0.623.2E-060.450.32–0.646.4E-060.460.33–0.661.6E-05

OR, odds ratio; CI, confidence interval. Regression models include residual DNA methylation levels (DNA methylation adjusted for technical covariates and matching variables; see Method section) as an independent variable. Results are presented for the unadjusted unconditional logistic regression and the same model adjusted for three selected smoking metrics (smoking status as categories never/former/current, pack-years, and comprehensive smoking index (CSI)). Model results for unconditional logistic regressions adjusted for WBCs and CSI + WBCs are presented in Table S4. Bolded p-values are significant according to the Bonferroni threshold.

Figure 1

Heatmap of the correlation between the two CpGs un-associated with smoking (LC-non-AwS) and the 23 CpGs associated with smoking (LC-AwS). Figure note: The correlation strength is represented by color as indicated in the bar to the right.

25 Bonferroni significant CpG sites differentially methylated in cases as compared to controls (N = 131 cases, 129 controls) that were un-associated with smoking (LC-non-AwS), or associated with smoking (LC-AwS). OR, odds ratio; CI, confidence interval. Regression models include residual DNA methylation levels (DNA methylation adjusted for technical covariates and matching variables; see Method section) as an independent variable. Results are presented for the unadjusted unconditional logistic regression and the same model adjusted for three selected smoking metrics (smoking status as categories never/former/current, pack-years, and comprehensive smoking index (CSI)). Model results for unconditional logistic regressions adjusted for WBCs and CSI + WBCs are presented in Table S4. Bolded p-values are significant according to the Bonferroni threshold. Heatmap of the correlation between the two CpGs un-associated with smoking (LC-non-AwS) and the 23 CpGs associated with smoking (LC-AwS). Figure note: The correlation strength is represented by color as indicated in the bar to the right. The two LC-non-AwS CpG sites were also associated with lung cancer risk in never smokers (OR = 0.36 (95% CI: 0.17–0.77) and OR = 0.31 (95% CI: 0.14–0.67) for cg10151248-PC and cg13482620-B3GNTL1, respectively). Estimates were consistent in current smokers for cg10151248-PC and cg13482620-B3GNTL1 (OR = 0.32 (95% CI: 0.19–0.57) and OR = 0.33 (95% CI: 0.18–0.61)), respectively) but slightly weaker in former smokers (OR = 0.43 (95% CI: 0.23–0.82) and 0.50 (95% CI: 0.30–0.85)). The stratified analysis showed that 10 CpG sites among the 23 LC-AwS CpG sites were significantly associated to lung cancer status in never smokers (Table S6). The methylation levels of the two most strongly associated LC-AwS CpG sites: cg05575921-AHRR and cg03636183-F2RL3, were not associated with lung cancer risk in never smokers (OR = 0.27 (95% CI: 0.03–2.14) and 1.22 (95% CI: 0.32–4.68), respectively. Additional stratification on histological subtypes provided consistent OR estimates for cg10151248-PC across histological subtypes (Table S7; range: 0.36–0.39), and stronger effects of methylation levels were estimated in cases with shorter time to diagnosis (0.33 vs 0.41 for short and long time to diagnosis, respectively). For cg13482620-B3GNTL1, effect size estimates were consistent in both time to diagnosis classes, but the OR was lower in adenocarcinoma cases (OR = 0.35) than in ‘all other subtypes’ and ‘squamous and small cell’ cases (OR >0.49). Corresponding stratified analyses for LC-AwS CpGs are also presented in Table S7. Using conditional logistic regressions unadjusted for smoking exposure, as a sensitivity analysis, only two of the n + n = 25 candidate CpG sites reached Bonferroni significance level (cg05575921 and cg06126421, p-values 3.99e−08 and 6.68e−08, respectively). When comparing mean methylation levels for the 25 candidate CpG sites within pairs of smoking-discordant twins (MZ or all), we found no differences between never smokers and ever/current smokers for the two LC-non-AwS markers (Table 3). Comparison of the mean methylation levels at the LC-AwS CpG sites, showed significant differences at eight CpG sites while comparing smokers (current or ever) to never smokers. When restricting these comparisons to MZ twin pairs, six and eight CpG sites were significantly different in never to current and never to ever comparisons, respectively (Table 3).
Table 3

Difference in methylation in twins discordant according to smoking status in the NTR study for the CpG sites associated with lung cancer identified as un-associated with smoking (LC-non-AwS), or associated with smoking (LC-AwS) in the NOWAC study.

CpG siteAll twinsMonozygotic twins
Never vs. CurrentNever vs. EverNever vs. CurrentNever vs. Ever
Mean diff.95% CIp-valueMean diff.95% CIp-valueMean diff.95% CIp-valueMean diff.95% CIp-value
LC-Non-AwS CpG sites
cg10151248−0.001−0.003–0.0012.7E-010.000−0.002–0.0015.1E-01−0.002−0.005–08.3E-020.000−0.002–0.0016.4E-01
cg13482620−0.002−0.005–0.0023.7E-01−0.002−0.004–0.0011.7E-01−0.002−0.008–0.0045.6E-01−0.002−0.005–0.0023.1E-01
LC-AwS CpG sites
cg055759210.1390.118–0.159 7.8E-26 0.0860.073–0.099 4.5E-31 0.1320.1–0.164 3.0E-11 0.0730.055–0.09 2.8E-13
cg036361830.0650.053–0.077 1.4E-20 0.0450.037–0.052 6.2E-27 0.0620.045–0.08 2.9E-09 0.0390.029–0.049 4.7E-13
cg061264210.0560.044–0.067 1.6E-16 0.0420.035–0.049 9.4E-27 0.0500.033–0.067 2.9E-07 0.0340.025–0.042 1.3E-11
cg215666420.0920.077–0.107 7.5E-23 0.0670.058–0.077 7.6E-34 0.0920.069–0.115 1.2E-10 0.0590.047–0.072 1.8E-15
cg02152091−0.003−0.007–0.0022.0E-01−0.003−0.006–04.8E-020.000−0.007–0.0079.9E-01−0.003−0.007–0.0011.6E-01
cg03898802−0.002−0.005–0.0011.5E-01−0.001−0.003–01.6E-01−0.002−0.006–0.0034.4E-010.000−0.002–0.0038.5E-01
cg06500852−0.002−0.004–0.0011.5E-01−0.001−0.002–0.0015.3E-01−0.004−0.008–0.0012.6E-02−0.002−0.004–0.0011.8E-01
cg20024310−0.006−0.011–06.5E-02−0.003−0.007–0.0011.7E-01−0.004−0.012–0.0054.1E-010.003−0.003–0.0092.7E-01
cg06368429−0.006−0.013–0.0019.5E-02−0.003−0.007–0.0011.6E-01−0.008−0.019–0.0031.5E-01−0.002−0.008–0.0045.1E-01
cg024518310.0140.009–0.02 3.0E-06 0.0100.006–0.013 1.8E-07 0.0130.005–0.0212.9E-030.0080.003–0.013 1.2E-03
cg13936208−0.004−0.008–04.5E-02−0.003−0.006–02.2E-02−0.003−0.008–0.0032.8E-01−0.001−0.005–0.0024.8E-01
cg13525026−0.003−0.006–0.0011.2E-01−0.001−0.003–0.0012.9E-01−0.007−0.012–0.0026.0E-03−0.003−0.007–05.0E-02
cg253057030.0210.012–0.03 8.9E-06 0.0150.01–0.021 3.1E-07 0.0200.006–0.0345.3E-030.0130.005–0.021 1.2E-03
cg00395990−0.002−0.006–0.0023.1E-01−0.001−0.003–0.0013.2E-01−0.002−0.007–0.0045.2E-010.000−0.003–0.0038.5E-01
cg019402730.0590.049–0.07 2.3E-20 0.0420.036–0.049 5.9E-29 0.0580.042–0.074 1.7E-09 0.0370.028–0.046 2.1E-13
cg11635401−0.002−0.007–0.0034.5E-01−0.001−0.004–0.0025.5E-01−0.006−0.014–0.0021.5E-01−0.003−0.008–0.0021.9E-01
cg224759740.000−0.002–0.0016.1E-010.000−0.002–0.0014.3E-01−0.002−0.004–0.0013.0E-010.000−0.002–0.0016.5E-01
cg21838013−0.003−0.008–0.0022.1E-01−0.001−0.005–0.0036.0E-01−0.001−0.009–0.0078.5E-010.002−0.003–0.0084.0E-01
cg23069177−0.001−0.005–0.0036.2E-01−0.001−0.003–0.0025.6E-010.000−0.006–0.0069.4E-01−0.001−0.005–0.0036.0E-01
cg211611380.0440.036–0.052 1.1E-20 0.0260.021–0.031 4.3E-20 0.0440.031–0.057 1.5E-08 0.0200.013–0.028 8.3E-07
cg169765470.001−0.003–0.0055.8E-010.001−0.001–0.0035.4E-010.003−0.006–0.0124.8E-010.001−0.003–0.0056.6E-01

Diff: difference; Bolded numbers for p-values are considered significant using a Bonferroni threshold.

Difference in methylation in twins discordant according to smoking status in the NTR study for the CpG sites associated with lung cancer identified as un-associated with smoking (LC-non-AwS), or associated with smoking (LC-AwS) in the NOWAC study. Diff: difference; Bolded numbers for p-values are considered significant using a Bonferroni threshold.

Functional investigation of the 25 candidate CpG sites

No significant association was found linking DNA methylation levels at either LC-non-AwS sites (cg10151248-PC and cg13482620-B3GNTL1) and the gene expression levels at the 18,955 transcripts assayed (containing one transcript each for PC and B3GNTL1 genes). We identified a total of n1’ = 1987 ‘second order’ LC-non-AwS CpG sites whose methylation levels were associated to that of at least one of the n1 = 2 LC-non-AwS CpG sites, and not directly with disease risk. Of these, 160 and 1,876 were associated with methylation levels of cg10151248-PC and cg13482620-B3GNTL1, respectively and their pairwise correlation is presented in Figure S5A. When regressing the n1’ ‘second order’ set of CpG sites against the gene expression levels we identified (i) 19 significant CpG-transcript pairs for cg10151248-PC (Table S8), corresponding to 19 unique transcripts and one unique CpG site (Table 4), and (ii) 137 CpG-transcript pairs for cg13482620-B3GNTL1 (Table S9), including 127 unique transcripts and nine CpG sites (Table 4). The correlations between transcripts associated to the methylation levels of at least one ‘second order’ CpG site are presented in Figure S5B. Overrepresentation analyses of transcripts involved in these significant LC-non-AsW CpG-transcript pairs demonstrated distinct enriched ontology categories relating to immune response, and involving beta cells (Fig. 2, Tables S10 and S11, respectively).
Table 4

The number of transcripts associated to the ‘second-order’ CpGs un-associated with smoking (LC-non-AwS) and the CpGs associated with smoking (LC-AwS).

CpG NameGeneChromosomeNo. of significant transcripts
cg10151248- PC associated
cg21570493 NA 119
cg13482620- B3GNTL1 associated
cg03160057 WDR66 12104
cg10115918 B3GNTL1 1711
cg05664421 ZFYVE28 410
cg13752749 CRB2 91
cg01676996 C6orf136 62
cg11654904 ZNF642 16
cg04909834 ARHGEF10 81
cg06836020 LOC440354 161
cg27665823 DECR2 161
LC-AwS CpG sites
cg06126421 NA 675
cg05575921 AHRR 543
cg01940273 NA 219
cg21566642 NA 210
cg03636183 F2RL3 197
cg21161138 AHRR 57
cg02451831 KIAA0087 74
cg25305703 NA 83
Figure 2

Network visualizations of gene ontology categories in which genes were significantly overrepresented, for the genes associated to the ‘second order’ CpGs un-associated with smoking (CpGs associated with cg10151248-PC and cg13482620-B3GNTL1) as well as those associated to the 23 CpGs associated with smoking (LC-AwS). Figure note: Biological processes categories are colored according to the significance of the overrepresentation and the gene ratio signifies the number of genes in each list relative to the number of genes in the ontology categories.

The number of transcripts associated to the ‘second-order’ CpGs un-associated with smoking (LC-non-AwS) and the CpGs associated with smoking (LC-AwS). Network visualizations of gene ontology categories in which genes were significantly overrepresented, for the genes associated to the ‘second order’ CpGs un-associated with smoking (CpGs associated with cg10151248-PC and cg13482620-B3GNTL1) as well as those associated to the 23 CpGs associated with smoking (LC-AwS). Figure note: Biological processes categories are colored according to the significance of the overrepresentation and the gene ratio signifies the number of genes in each list relative to the number of genes in the ontology categories. For the n2 = 23 LC-AwS CpG sites we identified 168 significant CpG-transcript pairs (Tables 4 and S12), corresponding to 100 unique transcripts and eight unique CpG sites. Overrepresentation analyses identified ontology categories distinctly different from those identified above and mostly related to responses to external stressors (Fig. 2 and Table S13).

Discussion

We combined genome-wide methylation and gene expression profiles from prospective blood samples in the NOWAC study to identify markers of lung cancer risk in Norwegian women and investigated to what extent these associations were driven by exposure to smoking. We identified 25 CpG sites associated with lung cancer risk, of which 23 were classified as LC-AwS, as they lost statistical significance after stringent adjustment for smoking exposure metrics. The two remaining CpG sites (cg10151248-PC and cg13482620- B3GNTL1) were classified as LC-non-AwS CpG sites, as they remained statistically significant after adjustment for CSI and demonstrated low correlation to the other 23 CpGs. For the majority of markers the case control difference was larger with shorter time to diagnosis. Of the 23 LC-AwS CpG sites, eight have been acknowledged as epigenetic signatures of cigarette smoking in a recent large meta-analysis of DNA methylation and smoking[14]. Pairwise correlations among the same eight LC-AwS CpG sites were also markedly higher than correlations with the other 15 LC-AwS CpG sites, supporting the evidence of these being linked to smoking. Furthermore, the same eight LC-AwS CpG sites were differentially methylated in smoking discordant twins. For the majority of the 23 LC-AwS markers, the association with risk was also stronger in the smoking related histological subtypes as compared to adenocarcinoma. The evidence to classify the 23 CpG sites as LC-AwS was not equally strong, but was considered sufficient for them to be treated separately as LC-AwS markers in the downstream analyses. The two LC-non-AwS CpG sites were consistently not associated to smoking exposure in all the analyses performed, which indicates that they are minimally associated with smoking. The association between LC-non-AwS CpG sites and risk was stronger in adenocarcinoma cases compared to the other more smoking-induced histological subtypes[35], which was not the trend observed in LC-AwS markers. Although hampered by statistical power in stratified analyses of never smokers, the same two LC-non-AwS markers were found to be statistically significant, along with 10 of the LC-AwS CpGs. Comparing pairs of smoking discordant twins revealed no difference in methylation levels at LC-non-AwS CpG sites. Finally, the two LC-non-AwS CpGs were not identified in the large meta-analyses for epigenetic smoking signatures[14] and we did not identify any single-nucleotide polymorphisms reported in the vicinity of these two sites[36], hence arguing against possible genetic confounding. Taken together, this supports that the two LC-non-AwS CpG sites, and in particular cg10151248-PC, are not associated with smoking and are distinct from the LC-AwS markers. To enable deeper investigation of the functional role of the methylation changes at the LC-non-AwS CpG sites, we defined ‘second order’ CpG sites as being associated with the methylation levels at any of these two LC-non-AwS CpG sites but not directly with lung cancer risk. None of the 160 CpG sites associated with cg10151248-PC were associated with smoking status in a recent large meta-analysis of DNA methylation and smoking status[14], and 22 of the 1,876 for cg13482620-B3GNTL1 (1.2%) were reported as LC-AwS markers. On this basis, we explored whether the two non-LC-AwS markers and complemented list of markers less associated with smoking could provide novel pathway information relevant for lung cancer development. The candidate methylation markers were further investigated by exploring the association between methylation levels at lung cancer related markers and gene expression data available in the same individuals. In order to ensure a comprehensive search for distinguishable pathways the full sets of markers were explored separately for the LC-AwS and LC-non-AwS CpG sites. Because regulation of gene expression through differential methylation obeys complex and multivariate mechanisms and can operate remotely (‘trans’ effects)[37], all assayed transcripts were investigated. No transcript was directly associated with methylation levels at cg10151248-PC and cg13482620-B3GNTL1 (neither PC or B3GNTL1 transcripts) which may not be surprising as both are highly methylated and show small, although significant, differences between cases and controls. However, we identified associations between methylation levels at the ‘second order’ CpG sites, and transcripts. The significant CpG-transcript pairs for the 160 cg10151248-PC-related CpG sites involved 19 transcripts, none of which were AwS markers either in our data or in the large meta-analysis of gene expression data[38], while 33 of 127 transcripts involved in cg13482620-B3GNTL1-related CpG-transcript pairs were identified in the large meta-analysis[38]. In the exploration of LC-non-AwS markers of lung cancer, which are likely to be more subtle signals than LC-AwS markers, an enriched CpG list was assessed when comparing potential functional roles of the different sets of markers identified. The gene ontology categories identified for the transcripts of the ‘second order’ CpG sites of cg10151248-PC and cg13482620-B3GNTL1, showed a large degree of overlap for categories linked to immune responses. The genes and consequently the categories indicated for cg10151248-PC clearly differed from those derived from LC-AwS CpG sites (categories linked to response to external stressors). Results from cg13482620-B3GNTL1 showed similarity with those from cg10151248-PC but also exhibited some common categories with LC-AwS sites. Thus indicating that a wide search provided novel information on potential pathways of relevance for lung cancer. There is very limited evidence in the literature linking the methylation or expression levels at the two LC-non-AwS CpG sites and health outcomes. Notably, the CpG methylations of PC and B3GNTL1 (located in unknown gene region and shelf region, respectively) were not associated with transcript expression for same genes. Nevertheless, hypermethylation at another CpG site in the gene B3GNTL1 has been observed in colorectal tumors compared to adjacent tissue[39] and the upregulated expression of this gene has been indicated as a potential marker for colorectal cancer[40]. Conversely, to the best of our knowledge there are no reported characterized description of the downstream consequences of altered methylation levels at cg10151248-PC. Residual confounding by smoking in our adjusted analyses cannot be disregarded. However, CSI appeared to be a stringent adjustment for exposure to smoking and the argumentation above supports the manner in which we classified markers as being LC-AwS or LC-non-AwS (or not directly for cg13482620-B3GNTL1). Further, adjustment for estimates of white blood cell composition were not emphasized here due to the potential over-adjustment by smoking. In conclusion, using blood-derived DNA methylation and gene expression profile from a prospective lung cancer study in Norwegian women, our study identified 25 differentially methylated CpG sites prior to lung cancer diagnosis, of which two appeared to be LC-non-AwS, in particular cg10151248-PC. These LC-non-AwS CpG sites seemed to be involved in biological pathways distinct from those related to LC-AwS CpG sites, and linked to immunological changes in blood prior to cancer diagnosis. Although the study size is limited, the use of a stringent significance level when assessing DNA methylation and gene expression data has revealed markers that represent prospective population-specific markers of smoking exposure as well as markers potentially relevant to lung cancer development and warrant further study. Supplementary Material
  37 in total

1.  Modeling smoking history: a comparison of different approaches.

Authors:  Karen Leffondré; Michal Abrahamowicz; Jack Siemiatycki; Bernard Rachet
Journal:  Am J Epidemiol       Date:  2002-11-01       Impact factor: 4.897

2.  clusterProfiler: an R package for comparing biological themes among gene clusters.

Authors:  Guangchuang Yu; Li-Gen Wang; Yanyan Han; Qing-Yu He
Journal:  OMICS       Date:  2012-03-28

Review 3.  Cohort profile: The Norwegian Women and Cancer Study--NOWAC--Kvinner og kreft.

Authors:  Eiliv Lund; Vanessa Dumeaux; Tonje Braaten; Anette Hjartåker; Dagrun Engeset; Guri Skeie; Merethe Kumle
Journal:  Int J Epidemiol       Date:  2007-07-20       Impact factor: 7.196

Review 4.  Lung cancer in never smokers: disease characteristics and risk factors.

Authors:  Athanasios G Pallis; Konstantinos N Syrigos
Journal:  Crit Rev Oncol Hematol       Date:  2013-08-04       Impact factor: 6.312

5.  Methylation markers for small cell lung cancer in peripheral blood leukocyte DNA.

Authors:  Liang Wang; Jeremiah A Aakre; Ruoxiang Jiang; Randolph S Marks; Yanhong Wu; Jun Chen; Stephen N Thibodeau; V Shane Pankratz; Ping Yang
Journal:  J Thorac Oncol       Date:  2010-06       Impact factor: 15.609

6.  Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains.

Authors:  Benjamin P Berman; Daniel J Weisenberger; Joseph F Aman; Toshinori Hinoue; Zachary Ramjan; Yaping Liu; Houtan Noushmehr; Christopher P E Lange; Cornelis M van Dijk; Rob A E M Tollenaar; David Van Den Berg; Peter W Laird
Journal:  Nat Genet       Date:  2011-11-27       Impact factor: 38.330

7.  Pathogenic mechanisms of lung adenocarcinoma in smokers and non-smokers determined by gene expression interrogation.

Authors:  Yunqian Hu; Guohan Chen
Journal:  Oncol Lett       Date:  2015-07-07       Impact factor: 2.967

8.  Comparison and combination of blood DNA methylation at smoking-associated genes and at lung cancer-related genes in prediction of lung cancer mortality.

Authors:  Yan Zhang; Lutz P Breitling; Yesilda Balavarca; Bernd Holleczek; Ben Schöttker; Hermann Brenner
Journal:  Int J Cancer       Date:  2016-08-22       Impact factor: 7.396

9.  Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking.

Authors:  Natalie S Shenker; Silvia Polidoro; Karin van Veldhoven; Carlotta Sacerdote; Fulvio Ricceri; Mark A Birrell; Maria G Belvisi; Robert Brown; Paolo Vineis; James M Flanagan
Journal:  Hum Mol Genet       Date:  2012-11-21       Impact factor: 6.150

10.  Dynamics of the risk of smoking-induced lung cancer: a compartmental hidden Markov model for longitudinal analysis.

Authors:  Marc Chadeau-Hyam; Pascale Tubert-Bitter; Chantal Guihenneuc-Jouyaux; Gianluca Campanella; Sylvia Richardson; Roel Vermeulen; Maria De Iorio; Sandro Galea; Paolo Vineis
Journal:  Epidemiology       Date:  2014-01       Impact factor: 4.822

View more
  15 in total

1.  Transcriptomic signals in blood prior to lung cancer focusing on time to diagnosis and metastasis.

Authors:  Therese H Nøst; Marit Holden; Tom Dønnem; Hege Bøvelstad; Charlotta Rylander; Eiliv Lund; Torkjel M Sandanger
Journal:  Sci Rep       Date:  2021-04-01       Impact factor: 4.379

2.  Novel blood-based hypomethylation of SH3BP5 is associated with very early-stage lung adenocarcinoma.

Authors:  Rong Qiao; Runbo Zhong; Chunlan Liu; Feifei Di; Zheng Zhang; Ling Wang; Tian Xu; Yue Wang; Liping Dai; Wanjian Gu; Baohui Han; Rongxi Yang
Journal:  Genes Genomics       Date:  2021-11-16       Impact factor: 1.839

3.  Sub-GOFA: A tool for Sub-Gene Ontology function analysis in clonal mosaicism using semantic (logical) similarity.

Authors:  Tadaaki Katsuda; Noriko Sato; Kaoru Mogushi; Takeshi Hase; Masaaki Muramatsu
Journal:  Bioinformation       Date:  2022-01-31

4.  Epigenetic mechanisms of lung carcinogenesis involve differentially methylated CpG sites beyond those associated with smoking.

Authors:  Dusan Petrovic; Barbara Bodinier; Florence Guida; Marc Chadeau-Hyam; Sonia Dagnino; Matthew Whitaker; Maryam Karimi; Gianluca Campanella; Therese Haugdahl Nøst; Silvia Polidoro; Domenico Palli; Vittorio Krogh; Rosario Tumino; Carlotta Sacerdote; Salvatore Panico; Eiliv Lund; Pierre-Antoine Dugué; Graham G Giles; Gianluca Severi; Melissa Southey; Paolo Vineis; Silvia Stringhini; Murielle Bochud; Torkjel M Sandanger; Roel C H Vermeulen
Journal:  Eur J Epidemiol       Date:  2022-05-20       Impact factor: 12.434

5.  Epigenome-wide scan identifies differentially methylated regions for lung cancer using pre-diagnostic peripheral blood.

Authors:  Naisi Zhao; Mengyuan Ruan; Devin C Koestler; Jiayun Lu; Carmen J Marsit; Karl T Kelsey; Elizabeth A Platz; Dominique S Michaud
Journal:  Epigenetics       Date:  2021-05-19       Impact factor: 4.528

6.  Individual and joint contributions of genetic and methylation risk scores for enhancing lung cancer risk stratification: data from a population-based cohort in Germany.

Authors:  Haixin Yu; Janhavi R Raut; Ben Schöttker; Bernd Holleczek; Yan Zhang; Hermann Brenner
Journal:  Clin Epigenetics       Date:  2020-06-18       Impact factor: 6.551

7.  Lifetime Ultraviolet Radiation Exposure and DNA Methylation in Blood Leukocytes: The Norwegian Women and Cancer Study.

Authors:  Christian M Page; Vera Djordjilović; Therese H Nøst; Reza Ghiasvand; Torkjel M Sandanger; Arnoldo Frigessi; Magne Thoresen; Marit B Veierød
Journal:  Sci Rep       Date:  2020-03-11       Impact factor: 4.379

8.  Prognostic role of alternative splicing events in head and neck squamous cell carcinoma.

Authors:  Yanni Ding; Guang Feng; Min Yang
Journal:  Cancer Cell Int       Date:  2020-05-14       Impact factor: 5.722

9.  Reply to P-A Dugué et al.

Authors:  Pooja R Mandaviya; Joyce B J van Meurs; Sandra G Heil
Journal:  Am J Clin Nutr       Date:  2020-01-01       Impact factor: 7.045

10.  AHRR methylation in heavy smokers: associations with smoking, lung cancer risk, and lung cancer mortality.

Authors:  Laurie Grieshober; Stefan Graw; Matt J Barnett; Mark D Thornquist; Gary E Goodman; Chu Chen; Devin C Koestler; Carmen J Marsit; Jennifer A Doherty
Journal:  BMC Cancer       Date:  2020-09-22       Impact factor: 4.430

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.