Literature DB >> 34932879

Epigenome-wide three-way interaction study identifies a complex pattern between TRIM27, KIAA0226, and smoking associated with overall survival of early-stage NSCLC.

Xinyu Ji1, Lijuan Lin1, Juanjuan Fan1, Yi Li2, Yongyue Wei1,3,4, Sipeng Shen1, Li Su3, Andrea Shafer5, Maria Moksnes Bjaanaes6, Anna Karlsson7, Maria Planck7, Johan Staaf7, Åslaug Helland6,8, Manel Esteller9,10,11,12, Ruyang Zhang1,3,4, Feng Chen1,4,13,14, David C Christiani3,5.   

Abstract

The interaction between DNA methylation of tripartite motif containing 27 (cg05293407TRIM27 ) and smoking has previously been identified to reveal histologically heterogeneous effects of TRIM27 DNA methylation on early-stage non-small-cell lung cancer (NSCLC) survival. However, to understand the complex mechanisms underlying NSCLC progression, we searched three-way interactions. A two-phase study was adopted to identify three-way interactions in the form of pack-year of smoking (number of cigarettes smoked per day × number of years smoked) × cg05293407TRIM27  × epigenome-wide DNA methylation CpG probe. Two CpG probes were identified with FDR-q ≤ 0.05 in the discovery phase and P ≤ 0.05 in the validation phase: cg00060500KIAA0226 and cg17479956EXT2 . Compared to a prediction model with only clinical information, the model added 42 significant three-way interactions using a looser criterion (discovery: FDR-q ≤ 0.10, validation: P ≤ 0.05) had substantially improved the area under the receiver operating characteristic curve (AUC) of the prognostic prediction model for both 3-year and 5-year survival. Our research identified the complex interaction effects among multiple environment and epigenetic factors, and provided therapeutic target for NSCLC patients.
© 2021 The Authors. Molecular Oncology published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.

Entities:  

Keywords:  DNA methylation; non-small-cell lung cancer; overall survival; prognostic prediction; three-way interaction

Mesh:

Substances:

Year:  2022        PMID: 34932879      PMCID: PMC8807353          DOI: 10.1002/1878-0261.13167

Source DB:  PubMed          Journal:  Mol Oncol        ISSN: 1574-7891            Impact factor:   7.449


area under the receiver operating characteristic curve confidence interval concordance index differentially expressed genes false discovery rate gene ontology hazard ratio honest significant difference integrated covariates and Three‐Way INteraction Score Kyoto Encyclopedia of Genes and Genomes lung adenocarcinomas lung squamous cell carcinomas non‐small‐cell lung cancer quality control receiver operating characteristic standard deviation single nucleotide polymorphisms The Cancer Genome Atlas tumor‐infiltrating immune cells tumor immune microenvironment

Introduction

Lung cancer, as one of the most commonly diagnosed cancer, is the leading cause of cancer mortality worldwide, with an estimated 1.8 million deaths every year [1]. More than 85% of lung cancer cases are non‐small‐cell lung cancer (NSCLC), of which lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the two major histological subtypes [2, 3]. Despite recent advances in early diagnosis and therapy, the 5‐year survival rate is low, ranging from 4 to 17%, which depends on clinical characteristics [4]. Even with the same histological type and at the same clinical stage, there exists wide heterogeneity in overall survival because of varying therapeutic responses [5]. However, molecular mechanisms underlying these therapeutic responses remain largely unclear [6]. DNA methylation is a reversible epigenetic modification [7, 8], with aberrant methylation being an early precursor of cancer progression [9] and a possible therapeutic target for NSCLC [10, 11, 12, 13, 14, 15, 16, 17] and other cancers [18, 19]. Furthermore, epigenetic features are modifiable by various environmental exposures (e.g., cigarette smoking), and changes contribute to the development and progression of cancer as well. Cigarette smoking is one of the leading risk factors to lung cancer morbidity and mortality [20, 21]. Our previous study found that a two‐way interaction between cg05293407 and smoking reveals the histologically heterogeneous effect of TRIM27 DNA methylation on survival among early‐stage NSCLC patients [22]. As is well known, gene–gene and gene–environment interactions provide important clues regarding biologic mechanisms of complex diseases [23] and are associated with the overall survival of NSCLC [10, 11, 14, 22, 24]. However, tumorigenesis and progression of complex diseases, such as NSCLC, often involves a multistep, multigenic, and multicausal biological process [25]; interactions between two relevant factors may only provide limited information understanding the cancer process driven by multiple factors [26]. Therefore, high‐order interactions may be needed to characterize complex association patterns relating to cancer recurrence and survival [27, 28], including NSCLC [29, 30]. This is because some well‐confirmed low‐order interactions, such as cg05293407  × smoking, may also be subjected to the influence of a third epigenetic factor, which necessitates the study of three‐way interactions arising from confirmed two‐way interactions [30]. To our knowledge, epigenetic analysis with three‐way interaction has seldom been conducted because of the complex nature of such analysis. Hereby, we performed a two‐phase epigenome‐wide study of three‐way interactions on overall survival of early‐stage NSCLC based on the previously confirmed two‐way interaction between cg05293407 and smoking, using patients from international consortium and the Cancer Genome Atlas (TCGA) database.

Materials and methods

Study populations

Included in the study were early‐stage (stages I and II) NSCLC patients, with harmonized DNA methylation data and gene expression data, from five international study centers, including Harvard [31], Spain [32], Norway [33], and Sweden [34], and TCGA. All of these studies were approved by each Institutional Review Board, and patients provided written informed consent. The study methodologies conformed to the standards set by the Declaration of Helsinki and was approved by the local ethics committee.

Harvard

The Harvard Lung Cancer Study cohort was described previously [31]. All patients were recruited at Massachusetts General Hospital since 1992. They were newly diagnosed and histologically confirmed as primary NSCLC at the time of recruitment. Snap‐frozen tumor samples were taken from patients during complete resection of the therapeutic procedure. Total 149 early‐stage patients, who had complete survival information, were enrolled in our study. Tumor DNA was extracted from 5‐μm‐thick histopathologic sections. Each specimen was evaluated by a pathologist for amount (tumor cellularity > 70%) and quality of tumor cells. Among them, 26 patients had mRNA sequencing data.

Spain

As described previously [32], tumors were collected by surgical resection from 207 patients. DNA extraction was performed on tumor specimens (tumor cellularity > 50%). However, Spanish study center did not profile mRNA sequencing data.

Norway

The Norwegian study consisted of 132 LUAD patients with operable lung cancer tumors seen at Oslo University Hospital, Rikshospitalet, Norway, in 2006–2011 [33]. Tumor tissues collected during surgery were snap‐frozen in liquid nitrogen and stored at −80 °C until DNA isolation. None of the enrolled patients received chemotherapy or radiotherapy before surgery. In addition, 93 of these patients had mRNA sequencing data.

Sweden

Tumor tissue samples were collected from 36 patients with early‐stage NSCLC, at Skane University Hospital, Lund, Sweden [34]. The study was developed under the approval of the Regional Ethical Review Board in Lund, Sweden (Registration no. 2004/762 and 2008/702). Meanwhile, mRNA sequencing data were collected from 34 of these patients.

TCGA

TCGA was composed of 227 LUAD and 241 LUSC cases with complete overall survival time and covariates. Level‐1 HumanMethylation450 DNA methylation data were downloaded at October 1, 2015. In addition, 221 LUAD and 235 LUSC patients had gene expression data. The TCGA workgroup completed processing and quality control (QC) of mRNA sequencing data. Raw counts were normalized by RNA‐seq expectation maximization (RSEM). Level‐3 gene quantification data were downloaded to further check quality. All of the gene expression value was transformed on a log2 scale and standardized before association analysis.

Quality control procedures for DNA methylation data

DNA methylation was profiled using Infinium HumanMethylation450 BeadChips (Illumina Inc., SanDiego, CA, USA). All studies followed the same QC procedures prior to association study. GenomeStudio Methylation Module V1.8 (Illumina Inc.) was used to convert raw image data into beta values (continuous numbers ranging from 0 to 100%) for background subtraction and control normalization. Unqualified probes meeting any one of the following criteria were excluded: (a) failed detection (P > 0.05) in more than 5% samples; (b) coefficient of variance (CV) < 5%; (c) methylated or unmethylated in all samples; (d) common single nucleotide polymorphisms (SNP) located in the probe sequence or 10‐bp flanking regions; (e) cross‐reactive probes or cross‐hybridizing probes [35]; or (f) did not pass quality control in all centers. Samples with > 5% undetectable probes or missing of pack‐year of smoking were excluded. Methylation signals were further processed for quantile normalization (betaqn function in R package minfi) as well as type I and II probes correction (BMIQ function in R package lumi), and were further adjusted for batch effects (ComBat function in R package sva) according to the best pipeline by a comparative study [36]. The QC processes are detailed in Fig. S1.

Statistical analysis

A two‐phase study of three‐way interaction

The analysis flow was depicted in Fig. 1, showing a two‐phase study to detect three‐way interactions that affect overall survival for early‐stage NSCLC on an epigenome‐wide scale. In the discovery phase, a histology‐stratified Cox proportional hazards model adjusted for age, sex, clinical stage, and study centers. The analysis flow was used to identify CpG probes that interact with pack‐year of smoking ×cg05293407 . That is, we tried to find three‐way interactions in the form of pack‐year of smoking × cg05293407  × CpG probe, by using samples from Harvard, Spain, Norway, and Sweden. Hazard ratios (HR) and 95% confidence intervals (CI) were computed for per 1% level of methylation increment. Multiple testing corrections were performed by controlling the false discovery rate (FDR) at a 5% level [37]. In the validation phase, the identified three‐way interactions from the discovery phase were further tested by using samples from TCGA. Significant interactions were those meeting the following criteria: (a) FDR‐q ≤ 0.05 in the discovery phase and P ≤ 0.05 in the validation phase; and (b) with consistent effect directions across two phases. Kaplan–Meier curves were used to compare survival differences between patients with high or low methylation level of each identified CpG probe, under different levels of smoking intensity and cg05293407 methylation.
Fig. 1

Flow chart of study design and statistical analyses. Patients with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) from the Harvard, Spain, Norway, and Sweden cohorts were used in the discovery phase for screening, whereas data from the Cancer Genome Atlas (TCGA) were used for validation. Subgroup analyses were based on quartiles of iTWINS.

Flow chart of study design and statistical analyses. Patients with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) from the Harvard, Spain, Norway, and Sweden cohorts were used in the discovery phase for screening, whereas data from the Cancer Genome Atlas (TCGA) were used for validation. Subgroup analyses were based on quartiles of iTWINS.

iTWINS subgroup analysis

An integrated covariates and Three‐Way INteraction Score (iTWINS) was defined by a weighted linear combination of all components inside the significant three‐way interaction, with weights derived from the aforementioned multivariable Cox proportional hazards model. Specifically, where and were the coefficients estimated in the Cox model. We conducted the following analyses. (a) We divided patients into four subgroups by the quartiles of iTWINS and compared their survival by Kaplan–Meier curves. (b) To explore the molecular characteristics of these subgroups, we preformed gene expression differential analysis through ANOVA (aov function in R package multcomp) and Tukey's honest significant difference (HSD) test (TukeyHSD function in R package multcomp). Genes with FDR‐q ≤ 0.05 and HSD‐q ≤ 0.05 were regarded as differentially expressed genes (DEGs). (c) We performed functional annotation and gene enrichment pathway analyses of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) on the DEGs by using the WebGestaltR package in R [38, 39, 40]. FDR‐q ≤ 0.05 was considered statistically significant. (d) To explore the pattern of tumor immune microenvironment (TIME) among subgroups, we computed the immune score representing the level of infiltrating immune cells based on gene expression data by using the ESTIMATE algorithm [41]. (e) We used CIBERSORT, a deconvolution algorithm based on linear support vector regression, to quantify the compositions of 22 types of tumor‐infiltrating immune cells (TIICs) [42].

Development of a prognostic prediction model

We further selected three‐way interactions by using a looser criterion (discovery: FDR‐q ≤ 0.10, validation: P ≤ 0.05). Then, they were included in a prognostic prediction model of NSCLC together with cg05293407  × pack‐year of smoking. The 3‐ and 5‐year overall survival of patients were estimated for each subgroup by using the Kaplan–Meier method [43]. The prediction accuracy was illustrated using a receiver operating characteristic (ROC) curve and was measured by area under the ROC curve (AUC) by the R package survivalROC. The 95% CI and P value of the AUC increase were calculated by 1000‐time bootstrap resampling. The concordance index (C‐index), an average accuracy of predictive survival across follow‐up years, which ranges from 0.5 to 1.0, were calculated to estimate the predictive performance. Continuous variables were expressed as mean ± standard deviation (SD), and categorical variables were expressed in frequency (n) and proportion (%). Statistical analysis was performed using R version 3.6.1 (The R Foundation of Statistical Computing).

Results

After quality control, our epigenome‐wide DNA methylation data were composed of 311,891 CpG probes from 992 early‐stage NSCLC patients; 524 patients (N LUAD = 425 and N LUSC = 99) were in the discovery phase and 468 patients (N LUAD = 227 and N LUSC = 241) were in the validation phase. Demographic and clinical information for these patients were detailed in Table 1 and Table S1.
Table 1

Demographic and clinical descriptions for early‐stage NSCLC patients with DNA methylation data in five international study centers.

VariableDiscoveryValidationCombined
USA (N = 149)Spain a (N = 207)Norway (N = 132)Sweden (N = 36)All (N = 524)TCGA (N = 468)Overall (N = 992)
Age (years)67.78 ± 9.9265.77 ± 10.6665.4 ± 9.2872.25 ± 7.2266.7 ± 10.0466.52 ± 9.366.61 ± 9.69
Pack‐year of smoking51.49 ± 40.6243.24 ± 31.7226.93 ± 17.6222.64 ± 25.1940.06 ± 33.0047.05 ± 28.4043.36 ± 31.10
Sex
Female66 (44.30%)99 (47.83%)70 (53.03%)22 (61.11%)257 (49.05%)185 (39.53%)442 (44.56%)
Male83 (55.70%)108 (52.17%)62 (46.97%)14 (38.89%)267 (50.95%)283 (60.47%)550 (55.44%)
Smoking status
Never18 (12.08%)28 (13.53%)16 (12.12%)18 (50.00%)80 (15.27%)0 (0.00%)80 (8.06%)
Former81 (54.36%)113 (54.59%)74 (56.06%)11 (30.56%)279 (53.24%)323 (69.02%)602 (60.69%)
Current50 (33.56%)66 (31.88%)42 (31.82%)7 (19.44%)165 (31.49%)145 (30.98%)310 (31.25%)
Clinical stage
I102 (68.46%)167 (80.68%)92 (69.70%)34 (94.44%)395 (75.38%)301 (64.32%)696 (70.16%)
II47 (31.54%)40 (19.32%)40 (30.30%)2 (5.56%)129 (24.62%)167 (35.68%)296 (29.84%)
Histology
LUAD96 (64.43%)169 (81.64%)132 (100.00%)28 (77.78%)425 (81.11%)227 (48.50%)652 (65.73%)
LUSC53 (35.57%)38 (18.36%)0 (0.00%)8 (22.22%)99 (18.89%)241 (51.50%)340 (34.27%)
Chemotherapy
No140 (93.96%)166 (91.21%)101 (76.52%)25 (89.29%)432 (87.98%)151 (75.88%)583 (84.49%)
Yes9 (6.04%)16 (8.79%)31 (23.48%)3 (10.71%)59 (12.02%)48 (24.12%)107 (15.51%)
Unknown0250833269302
Radiotherapy
No130 (87.25%)172 (94.51%)131 (99.24%)28 (100.00%)461 (93.89%)190 (95.48%)651 (94.35%)
Yes19 (12.75%)10 (5.49%)1 (0.76%)0 (0.00%)30 (6.11%)9 (4.52%)39 (5.65%)
Unknown0250833269302
Adjuvant therapy b
No125 (83.89%)157 (86.26%)100 (75.76%)25 (89.29%)407 (82.89%)146 (73.37%)553 (80.14%)
Yes24 (16.11%)25 (13.74%)32 (24.24%)3 (10.71%)84 (17.11%)53 (26.63%)137 (19.86%)
Unknown0250833269302
Survival year c
Median (95% CI)6.61 (5.14–7.49)3.83 (3.21–4.46)5.40 (5.16–5.77)3.25 (2.09–4.39)5.01 (4.53–5.25)0.58 (0.50–0.70)2.31 (2.00–2.64)
Censoring rate18.79%54.59%68.94%52.78%47.90%76.07%61.19%

TCGA, The Cancer Genome Atlas; 95% CI, 95% confidence interval; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma.

Spain center is a collaborative study center, containing samples from Spain, Italy, the UK, France, and the USA.

Adjuvant therapy included chemotherapy and/or radiotherapy.

Restricted mean survival time was given because median was not available; proportion of samples lost to follow‐up or alive at the end of study.

Demographic and clinical descriptions for early‐stage NSCLC patients with DNA methylation data in five international study centers. TCGA, The Cancer Genome Atlas; 95% CI, 95% confidence interval; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma. Spain center is a collaborative study center, containing samples from Spain, Italy, the UK, France, and the USA. Adjuvant therapy included chemotherapy and/or radiotherapy. Restricted mean survival time was given because median was not available; proportion of samples lost to follow‐up or alive at the end of study.

Two significant three‐way interactions identified in the two‐phase study

In the discovery phase, 91 three‐way interactions (FDR‐q ≤ 0.05) were identified, while only 2 of them remained significant (P ≤ 0.05) in the validation phase and showed robust association in the combined data (Table 2). Two CpG probes, together with pack‐year of smoking and cg05293407 , had significant three‐way interaction effects on NSCLC survival, including cg00060500 (HR interaction = 0.993, 95% CI: 0.990‐0.996, P = 7.79 × 10‐6, FDR‐q = 0.039 in the discovery phase; HR interaction = 0.992, 95% CI: 0.986‐0.999, P = 0.021 in the validation phase; HR interaction = 0.993, 95% CI: 0.991‐0.996, P = 8.91 × 10−7 in the combined data) and cg17479956 (HR interaction = 0.997, 95% CI: 0.996‐0.998, P = 1.16 × 10−5, FDR‐q = 0.046 in the discovery phase; HR interaction = 0.993, 95% CI: 0.987‐0.998, P = 0.011 in the validation phase; HR interaction = 0.997, 95% CI: 0.996‐0.999, P = 5.72 × 10‐5 in the combined data); however, these two probes did not have significant marginal effects (Table S2) and were significantly associated with the expression of their corresponding genes (KIAA0226 and EXT2), respectively (Fig. S2).
Table 2

Association results of two three‐way interactions in the discovery phase, the validation phase, and the combined data.

Discovery phaseValidation phaseCombined data
HR95% CI P FDR‐q HR95% CI P HR95% CI P
cg052934070.130.030.534.21E‐030.230.016.940.3960.140.040.492.02E‐03
cg000605000.670.510.884.24E‐030.590.291.190.1420.650.510.836.09E‐04
Pack‐year0.900.860.942.06E‐060.880.800.970.0120.900.860.931.04E‐07
cg05293407 × cg000605001.261.081.483.91E‐031.250.841.850.2711.261.091.451.30E‐03
cg05293407 × pack‐year1.071.041.102.32E‐061.071.011.140.0161.071.041..091.14E‐07
cg00060500 × pack‐year1.011.011.024.23E‐061.011.001.030.0191.011.011.026.62E‐07
cg05293407 × cg00060500 × pack‐year0.9930.9900.9967.79E‐060.0390.9920.9860.9990.0210.9930.9910.9968.91E‐07
cg052934070.220.100.464.97E‐050.200.021.760.1460.320.170.615.23E‐04
cg174799560.730.640.832.57E‐060.630.410.970.0340.770.680.874.12E‐05
pack‐year0.950.930.971.37E‐050.910.850.990.0190.950.930.971.74E‐05
cg05293407 × cg174799561.181.101.271.94E‐061.321.031.700.0311.141.071.224.18E‐05
cg05293407 × pack‐year1.031.021.043.15E‐061.061.011.110.0151.031.021.044.69E‐06
cg17479956 × pack‐year1.0051.0031.0082.46E‐051.0111.0021.0190.0171.0051.0031.0078.99E‐05
cg05293407 × cg17479956 × pack‐year0.9970.9960.9981.16E‐050.0460.9930.9870.9980.0110.9970.9960.9995.72E‐05
Association results of two three‐way interactions in the discovery phase, the validation phase, and the combined data. To visualize the three‐way interactions, we generated a 3D figure illustrating, for example, the varying effects of cg00060500 at different levels of pack‐year of smoking and cg05293407 (Fig. 2A). As the smoking intensity increased and methylation level of cg05293407 decreased, we observed an enhanced risk associated with cg00060500 ; the risk of cg00060500 was elevated as smoking intensity decreased and methylation level of cg05293407 increased.
Fig. 2

The three‐way interaction pattern between pack‐year of smoking, cg05293407 and cg00060500 . (A) The pattern illustrated by a 3D plot. (B‐E) Kaplan–Meier survival analysis of cg00060500 stratified by pack‐year of smoking and cg05293407 . The number of patients in each subgroup was 266, 282, 211 and 233. Hazard ratio (HR), 95% CI, and P‐values were derived from a Cox proportional hazards regression model adjusted for age, sex, clinical stage and study centers.

The three‐way interaction pattern between pack‐year of smoking, cg05293407 and cg00060500 . (A) The pattern illustrated by a 3D plot. (B‐E) Kaplan–Meier survival analysis of cg00060500 stratified by pack‐year of smoking and cg05293407 . The number of patients in each subgroup was 266, 282, 211 and 233. Hazard ratio (HR), 95% CI, and P‐values were derived from a Cox proportional hazards regression model adjusted for age, sex, clinical stage and study centers. To more clearly illustrate the effect modifiers, we compared the effect of cg00060500 (high vs low defined by median) on survival in four subgroups, defined by the median of pack‐year of smoking and cg05293407 (Fig. 2B‐E). For patients with low level of pack‐year of smoking, the effect of cg00060500 on survival was harmful (HR High vs Low = 1.70, 95% CI: 1.02‐2.83, P = 0.042) when cg05293407 was high (Fig. 2B), and was protective (HR High vs Low = 0.34, 95% CI: 0.17‐0.66, P = 1.37 × 10‐3) when cg05293407 was low (Fig. 2C); for patients with high level of pack‐year of smoking, cg00060500 had a protective effect (HR High vs Low = 0.50, 95% CI: 0.29‐0.86, P = 0.012) when cg05293407 was high (Fig. 2D), and a detrimental effect (HR High vs Low = 1.65, 95% CI: 1.02‐2.68, P = 0.041) when cg05293407 was low (Fig. 2E). Similar patterns were also observed for the three‐way interaction of pack‐year of smoking × cg05293407  × cg17479956 (Fig. S3). As demonstrated in Fig. S4A, these two three‐way interaction modules were highly correlated (r = 0.82, P = 2.66 × 10‐241), and KIAA0226 and EXT2 gene expression were also correlated significantly (r = 0.61, P = 7.71 × 10‐52) (Fig. S4B). Therefore, the subsequent analysis focused on the interaction involving cg00060500 that had a lower P value.

Significant heterogeneity among iTWINS subgroups

We categorized patients into four subgroups based on the quartiles of iTWINS, with Groups 1 and 4 representing the lowest and highest risk patients, respectively. As shown in Fig. 3A, patients with higher iTWINS had a shorter median survival year (0.97, 95% CI: 0.75–1.25) and a higher risk of mortality (HR Group4 vs 1 = 3.98, 95% CI: 2.98–5.33, P = 1.74 × 10‐20; HR Group3 vs 1 = 2.23, 95% CI: 1.67–2.98, P = 5.13 × 10‐8; HR Group2 vs 1 = 1.39, 95% CI: 1.03–1.87, P = 0.033) (Fig. 3B). For comparison, we also divided patients into four subgroups based on the clinical score, a weighted linear combination of demographic and clinical variables, as shown in gray lines in Fig. 3A. iTWINS obviously outperformed the clinical score in discriminability, and the association between iTWINS and overall survival remained significant in all subgroup analyses stratified by various covariates (Fig. 4).
Fig. 3

Estimated survival curves for patients among iTWINS subgroups. (A) Kaplan–Meier survival curves for patients grouped by iTWINS subgroups. Patients were categorized into four subgroups by using the quantile of iTWINS as the cutoffs. The number of patients in groups 1‐4 was 248. (B) Hazard ratio (HR) and P‐values were derived from the Cox proportional hazards model for patients with different quartile levels of the iTWINS.

Fig. 4

Forest plots of results from stratification analysis of iTWINS. Hazard ratio (HR) with 95% CI of iTWINS on non‐small‐cell lung cancer (NSCLC) survival in various subgroups is stratified by clinical characteristics. LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma. Hazard ratio (HR), 95% CI, and P‐values were derived from a Cox proportional hazards regression model for patients in different subgroups.

Estimated survival curves for patients among iTWINS subgroups. (A) Kaplan–Meier survival curves for patients grouped by iTWINS subgroups. Patients were categorized into four subgroups by using the quantile of iTWINS as the cutoffs. The number of patients in groups 1‐4 was 248. (B) Hazard ratio (HR) and P‐values were derived from the Cox proportional hazards model for patients with different quartile levels of the iTWINS. Forest plots of results from stratification analysis of iTWINS. Hazard ratio (HR) with 95% CI of iTWINS on non‐small‐cell lung cancer (NSCLC) survival in various subgroups is stratified by clinical characteristics. LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma. Hazard ratio (HR), 95% CI, and P‐values were derived from a Cox proportional hazards regression model for patients in different subgroups. Exploring the molecular characteristics of these iTWINS subgroups, we performed differential expression analysis and identified a total of 3150 DEGs, which were significantly enriched in 106 KEGG pathways, including NSCLC (Fig. 5A). In addition, GO enrichment analysis identified 284 significant biological process pathways (Fig. 5B), 76 significant cellular component pathways (Fig. 5C) and 90 significant molecular function pathways (Fig. 5D). Moreover, iTWINS was significantly but negatively correlated with the immune score (r = −0.096, P = 0.018) (Fig. S5A). For example, compared to Group 4 of iTWINS that had the worst survival, Group 1 with the best survival had the highest immune score (P = 0.0092) (Fig. S5B), indicating that abundant immune cells are beneficial for NSCLC survival. Further, the compositions of 14 types of immune cells were differently distributed among iTWINS subgroups (Fig. 6A); the correlation of iTWINS with each immune cell composition varied, including negative (e.g., with T‐cell subsets gamma delta, r = −0.29, P = 2.67 × 10‐13) and positive correlations (e.g., with M2 macrophages, r = 0.32, P = 3.17 × 10‐16) (Fig. 6B).
Fig. 5

The functional enrichment analyses of DEGs. (A) Top 36 significant KEGG pathways. (B) Top 36 significant biological process pathways. (C) Top 36 significant cellular component pathways and (D) top 36 significant molecular function pathways.

Fig. 6

The association analysis between immune cells and iTWINS. (A) Comparisons of the abundances of 22 immune cells in iTWINS subgroups. ‘ns’ means P > 0.05, ∗ means P < 0.05, ∗∗∗ means P < 0.001 and ∗∗∗∗ means P < 0.0001. (B) Correlation analysis between immune cells and iTWINS. Yellow bars meant correlation coefficient > 0 and P ≤ 0.05, blue bars meant correlation coefficient< 0 and P ≤ 0.05, and gray bars meant P > 0.05. Correlation coefficients and hypothesis tests were based on Pearson correlation tests.

The functional enrichment analyses of DEGs. (A) Top 36 significant KEGG pathways. (B) Top 36 significant biological process pathways. (C) Top 36 significant cellular component pathways and (D) top 36 significant molecular function pathways. The association analysis between immune cells and iTWINS. (A) Comparisons of the abundances of 22 immune cells in iTWINS subgroups. ‘ns’ means P > 0.05, ∗ means P < 0.05, ∗∗∗ means P < 0.001 and ∗∗∗∗ means P < 0.0001. (B) Correlation analysis between immune cells and iTWINS. Yellow bars meant correlation coefficient > 0 and P ≤ 0.05, blue bars meant correlation coefficient< 0 and P ≤ 0.05, and gray bars meant P > 0.05. Correlation coefficients and hypothesis tests were based on Pearson correlation tests.

Three‐way interaction empowered prognostic prediction model

We built a prognostic prediction model of NSCLC by incorporating significant three‐way interactions. The model with only demographic and clinical variables had a limited prediction ability (AUC3‐year = 0.64, AUC5‐year = 0.68, C‐index = 0.62), and the prediction accuracy slightly increased by adding the cg05293407  × pack‐year of smoking (AUC3‐year = 0.67, AUC5‐year = 0.71, C‐index = 0.63) or the two significant three‐way interactions (AUC3‐year = 0.72, AUC5‐year = 0.74, C‐index = 0.66). However, by adding 42 three‐way interactions obtained by using a looser criterion (discovery: FDR‐q ≤ 0.10, validation: P ≤ 0.05), the AUC increased by 32.8% (95% CI: 32.6%‐33.1%, P = 2.20 × 10‐16) and 29.4% (95% CI: 29.2%‐29.6%, P = 2.20 × 10‐16) for 3‐ and 5‐year survival (AUC3‐year = 0.85, AUC5‐year = 0.88, C‐index = 0.76), respectively (Fig. 7).
Fig. 7

ROC curves for different predictive models using the covariates (clinical information), two‐way interaction (cg05293407 and pack‐year of smoking) and three‐way interactions (cg05293407 , pack‐year of smoking and another CpG probe) with FDR‐q ≤ 0.05 or FDR‐q ≤ 0.10. (A) Three‐year survival prediction. (B) Five‐year survival prediction. The AUC increase (%) was evaluated by comparing the model with that with only the covariates. P‐values and 95% CIs were calculated by using 1000 bootstrap samples and t‐tests. AUC, area under the receiver operating characteristic curve; ROC, receiver operating characteristic.

ROC curves for different predictive models using the covariates (clinical information), two‐way interaction (cg05293407 and pack‐year of smoking) and three‐way interactions (cg05293407 , pack‐year of smoking and another CpG probe) with FDR‐q ≤ 0.05 or FDR‐q ≤ 0.10. (A) Three‐year survival prediction. (B) Five‐year survival prediction. The AUC increase (%) was evaluated by comparing the model with that with only the covariates. P‐values and 95% CIs were calculated by using 1000 bootstrap samples and t‐tests. AUC, area under the receiver operating characteristic curve; ROC, receiver operating characteristic.

Discussion

We were among the first few to perform an epigenome‐wide three‐way interaction study on overall survival of early‐stage NSCLC. We identified two CpG probes (cg00060500 and cg17479956 ), each of which, by interacting with cg05293407  × pack‐year of smoking, displayed a significant three‐way interaction effect on the overall survival of NSCLC. The proposed iTWINS had some interesting properties. It enabled us to categorize patients with various risk levels of mortality into four subgroups, which involved a series of significant DEGs belonging to several biological pathways. Also, iTWINS was negatively correlated with the immune score, representing the level of infiltrating immune cells. Finally, prognostic models with three‐way interactions substantially improved the prediction accuracy. It is known that gene–gene and gene–environment interactions on an epigenetic level play a crucial role in revealing biological mechanisms of complex diseases [23], including NSCLC [11, 24]. Emerging evidence from higher‐order interaction studies with high‐throughput data [44, 45] indicates that accommodation of such complex association patterns can improve the predictive power [46]. Building upon our previous study that identified two‐way interactions contributing to the improvement of the prediction accuracy [14], the current study found that inclusion of three‐way interactions might largely enhance prediction performance for both 3‐ and 5‐year survival. Therefore, complex association patterns (e.g., higher‐order interactions) among multiple factors should be factored in for studies of complex diseases, such as NSCLC. To further elaborate on the three‐way interactions among pack‐year of smoking, cg05293407 and cg00060500 /cg17479956 , we generated 3D figures and observed varying effects of cg00060500 /cg17479956 at different levels of pack‐year of smoking and cg05293407 . Nevertheless, EXT2 is found to be the likely antigens in the immune complexes and is possibly the target antigens in autoimmune membranous nephropathy [47]. Meanwhile, KIAA0226 is identified as a multifaceted immune modulatory protein that regulates autophagy and phagocytosis [48]. They were all related to immunity, and their gene expression was significantly correlated. Thus, the subsequent analysis only focused on the interaction involving cg00060500 . Biologically, KIAA0226 was identified as a novel Beclin‐1‐binding protein, and hence dubbed as Rubicon (RUN domain and cysteine‐rich domain containing, Beclin‐1‐interacting protein) [49, 50]. Rubicon is a multifaceted immune modulatory protein that regulates autophagy and phagocytosis [48]. Cigarette smoke induced extrinsic apoptosis partly dependent on autophagy protein Beclin‐1 and caused a reduction of Beclin‐1 expression [51, 52]. Furthermore, decreased expression of Beclin‐1 may lead to changes in Rubicon expression, thus affecting cross‐regulation between apoptosis and autophagy [53]. That said, biological mechanisms underlying this three‐way interaction need to be more systematically studied. Notably, besides the heterogeneous survival among four epigenetic iTWINS subgroups, there were a series of significant transcriptional DEGs enriched in cancer related pathways, including NSCLC pathway and EGFR tyrosine kinase inhibitor resistance pathway which directly affected the overall survival of NSCLC [54]. In addition, the subgroup with the lowest iTWINS had the best survival and the highest immune score, consistent with the previous study [55]. Furthermore, among 13 immune cell types identified in NSCLC tumors [56], 7 of them were significantly correlated with iTWINS in our study, including T cells CD4, T cells CD8, B cells, macrophages, eosinophils, mast cells, and dendritic cells. For example, M2 macrophages were positively correlated with iTWINS, indicating the higher abundance of M2 macrophages, the worse the survival of NSCLC, as confirmed by several studies [57, 58]. Our iTWINS subgroup analysis further indicated that potential NSCLC‐related pathways and TIICs might be the driven factors of lung cancer progression. Our study has several strengths. First, to our knowledge, this is the first three‐way interaction study between DNA methylation of CpG probes and pack‐year of smoking, shedding some light on the tumor recurrence and progression. Second, we controlled false positives in the discovery phase and also performed external validation using an independent population. In our study, the final significant signals were defined as these with FDR‐q ≤ 0.05 in the discovery phase and meanwhile with P ≤ 0.05 in the validation phase. Thus, the overall false positive rate is less than 0.05, which is approximatively around 0.0025 (0.05 × 0.05). Such two‐phase design is a very popular and quite conservative manner used in the omics analysis to control false positives. Third, 3D plots were designed and used to illustrate the complex three‐way interactions, facilitating interpretation and dissemination of our results. Fourth, by exploring the molecular characteristics of different subgroups defined by iTWINS, we gained a better understanding of the heterogeneous response to immunotherapy, and therefore, a potential cause of survival heterogeneity. Finally, our prognostic model by incorporating three‐way interactions may aid physicians in making clinical decisions or guiding immunotherapy. We acknowledge limitations. First, the study did not elucidate the biological mechanism underlying the identified three‐way interactions, although the results of immune‐related and enrichment analyses might provide insight into the functional mechanisms. Second, the censoring rate (76.07%) of survival time for the TCGA cohort is relatively high, since early‐stage NSCLC patients need longer follow‐up time. Thus, the validation phase using TCGA population had low statistical power. Nonetheless, the three‐way interactions remained significant in TCGA. Finally, as the majority of our population was Caucasian (89%), generalization of the results to the other ethnicity groups should be cautioned.

Conclusion

Our study identified two CpG probes (cg00060500 and cg17479956 ), together with pack‐year of smoking and cg05293407 , had genome‐wide significant three‐way interaction effects on NSCLC survival. iTWINS, which involved in TIME, can distinguish high risk NSCLC patients susceptible to mortality, by significantly improving prognostic prediction accuracy for NSCLC survival. Our findings suggested that lung cancer progression may be driven by complex interacting effects that involve multiple environmental and epigenetic factors, which can provide potential therapeutic targets of NSCLC.

Conflict of interest

The authors declare no conflict of interest.

Peer review

The peer review history for this article is available at https://publons.com/publon/10.1002/1878‐0261.13167.

Author contributions

XJ, LL, JF, RZ, FC, and DCC contributed to the study design. RZ, LS, AS, MMB, AK, MP, JS, ÅH, ME, and DCC contributed to data collection. XJ, LL, JF, YL, YW, SS, RZ, FC, and DCC performed statistical analysis and interpretation and drafted the manuscript. XJ, LL, JF, YL, YW, and SS revised the manuscript. All authors contributed to critical revision of the manuscript and approved its final version. Financial support and study supervision were provided by RZ, FC, and DCC.

Consent for publication

All participants or their surrogate care providers gave written informed consent. All authors have reviewed the manuscript and consented for publication.

Web resources

TCGA: https://portal.gdc.cancer.gov Table S1. Demographic and clinical descriptions of early‐stage NSCLC patients with gene expression data in four international study centers. Table S2. The association results of variables derived from Cox proportional hazards model adjusted for covariates in NSCLC samples. Fig. S1. Quality control processes for DNA methylation data. Fig. S2. Correlation between DNA methylation probe and its corresponding gene expression. Fig. S3. The three‐way interaction pattern between pack‐year of smoking, cg05293407 and cg17479956 . Fig. S4. Association between KIAA0226 and EXT2. Fig. S5. The association analysis between immune score and iTWINS. Click here for additional data file.
  57 in total

1.  Pack-years of cigarette smoking as a prognostic factor in patients with stage IIIB/IV nonsmall cell lung cancer.

Authors:  Yelena Y Janjigian; Kevin McDonnell; Mark G Kris; Ronglai Shen; Camelia S Sima; Peter B Bach; Naiyer A Rizvi; Gregory J Riely
Journal:  Cancer       Date:  2010-02-01       Impact factor: 6.860

Review 2.  The role of gene-environment interaction in the aetiology of human cancer: examples from cancers of the large bowel, lung and breast.

Authors:  L A Mucci; S Wedren; R M Tamimi; D Trichopoulos; H O Adami
Journal:  J Intern Med       Date:  2001-06       Impact factor: 8.989

Review 3.  Non-small-cell lung cancers: a heterogeneous set of diseases.

Authors:  Zhao Chen; Christine M Fillmore; Peter S Hammerman; Carla F Kim; Kwok-Kin Wong
Journal:  Nat Rev Cancer       Date:  2014-08       Impact factor: 60.716

4.  Incorporating higher-order representative features improves prediction in network-based cancer prognosis analysis.

Authors:  Shuangge Ma; Michael R Kosorok; Jian Huang; Ying Dai
Journal:  BMC Med Genomics       Date:  2011-01-12       Impact factor: 3.063

5.  A multi-omic study reveals BTG2 as a reliable prognostic marker for early-stage non-small cell lung cancer.

Authors:  Sipeng Shen; Ruyang Zhang; Yichen Guo; Elizabeth Loehrer; Yongyue Wei; Ying Zhu; Qianyu Yuan; Sebastian Moran; Thomas Fleischer; Maria M Bjaanaes; Anna Karlsson; Maria Planck; Johan Staaf; Åslaug Helland; Manel Esteller; Li Su; Feng Chen; David C Christiani
Journal:  Mol Oncol       Date:  2018-05-04       Impact factor: 6.603

6.  Trans-omics biomarker model improves prognostic prediction accuracy for early-stage lung adenocarcinoma.

Authors:  Xuesi Dong; Ruyang Zhang; Jieyu He; Linjing Lai; Raphael N Alolga; Sipeng Shen; Ying Zhu; Dongfang You; Lijuan Lin; Chao Chen; Yang Zhao; Weiwei Duan; Li Su; Andrea Shafer; Moran Salama; Thomas Fleischer; Maria Moksnes Bjaanæs; Anna Karlsson; Maria Planck; Rui Wang; Johan Staaf; Åslaug Helland; Manel Esteller; Yongyue Wei; Feng Chen; David C Christiani
Journal:  Aging (Albany NY)       Date:  2019-08-21       Impact factor: 5.682

7.  Immune Cell Composition in Human Non-small Cell Lung Cancer.

Authors:  Branislava Stankovic; Heidi Anine Korsmo Bjørhovde; Renate Skarshaug; Henrik Aamodt; Astri Frafjord; Elisabeth Müller; Clara Hammarström; Kahsai Beraki; Espen S Bækkevold; Per Reidar Woldbæk; Åslaug Helland; Odd Terje Brustugun; Inger Øynebråten; Alexandre Corthay
Journal:  Front Immunol       Date:  2019-02-01       Impact factor: 7.561

8.  Targeting c-Myc to Overcome Acquired Resistance of EGFR Mutant NSCLC Cells to the Third-Generation EGFR Tyrosine Kinase Inhibitor, Osimertinib.

Authors:  Lei Zhu; Zhen Chen; Hongjing Zang; Songqing Fan; Jiajia Gu; Guojing Zhang; Kevin D-Y Sun; Qiming Wang; Yong He; Taofeek K Owonikoko; Suresh S Ramalingam; Shi-Yong Sun
Journal:  Cancer Res       Date:  2021-07-21       Impact factor: 12.701

9.  An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform.

Authors:  Francesco Marabita; Malin Almgren; Maléne E Lindholm; Sabrina Ruhrmann; Fredrik Fagerström-Billai; Maja Jagodic; Carl J Sundberg; Tomas J Ekström; Andrew E Teschendorff; Jesper Tegnér; David Gomez-Cabrero
Journal:  Epigenetics       Date:  2013-02-19       Impact factor: 4.528

10.  Independent Validation of Early-Stage Non-Small Cell Lung Cancer Prognostic Scores Incorporating Epigenetic and Transcriptional Biomarkers With Gene-Gene Interactions and Main Effects.

Authors:  Ruyang Zhang; Chao Chen; Xuesi Dong; Sipeng Shen; Linjing Lai; Jieyu He; Dongfang You; Lijuan Lin; Ying Zhu; Hui Huang; Jiajin Chen; Liangmin Wei; Xin Chen; Yi Li; Yichen Guo; Weiwei Duan; Liya Liu; Li Su; Andrea Shafer; Thomas Fleischer; Maria Moksnes Bjaanæs; Anna Karlsson; Maria Planck; Rui Wang; Johan Staaf; Åslaug Helland; Manel Esteller; Yongyue Wei; Feng Chen; David C Christiani
Journal:  Chest       Date:  2020-02-28       Impact factor: 9.410

View more
  1 in total

1.  Epigenome-wide gene-age interaction study reveals reversed effects of MORN1 DNA methylation on survival between young and elderly oral squamous cell carcinoma patients.

Authors:  Ziang Xu; Yan Gu; Jiajin Chen; Xinlei Chen; Yunjie Song; Juanjuan Fan; Xinyu Ji; Yanyan Li; Wei Zhang; Ruyang Zhang
Journal:  Front Oncol       Date:  2022-07-28       Impact factor: 5.738

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.