Literature DB >> 31423161

A novel CpG island methylation panel predicts survival in lung adenocarcinomas.

Pingzhao Yan1, Xiaohua Yang2, Jianhua Wang1, Shichang Wang1, Hong Ren3.   

Abstract

The lack of clinically useful biomarkers compromise the personalized management of lung adenocarcinomas (ADCs); epigenetic events and DNA methylation in particular have exhibited potential value as biomarkers. By comparing genome-wide DNA methylation data of paired lung ADCs and normal tissues from 6 public datasets, cancer-specific CpG island (CGI) methylation changes were identified with a pre-specified criterion. Correlations between DNA methylation and expression data for each gene were assessed by Pearson correlation analysis. A prognostically relevant CGI methylation signature was constructed by risk-score analysis, and was validated using a training-validation approach. Survival data were analyzed by log-rank test and Cox regression model. In total, 134 lung ADC-specific CGI CpGs were identified, among which, a panel of 9 CGI loci were selected as prognostic candidates, and were used to construct a risk-score signature. The novel CGI methylation signature was identified to classify distinct prognostic subgroups across different datasets, and was demonstrated to be a potent independent prognostic factor for overall survival time of patients with lung ADCs. In addition, it was identified that cancer-specific CGI hypomethylation of RPL39L, along with the corresponding gene expression, provided optimized prognostication of lung ADCs. In summary, cancer-specific CGI methylation aberrations are optimal candidates for novel biomarkers of lung ADCs; the 9-CpG methylation panel and hypomethylation of RPL39L exhibited particularly promising significance.

Entities:  

Keywords:  CpG island; DNA methylation; biomarker; lung adenocarcinomas; prognostication

Year:  2019        PMID: 31423161      PMCID: PMC6607393          DOI: 10.3892/ol.2019.10431

Source DB:  PubMed          Journal:  Oncol Lett        ISSN: 1792-1074            Impact factor:   2.967


Introduction

Non-small cell lung cancer (NSCLC) is the leading cause of cancer-associated mortality worldwide, and adenocarcinoma (ADC) is its most common histological subtype (1). Despite multiple treatment modalities, NSCLC is commonly associated with unfavorable outcomes, and has a 5-year survival rate of <20% (1). Several factors are known to contribute to the poor prognosis of patients with NSCLC, including late diagnosis of disease and a lack of effective drugs (2). NSCLCs are a clinically and molecularly heterogeneous group of diseases, and survival outcome or treatment response varies among individuals (3). Therefore, the absence of clinically informative biomarkers for stratifying different risk subgroups or guiding targeted treatment decisions is also notable. Efforts to identify potential biomarkers have been made, with a focus on genetic alterations including somatic mutations, copy number variations and gene expression; however, few are suitable for routine use in the field of NSCLC treatment (3–5). Epigenetic changes, and particularly those at the DNA methylation level, are implicated in tumor initiation and progression (6). Hypermethylation of CpG islands (CGI) at the promoter regions of tumor-suppressor genes and consequent transcriptional silencing represents the best-known epigenetic event in cancer biology (6). As a novel molecular candidate for cancer biomarker discovery, DNA methylation has numerous advantages over the genetic alteration- or gene expression-based biomarkers for clinical application, including reliable DNA samples, stable methylation changes, informative biological relevance and drug-induced reversibility (7). Early efforts with candidate-gene approaches have identified a number of useful prognostic biomarkers based on the CGI methylation status of key genes, including Ras association domain family 1 isoform A (RASSF1A), runt-related transcription factor 3, and deleted in esophageal and lung cancer 1, in NSCLC (8). Unfortunately, these single-gene methylation events were unable to demonstrate consistent prognostic ability in independent validation studies, and therefore have not effected a real change in routine practice (8). High-throughput genome-wide DNA methylation profiling techniques have been increasingly used for the detection of the cancer genome markers. These methods may provide a comprehensive and unbiased identification of prognostic DNA methylation events throughout the epigenome, eventually leading to the improvement of personalized medicine for NSCLC (3). The present study aimed to identify clinically useful epigenetic biomarkers from lung ADC-specific CGI methylation changes at different gene regions using genome-wide DNA methylation microarray data of lung ADCs and matched normal tissues from 6 publically available datasets. Accordingly, a 9-CpG CGI methylation panel and hypomethylation/overexpression of ribosomal protein 39 like (RPL39L) were identified, which may be of potential value for optimizing the risk stratification and personalized management of lung ADCs.

Materials and methods

Public datasets

The Cancer Genome Atlas (TCGA)

Genome-wide DNA methylation data and corresponding clinical information were retrieved from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/, accessed March 2016), including a dataset of 65 lung ADCs [female/male, 35/30; Tumor-Node-Metastasis (TNM) staging, I to IV (1); median age, 67 years; age range, 38–84 years] and 24 matched non-tumor lung samples assayed using an Illumina Infinium 27k BeadChip system (TCGA-27k set) and a dataset of 456 tumor samples [female/male, 244/212; TNM staging, I to IV (1); median age, 66 years; age range, 33–88 years] and 29 matched normal samples assayed using a Illumina Infinium 450k BeadChip system (TCGA-450k set) (3). There were Infinium 27k and 450k DNA methylation data for 6 tumor samples. For the transcriptome data, Level 3 Illumina HiSeq_RNASeqV2 data were obtained for all tumor samples from the TCGA-27k set, and for 452 tumor samples and 58 matched normal samples from the TCGA-450k set. Among the aforementioned TCGA datasets, Level 2 IlluminaGA_DNASeq data were also available for 490 samples, and Level 3 Affymetrix Genome_Wide_SNP_6 data for 512 samples. Somatic copy number data were analyzed within the GISTIC2.0 module on GenePattern (http://genepattern.broadinstitute.org/gp/; accessed March 2016). An amplitude threshold of ±0.2 was used.

Gene Expression Omnibus (GEO)

Genome-wide DNA methylation microarray data were also obtained from 4 GEO series (https://www.ncbi.nlm.nih.gov/geo/; access at March 2016), including: i) A dataset of 59 matched lung ADCs [female/male, 45/14; TNM stage I to IV (1); median age, 68 years; age range, 39–86 years] and non-tumor lung samples [accession no. GSE32861; Selamat et al set (9)]; ii) a dataset of 26 matched tumor [female/male, 14/12; TNM stage I to IV (1); median age, unknown] and normal lung samples [accession no. GSE32866; Ontario Tumor Bank set (9)]; iii) a dataset of 28 matched tumor [female/male, 22/6; TNM stage I to IV (1); median age, 65 years; age range, unknown] and normal lung samples of never-smokers [accession no. GSE62948; Mansfield et al set (10)]; and iv) a dataset of 35 matched tumors [female/male, 19/16; TNM stage I to II (1); median age, 63 years; age range, 47–88 years] and normal lung samples of patients with lung ADCs [accession no. GSE63384; Robles et al set (11)].

Ethical approval

All procedures performed in studies involving humans were conducted in accordance with the ethical standards of the institutional research committees and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants as reported by included datasets (3,9–11).

Microarray data processing

For the Level 3 DNA methylation microarray data (Infinum BeadChips, Illumina Inc.), the methylation level of each interrogated CpG locus was summarized as a β-value, providing a continuous and quantitative index of DNA methylation, ranging from 0 (completely unmethylated) to 1 (completely methylated). To ensure that β-values were comparable across each dataset/platform, batch effects were adjusted by a non-parametric empirical Bayes approach (ber R package; version 3.2.5; http://www.r-project.org/) (12–14). The empirical Bayes correction was demonstrated to effectively remove batch effects following initial microarray data normalization (12,13). M-value transformation was applied prior to the batch effect adjustment to avoid a negative β-value, as described previously (15). For the gene-level analysis of the Level 3 Illumina HiSeq_RNASeqV2 data, expression values of 0 were set as the overall minimum value, and all data were log2 transformed and standardized to z-scores within each gene. All missing values were imputed by nearest neighbor averaging (impute R package) (3).

Cancer-specific CGI methylation loci and their correlation with gene expression

The CpG probes interrogated by the Infinium 27k and 450k platforms were maintained for analysis, and were annotated using the Infinium Human Methylation 450k annotation file. Prior selection of CpGs probes was performed by removal of those that: i) Targeted the X and Y chromosomes; ii) contained a single-nucleotide polymorphism within 5 base pairs of and including the targeted CpGs; and iii) were not located at CGI regions of a gene; CGI was defined by the UCSC genome reference (http://genome.ucsc.edu/; accessed March 2016). For CpGs corresponding to multiple annotation terms, the first one in the 450k annotation file were used in the present study, to simplify data interpretation. Finally, a total of 9,270 CpG probes were included for additional analysis. Differentially methylated CpGs were computed by two-sample Wilcoxon test (samr R package). Lung ADC-specific CpGs were defined as those having a median β difference ≥0.2 between matched tumor and non-tumor lung samples and a false discovery rate (FDR) q-value ≤0.05 in at least 4 of the 6 datasets. Methylation and expression data were paired based on each Entrez Gene ID (https://www.ncbi.nlm.nih.gov/gene/; accessed March 2016). The correlation between methylation and expression level of each gene was evaluated by Pearson's correlation analysis, and those having an absolute Pearson correlation coefficient (r)≥0.3, 0.2–0.3, or 0.1–0.2 and P≤0.05 were defined as strong, moderate or weak correlations, respectively.

Construction and validation of a CGI methylation-based risk score signature

The training-validation approach was used to construct a prognostic CGI methylation signature. The training phase was performed using the TCGA-450k set, where the methylation levels of lung ADC-specific CpGs were correlated with overall survival (OS) time by univariate Cox regression analysis with permutation correction within the Biometric Research Branch-Array Tools (http://brb.nci.nih.gov/BRB-ArrayTools, accessed March 2016). Those that exhibited significant correlation with OS (permutation P≤0.05), and high variability [standard deviation (SD)≥0.10] were finally selected as prognostic methylation candidates. Probes with a higher SD variability indicated that the interrogated CpGs loci may have more opportunities to be dysregulated across tumors. These CpGs may therefore be more likely to serve roles in tumor biology, and the alterations in those CpGs may be easier to detect (16). The prognostic model was established by risk-score analysis, where each patient was assigned a risk score that is a linear combination of the methylation levels of each CpG weighted by their corresponding Cox regression coefficients (17). The median risk score (3.08) from the training set was pre-specified as cut-off for stratifying low-risk and high-risk subgroups. The validation phase was performed on the aforementioned TCGA-27k (3) and Robles et al (11) datasets. An additional dataset of patients with lung ADC [female/male, 127/125; TNM stage I to IV (1); median age, 65 years; age range, 40–90 years] with relapse-free survival (RFS) time was also included for independent validation [accession no. GSE39279; n=252; Sandoval et al set (2)].

Database for Annotation, Visualization and Integrated Discovery (DAVID) annotation clustering analysis

DAVID (version 6.7; https://david.ncifcrf.gov/; accessed March 2016) (18) was used to create functional annotation for genes corresponding to cancer-specific differentially-methylated CpGs with Gene Ontology (19), BioCarta (20) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway tools (21).

Statistical analysis

Survival data were estimated by the Kaplan-Meier method, and compared using the log-rank test. Survival data were summarized as median OS time or RFS time. The associations between variables and survival data were evaluated using the univariate Cox regression model. A multivariate Cox regression model was used to evaluate the independence of each potential prognostic indicator by incorporating those significant variables from the univariate Cox model. Pooled survival data were analyzed by meta-analysis with the inverse-variance method, where either fixed- or random-effect models were used on the basis of the intra-dataset heterogeneity. Heterogeneity was analyzed using the χ2 test and I2 statistic, with Pheterogeneity <0.1 or I2 value >50% being considered significant. When integrating the DNA methylation and gene expression data of RPL39L, the optimal cut-off values to segregate patients into poor and good prognostic subgroups were determined by the maximally selected rank statistics, as described previously (22). All calculations were performed with SPSS v19.0 (SPSS Software, Inc., Chicago, IL, USA) and R software version 3.2.5, and P≤0.05 was considered to indicate a statistically significant difference.

Results

Identification of cancer-specific CGI methylation loci in lung ADCs

By comparing the genome-wide DNA methylation data of matched lung ADCs and non-tumor tissues from the 6 included datasets, a total of 134 CGI loci (corresponding to 119 genes) that met the study criteria of lung ADCs-specific methylation changes were identified. Almost all of these CGI CpGs gained DNA methylation, whereas only 3 loci ([cg07693270 (RPL39L); cg24898753 (ferritin heavy chain 1); and cg06038133 (CORO6)] were hypomethylated in lung ADCs (Table I). DAVID annotation analysis (18) revealed that those cancer-specific methylation changes often affected genes with roles in the regulation of transcription (49 genes, P=3.60×10−10), cell-cell signaling (17 genes, P=1.29×10−5) and cell surface receptor linked signal transduction (22 genes, P=0.042). Furthermore, by integrating TCGA gene expression data, it was identified that the methylation levels of 27 (20%), 23 (17%) and 45 (34%) CpGs exhibited strong, moderate and weak correlations with their gene expressions, respectively. Accordingly, among those that were strongly associated with DNA methylation (n=82), 64 genes (78%) were differentially expressed between tumor and non-tumor tissues. In summary, ADC-specific CGI loci, and those with corresponding gene expression aberrations in particular, may serve as potential biomarker candidates with diagnostic and prognostic possibilities.
Table I.

Characteristics of the identified lung ADC-specific CpGs at CpG island regions.

ProbesChr.SymbolsGene IDAssociation with CpG islandAssociation with gene regionMethylation status between tumor and normal tissuesPearson coefficients between DNA methylation and gene expression[a]Log2 fold change between tumor and normal tissues[b]
cg1833506819ZNF677342926Island5′UTRHypermethylation−0.603−0.860
cg0808930117HOXB43214Island1stExonHypermethylation−0.536−0.326
cg043173997HOXA43201Island1stExonHypermethylation−0.492−1.446
cg075331481TRIM5825893Island1stExonHypermethylation−0.475−1.407
cg0770340116HBQ13049Island1stExonHypermethylation−0.461−0.552
cg234323457HOXA73204Island1stExonHypermethylation−0.436−0.648
cg128806585CDO11036Island1stExonHypermethylation−0.414−1.599
cg029194228SOX1764321Island5′UTRHypermethylation−0.410−1.527
cg2587521319ZNF781163115Island5′UTRHypermethylation−0.402−1.279
cg1445883417HOXB43214Island1stExonHypermethylation−0.389−0.326
cg100889854CXCL56374Island1stExonHypermethylation−0.375−0.392
cg0404825920EDN31908IslandTSS200Hypermethylation−0.363−1.415
cg0406239119ZNF560147741Island5′UTRHypermethylation−0.341Not Significant
cg164282513SOX148403IslandTSS200Hypermethylation−0.341Not Significant
cg0762104610C10orf82143379IslandTSS200Hypermethylation−0.337−0.355
cg1853614817TBX49496Island5′UTRHypermethylation−0.332−1.540
cg232903448NEFM4741IslandTSS1500Hypermethylation−0.328−0.336
cg212337225DOCK21794IslandBodyHypermethylation−0.325−0.946
cg1438453215NTRK34916IslandTSS1500Hypermethylation−0.322−1.120
cg020081547TBX2057057Island1stExonHypermethylation−0.320Not Significant
cg2154667117HOXB43214Island1stExonHypermethylation−0.319−0.326
cg198857615CPLX210814Island5′UTRHypermethylation−0.318−0.476
cg0373487414TMEM179388021IslandTSS1500Hypermethylation−0.3130.530
cg175254061AJAP155966IslandBodyHypermethylation−0.295−0.943
cg206164149WNK265268Island1stExonHypermethylation−0.2880.946
cg102358174ADRA2C152Island1stExonHypermethylation−0.269−1.000
cg1014171512SLC5A8160728Island1stExonHypermethylation−0.254−0.829
cg000157704QRFPR84109Island1stExonHypermethylation−0.246Not Significant
cg075368471PAX75081IslandTSS1500Hypermethylation−0.2370.777
cg254849044CWH4380157IslandTSS1500Hypermethylation−0.237−0.971
cg138708667TBX2057057Island1stExonHypermethylation−0.235Not Significant
cg060928152SPHKAP80309IslandTSS200Hypermethylation−0.233−0.977
cg237102188MSC9242Island1stExonHypermethylation−0.2250.735
cg1211171413ATP8A251761IslandBodyHypermethylation−0.220−1.184
cg005482687NPTX24885IslandTSS1500Hypermethylation−0.2150.843
cg213768831ACTN288IslandBodyHypermethylation−0.213−1.639
cg0844180610NKX6-284504Island1stExonHypermethylation−0.212−0.576
cg209598661AJAP155966IslandTSS1500Hypermethylation−0.211−0.943
cg0066255618GALR12587IslandBodyHypermethylation−0.211−0.322
cg2079206212KCNA53741Island5′UTRHypermethylation−0.211−1.276
cg1055606416SMPD355512Island5′UTRHypermethylation−0.206−0.405
cg202910492POU3F35455Island1stExonHypermethylation−0.2000.385
cg126141057NPY4852Island5′UTRHypermethylation−0.195Not Significant
cg0961914610CPXM2119587Island1stExonHypermethylation−0.193Not Significant
cg0449071416SLC6A26530Island1stExonHypermethylation−0.190−0.375
cg1392932810FOXI2399823Island1stExonHypermethylation−0.189−0.781
cg1808125814NDRG257447IslandTSS1500Hypermethylation−0.188−1.450
cg1534311918GALR12587IslandTSS1500Hypermethylation−0.187−0.322
cg0089154116SMPD355512Island5′UTRHypermethylation−0.187−0.405
cg1048699818GALR12587IslandTSS1500Hypermethylation−0.187−0.322
cg212456522MAL4118IslandTSS1500Hypermethylation−0.181−1.204
cg0667547813SOX16656IslandTSS200Hypermethylation−0.1780.352
cg2672126418GALR12587IslandTSS1500Hypermethylation−0.178−0.322
cg1895264715BNC1646IslandTSS1500Hypermethylation−0.177−0.498
cg0168388316CMTM2146225IslandTSS1500Hypermethylation−0.175−1.336
cg067226331GRIK32899IslandBodyHypermethylation−0.175Not Significant
cg259424505TLX330012IslandTSS200Hypermethylation−0.1730.474
cg270097037HOXA93205Island1stExonHypermethylation−0.170Not Significant
cg0453476518GALR12587Island1stExonHypermethylation−0.170−0.322
cg1906425816HS3ST29956Island1stExonHypermethylation−0.163−0.265
cg021640463SST6750Island1stExonHypermethylation−0.159Not Significant
cg1276860519LYPD5284348IslandTSS200Hypermethylation−0.153−0.346
cg257208045TLX330012Island1stExonHypermethylation−0.1530.474
cg108833037HOXA133209Island1stExonHypermethylation−0.1500.831
cg124577736NRSN1140767Island5′UTRHypermethylation−0.150−0.521
cg1400888310SLC18A36572Island1stExonHypermethylation−0.1480.725
cg035443204CRMP11400Island1stExonHypermethylation−0.147−0.610
cg241998344POU4F25458Island1stExonHypermethylation−0.145Not Significant
cg1945654014SIX64990Island1stExonHypermethylation−0.1440.392
cg085726117ACTL6B51412IslandBodyHypermethylation−0.142Not Significant
cg004894015FLT42324IslandBodyHypermethylation−0.133−1.340
cg053734578KCNS23788Island5′UTRHypermethylation−0.133Not Significant
cg149914872HOXD93235IslandTSS200Hypermethylation−0.123Not Significant
cg027744394HAND29464Island5′UTRHypermethylation−0.122−0.364
cg0275743210GPR262849Island1stExonHypermethylation−0.114Not Significant
cg250446515LVRN206338Island1stExonHypermethylation−0.112Not Significant
cg013544737HOXA93205Island1stExonHypermethylation−0.112Not Significant
cg081098156NMBR4829Island5′UTRHypermethylation−0.107−0.506
cg103034878DPYS1807Island1stExonHypermethylation−0.107−0.816
cg1855544011MYOD14654Island1stExonHypermethylation−0.094Not Significant
cg099365614DRD51816Island1stExonHypermethylation−0.085Not Significant
cg148594605GRM62916IslandTSS200Hypermethylation−0.079Not Significant
cg1872284111PHOX2A401Island1stExonHypermethylation−0.0790.428
cg0922991212CUX223316Island1stExonHypermethylation−0.076Not Significant
cg204043871FAM43B163933Island1stExonHypermethylation−0.0720.314
cg127821807LEP3952IslandTSS1500Hypermethylation−0.0700.987
cg154892945LVRN206338IslandTSS1500Hypermethylation−0.068Not Significant
cg2599371820CBLN4140689IslandTSS200Hypermethylation−0.067−0.431
cg1678760010SORCS322986Island1stExonHypermethylation−0.062Not Significant
cg0730707818TUBB684617IslandTSS1500Hypermethylation−0.059−1.308
cg0883222712KCNA13736IslandBodyHypermethylation−0.058−0.405
cg013818467HOXA93205Island1stExonHypermethylation−0.055Not Significant
cg023325253GRM72917Island1stExonHypermethylation−0.050−0.368
cg1574850710PRLHR2834IslandBodyHypermethylation−0.049Not Significant
cg1519164818SALL327164IslandTSS200Hypermethylation−0.0480.690
cg2660963113GSX1219409Island5′UTRHypermethylation−0.048Not Significant
cg133028238SCRT183482Island1stExonHypermethylation−0.033Not Significant
cg0183946418DCC1630IslandBodyHypermethylation−0.029−1.162
cg256911677FERD3L222894Island1stExonHypermethylation−0.025Not Significant
cg053452866MDFI4188IslandBodyHypermethylation−0.0240.960
cg2557402411IGF2AS51214IslandBodyHypermethylation−0.020Not Significant
cg1152528514VSX2338917Island1stExonHypermethylation−0.019−0.269
cg2218763019CACNA1A773Island1stExonHypermethylation−0.0160.290
cg2129623015GREM126585Island5′UTRHypermethylation−0.0101.327
cg1379113111IGF2AS51214IslandBodyHypermethylation−0.009Not Significant
cg012952038PRDM1463978IslandTSS1500Hypermethylation0.002Not Significant
cg262521676GPR62830Island1stExonHypermethylation0.004Not Significant
cg135476441ACTA158Island5′UTRHypermethylation0.012Not Significant
cg2288191414NID222795IslandTSS1500Hypermethylation0.0280.667
cg232079904SFRP26423IslandTSS1500Hypermethylation0.0410.729
cg1332375212SLC2A14144195IslandTSS200Hypermethylation0.054−0.701
cg0964354419ZNF1777730Island1stExonHypermethylation0.064−0.725
cg085755377EPO2056IslandBodyHypermethylation0.064−0.262
cg1510767011WT17490Island1stExonHypermethylation0.0670.569
cg2618672718NETO181832Island1stExonHypermethylation0.0861.372
cg0695882917ACSF280221IslandBodyHypermethylation0.091−0.500
cg049072575ADCY2108IslandTSS1500Hypermethylation0.097−0.578
cg215917422HOXD103236IslandTSS1500Hypermethylation0.1140.507
cg039589796NR2E17101IslandTSS1500Hypermethylation0.1231.200
cg022453782PAX35077IslandBodyHypermethylation0.126Not Significant
cg1414430511ALX460529IslandBodyHypermethylation0.129Not Significant
cg2590288919FSD179187IslandBodyHypermethylation0.1410.842
cg2266057817LHX13975IslandTSS1500Hypermethylation0.1510.784
cg2234131019ZNF54184215IslandBodyHypermethylation0.172−0.621
cg134621297DLX51749IslandBodyHypermethylation0.1930.678
cg113761981AKR7L246181IslandTSS200Hypermethylation0.2430.531
cg263169466GRIK22898Island1stExonHypermethylation0.2460.659
cg038741992HOXD123238IslandTSS200Hypermethylation0.2830.501
cg231302542HOXD123238Island1stExonHypermethylation0.3170.501
cg007675812HOXD43233IslandTSS1500Hypermethylation0.353Not Significant
cg187021972HOXD33232IslandTSS1500Hypermethylation0.3550.422
cg076932703RPL39L116832Island5′UTRHypomethylation−0.6681.296
cg2489875311FTH12495IslandTSS1500Hypomethylation0.053−0.589
cg0603813317CORO684940IslandBodyHypomethylation0.054−0.849

Pearson coefficients that were calculated using all TCGA lung ADC samples with paired DNA methylation and gene expression data.

Log2 fold changes that were calculated using the expression data from all paired lung ADCs and normal tissues from TCGA. TSS, transcription start site; UTR, untranslated region; TCGA, The Cancer Genome Atlas; ADC, adenocarcinoma; Chr., chromosome.

Identification of a novel CGI methylation signature that is a potent prognostic indicator for OS time in lung ADCs

Within the univariate Cox regression model incorporating methylation data of those ADCs-specific CGI loci, a total of 9 CGI CpGs were identified from the training set (TCGA-450k set) that were significantly associated with OS (permutation P≤0.05), and that had higher variability (SD≥0.10) in lung ADCs. Characteristics of the 9 CGI CpGs are summarized in Table II. Methylation data of 7 and 2 CpGs exhibited negative and positive associations with OS, respectively (Table II). Accordingly, as aforementioned, the risk score formula for the CpGs of the MyoD family inhibitor (MDFI), homeobox D3 (HOXD3), CKLF like MARVEL transmembrane domain containing 2 (CMTM2), paired box 3 (PAX3), LY6/PLAUR domain containing 5 (LYPD5), laeverin (LVRN), RPL39L, glutamate ionotropic receptor kainite type subunit 2 (GRIK2) and complexin 2 (CPLX2) genes was established as follows: Risk score=[(1.403 × β-value of cg05345286 (MDFI)) + (1.564 × β-value of cg18702197 (HOXD3)) + (1.646 × β-value of cg01683883 (CMTM2)) + (1.526 × β-value of cg02245378 (PAX3)) + (0.984 × β-value of cg12768605 (LYPD5)) + (1.316 × β-value of cg25044651 (Laeverin (LVRN)) + (−1.130 × β-value of cg07693270 (RPL39L)) + (1.088 × β-value of cg26316946 (GRIK2)) + (−0.835 × β-value of cg19885761 (CPLX2))]. On the basis of the risk formula, each patient from the TCGA-450k set was assigned a risk score, and then classified into low-risk or high-risk groups using the median score as a cut-off (3.08). Survival analysis indicated that in the TCGA-450k set, the low-risk group was associated with increased OS times compared with the high-risk group [54.4 vs. 42.3 months, respectively; P=0.006 (log-rank test); Fig. 1A].
Table II.

Characteristics of the 9-CpG CpG island methylation panel.

Probe IDSymbolAssociation with gene regionChr.Methylation status between tumor and normal tissues[a]Expression status between tumor and normal tissues[b]Pearson coefficients between DNA methylation and gene expression[c]Univariate Cox coefficients[d]Permutation P-value[d]
cg05345286MDFIBody6HyperUp−0.0241.4030.003
cg18702197HOXD3TSS15002HyperUp0.3551.5640.004
cg01683883CMTM2TSS150016HyperDown−0.1751.6460.012
cg02245378PAX3Body2HyperNS0.1261.5260.017
cg12768605LYPD5TSS20019HyperDown−0.1530.9840.024
cg25044651LVRN1stExon5HyperNS−0.1121.3160.030
cg07693270RPL39L5′UTR3HypoUp−0.668−1.1300.043
cg26316946GRIK21stExon6HyperUp0.2461.0880.043
cg19885761CPLX25′UTR5HyperDown−0.318−0.8350.049

Methylation status in all the 6 included datasets.

Expression status in all matched lung adenocarcinomas and normal tissues within the combined TCGA dataset (TCGA-27k and TCGA-450k sets).

Pearson coefficients in all TCGA tumor samples with paired DNA methylation and gene expression data.

Calculated within the TCGA-450k training set. Chr., chromosome; hyper, hypermethylation; hypo, hypomethylation; up, upregulation; down, downregulation; NS, not significantly altered; TCGA, The Cancer Genome Atlas; TSS, transcription start site; UTR, untranslated region; MDFI, MyoD family inhibitor; HOXD3, homeobox D3; CMTM2, CKLF like MARVEL transmembrane domain containing 2; PAX3, paired box 3; LYPD5, LY6/PLAUR domain containing 5; LVRN, laeverin; RPL39L, ribosomal protein 39 like; GRIK2, glutamate ionotropic receptor kainite type subunit 2; CPLX2, complexin 2.

Figure 1.

Kaplan-Meier curves of overall survival time using the 9-CpG island methylation signature across each dataset. (A) TCGA-450k dataset. (B) TCGA-27k dataset. (C) Robles et al (11) dataset. TCGA, The Cancer Genome Atlas.

To confirm its prognostic relevance, the CGI methylation signature in an additional 2 datasets, the TCGA-27k and Robles et al (11) datasets, were analyzed. By directly applying the risk formula and using cut-off points, the TCGA-27k set was divided into a low-risk group (n=21) and a high-risk group (n=44). In concordance with the training set, patients within the low-risk group exhibited increased OS times compared with those within the high-risk group [77.3 vs. 34.2 months; P=0.039 (log-rank test); Fig. 1B]. Similar results were also observed within the Robles et al (11) set, where low-risk patients were associated with improved OS compared with the high-risk patients [median time not reached for either group; P=0.009 (log-rank test); Fig. 1C]. Pooled analysis at dataset level confirmed the prognostic relevance of the CGI methylation signature for lung ADCs [hazard ratio (HR)=1.61, 95% confidence interval (CI), 1.20–2.17; P=0.002; I2=29%, P=0.25]. Univariate Cox regression analysis of all patients from TCGA datasets (combined TCGA-27k and TCGA-450k sets) indicated that only tumor stages and the CGI methylation signature were significantly associated with OS, while patient age, sex, tumor stages, smoking status, MET proto-oncogene, receptor tyrosine kinase amplification and mutations in key genes including KRAS proto-oncogene, GTPase, Epithelial growth factor receptor, tumor protein P53 and B-Raf proto-oncogene, serine/threonine kinase were not. Finally, the multivariate Cox regression analysis demonstrated the prognostic significance of the CGI methylation signature of the present study in lung ADCs (Table III).
Table III.

Results from Cox regression models within all The Cancer Genome Atlas samples.

Univariate Cox modelMultivariate Cox model


VariablesHR (95% CI)P-valueHR (95% CI)P-value
Tumor stage1.651 (1.441–1.893)<0.0011.611 (1.405–1.847)<0.001
CGI methylation signature1.606 (1.199–2.152)0.0011.449 (1.078–1.947)0.014
Sex1.057 (0.794–1.407)0.705
Smoking status0.915 (0.611–1.371)0.666
Age1.009 (0.993–1.024)0.271
BRAF mutations0.707 (0.402–1.246)0.231
EGFR mutations1.230 (0.830–1.824)0.302
KRAS mutations1.176 (0.858–1.610)0.314
TP53 mutations1.332 (0.990–1.793)0.058
MET amplification1.027 (0.833–1.268)0.801

HR, hazard ratio; CI, confidence interval; CGI, CpG island; BRAF, B-Raf proto-oncogene, serine/threonine kinase; EGFR, epithelial growth factor receptor; KRAS, KRAS proto-oncogene, GTPase; TP53, tumor protein P53; MET, MET proto-oncogene, receptor tyrosine kinase.

CGI methylation signature is not a strong prognostic indicator of RFS in lung ADCs

To investigate the association of the CGI methylation signature of the present study with RFS, it was analyzed within the TCGA-450k set, which yielded a marginally significant difference in RFS between each risk group [33.9 vs. 27.0 months; P=0.049 (log-rank test); Fig. 2A]. Then, in the TCGA-27k set, low-risk patients appeared to exhibit longer RFS compared with the high-risk patients, but the difference did not reach significance (68.2 vs. 17.0 months; log-rank test P=0.072; Fig. 2B). An additional large cohort of lung ADCs was finally introduced into the validation phase, where the CGI methylation signature also failed to significantly stratify patients into subgroups with distinct RFS outcomes [62.6 vs. 55.6 months; P=0.492 (log-rank test); Fig. 2C]. Despite that, the pooled analysis of the 3 datasets yielded a significant difference in RFS between the risk groups (HR, 1.30; 95% CI, 1.04–2.62; P=0.020; I2=0%; P=0.38). The inconsistent results from different analysis levels indicated that the CGI methylation signature is not a robust indicator for RFS in lung ADCs.
Figure 2.

Kaplan-Meier curves of the relapse-free survival time using the 9-CpG island methylation signature across each dataset. (A) TCGA-450k dataset. (B) TCGA-27k dataset. (C) Sandoval et al (2) dataset. TCGA, The Cancer Genome Atlas.

Novel classification approach based on the integration of DNA methylation and gene expression of RPL39L in lung ADCs

By characterizing each member of the CGI methylation panel, it was identified that one CGI locus (cg07693270) was consistently hypomethylated in lung ADCs (Fig. 3A), and the methylation data were closely correlated with gene expression (RPL39L, Pearson coefficient r=−0.668; P<0.0001; Fig. 3B), indicating a methylation-dependent transcriptional regulatory mechanism for RPL39L. In line with its epigenetic status, RPL39L was upregulated in lung ADCs, indicating a tumor-promoting role (Fig. 3C). At the initiation of the present study, the methylation level of RPL39L was positively correlated with OS. Therefore, the present study attempted to prognostically classify patients by single-locus methylation status of RPL39L, and it was identified that tumors with methylated CGI of RPL39L were associated with increased OS compared with the unmethylated tumors within TCGA samples (Fig. 3D). Additionally, it was identified that based on RPL39L expression levels, patients may also be classified into distinct prognostic subgroups, in which tumors exhibiting decreased RPL39L expression levels were associated with increased OS time compared with those with increased expression levels [59.7 vs. 42.7 months; P=0.002 (log-rank test); Fig. 3E]. These data indicated the possibility of a promising classification approach based on the integration of the DNA methylation and gene expression of RPL39L. Consequently, the present study identified that tumors with increased methylation and decreased expression of RPL39L exhibited the best OS among all cases (Fig. 3F and G). The multivariate Cox model demonstrated the prognostic independence of the integrated approach (HR, 0.54; 95% CI, 0.36–0.81; P=0.003) as compared with tumor stages (HR, 1.66; 95% CI, 1.45–1.91; P<0.001) within TCGA samples. These data indicated that RPL39L may serve oncogenic roles in the progression of lung ADCs, and may represent a novel promising therapeutic target for this disease. The integrated epigenetic and transcriptional assessment of PRL39L may be useful for optimizing the risk stratification of patients with lung ADC, and for identifying the appropriate subgroups sensitive to targeted drugs against RPL39L.
Figure 3.

Integration of DNA methylation and gene expression of RPL39L within The Cancer Genome Atlas samples. (A) Methylation status between matched lung adenocarcinomas and normal tissues. (B) Expression status between matched lung adenocarcinomas and normal tissues. (C) Pearson correlation between DNA methylation and RPL39L gene expression. (D) Patient classification on the basis of single-locus methylation levels of RPL39L. (E) Patient classification on the basis of the expression levels of RPL39L. (F) Patient classification on the basis of the combined assessment of DNA methylation and gene expression of RPL39L. (G) Patients with increased methylation and decreased expression of RPL39L experienced an improved survival time compared with that in the other subgroups. RPL39L, ribosomal protein 39 like; methy, methylated; unmethy, unmethylated.

Discussion

The study of epigenetic markers, particularly DNA methylation, represents one of the most promising and fastest expanding areas in cancer biomarker identification (23). Similar to other tumors, lung ADCs are characterized by distinct genome-wide DNA methylation landscapes, where the global hypomethylation of DNA repeats occurs concomitantly with CGI hypermethylation of gene regions (8). Among those cancer-specific DNA methylation aberrations, the promoter-specific CGI de novo methylation of tumor suppressor genes is the best-known epigenetic abnormality in lung cancer patients (8). Studies using candidate gene approaches have identified a large number of known tumor suppressors, including cyclin-dependent kinase inhibitor 2A (24), RAS association domain family member 1 (25), O-6-methylguanine-DNA methyltransferase (26) and APC, WNT signaling pathway regulator (27), to be consistently methylated in lung ADCs. A number of those epigenetic alterations were identified to serve crucial roles in tumorigenesis via the regulation of gene expression and to exhibit promise in the diagnosis and prognostication of patients with lung cancer (24–27). Previously, efforts have been made to comprehensively assess cancer epigenomes using genome-wide DNA methylation profiling techniques, including Illumina array-based assays, restriction landmark genome scanning gel-based analysis, and next-generation sequencing-based analysis (23,28). The application of those high-throughput detection approaches may provide an unbiased and clear view of the lung cancer epigenome, and assist in identifying useful DNA methylation events for diagnostic and prognostic purposes. The reproducibility of results from genome-wide DNA methylation analysis may be an issue for making definitive conclusions from these types of studies, as false-positive data are common in microarray analysis where the number of interrogated loci within each tumor is larger compared with the number of participants (29). Batch effects appear to be a common phenomenon in high-throughput microarray data, particularly for the Infinium Methylation BeadChip (13). In the present study, the effective empirical Bayes method was adopted to remove the potential non-biological difference of methylation data across each dataset. Genome-wide DNA methylation data of lung ADCs and matched control tissues from 6 publically available datasets were then independently re-analyzed, and stricter criteria were adopted to identify robust cancer-specific CGI methylation loci in lung ADCs. In total, 134 cancer-specific CpGs were consistently observed in at least 4 of the 6 datasets examined in the present study, 11 of which had been described by previous studies with other DNA methylation detection approaches, for example genes in HOX clusters (30) including HOXB4, HOXA7 and HOXA9, TRIM58 (31) and GALR1 (32) (Table I). The methylation status of these genes exhibited promise for the early detection and risk prediction for lung cancer (30–32). In addition, by integrating gene expression data, it was identified that a considerable proportion of these cancer-specific CGI methylation changes may have significant effects on their relevant gene expression, and indicate potential functional value in tumorigenesis of lung ADCs. Well-studied examples are the de novo CGI methylation of zinc finger protein 677 (33), cysteine dioxygenase type 1 (34,35), SRY-box 1 (SOX1) (36) and SOX17 (37) in NSCLCs. The data from the present study were corroborated by the validation of the identified CGI methylation candidates in the literature (30–34). In addition, the present study also identified a panel of previously unknown cancer-specific CGI methylation loci that may have potential roles in determining the fate of patients with lung cancer, which will warrant future investigation. Clinically or functionally characterizing each CGI candidate is beyond the scope of the present study. Instead, by applying a univariate Cox regression model and permutation correction, a panel of 9 CGI CpGs that were significantly associated with OS time was identified in a large cohort of patients with lung ADCs (TCGA-450k set; n=450). The detection of a panel of biomarkers, compared with single markers, may have a higher sensitivity and specificity for specific clinical purposes (38). Therefore, a risk score-based prognostic classifier was established based on the methylation patterns of the 9 CpGs to assist in stratifying patients into distinct prognostic subgroups. The novel methylation signature indicated consistent prognostic ability in different patient cohorts. Finally, a multivariate Cox model demonstrated its prognostic significance in the context of different tumor stages. However, with respect to the RFS data, which is an additional notable clinical outcome, the novel methylation signature demonstrated limited value for risk stratification, and future validation is required for justifying a definitive conclusion. In summary, the data in the present study indicated that the CGI methylation signature of the present study may be a potent prognostic indicator for OS outcome in patients with all-stage lung ADCs. Additional supporting evidence for this novel CGI methylation signature may support its potential biological relevance in cancer biology. In the present study, it was identified that the methylation levels of 8 CpG loci were significantly correlated with gene expression (positively correlated: HOXD3, GRIK2 and PAX3; and negative correlation: PRL39L, CPLX2, CMTM2, LYPD5 and LVRN). Accordingly, the majority of the genes were differentially expressed between lung ADCs and normal tissues (upregulated: RPL39L, GRIK2 and HOXD3; and downregulated: CMTM2, CPLX2 and LYPD5). The majority of these genes have been demonstrated to be abnormally methylated and expressed in a number of human cancer types, including breast, colorectal and prostate cancer, and were closely associated with patient prognosis and tumor aggressiveness (39–42). However, limited data had been acquired on their functional roles in cancer biology. RPL39L was identified to confer drug resistance in lacrimal gland adenoid cystic carcinoma (43) and the lung cancer A549 cell line (44), but others have not been fully characterized in cancer. Future functional investigation of these genes will assist in developing understanding of the biological implications of the CGI methylation signature of the present study, and for identifying promising epigenetic therapeutic targets in lung ADCs. Unlike the cancer-specific de novo DNA methylation at CGI regions of genes, the presence and functional roles of CGI hypomethylation have been much less well characterized in cancer biology. The present study identified that CGI hypomethylation of RPL39L was consistently observed in lung ADCs. This epigenetic event may have functional significance in the initiation and progression of lung cancer, as it markedly affected gene transcription and resulted in the upregulation of RPL39L in tumor tissues. In line with the aforementioned data, it was also demonstrated that within TCGA samples, either epigenetic or transcriptional activations of RPL39L were associated with poorer OS time in patients with lung ADCs. Furthermore, the integration of DNA methylation and gene expression data identified a refined subset of tumors with favorable prognoses whose RPL39L gene was epigenetically and transcriptionally repressed. RPL39L is a recently evolved ribosomal protein paralog that exhibits highly specific tissue expression patterns in mice and humans (45). This gene was previously described to be highly expressed in the testis and to be upregulated in multiple cancer cell lines (45). Wong et al (45) had demonstrated that RPL39L was highly upregulated in mouse embryonic stem cells, and that its expression was markedly associated with tumor aggressiveness and vascular invasiveness of hepatocellular carcinomas (45). High expression of RPL39L may also confer the drug-resistant phenotype of lung cancer A549 cell lines (44). However, RPL39L was demonstrated to be associated with hypermethylation and gene inactivation in prostate cancer cell lines (39). Together, these results indicated that epigenetic and transcriptional abnormalities in RPL39L were commonly implicated in the initiation and progression of human cancer. Notably, the data from the present study is of interest as it provides novel evidence for the contributing roles of CGI hypomethylation and gene re-activation in lung cancer. In addition, the data also raise concerns surrounding the current non-specific demethylating anticancer approach, as it may promote cancer development via the exacerbation of cancer-specific hypomethylation. Targeted epigenetic therapy that has distinct effects on cancer-specific hypermethylation and hypomethylation may be a promising option for the future development of anticancer therapy. Unfortunately, the oncogenic roles of RPL39L have not been studied extensively in lung ADCs. Future functional studies may assist in developing targeted therapies against this gene. Finally, the integrated assessment of RPL39L may be a promising approach for optimizing risk stratification, and improving personalized medicine in lung ADCs. There were several limitations to the present study. The incompleteness of certain important clinical information for the included patients, including performance status and treatment modality, compromised the prognostic robustness of the study-specific methylation signature. The clinical and methodological heterogeneity across each dataset may also introduce uncertainty in data interpretation. Other limitations include the relatively small sample size of the validation sets, and the lack of functional validation of those CGI methylation candidates. The results of the present study were preliminary and primarily derived from microarray data analysis. Additional studies will be required to validate these results in vivo, and in a clinical setting. In conclusion, by comparing genome-wide DNA methylation and gene expression profiles of lung ADCs and matched non-tumor tissues from multiple independent datasets, the present study identified a number of cancer-specific CGI methylation changes in lung ADCs, and characterized their associations with gene expression. Those CGI methylation changes may be useful for the identification of novel biomarkers for diagnostic and prognostic purposes in lung ADCs. One example is the identification of a 9-CpG methylation panel that was demonstrated to be a potent prognostic indicator for OS time. Furthermore, the identification of CGI hypomethylation and consequent gene re-activation of RPL39L provides novel insights into treatment development and risk stratification for lung ADCs.
  42 in total

1.  Aberrant methylation of the adenomatous polyposis coli (APC) gene promoter 1A in breast and lung carcinomas.

Authors:  A K Virmani; A Rathi; U G Sathyanarayana; A Padar; C X Huang; H T Cunnigham; A J Farinas; S Milchgrub; D M Euhus; M Gilcrease; J Herman; J D Minna; A F Gazdar
Journal:  Clin Cancer Res       Date:  2001-07       Impact factor: 12.531

2.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

3.  Generalized maximally selected statistics.

Authors:  Torsten Hothorn; Achim Zeileis
Journal:  Biometrics       Date:  2008-03-05       Impact factor: 2.571

4.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nat Protoc       Date:  2009       Impact factor: 13.491

5.  A gene hypermethylation profile of human cancer.

Authors:  M Esteller; P G Corn; S B Baylin; J G Herman
Journal:  Cancer Res       Date:  2001-04-15       Impact factor: 12.701

6.  Epigenetic inactivation of RASSF1A in lung and breast cancers and malignant phenotype suppression.

Authors:  D G Burbee; E Forgacs; S Zöchbauer-Müller; L Shivakumar; K Fong; B Gao; D Randle; M Kondo; A Virmani; S Bader; Y Sekido; F Latif; S Milchgrub; S Toyooka; A F Gazdar; M I Lerman; E Zabarovsky; M White; J D Minna
Journal:  J Natl Cancer Inst       Date:  2001-05-02       Impact factor: 13.506

7.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.

Authors:  Pan Du; Xiao Zhang; Chiang-Ching Huang; Nadereh Jafari; Warren A Kibbe; Lifang Hou; Simon M Lin
Journal:  BMC Bioinformatics       Date:  2010-11-30       Impact factor: 3.169

Review 8.  Lung cancer: from single-gene methylation to methylome profiling.

Authors:  Gerwin Heller; Christoph C Zielinski; Sabine Zöchbauer-Müller
Journal:  Cancer Metastasis Rev       Date:  2010-03       Impact factor: 9.264

Review 9.  DNA methylation patterns in lung carcinomas.

Authors:  Gerd P Pfeifer; Tibor A Rauch
Journal:  Semin Cancer Biol       Date:  2009-02-20       Impact factor: 15.707

10.  DNA methylation markers and early recurrence in stage I lung cancer.

Authors:  Malcolm V Brock; Craig M Hooker; Emi Ota-Machida; Yu Han; Mingzhou Guo; Stephen Ames; Sabine Glöckner; Steven Piantadosi; Edward Gabrielson; Genevieve Pridham; Kristen Pelosky; Steven A Belinsky; Stephen C Yang; Stephen B Baylin; James G Herman
Journal:  N Engl J Med       Date:  2008-03-13       Impact factor: 91.245

View more
  2 in total

1.  Genome-wide promoter methylation profiling in a cellular model of melanoma progression reveals markers of malignancy and metastasis that predict melanoma survival.

Authors:  Flávia E Rius; Debora D Papaiz; Hatylas F Z Azevedo; Ana Luísa P Ayub; Diogo O Pessoa; Tiago F Oliveira; Ana Paula M Loureiro; Fernando Andrade; André Fujita; Eduardo M Reis; Christopher E Mason; Miriam G Jasiulionis
Journal:  Clin Epigenetics       Date:  2022-05-23       Impact factor: 7.259

2.  Identification and validation of novel DNA methylation markers for early diagnosis of lung adenocarcinoma.

Authors:  Miao Li; Chen Zhang; Lijun Zhou; Siyu Li; Yuan Jie Cao; Longlong Wang; Rong Xiang; Yi Shi; Yongjun Piao
Journal:  Mol Oncol       Date:  2020-08-27       Impact factor: 7.449

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.