Xindong Zhang1, Lin Gao1, Zhi-Ping Liu2, Songwei Jia1, Luonan Chen3. 1. School of Computer Science and Technology, Xidian University, Xi'an 710000, China. 2. Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Shandong 250061, China. 3. Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.
Abstract
As smoking rates decrease, proportionally more cases with lung adenocarcinoma occur in never-smokers, while aberrant DNA methylation has been suggested to contribute to the tumorigenesis of lung adenocarcinoma. It is extremely difficult to distinguish which genes play key roles in tumorigenic processes via DNA methylation-mediated gene silencing from a large number of differentially methylated genes. By integrating gene expression and DNA methylation data, a pipeline combined with the differential network analysis is designed to uncover driver methylation genes and responsive modules, which demonstrate distinctive expressions and network topology in tumors with aberrant DNA methylation. Totally, 135 genes are recognized as candidate driver genes in early stage lung adenocarcinoma and top ranked 30 genes are recognized as driver methylation genes. Functional annotation and the differential network analysis indicate the roles of identified driver genes in tumorigenesis, while literature study reveals significant correlations of the top 30 genes with early stage lung adenocarcinoma in never-smokers. The analysis pipeline can also be employed in identification of driver epigenetic events for other cancers characterized by matched gene expression data and DNA methylation data.
As smoking rates decrease, proportionally more cases with lung adenocarcinoma occur in never-smokers, while aberrant DNA methylation has been suggested to contribute to the tumorigenesis of lung adenocarcinoma. It is extremely difficult to distinguish which genes play key roles in tumorigenic processes via DNA methylation-mediated gene silencing from a large number of differentially methylated genes. By integrating gene expression and DNA methylation data, a pipeline combined with the differential network analysis is designed to uncover driver methylation genes and responsive modules, which demonstrate distinctive expressions and network topology in tumors with aberrant DNA methylation. Totally, 135 genes are recognized as candidate driver genes in early stage lung adenocarcinoma and top ranked 30 genes are recognized as driver methylation genes. Functional annotation and the differential network analysis indicate the roles of identified driver genes in tumorigenesis, while literature study reveals significant correlations of the top 30 genes with early stage lung adenocarcinoma in never-smokers. The analysis pipeline can also be employed in identification of driver epigenetic events for other cancers characterized by matched gene expression data and DNA methylation data.
As a leading cause of death worldwide, lung cancer is mainly attributed to smoking in both men and women [1, 2], of which the most common histological subtype is adenocarcinoma. However, as smoking rates decrease, proportionally more cases occur in never-smokers [3]. Lung adenocarcinoma in never-smokers shows obvious distinctions in clinical and molecular mechanism to those cigarette smoking [4]. Both genetics and epigenetics in cancer genomes have been suggested to account for the development of lung adenocarcinoma.As one of the vital epigenetic mechanisms, DNA methylation regulates gene expression without alterations in DNA sequence [5, 6] and plays key roles in X chromosome inactivation, genome stability, chromatin structure, embryonic development, differentiation, and maintenance of pluripotency in normal somatic cells [7, 8]. Genome-scale methylation-profiling techniques have confirmed the existence of widespread aberrations of DNA methylation patterns in humancancer genome [9-12]. Studies of DNA methylation have suggested that both global DNA hypomethylation and gene-specific hypermethylation may contribute to the initiation and progression of tumorigenesis, as well as gene body methylation [13-15]. It is challenging but of great significance to distinguish genes whose methylation changes are crucial in cancer occurrence, progression, or metastasis from genes whose methylation changes merely have effects on the process of tumorigenesis in cancer research and therapy [13]. Unlike somatic mutations in the genome, DNA methylation is inherently reversible and serves as potential drug targets in cancer intervention [16, 17].Numerous studies have focused on discovering genes whose DNA methylation potentially plays key roles in tumorigenesis of lung adenocarcinoma, including integration of genome-scale DNA methylation and gene expression [18-21]. The main idea of these works is to search genes whose gene expression fluctuations are highly correlated to DNA methylation changes. However, there is a deficiency derived on the complexity of the gene expression regulation. Both genetic and epigenetic alterations can contribute to gene expression as well as other transcriptional factors in sophisticated manners in complex diseases [22, 23]. In tumors, a differential gene expression may be induced by an aberrant DNA methylation in the promoter of the gene but also may be a consequence regulated by its upstream genes in regulatory mechanisms. These appeal to a great attention in uncovering driver DNA methylations, which play major roles in methylation-associated gene silencing and drive malignant transformation [5, 13]. In this work, we refine the generalized description of driver methylation as two properties. (1) Driver DNA methylation should induce distinctive expressions in tumors with differential DNA methylation (T-DM) when compared to expressions in matched adjacent nontumor (normal) and tumors with nondifferential DNA methylation (T-NDM), and (2) driver methylation should induce a distinct regulation module in the network perspective. The first property guarantees the major role of DNA methylation in the regulation of gene expression, while the second property guarantees the functional effects of driver genes on tumorigenesis.Focusing on genes differentially expressed among matched adjacent nontumors (normal), tumors with aberrant DNA methylation (T-DM), and tumors without aberrant DNA methylation (T-NDM), we integrate genome-wide DNA methylation data and gene expression data to uncover driver methylation events in never-smokers in early stage lung adenocarcinoma. Differential network analyses show significant changes of DNA methylation-responsive modules in network topology across normal, T-DM, and T-NDM, which imply potential mechanisms of identified driver genes underlying the tumorigenesis.
2. Materials and Methods
2.1. Data Sets
Both the DNA methylation data and gene expression data are downloaded from NCBI Gene Expression Omnibus (GEO) with accession number GSE32867 [18]. The series contains 59 samples with paired genome-scale DNA methylation profiling and gene expression. Stage I and stage II are merged as early stage and stages III-IV are labeled with late stage [18]. After removing noisy data [18], 22 samples are labeled with “never smoking” and “early stage” simultaneously. Paired DNA methylation data and gene expression data of these 22 samples are collected to further analysis. Probes in gene expression data are firstly mapped to Entrez gene ID and expression values sharing same Entrez gene IDs are averaged among samples.
2.2. Schematic Overview of the Analysis Pipeline
The schematic overview of the analysis pipeline is shown in Figure 1, and detailed procedures are described in the following sections.
Figure 1
Schematic overview of the pipeline proposed in this work. (a) Candidate gene selection. Methylation matrix of continuous beta values is converted into difference matrix and discretized by kernel distribution function, which partition samples into normal, T-DM, and T-NDM. Probes are mapped to genes after noise filtering and genes passing the consistent test are collected as candidate driver genes. (b) For each candidate gene, a subset of DM responsive genes is collected and DM responsive modules are constructed by the CLR method. Candidate driver genes are ranked by differential scores derived on the differential network analysis.
2.2.1. Candidate Driver Gene Selection
Figure 1(a) shows a brief schematic overview of this procedure. The difference matrix is firstly created to measure differences of beta values of DNA methylation between tumor and normal. The kernel probability distribution with normal smoothing function is used to estimate the probability density distribution for each probe in the difference matrix (Figure 1(a)). The hypothesis is that the differences of beta values for given probes come from distributions with the mean 0 and unknown variances. The cumulative density function (CDF) is used to estimate the probability of a beta value falling within given interval. Hypermethylation and hypomethylation are determined by the upper bound CDF > 0.95 and the lower bound CDF < 0.05, respectively. For each probe, tumors are partitioned into two groups, tumors with differential methylation group (T-DM) and tumors without differential methylation group (T-NDM).Then, the two-sample t-test is used to evaluate differential expression under conditions [24], and p values are adjusted by the procedure introduced by Storey [25]. The mapping from DNA methylation to gene expression is performed by shared Entrez gene ID. Probes remain if the mapped genes are differentially expressed in T-DM when compared to normal and T-NDM (adjusted p value < 0.05), which implies that the differential methylation of given probes in T-DM is more likely to induce significant expression changes. Probes mapping to same genes are removed if hypermethylation and hypomethylation coexist in more than 5 samples. Then samples in T-DMs and T-NDMs merge, respectively, by shared Entrez gene ID and serve as T-DM and T-NDM of the gene.We then search for genes whose expressions are highly discriminative and consistent in T-DM when compared to normal and T-NDM. Many types of statistics, such as Wilcoxon score, Pearson correlation coefficient (PCC), or mutual information (MI), could be used to score the relationship between gene expression and class labels, and a T-score method is used in this work [26]. For a given gene, let a be the gene expression levels across samples with class c and the discriminative score s(a, c) is defined as the t-test statistic. To determine whether the discriminative level of the gene among groups is consistent, we permute the class c by 1000 times and obtain a background distribution of the discriminative scores S′(a, c) derived on the gene expression levels a and permuted class c′. Genes with significant values (p value < 0.05) among groups (normal versus T-DM and T-DM versus T-NDM) are considered differentially methylated and served as candidates for further analysis.
2.2.2. Detection of DNA Methylation-Responsive Module
To construct the DNA methylation-responsive module for a candidate gene g, we firstly recognize a set of genes whose expressions are highly discriminative among groups defined by DNA methylation profiles of g. These genes are potentially responsive to aberrant DNA methylation of g.The Context Likelihood of Relatedness (CLR) method [27] is used to assess regulatory relationships among these genes. CLR estimates MI for each pair of variables and corrects the MI via a background-corrected procedure. In particular, for mutual information I(X
; X
), CLR scores the relatedness between a pair of variables X
and X
by the joint likelihood measurement:wherewhere μ
and σ
are the mean and standard deviation derived on the empirical distribution of MI between X
and arbitrary variables X
(k = 1,2,…, n) and I(X
; X
) is the mutual information of X
and X
.CLR employs B-spline smoothing and discretization method [28] to estimate the MI for a pair of variables. However, it is time-consuming in this work under diversiform conditions and permutations. Thus, we use the following estimation method to calculate MI for pair of variables X
and X
[29]; that is,where ρ is the PCC of X
and X
.An experienced threshold δ is necessary when CLR is employed. A larger threshold results in a higher precision but a smaller size of responsive modules. The size of more than 70% modules is less than three when δ = 4.46, while the size of 80% modules is larger than 3 when δ = 4.46 and approximate ranking lists of top 30 genes are obtained when δ falls in the interval between 3.96 and 5.46. Thus, we set δ = 4.46 in this work.
2.2.3. Scoring Candidate Driver Genes by Differential Network Analysis
Differential network analysis reveals dynamic changes of pathways and potential mechanisms in complex diseases including cancers [30]. For each candidate gene, we calculate CLR scores for edges in responsive modules under normal and T-NDM. Differential scores are calculated to estimate network differences among groups. The differential score (DS) is yielded by the following equation: where w
is the CLR score of the ith edge and k is the number of edges in driver methylation-responsive module. Then candidate genes are prioritized by DS scores in descending order.
3. Results
We focus on the detection of differentially methylated genes which play key roles in tumorigenesis (“driver methylation gene”) and modules responsive to aberrant methylation of these genes. Rather than genes with consistent expressions to DNA methylation levels in whole tumors, we detect genes differentially expressed and consistent with DNA methylation in T-DM when compared to normal and T-NDM.
3.1. Identification of Candidate Driver Genes in Tumorigenesis
By integrating DNA methylation and corresponding gene expression data, the samples are partitioned into three groups (normal, T-DM, and T-NDM) for each gene (Figure 1(a)). Firstly, we remove genes that are not differentially expressed in T-DM when compared to normal and T-NDM. Then a permutation test is performed to determine the significance of the consistency of gene expression changes in T-DM when compared to T-NDM. To obtain a significant level of differences, we randomly permute T-DM and T-NDM and calculate differences. After 1000 times permutation, a background distribution of differences is constructed. After removing genes with the absolute mean beta value less than 0.1, 135 genes remain in the candidate list (see Supplementary File in Supplementary Material available online at http://dx.doi.org/10.1155/2016/2090286). We perform a functional enrichment analysis using DAVID [31, 32]. Of these 135 genes, 115 are annotated to GO terms including cancer-related functions such as response to stimulus, development process, cell differentiation, cell adhesion, cell growth and cell death, DNA repair, and apoptosis, which imply potential relationships between cancers and these 135 genes.
3.2. Detection Responsive Modules of Candidate Driver Genes
Biological network reveals cell's functional organization [33]. To characterize the functional implications of candidate driver genes in tumorigenesis, we detect modules responsive to differential methylation of candidate driver genes (Section 2). Totally, 130 of 135 modules have at least one edge when the threshold of CLR is set to 4.46, and the mean size of 130 modules is 15.
3.3. Prioritization of Candidate Driver Genes by Differential Network Analysis
We argue that a driver DNA methylation can induce not only a distinctive gene expression in T-DM, but also a distinctive module responsive to the alteration. We score each candidate driver gene by analysis of the differential level of the responsive module. Candidate driver genes are ranked by differential scores in descending order.We testify the significance of the differential score to a background distribution derived from random permutations. For a given candidate driver gene, genes are randomly selected from its possible responsive genes with module size maintained, and a new module is constructed by CLR with δ = 4.46 as well as a differential score. A sequence of DS′ consisting of random differential scores is obtained after 1000 times random permutation. Of 135 candidate driver genes, 130 genes pass the test with p value < 0.01.We also perform a differential network analysis of responsive modules under different CLR thresholds from 1.96 to 6.96 with step 0.5. Almost all modules obtain significant differential scores under CLR cutoffs (Supplementary File). Table 1 lists details of top 30 genes.
Table 1
Top 30 genes ranked by differential score in lung adenocarcinoma.
Gene symbola
Differential score
Number of samples in T-DM groupb
p value
FAM107A [34]
16.301
20
7.80E − 06
SPARCL1 [35, 36]
14.920
20
1.40E − 07
TRPC6 [37]
14.649
11
<1.0E − 10
CRYAB [38]
14.508
12
3.84E − 10
WFDC3
14.483
−14
<1.0E − 10
EFEMP2 [39]
13.958
20
<1.0E − 10
MX2 [40, 41]
13.895
−18
2.12E − 05
PLA2G4C [42]
13.870
−8
<1.0E − 10
ST6GALNAC5 [43]
13.848
9
<1.0E − 10
PLAT [44]
13.690
8
2.45E − 04
TCF21 [45]
13.664
22
<1.0E − 10
SOX17 [46]
13.368
22
<1.0E − 10
SH3GL2 [47]
13.300
5
<1.0E − 10
MAMDC2 [18]
13.274
19
4.54E − 07
GCNT3 [48]
13.238
−14
<1.0E − 10
MSR1 [49]
13.144
−16
<1.0E − 10
PPP1R14D [50]
13.057
−12
<1.0E − 10
COL5A2 [51]
13.045
19
6.67E − 04
PTPRH [52]
12.967
−16
8.98E − 13
HKDC1 [53]
12.961
−20
<1.0E − 10
CDH13 [54]
12.932
−20
3.34E − 04
CFI [55]
12.932
5
1.20E − 04
ARL14
12.880
−12
2.06E − 04
MMP9 [56]
12.866
7
<1.0E − 10
CELSR3
12.856
16
4.65E − 10
CDO1 [57]
12.846
22
<1.0E − 10
AGR2 [58]
12.836
−22
<1.0E − 10
S100P [59, 60]
12.828
−10
2.29E − 04
DOCK2 [61]
12.777
20
2.54E − 03
TNFRSF1B [62]
12.736
13
<1.0E − 10
aBold: gene literature annotated to lung cancer.
b−: Gene hypomethylated in samples.
4. Discussion
We build two lists as background to testify the accuracy of the ranked list. The first consists of genes that show absolute mean fold change larger than 0.2 in T-DM and literature annotated in lung cancer. Totally, 29 genes are contained in the first list and denoted as Standard_Lit. The other one comes from Selamat et al. of 76 genes [18]. In fact, this list is not very suitable because genes in Selamat et al. are confused with differentially methylated genes under smoking and late stage. Thus, we select genes covered by list from Selamat et al. and our list. Totally 19 genes are in the list and denote as Standard_Sel. Genes in these two lists are listed in Supplementary File.We test the accuracy of our list to Standard_Lit and Standard_Sel; Figure 2(a) shows the ROC curves with AUC = 0.686 and AUC = 0.628, respectively, which means that over half of genes in two standard lists are high-ranked in our list. Figure 2(b) shows the overlaps of the top 30 genes in our list to Standard-Lit and Standard-Sel. For Standard-Lit, 12 of 29 genes are overlapped (Fisher exact test p value = 0.0018), while for Standard-Lit, 10 of 29 genes are overlapped (Fisher exact test p value = 2.67E − 04).
Figure 2
Comparison of the ranked list to two standard sets denoted by Standard-Lit and Standard-Sel. (a) ROC curves of our ranked list compared to Standard-Lit and Standard-Sel with AUC equal to 0.686 and 0.628, respectively. (b) Venn diagram showing the overlap of top 30 ranked genes in our list to Standard-Lit and Standard-Sel.
The ranked list is also validated by literature annotation. Of the top 30 genes, 27 genes are previously reported to be cancer-relevant, while 17 of them are lung cancer or non-small-cell lung cancer-related (Table 1).We also annotate responsive modules of top 30 ranked genes to KEGG signaling pathways. Among them, responsive modules for 18 genes are enriched with KEGG signaling pathways with significance level p value < 0.01, which imply significant relations of these responsive modules to cancer processes (Table 2) and indicate potential mechanism changes induced by aberrant DNA methylation. The KEGG signaling pathways are collected from MsigDB [63, 64].
Table 2
Functional annotation of driver-responsive network to KEGG signaling pathways (p value < 0.01).
Gene symbol
Enriched KEGG signaling pathway
p value
SPARCL1
CYTOSOLIC_DNA_SENSING
3.22E − 03
TRPC6
PPAR_SIGNALING
9.50E − 03
P53_SIGNALING
9.50E − 03
MTOR_SIGNALING
7.16E − 03
NOTCH_SIGNALING
6.47E − 03
EFEMP2
NOTCH_SIGNALING
9.70E − 03
MX2
RIG_I_LIKE_RECEPTOR_SIGNALING
9.33E − 04
PLA2G4C
PPAR_SIGNALING
9.50E − 03
P53_SIGNALING
9.50E − 03
MTOR_SIGNALING
7.16E − 03
NOTCH_SIGNALING
6.47E − 03
ST6GALNAC5
PPAR_SIGNALING
9.50E − 03
P53_SIGNALING
9.50E − 03
MTOR_SIGNALING
7.16E − 03
NOTCH_SIGNALING
6.47E − 03
PLAT
TOLL_LIKE_RECEPTOR_SIGNALING
3.09E − 03
NOD_LIKE_RECEPTOR_SIGNALING
1.16E − 03
CYTOSOLIC_DNA_SENSING
9.44E − 04
JAK_STAT_SIGNALING
6.99E − 03
TCF21
FC_EPSILON_RI_SIGNALING
4.89E − 03
GCNT3
NOTCH_SIGNALING
9.70E − 03
MSR1
NOTCH_SIGNALING
9.70E − 03
PTPRH
B_CELL_RECEPTOR_SIGNALING
9.80E − 03
HKDC1
PPAR_SIGNALING
9.50E − 03
P53_SIGNALING
9.50E − 03
MTOR_SIGNALING
7.16E − 03
NOTCH_SIGNALING
6.47E − 03
CDH13
ERBB_SIGNALING
1.55E − 03
T_CELL_RECEPTOR_SIGNALING
2.38E − 03
CFI
PPAR_SIGNALING
2.90E − 03
MAPK_SIGNALING
3.47E − 03
ARL14
VEGF_SIGNALING
3.93E − 03
S100P
HEDGEHOG_SIGNALING
6.47E − 04
TGF_BETA_SIGNALING
1.51E − 03
DOCK2
CHEMOKINE_SIGNALING
3.49E − 05
TOLL_LIKE_RECEPTOR_SIGNALING
4.85E − 03
NOD_LIKE_RECEPTOR_SIGNALING
3.27E − 05
T_CELL_RECEPTOR_SIGNALING
5.42E − 03
B_CELL_RECEPTOR_SIGNALING
2.66E − 03
TNFRSF1B
NOTCH_SIGNALING
7.06E − 03
FC_EPSILON_RI_SIGNALING
1.24E − 03
Of 30 top ranked genes, FAM107A, MAMDC2, SOX17, TCF21, PTPRH, and CDO1 have been previously reported with aberrant DNA methylation in lung cancer [18, 34, 45, 46, 52, 57]. All these genes obtain higher occurrences (n > 19) in lung adenocarcinoma. AGR2, CDH13, CRYAB, MX2, SH100P, and SH3GL2 are reported with aberrant gene expression [38, 40, 47, 54, 58, 59], while AGR2, CDH13, and MX2 are of high occurrences in aberrant DNA methylation (n ≥ 18). Differential expression of these genes has been reported playing crucial roles in key pathways in tumorigenesis or serving as potential prognostic targets. With higher occurrences, the correlation of differential gene expression and aberrant DNA methylation of AGR2, CDH13, and MX2 have been reported relevant to lung adenocarcinoma [18].Alpha B-crystallin (CRYAB) is one of the important members of the small heat-shock protein family with aberrant DNA methylation occurring in 12 of 22 samples. The upregulated expression of CRYAB is reported relevant to the poor survival of patients with non-small-cell lung cancer (NSCLC) [38]. Interestingly, we find a contrary expression pattern in early stage lung adenocarcinoma in nonsmoking patients (Figure 3). A decreased expression is observed in both T-DM (p value = 8.20E − 11) and T-NDM (p value = 7.72E − 8) when compared to normal, while a relatively weak difference is also observed between T-DM group and T-NDM group (mean fold change difference = 0.07, p value = 0.15), which implies multiple mechanisms in regulation of CRYAB, as well as DNA hypermethylation. The responsive module of CRYAB is highly changed in normal and T-NDM (DS = 14.508, p value = 3.84E − 10). The similar case is SH3GL2, deletion of which downregulates tumor growth by modulating EGFR signaling [47].
Figure 3
Genes show consistently significant changes in gene expression and DNA methylation in T-DM (red diamond) when compared to normal (green star) and T-NDM (blue plus). Results indicate different distributions of gene expression with altered DNA methylation in three groups of top ranked genes.
Another interesting case is S100P, which has been reported as a key gene in tumor progression in both initial stage and advanced stage in lung adenocarcinoma [60]. The gene shows distinctive expressions among normal, T-DM, and T-NDM. There are nearly no changes existent in gene expression between normal and T-NDM, while in T-DM, upregulation is observed, which implies that the upregulation of S100P may be an important step in the early stage of lung adenocarcinomas.Also some genes are relevant to cancers but lung cancer from literature study (COL5A2 [51], SPARCL1 [35], EFEMP2 [39], MSR1 [49], and DOCK2 [61]). APARCL1 and DOCK2 have shown downregulation in types of cancer [36, 61], while both of them show downregulated gene expressions in T-DM with high occurrences of DNA hypermethylation. Similar to CRYAB, EFEMP2 shows contrary expression patterns in our observation compared to which in gliomas [39]. EFEMP2 has high occurrences of DNA hypermethylation and downregulated gene expression in totally 20 samples, while 2 samples in T-NDM show little differences when compared to matched normal. COL5A2 also shows T-DM specific upregulation of gene expression and DNA hypermethylation with high occurrences.We show the responsive module of MSR1 in Figure 4(a) as a representation of responsive modules of cancer-related genes. All these genes exhibit significant changes in responsive modules in T-DM when compared to normal and T-NDM.
Figure 4
Differential representation of responsive modules for MSR1, ARL14, CELSR3, and WFDC3 in T-DM (left), normal (middle), and T-NDM (right). Significant changes of responsive modules for identified driver genes (red diamond) imply functional alterations of driver genes in tumorigenesis.
Besides cancer-related genes, three genes ARL14, CELSR3, and WFDC3 are also observed in our list. These three genes show T-DM specific expression changes (Figure 3), and regulatory correlations in responsive modules show significant differences in T-DM when compared to normal and T-NDM (Figures 4(b)–4(d)) which also imply potential roles of the three genes in the tumorigenesis of lung adenocarcinoma.All top 30 genes show significant changes in responsive modules in T-DM, while detailed information of the top 30 genes and responsive modules are listed in Supplementary File.
5. Conclusions
By integration of gene expression and DNA methylation data, we analyzed 22 matched lung adenocarcinoma/nontumor lung pairs for nonsmokers in early stage lung adenocarcinoma. By focusing on differences in gene expression patterns and responsive modules derived from T-DM compared to those in normal and T-NDM, we proposed a pipeline by employing a differential network analysis strategy. Totally, 135 candidate genes are analyzed, and top 30 genes are well studied in this work. All 135 genes are differentially expressed in T-DM when compared to matched normal and T-NDM, while 130 of them show significant changes in regulatory correlations of responsive modules. Literature mining of top 30 genes indicates a high proportion of lung cancer-relevant genes, which implies potential risks of these genes to disturb functions and pathways via differential methylation mechanisms, and further drives the tumorigenesis of lung adenocarcinoma in early stage. In conclusion, we provide a bioinformatics pipeline to identify driver genes with aberrant DNA methylation by fully considering differential expression and network changes in T-DM, normal, and T-NDM. The analysis pipeline can also be employed in identification of driver genes with aberrant DNA methylation of other cancers characterized by paired gene expression and DNA methylation.
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: Suhaida A Selamat; Brian S Chung; Luc Girard; Wei Zhang; Ying Zhang; Mihaela Campan; Kimberly D Siegmund; Michael N Koss; Jeffrey A Hagen; Wan L Lam; Stephen Lam; Adi F Gazdar; Ite A Laird-Offringa Journal: Genome Res Date: 2012-05-21 Impact factor: 9.043
Authors: Naisi Zhao; Mengyuan Ruan; Devin C Koestler; Jiayun Lu; Carmen J Marsit; Karl T Kelsey; Elizabeth A Platz; Dominique S Michaud Journal: Epigenetics Date: 2021-05-19 Impact factor: 4.528