Literature DB >> 30400878

A pan-cancer analysis of driver gene mutations, DNA methylation and gene expressions reveals that chromatin remodeling is a major mechanism inducing global changes in cancer epigenomes.

Ahrim Youn1,2, Kyung In Kim2, Raul Rabadan3,4, Benjamin Tycko5, Yufeng Shen3,4,6, Shuang Wang7.   

Abstract

BACKGROUND: Recent large-scale cancer sequencing studies have discovered many novel cancer driver genes (CDGs) in human cancers. Some studies also suggest that CDG mutations contribute to cancer-associated epigenomic and transcriptomic alterations across many cancer types. Here we aim to improve our understanding of the connections between CDG mutations and altered cancer cell epigenomes and transcriptomes on pan-cancer level and how these connections contribute to the known association between epigenome and transcriptome.
METHOD: Using multi-omics data including somatic mutation, DNA methylation, and gene expression data of 20 cancer types from The Cancer Genome Atlas (TCGA) project, we conducted a pan-cancer analysis to identify CDGs, when mutated, have strong associations with genome-wide methylation or expression changes across cancer types, which we refer as methylation driver genes (MDGs) or expression driver genes (EDGs), respectively.
RESULTS: We identified 32 MDGs, among which, eight are known chromatin modification or remodeling genes. Many of the remaining 24 MDGs are connected to chromatin regulators through either regulating their transcription or physically interacting with them as potential co-factors. We identified 29 EDGs, 26 of which are also MDGs. Further investigation on target genes' promoters methylation and expression alteration patterns of these 26 overlapping driver genes shows that hyper-methylation of target genes' promoters are significantly associated with down-regulation of the same target genes and hypo-methylation of target genes' promoters are significantly associated with up-regulation of the same target genes.
CONCLUSION: This finding suggests a pivotal role for genetically driven changes in chromatin remodeling in shaping DNA methylation and gene expression patterns during tumor development.

Entities:  

Keywords:  DNA methylation; Pan-cancer analysis; TCGA; chromatic remodeling; expression driver gene; gene expression; methylation driver gene; somatic mutation

Mesh:

Substances:

Year:  2018        PMID: 30400878      PMCID: PMC6218985          DOI: 10.1186/s12920-018-0425-z

Source DB:  PubMed          Journal:  BMC Med Genomics        ISSN: 1755-8794            Impact factor:   3.063


Background

Cancer arises through accumulation of somatically acquired genetic and epigenetic aberrations that lead to malignant transformation [1, 2]. Comprehensive characterization of somatic mutations in cancer genomes using next-generation sequencing technology has led to discoveries of cancer driver genes (CDGs) in human cancers [2]. The interplay between genetic and epigenetic alterations was only recently revealed through genome-wide scale genomic and epigenomic analyses. Specifically, genome-wide change of DNA methylation was observed in patients with mutations in epigenetic regulators [2-4], affecting both the global levels of 5-methyl-cytosine (5mC) and the precise DNA methylation patterns in diverse regulatory sequences across the genome [2, 3]. A recent study investigated associations between driver gene mutations and DNA methylation alterations across many cancer types [5], and identified associations between mutated driver genes and site-specific methylation changes as well as some genome-wide trends in specific cancer types. They further used these mutation-methylation associations to better define cancer subtypes. However, it remains largely unknown how the CDG mutations contribute to changes in cancer cell epigenomes on a pan-cancer level [6]. A better understanding of the connections between CDGs and altered cancer cell epigenomes is an important goal, particularly since mutations in epigenetic regulators could be novel targets for anti-cancer therapies [6]. Studies have integrated multi-scale omics data, including somatic mutation data, epigenomes, and transcriptomes across various cancer types to improve the mechanistic understanding of the interplay between cancer genome and cancer epigenome and transcriptome. An integrative analysis of DNA methylation data and gene expression data of various cancer types identified pan-cancer hypo- and hyper-methylated genes that are predictive of transcription as well as methylation-driven subgroups with clinical implications [7]. Another integrative analysis on a set of known epigenetic regulators with DNA methylation data and gene expression data from various cancer types identified key epigenetic regulators whose deregulation patterns are associated with genome-wide DNA methylation changes, which transcend cancer types [8]. Here we aim to improve our understanding of the connections between CDGs and altered cancer cell epigenomes and altered cancer cell transcriptome on pan-cancer level, and how these connections contribute to the known association between cancer epigenome and transcriptome. We used somatic mutation, DNA methylation, and gene expression data of 20 cancer types from The Cancer Genome Atlas (TCGA) project to identify CDGs that, when mutated, have strong associations with genome-wide methylation or expression changes across cancer types, which we refer as methylation driver genes (MDGs) or expression driver genes (EDGs). We identified 32 MDGs and found that most of them are either chromatin regulators (genes involved in chromatin remodeling) or ones that regulate the expression of or physically interact with chromatin regulators. We also identified 29 EDGs and found that 26 of them overlap with the 32 MDGs. We further investigated target genes’ methylation and expression alteration patterns that are associated with mutation status of these 26 overlapping driver genes and found that hyper-methylation of target genes’ promoters are significantly associated with down-regulation of the same target genes and hypo-methylation of target genes’ promoters are significantly associated with up-regulation of the same target genes. This finding shows that dysregulation of chromatin regulators is potentially an important mechanism that induces global change of DNA methylation and gene expression in tumor development.

Methods

We downloaded somatic mutation data, DNA methylation 450K array data, and gene-level RNA-seq data of 20 tumor types with at least 100 samples available in all three data types from TCGA. For DNA methylation 450K array data, we conducted standard quality control steps removing CpG sites that overlap with known single nuclear polymorphisms (SNPs), sites on the sex chromosomes and sites with missing values for more than 5% of the tumor samples within a tumor type. After these steps, 370,877 CpG sites remained. We then corrected for the type I/II probe bias using the BMIQ algorithm [9]

Selection of candidate CDGs

We obtained level 2 somatic mutation data of the above-mentioned 20 tumor types from Broad Institute TCGA Genome Data Analysis Center Firehose [10] and selected candidate CDGs using the MutSIG [11] algorithm that tests how frequently a gene is mutated in a tumor type comparing to the background mutation rate. We used the false discovery rate (FDR) < 0.1 to select candidate CDGs. We then assessed the functional impact of mutations at gene levels using the MutationAssessor [12] algorithm to further remove mutations classified as neutral. Additional steps were done for COAD and STAD when an abnormally large number of candidate CDGs remained (1,433 and 553, respectively) after these steps to avoid potential high false positive discovery rate of CDGs. Specifically, we only kept the genes that were identified in any of the other 18 tumor types as well as identified in the Cancer Gene Census [13] and the numbers of candidate CDGs in COAD and STAD then dropped to 193 and 67. The number of candidate CDGs selected in all 20 tumor types is provided in Additional file 1: Table S1. To conduct pan-cancer analysis associating mutation and methylation/expression, within a tumor type, we selected CDGs that have mutations in at least 5 samples with matched methylation data or expression data in order to have not-too-sparse numbers in the mutated group. For matched mutation and methylation data, 445 CDGs were selected across the 20 tumor types. Here we analyzed somatic mutations at the gene level and a gene is considered mutated in a tumor sample as long as there is any mutation in this gene. Within these driver genes, the number of tumor types in which a driver gene was mutated in at least five samples varies from 1 to 15 (Additional file 2: Table S2) where most of the CDGs were mutated in only one or two tumor types. TP53 was mutated in 15 tumor types and PTEN was mutated in 14 tumor types. For matched mutation data and expression data, 422 CDGs were similarly selected. Of them, 403 CDGs overlap with the CDGs selected for matched mutation data and methylation data. For the 422 CDGs, the number of tumor types in which a CDG is mutated in at least five samples varies from 1 to 14 (Additional file 2: Table S2), where TP53 and PTEN were mutated in 14 tumor types.

Pan-cancer analysis to identify MDGs

We described the details in the pan-cancer analysis associating driver genes and genome-wide methylation alterations across cancer types. Similar procedures with necessary modifications to associate driver genes and gene expression changes were described in the Additional file 3: Text S1.

Associate CDGs and DNA methylation in one cancer type

For CDG i, let A denote the set of tumor types in which CDG i is mutated in at least 5 tumor samples with methylation data available. We then determine the hyper- or hypo-methylation status per CpG site by the mutation status of CDG i using the nonparametric Wilcoxon test. Since methylation levels range from 0 to 1 and are often bimodally distributed across tumor samples and the numbers of samples in the mutated and non-mutated groups are extremely unbalanced. With the Wilcoxon test, we define a set of genome-wide hyper-methylated sites S+ whose methylation levels are significantly increased at significance level 0.01 in the mutated group comparing to the non-mutated group of CDG i in cancer type k. We similarly define a set of hypo-methylated sites S−. Since the goal is not to identify specific CpG sites that are affected by the mutation status but to see how the mutation status is associated with genome-wide methylation changes, no multiple comparisons adjustment is applied to the site-level differential methylation association test. To determine if mutation status of CDG i is significantly associated with genome-wide methylation changes in cancer type k, we calculate the p-value pi,k, which is the probability of observing the number of differentially (hyper- or hypo-) methylated sites or more that are associated with the mutation status of CDG i in cancer type k under the null hypothesis that the mutation status of CDG i is not associated with genome-wide methylation changes. To do so, we generate a “methylation null pool”, which has the number of differentially methylated sites under the null hypothesis. We first selected genes that were mutated in at least 5 samples with methylation data available within a tumor type. We then further selected only top 500 highly mutated genes within each tumor type for computational efficiency and also excluded the 445 CDGs selected above. We ended up with 7,019 mutation genes (those are considered as passenger mutation genes) across 20 tumor types in the “methylation null pool” (see Additional file 4: Table S3 for the number of mutation genes from each tumor type). The 7,019 mutation genes have similar mutation rate (average number of mutations in a cancer type) with that of the 445 CDGs. The average mutation rate of these 7,019 mutation genes is 0.082 with standard deviation (SD) 0.10 while the average mutation rate of the 445 CDGs is 0.085 with SD = 0.13 (p-value=0.54 from a t-test). Within each cancer type, we calculated , the number of differentially methylated sites that are associated with the mutation status of the methylation null gene jnull= 1,…,7019, which form the “methylation null pool”. The p-value pi,k, is then calculated as the proportion of numbers in the “methylation null pool” that is greater than or equal to the observed number of differentially methylated sites , that is, p= , where I(.) is the indicator function. To investigate the potential selection bias in the “methylation null pool”, we also generated the null distribution of number of genome-wide differentially methylated sites by randomly splitting tumor samples of a tumor type into mutation and non-mutation groups, varying the percentage of mutation from 5 to 40% based on the mutation rate of the TCGA 20 tumor types and calculated numbers of differentially methylated sites between the two groups. We repeated this 10 times for each percentage from 5 to 40%, increasing by 1%. Therefore, we ended up with 360*20 values for the number of differentially methylated sites across 20 tumor types. We found that these numbers are on average much smaller than those from the “methylation null pool” generated using passenger mutations, making the p-values of CDGs more significant. This indicates that there is some association between passenger mutations and global methylation changes that random sampling cannot capture. Therefore, the methylation null pool generated by using the passenger mutations rather than randomly splitting may represent a better null distribution. The MDGs identified this way are those associated with methylation changes beyond what is expected for passenger mutations. We classify the effect of CDG i on genome-wide methylation in tumor type k as:

Associate CDGs and DNA methylation across multiple cancer types

To calculate the p-value, pi, testing if CDG i is significantly associated with genome-wide methylation changes across multiple cancer types, we compare , the observed total number of differentially methylated sites associated with CDG i summed over Ai cancer types, to B resampled values generated from the “methylation null pool” where we set B=one million. More specifically, for CDG i that was mutated in |Ai| number of tumor types, the null distribution is generated using the B sets of sum of |Ai| random samples from the “methylation null pool”. We then calculate pi as follows: where r is a random number between 1 and 7,019 from the bth resampling. We use Benjamini-Hochberg procedure to adjust for multiple comparisons on p, which is done within groups of CDGs that were mutated in the same number of tumor types. The MDGs are then identified as those CDGs with adjusted pi < 0.05.

Results

TCGA 20 Cancer Types

We assembled somatic mutation data, HM450 DNA methylation data and gene-level RNA-Seq data (upper-quantile-normalized count data) of 20 tumor types with at least 100 samples available in all three data types from TCGA. This includes breast invasive carcinoma (BRCA), bladder urothelial carcinoma (BLCA), cervical squamous cell carcinoma (CESC), colon adenocarcinoma (COAD), glioblastoma (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), acute myeloid leukemia (LAML), lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), sarcoma (SARC), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), testicular germ cell tumor (TGCT), and uterine corpus endometrial carcinoma (UCEC)). For detailed steps on processing DNA methylation data and selecting candidate CDGs, see Methods. We refer candidate CDGs as CDGs from now on for notation simplicity.

The Pan-Cancer Analysis

We conducted a pan-cancer analysis to identify methylation driver genes (MDGs)/expression driver genes (EDGs) that, when mutated, have strong associations with genome-wide methylation/expression changes across multiple cancer types through integrating somatic mutation and DNA methylation/gene expression data of 20 TCGA tumor types (Fig. 1a, b).
Fig. 1

Rationale underlying the pan-cancer analysis to identify (a) MDGs that are associated with genome-wide methylation changes across cancer types and (b) EDGs that are associated with genome-wide expression changes across cancer types, with further analysis that reveals (c) MDGs mostly consist of chromatin regulators that directly affect the genome-wide methylation patterns or genes that regulate expression of or physically interact with chromatin regulator.

Rationale underlying the pan-cancer analysis to identify (a) MDGs that are associated with genome-wide methylation changes across cancer types and (b) EDGs that are associated with genome-wide expression changes across cancer types, with further analysis that reveals (c) MDGs mostly consist of chromatin regulators that directly affect the genome-wide methylation patterns or genes that regulate expression of or physically interact with chromatin regulator. We then showed that some of the identified MDGs are chromatin regulators that directly affect the genome-wide methylation patterns and some are connected to chromatin regulators through either regulating their transcription or physically interacting with them as potential co-factors (Fig. 1c). We first tested whether mutations in a CDG are significantly associated with changes in genome-wide methylation patterns in one cancer type. For this, we performed CpG-site-level association analysis within a cancer type, where a nonparametric Wilcoxon test was used since the numbers of samples in the mutated and non-mutated groups are extremely unbalanced and methylation measures were usually enriched at 0 and 1 [14]. We then used the number of genome-wide differentially methylated sites as the test statistic to measure degree of genome-wide methylation changes associated with the mutation status of a CDG for one cancer type. Note that we used significance level 0.01 to determine site-level association without multiple comparisons adjustment since the goal is to measure genome-wide degree of differential methylation due to mutation status but not to claim any associated CpG sites. To assess the significance of the genome-wide methylation changes by a CDG in one cancer type, we first generated an empirical null distribution with numbers of genome-wide differentially methylated sites by mutations of non-CDGs and then calculated the p-value pi,k for CDG i in cancer type k by comparing the number of genome-wide differentially methylated sites by the mutation of CDG i in cancer type k with the empirical null distribution. We then classify the effect of CDG i in tumor type k as hyper-methylated if pi,k<0.05 and the number of genome-wide hyper-methylated sites is greater than that of hypo-methylated sites or hypo-methylated if pi,k<0.05 and the number of genome-wide hypo-methylated sites is greater than that of hyper-methylated sites. Finally, to determine the significance of genome-wide methylation changes across multiple cancer types by a CDG, we compare the observed total number of differentially methylated sites associated with a CDG summed over all cancer types with its null distribution to calculate the p-value pi. We use Benjamini-Hochberg procedure to adjust for multiple comparisons for pi, where the adjustment is done within the group of CDGs that were mutated in the same number of cancer types. The MDGs are then identified as those CDGs with adjusted p-values < 0.05. Similar steps are applied to mutation and expression data to identify EDGs. Detailed steps of how to identify MDGs/EDGs are provided in the Methods.

Thirty-two MDGs were identified that, when mutated, have strong association with genome-wide methylation changes across 20 cancers

The pan-cancer analysis of the 20 TCGA cancer types identified 32 MDGs (Table 1). For the complete list of CDGs whose mutation states were significantly associated with genome-wide methylation changes within each cancer type (gene i with pi,k<0.05 in the cancer type k), see Additional file 5: Table S4. The genes in Table 1 and Additional file 5: Table S4 highly overlap with the genes identified as the CDGs whose mutation states are associated with genome-wide methylation changes by Chen et al. [5]. They used Principal Component Analysis (PCA) to identify driver genes whose mutations are associated with the top five PCs within each cancer. Although the two methods used different approaches, the identified genes are very similar, providing further validation of the results.
Table 1

The identified 32 MDGs

MDGs|Ai||Ti|PiDiT-iT+i
TP53 158< e-06bothaBLCA BRCA HNSC LIHCLUAD STAD UCECLGGa
PTEN 1430.00147bothLGGSTAD UCEC
RB1 1120.00861bothLGGBLCA
PIK3CA 1110.0277hyperSTAD
ARID1A 1010.0401hyperSTAD
KRAS 811.2e-05hypoTGCT
KMT2D 820.00575bothBLCASTAD
NF1 620.000227hypoLGG PCPG
CTNNB1 620.00185hypoLIHC UCEC
SETD2 520.00351hyperKIRC KIRP
KMT2C b 510.00681hyperSTAD
EGFR 419.5e-05hypoLGG
HRAS 420.000266bothPCPGHNSC
BRAF 420.00474bothTHCACOAD
IDH1 32< e-06hyperGBM LGG
CIC 311.3e-05hyperLGG
NRAS 321.4e-05bothTGCTTHCA
RNF43 325.1e-05bothSTADCOAD
ATRX 310.000487hyperaLGGa
ZBTB20 320.00151hyperCOAD STAD
NOTCH1 310.00189hyperLGG
CDH1 320.00213hyperBRCA STAD
KEAP1 310.00578hypoLUAD
SMARCA4 310.00687hypoLUAD
FOXA1 b 310.00953hypoPRAD
EPHA2 b 310.0104hyperSTAD
KIT 21< e-06hypoTGCT
KMT2B 220.000511bothSTADCOAD
FGFR3 b 210.000697hypoBLCA
STK11 b 210.00193hypoLUAD
NSD1 11< e-06hypoHNSC
BAP1 b 110.000142hyperLIHC

|Ai|: number of tumor types in which CDG i is mutated in ≥ 5 samples with available methylation data;

|Ti|: number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i;

p: p-value testing if CDG i is significantly associated with genome-wide methylation changes across tumor types;

Di: direction of methylation changes associated with mutation status of CDG i;

T+i: tumor types that are hyper-methylated by CDG i;

T-i: tumor types that are hypo-methylated by CDG i;

a: Further stratified analysis by IDH1 mutation status in LGG tumor samples suggests an opposite direction from hyper- to hypo-methylation;

b: genes that are not overlapping driver genes

__ : genes that are identified as associated with genome-wide patterns of aberrant methylation by Chen et al. [5]

The identified 32 MDGs |Ai|: number of tumor types in which CDG i is mutated in ≥ 5 samples with available methylation data; |Ti|: number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i; p: p-value testing if CDG i is significantly associated with genome-wide methylation changes across tumor types; Di: direction of methylation changes associated with mutation status of CDG i; T+i: tumor types that are hyper-methylated by CDG i; T-i: tumor types that are hypo-methylated by CDG i; a: Further stratified analysis by IDH1 mutation status in LGG tumor samples suggests an opposite direction from hyper- to hypo-methylation; b: genes that are not overlapping driver genes __ : genes that are identified as associated with genome-wide patterns of aberrant methylation by Chen et al. [5] The 32 MDGs were mutated with different frequencies in each cancer types (Additional file 6: Figure S1) and the mutation status of the 32 MDGs is associated with different genome-wide number of hyper- and hypo-methylated sites (Fig. 2a). Cancer types COAD and STAD have the highest mutation rate with many of the identified MDGs being mutated. KIRP, PCPG, TGCT and THCA have the smallest number of mutated MDGs. In CESC, LUSC, PAAD, and SARC tumor types, genome-wide methylation patterns were not significantly affected by mutations of any of the identified 32 MDGs, potentially due to small sample sizes or fewer number of CDGs. TP53 mutations are associated with significant genome-wide methylation changes in 8 out of the 15 tumor types in which it was mutated in more than 5 samples (Table 1). Among these 8 tumor types, more CpG sites were hypo-methylated in all but LGG. Instead, in LGG, TP53 mutations are associated with more hyper-methylated CpG sites. However, almost all LGG tumors with TP53 mutations also have IDH1 mutations (Additional file 6: Figure S1), which are known to lead to hyper-methylation in LGG [15-17]. IDH1 is also identified as one of the 32 MDGs, where in GBM and LGG, it is associated with more CpGs to be hyper-methylated. Given the prominent role of IDH1 in LGG, we stratified LGG tumor samples by the IDH1 mutation status and further examined the effect of the other 31 MDGs within the IDH1 mutation stratum and the IDH1 wild-type stratum and found that TP53 mutations are now significantly associated with more hypo-methylation genome-wide in each stratum (Additional file 3: Text S1). Similar stratified analyses were conducted in all other tumor types whose genome-wide methylation patterns were significantly associated with mutations of the identified MDGs. Similar patterns as in the non-stratified analysis were observed (Additional file 7: Table S5).
Fig. 2

a Number of genome-wide hyper- (1) and hypo-methylated (2) sites that are associated with the mutation status of the 32 identified MDGs (columns) for each of the 20 TCGA tumor types (rows). (b) Number of genome-wide up-regulated- (1) and down-regulated- (2) genes that are associated with the mutation status of the 29 identified EDGs (columns) for each of the 20 TCGA tumor types (rows). The color code represents the number of differentially methylated sites/differentially expressed genes. Only driver genes that were mutated in ≥ 5 samples for the given tumor type were colored

a Number of genome-wide hyper- (1) and hypo-methylated (2) sites that are associated with the mutation status of the 32 identified MDGs (columns) for each of the 20 TCGA tumor types (rows). (b) Number of genome-wide up-regulated- (1) and down-regulated- (2) genes that are associated with the mutation status of the 29 identified EDGs (columns) for each of the 20 TCGA tumor types (rows). The color code represents the number of differentially methylated sites/differentially expressed genes. Only driver genes that were mutated in ≥ 5 samples for the given tumor type were colored

Many MDGs are known chromatin regulators or the ones that regulate the expression of or physically interact with chromatin regulators

Among the identified 32 MDGs, 8 are known chromatin regulators that are either histone modification enzymes (KMT2D, KMT2C, KMT2B, NSD1, and SETD2), or part of ATP-dependent chromatin remodeling complexes (ARID1A, ATRX, and SMARCA4, all from the SWI/SNF family [18]). We hypothesize that among the remaining 24 MDGs, some are “epigenetic modulators” in that these genes change genome-wide methylation patterns by regulating the expression of chromatin regulators/DNA methyltransferases, or through physically interacting with these epigenomic regulators as cofactors. To test this hypothesis, we examined whether mutations of these 24 MDGs are associated with the expression changes of known epigenomic regulator genes across the 20 tumor types, where we used the exon level RNA-Seq data of the 20 tumor tissue types from TCGA. We also investigated if epigenomic regulator genes are over-represented among genes that physically interact with these 24 MDGs. We created two lists of known chromatin regulator genes. List A has 720 DNA/RNA, histone and chromatin-modifying enzymes and their co-factors from the EpiFactors database [19]. List B has 18 epigenetic regulators that were identified as the master regulators of global DNA methylation by Yang et al. [8], including EYA4, SETBP1, PRDM2, PRDM5, CBX7, DUSP1, KAT2B, RAD54L, WHSC1, EZH2, UHRF1, PCNA, TTF2, KDM1A, SUV39H2, HDAC1, TDG and TET3, plus the DNA methyltransferase DNMT1, 3A, 3B that were not identified as master regulators by Yang et al. [8]. Among the remaining 24 MDGs that are potentially new “epigenetic modulators”, 12 are associated with genome-wide methylation changes in more than one cancer type (Table 1), including TP53, PTEN, RB1, NF1, CTNNB1, HRAS, BRAF, IDH1, NRAS, RNF43,ZBTB20, and CDH1. To focus on pan-cancer effects, we worked on only these 12 driver genes that are associated with genome-wide methylation changes in more than one cancer types. For each of these 12 genes, we first identified genome-wide target genes whose expression levels were dysregulated by the mutation status commonly across tumor types. We compared the expression levels of all genes between mutated and non-mutated groups of a MDG using a two-sample t-test and identified target genes that show significantly differential expressions (p-value <0.05) in all tumor types whose methylation patterns are associated with the mutation status. Similarly, since the goal here is not to identify specific target genes that are affected by the MDGs but to quantify degree of dysregulation by MDGs through number of target genes that are commonly dysregulated across cancer types, we used a loose p-value cutoff without considering multiple comparisons. We then examined if known chromatin regulators in lists A and B were over represented among the genome-wide dysregulated target genes by a MDG. The expression data were only available for 717 out of the 720 genes in list A, but were available for all 21 genes in list B. We used a hypergeometric distribution to calculate p-values for the enrichment of known chromatin regulators and described the procedure to examine this mechanistic hypothesis in more details in Fig. 3.
Fig. 3

Steps to test if chromatin regulators are enriched among the dysregulated target genes associated with the mutation status of the identified MDGs

Steps to test if chromatin regulators are enriched among the dysregulated target genes associated with the mutation status of the identified MDGs We found that among the genome-wide dysregulated target genes by each of the 12 MDGs, known chromatin regulators were clearly enriched (Table 2). In addition, 7 out of the 12 MDGs are associated with differential expression of the DNMT genes. Additionally, the results in Table 2 confirm some previously known interactions between MDGs and chromatin regulators. For example, TP53 mutations are associated with upregulated KDM1A expression levels across all tumor types whose genome-wide methylation patterns are also significantly associated with TP53 mutations. KDM1A is known to physically interact with TP53 [20] and it demethylates histone lysine residues 9 of histone 3, which in turn leads to extensive hypo-methylation in that region [21]. This analysis suggests that KDM1A may play a role in the association between TP53 mutations and genome-wide hypo-methylation changes across tumor types. Other notable associations that were confirmed by results in Table 2 include interactions between RB1 and DNMT1, and between RAS genes (HRAS, NRAS) and HDAC1 [22].
Table 2

MDGs dysregulate expression levels of chromatin regulators

MDGs|Ti|N1 (# of dysregulated genes)N2 (# ofdysregulatedgenes in list A)Enrichment PvalueAN3 (# ofdysregulatedgenes in list B)Enrichment PvalueBGenes in list B that are dysregulatedaGenes in list B that aredifferentially methylatedb
TP53 8233190.0005653.2e-06 CBX7 RAD54L TTF2 KDM1A SUV39H2
PTEN 31,534810.0001299.1e-06 DNMT1 DNMT3A SETBP1 PRDM5 CBX7 RAD54L EZH2 PCNA HDAC1 DNMT3A PRDM5 HDAC1
RB1 21,447835.2e-0640.056 DNMT1 EYA4 EZH2 PCNA PCNA
NF1 2766360.04401
CTNNB1 25,5152070.12120.0033 DNMT1 SETBP1 PRDM2 PRDM5 CBX7 KAT2B RAD54L WHSC1 EZH2 UHRF1 TDG TET3 PRDM2 CBX7
HRAS 21,137632.0e-0430.11 PRDM2 RAD54L HDAC1 PRDM2
BRAF 23,1281330.00880.0091 DNMT1 DNMT3A DNMT3B KAT2B WHSC1 EZH2 UHRF1 KDM1A DNMT3A UHRF1
IDH1 23,5602082.8e-1520.9 SUV39H2 TET3
NRAS 21,609872.8e-0510.82 HDAC1
RNF43 23,2121574.4e-0680.011 DNMT3A DNMT3B PRDM2 RAD54L WHSC1 UHRF1 HDAC1 TET3 DNMT3A DNMT3B PRDM2 WHSC1
ZBTB20 21,563780.0008860.0039 DNMT3A RAD54L WHSC1 UHRF1 TTF2 HDAC1 DNMT3A WHSC1 UHRF11
CDH1 22,7921362.7e-05143.3e-08 DNMT3B SETBP1 PRDM2 CBX7 DUSP1 RAD54L WHSC1 EZH2 PCNA KDM1A SUV39H2 HDAC1 TDG TET3 PRDM2 CBX7 WHSC1 KDM1A

|Ti|: number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i;

N: number of genes whose expression levels are dysregulated by MDG i in all |Ti| tumor types;

N: number of genes in list A that are dysregulated in all |Ti| tumor types;

N: number of genes in list B that are dysregulated in all |Ti| tumor types;

Enrichment PvalueA, and PvalueB are calculated using hypergeometric distributions testing if genes in lists A and B are enriched among genome-wide differentially expressed target genes;

aGenes in list B that are dysregulated in all |Ti| tumor types;

bGenes in list B that are differentially methylated in all |Ti| tumor types.

MDGs dysregulate expression levels of chromatin regulators |Ti|: number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i; N: number of genes whose expression levels are dysregulated by MDG i in all |Ti| tumor types; N: number of genes in list A that are dysregulated in all |Ti| tumor types; N: number of genes in list B that are dysregulated in all |Ti| tumor types; Enrichment PvalueA, and PvalueB are calculated using hypergeometric distributions testing if genes in lists A and B are enriched among genome-wide differentially expressed target genes; aGenes in list B that are dysregulated in all |Ti| tumor types; bGenes in list B that are differentially methylated in all |Ti| tumor types. We next investigated if the epigenomic regulator genes in lists A and B are over represented among genes that physically interact with these 12 MDGs. To do so, we first obtained a list of genes that physically interact with each of them from the HumanMine database [23]. We then tested if the epigenomic regulator genes in lists A and B are over-represented among them. Since the number of physically interacting genes was too small for some MDGs, the enrichment analysis was only conducted for the genes in list A that had enough overlap with the interacting genes. We found that known epigenomic regulator genes in list A were highly enriched in the lists of interacting genes of the 12 MDGs (Table 3). These results support our hypothesis that these 12 MDGs that are not known chromatin regulators but are associated with changes in epigenomes either through regulating expression of epigenomic regulators or through physically interacting with them.
Table 3

Chromatin regulators are enriched in genes that physically interact with MDGs

MDGs|Ti|N4 (# of physically interacting genes)N5 (# of physicallyinteracting genes in list A)Enrichment PvalueAi
TP53 89231920
PTEN 3224190.00035
RB1 2250640
NF1 22930.079
CTNNB1 2364630
HRAS 28660.08
BRAF 25480.00053
IDH1 24970.0015
NRAS 23630.13
RNF43 21871.4e-06
ZBTB20 21710.45
CDH1 2144182.9e-06

|Ti|: number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i;

N: number of genes physically interact with MDG i;

N: number of genes physically interact with MDG i that are also in list A;

Enrichment PvalueAi is calculated using a hypergeometric distribution testing if genes in list A are enriched among selected physically interacting genes.

Chromatin regulators are enriched in genes that physically interact with MDGs |Ti|: number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i; N: number of genes physically interact with MDG i; N: number of genes physically interact with MDG i that are also in list A; Enrichment PvalueAi is calculated using a hypergeometric distribution testing if genes in list A are enriched among selected physically interacting genes. We also investigated whether differential expression of target genes in list B of 21 epigenetic regulators are directly or indirectly associated with differential methylation of the same genes. We found only a small fraction of genes in list B whose expression and methylation levels are both associated with the mutation status of the MDGs (Table 2), which suggests that the differential expression of these target genes may be directly associated with mutations of these MDGs instead of being indirectly associated through changes in their methylation patterns. We further investigated mutation status of genes in list B to examine if the mutations affect their expression or methylation levels directly and found that the majority of genes in list B were rarely mutated across tumor types (Additional file 8: Table S6). Although CIC was not included in the above analyses since it was mutated only in LGG, due to its important role in LGG tumors, we examined how CIC regulates expressions of target genes and found that chromatin remodeling genes in list A were significantly enriched among dysregulated target genes, in both full LGG tumor samples and in stratified samples by IDH1 mutation status (Additional file 9: Table S7).

Twenty-nine EDGs were identified, out of which, 26 overlaps with the identified 32 MDGs

We conducted similar pan-cancer analysis to associate driver genes and gene expression across the 20 TCGA cancer types. We identified 29 CDGs as the expression driver genes (EDGs) that, when mutated, are significantly associated with genome-wide expression changes across multiple cancer types (Table 4). The mutation status of these 29 EDGs is associated with different genome-wide number of up- and down-regulated genes (Fig. 2b). For the complete list of CDGs whose mutation states were significantly associated with genome-wide expression changes within each cancer type, see Additional file 10: Table S8.
Table 4

Identified 29 EDGs

EDGsAiEipi B i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {E}_i^{-} $$\end{document}Ei \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {E}_i^{+} $$\end{document}Ei+
TP53 1411< e-06bothHNSC LGG SARCBLCA BRCA COAD GBM KIRC LIHC LUAD STAD
PTEN 1410.00299upLGG
PIK3CA 1120.00536downBRCA STAD
RB1 1039e-05upBLCA LGG LUAD
ARID1A 1020.00557bothSTADCESC
KRAS 844e-06downCOAD LUAD PAAD TGCT
KMT2D 810.0265downSTAD
NF1 610.00274upLGG
CTNNB1 510.00572downLIHC
†SETD2 500.00833NANANA
BRAF 421.1e-05downCOAD THCA
EGFR 425.3e-05upLGG LUAD
HRAS 437e-05downHNSC PCPG THCA
CIC 31< e-06downLGG
IDH1 32< e-06bothLGGGBM
RNF43 321e-06downCOAD STAD
ATRX 322e-06bothLGGGBM
CDH1 327e-06bothBRCASTAD
NRAS 320.000199upTGCT THCA
KEAP1 310.000809downLUAD
ZBTB20 320.00119downCOAD STAD
NOTCH1 320.00182downHNSC LGG
SMARCA4 310.0108upLUAD
GATA3 a 310.0138downBRCA
KMT2B 220.000317downCOAD STAD
KIT 210.000889downTGCT
FUBP1 a 11< e-06downLGG
SPOP a 11< e-06downPRAD
NSD1 110.000432upHNSC

∣A′∣ = number of tumor types in which EDG i is mutated in ≥ 5 samples with expression data;

∣E∣ = number of tumor types whose genome-wide expression levels are significantly associated with CDG i;

p-value testing if CDG i is significantly associated with genome-wide expression changes across tumor types;

Bi is the direction of change of expression levels associated with the mutation status of CDG i;

tumor types that are up-regulated by CDG i, = tumor types that are down-regulated by CDG i;

a : genes that are not overlapping driver genes.

†Note that SETD2 gene has a significant p-value for testing association of genome-wide expression changes across multiple tumor types, but there is not a specific tumor type in which SETD2 mutation is significantly associated with genome-wide expression changes.

Identified 29 EDGs ∣A′∣ = number of tumor types in which EDG i is mutated in ≥ 5 samples with expression data; ∣E∣ = number of tumor types whose genome-wide expression levels are significantly associated with CDG i; p-value testing if CDG i is significantly associated with genome-wide expression changes across tumor types; Bi is the direction of change of expression levels associated with the mutation status of CDG i; tumor types that are up-regulated by CDG i, = tumor types that are down-regulated by CDG i; a : genes that are not overlapping driver genes. †Note that SETD2 gene has a significant p-value for testing association of genome-wide expression changes across multiple tumor types, but there is not a specific tumor type in which SETD2 mutation is significantly associated with genome-wide expression changes. Of the 29 EDGs, 26 overlap with the 32 MDGs. To understand this high rate of overlap, within each cancer type, we examined the overlap between CDGs that are significantly associated with genome-wide methylation changes and CDGs that are significantly associated with genome-wide expression changes, and found they overlap highly. Moreover, there is a high correlation between the number of differentially methylated sites and the number of differentially expressed genes by each CDG (Additional file 11: Table S9), which implies a close connection between genome-wide methylation changes and genome-wide expression changes. We further investigated patterns of target genes’ methylation in promoter regions and target genes’ expression changes of the 26 overlapping driver genes. A target gene is hyper-methylated if the number of hyper-methylated sites is larger than that of hypo-methylated in the promoter region of the gene (1,500 base pairs upstream of the transcription start site) and hypo-methylated otherwise. If there are the same numbers of hyper-/hypo-methylated sites or no hyper/hypo-methylated sites in the promoter region, the gene is considered not differentially methylated. The signature patterns of target genes’ methylation and expression changes by the overlapping driver genes could be hyper-methylated and up-regulated, the “++” pattern; hyper-methylated and down-regulated, the “+-” pattern; hypo-methylated and up-regulated, the “-+” pattern; and hypo-methylated and down- regulated, the “--” pattern. We used a hypergeometric distribution to calculate p-values for the enrichment of each pattern in a cancer type and combined per tumor type p-values using the Fisher’s method (Additional file 12: Figure S2, Table 5).
Table 5

Patterns of target genes’ promoter regions methylation and expression changes by mutations of the overlapping driver genes across tumor types

Overlapping drivergenesTiEiTi ∩ EiDMDE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \frac{\mid DM\cap DE\mid }{\mid DE\mid } $$\end{document}DMDEDE (%)p(DM ∙ DE)p(−−)p(+−)p(−+)p(++)p.methylp.exp
TP53 81171038910066522.8e-3219.7e-957.1e-65100
PTEN 3111225910293610.01411e-151e-1512.98e-100
RB1 23190498062460.00660.810.051e-15100
PIK3CA 121139335822735e-1211e-155.4e-0511.93e-110
ARID1A 12194878573531e-150.981e-150.29100
KRAS 14115041842773117.5e-1210.8700
KMT2D 21198028485524.3e-1411e-150.01417.37e-100
NF1 211102887435539e-0711e-151e-15100
CTNNB1 21175837591370.6411e-151100
EGFR 1211315610606660.009611e-151e-1517.63e-070
HRAS 23287586300473.7e-190.977e-298.1e-20100
BRAF 222912010092471.3e-060.961.3e-147e-29109.99e-16
IDH1 2221059610134556.6e-1617e-297e-29100
CIC 1111406612718711.2e-0811e-151e-15100
NRAS 221841310581435.6e-0611e-151e-15100
RNF43 22299789292514.9e-0813.9e-292e-14100
ATRX 1211164612114602.2e-1611e-151e-15100
ZBTB20 22284156667431.1e-100.948.6e-192.1e-09100
NOTCH1 121102257849551e-1511e-151e-15100
CDH1 22286348254453.6e-1611.1e-233.5e-26100
KEAP1 1111006010030500.411e-150.01711.25e-120
SMARCA4 11161186234300.8611e-150.9311.11e-163.56e-11
KIT 1111503510164740.8714.4e-160.0290.952.54e-091.07e-06
KMT2B 22280666649441.7e-2417e-292.3e-24100
NSD1 111130528233671.2e-100.551.9e-143.1e-101NANA

|DM|: number of differentially methylated genes averaged across T ∩ E tumor types;

|DE|: number of differentially expressed genes averaged across T ∩ E tumor types;

(%): percent of differentially methylated target genes out of differentially expressed target genes, averaged across tumor types T ∩ E

p(DM ∙ DE): p-value testing if number of target genes that are differentially methylated and expression is larger than expected using a hypergeometric distribution combined across tumor types T ∩ Eusing the Fisher’s method.

p(−−), p(+−), p(−+), p(++): p-values that test if number of target genes with “--”,“+-”,“-+”,“++” pattern of methylation and expression changes is larger than expected a using hypergeometric distribution combined across tumor types using the Fisher’s method.

p.methyl: median p-value from testing if the number of overlapping target genes that are differentially methylated by the mutation of the CDG between any pair of two tumor types is larger than expected using a hypergeometric distribution.

p.exp: median p-value from testing if the number of overlapping target genes that are differentially expressed by the mutation of the CDG between any pair of two tumor types is larger than expected using a hypergeometric distribution.

Patterns of target genes’ promoter regions methylation and expression changes by mutations of the overlapping driver genes across tumor types |DM|: number of differentially methylated genes averaged across T ∩ E tumor types; |DE|: number of differentially expressed genes averaged across T ∩ E tumor types; (%): percent of differentially methylated target genes out of differentially expressed target genes, averaged across tumor types T ∩ E p(DM ∙ DE): p-value testing if number of target genes that are differentially methylated and expression is larger than expected using a hypergeometric distribution combined across tumor types T ∩ Eusing the Fisher’s method. p(−−), p(+−), p(−+), p(++): p-values that test if number of target genes with “--”,“+-”,“-+”,“++” pattern of methylation and expression changes is larger than expected a using hypergeometric distribution combined across tumor types using the Fisher’s method. p.methyl: median p-value from testing if the number of overlapping target genes that are differentially methylated by the mutation of the CDG between any pair of two tumor types is larger than expected using a hypergeometric distribution. p.exp: median p-value from testing if the number of overlapping target genes that are differentially expressed by the mutation of the CDG between any pair of two tumor types is larger than expected using a hypergeometric distribution. It is clear that across the 26 overlapping driver genes, target genes’ hyper-methylation are significantly associated with their down-regulation (“+-” pattern) and target genes’ hypo-methylation are significantly associated with their up-regulation (“-+” pattern). A specific example of a target gene that is hypo-methylated and up-regulated by the mutation of TP53 is HSF1 gene. It is hypo-methylated and up-regulated by TP53 mutations across 9 tumor types. Dysregulation of chromatin regulators induces global change of chromatin architecture, which is highly interconnected with DNA methylation. DNA methylation and histone modification interact with each other to determine the chromatin state as an euchromatic (on) or heterochromatic (off) state, where euchromatic state is associated with hypomethylation and active gene expression and heterochromatic state is associated with hypermethylation and repressed gene expression [24]. We also investigated the consistency of the differential gene expression and DNA methylation patterns across tumor types. For each CDG, for every pair of tumor types in which it is mutated in more than five samples, we tested using a hypergeometric distribution if the number of overlapping target genes that are differentially methylated by the mutation of the CDG is larger than expected. We then reported the median p-values (Table 5) from all pairs of two tumor types and repeated the same analysis for differential expression. Both median p-values for differential expression and methylation are ‘0’ or close to ‘0’ for most CDGs, which indicates that the differential expression or methylation associated with CDGs are consistent across tumor types. Note that NSD1 gene was only mutated in one tumor type. Our findings on how CDG mutations contribute to pan-cancer-associated epigenomic alterations and transcriptomic alterations suggest that there are potentially three mechanisms (Fig. 4): 1) genome-wide methylation and expression changes are associated with changes in chromatin states induced by malfunctions of chromatin regulators directly through mutations of these genes; 2) or indirectly through mutations of other genes that regulate the expression of chromatin regulators; 3) or indirectly through mutations of other genes with which chromatin regulators physically interact with for epigenomic regulation.
Fig. 4

Chromatin remodeling is one of the major mechanisms that amplify the impact of mutations in CDGs by global methylation and gene expression changes

Chromatin remodeling is one of the major mechanisms that amplify the impact of mutations in CDGs by global methylation and gene expression changes

Discussion

We conducted a pan-cancer analysis to identify CDGs whose somatic mutations are associated with genome-wide methylation/expression changes across multiple cancer types. We used a straightforward method to compare methylation/expression levels between mutated and non-mutated groups of each CDG. The MDGs identified highly overlap with the driver genes identified whose mutation states are associated with genome-wide methylation changes by Chen et al. [5] (these overlapping genes are underlined in Table 1), where they used a different method, the Principal Component Analysis (PCA). This provides further validation of the MDGs results. However, our method also identified several MDGs that were not identified by Chen et al. [5] including well-known chromatin regulators KMT2B, KMT2C, KMT2D and SMARCA4. The MDGs that we identified do not overlap the 18 master regulators identified by Yang et al. [8]. This is because they did not consider mutation data and focused only on epigenetic enzymes, which exhibit consistently differential expression and DNA methylation instability correlation patterns across cancer types. However, as Table 2 shows, the deregulation of the 18 master regulators in list B are correlated with the mutation status of the MDGs. Interestingly, the MDGs and EDGs also include genes that are associated with telomere length (TL) elongation in cancers. Telomeres shorten with each cell division, therefore, maintenance of telomere length is critical in tumorigenesis. While telomere shortening is often prevented by activation of telomerase reverse transcriptase (TERT), it is also prevented by a homologous recombination-based process known as alternative lengthening of telomeres (ALT), where remodeling of the telomeric architecture may play a key role [25]. Floris et al. [26] recently performed a comprehensive analysis of association between TL and somatic alterations in cancers. To identify TERT-independent TL regulation, they associated somatic alterations of 196 telomere-associated genes to TL ratio between matching tumor and normal samples and found alterations of ATRX, IDH1, TP53, BCOR, and RB1 were significantly associated with relative TL elongation under FDR<0.05. Our MDGs include four out of these five genes, suggesting that chromatin remodeling plays an important role in ALT. In a recent review by Feinberg et al. [27], an epigenetic functional classification system was introduced that classifies epigenetic genes into three categories 1) “epigenetic mediators”, which correspond to tumor progenitor genes that are targets of epigenetic modification; 2) “epigenetic modifiers”, which modify DNA methylation or chromatin structure; and 3) “epigenetic modulators”, which influence activities of epigenetic modifiers to destabilize epigenetic states. Among the 32 MDGs, 8 are well-known chromatin regulators that fall in the category of “epigenetic modifiers”. The remaining 24 genes are considered as new candidates of “epigenetic modulators” that are associated with genome-wide methylation changes through regulating or interacting with chromatin regulators. Further analysis that examined whether mutations of 12 MDGs out of these 24 MDGs are associated with the expression of known epigenetic modifiers across cancer types supports our mechanistic hypothesis that some of these MDGs are the ones that regulate expression of chromatin regulators. Similarly, analysis that examined whether chromatin regulators are enriched among genes that physically interact with the 12 MDGs supports our mechanistic hypothesis that some of these MDGs are the ones that physically interact with chromatin regulators. Seven out of the 24 MDGs: PTEN, PIK3CA, KRAS, HRAS, BRAF, NRAS, and KIT belong to the PI3K/AKT signaling pathway which is known to target and change the function of chromatin-modifying enzymes in SWI/SNF family members [28]. Previous studies provide strong evidence that all these 7 genes are involved in chromatin remodeling. BRAF mutation is known to be tightly associated with a CpG island methylator phenotype (CIMP) and alteration of SWI/SNF chromatin remodeling pathway [29]. RAS genes which were classified as epigenetic modifiers in the review by Feinberg et al. induce global and local chromatin modifications [27]. There is also evidence for direct or indirect interactions between chromatin regulators or chromatin themselves and KIT [30], PIK3CA [31], and PTEN [32]. Another highly enriched function among the identified MDGs is DNA repair. Eight of the 24 MDGs: TP53, PTEN, RB1, FOXA1, BAP1, IDH1 and NF1 are known to play a role in DNA repair, when DNA repair is known to interact with chromatin remodeling. Studies of DNA repair have uncovered that many histone modifications occur after induction of a double-strand break [33]. TP53 binds to and regulates chromatin regulators, including the methyltransferases KMT2A and KMT2D and acetyltransferase KAT6A, resulting in genome-wide increases of histone methylation and acetylation [34]. RB1 is also known to bind to and regulate DNA methyltransferase, histone methyltransferases and histone acetyltransferase [35]. FOXA1 is a pioneer transcription factor whose recruitment to enhancers is associated with DNA demethylation and induction of histone H3 lysine 4 methylation at these enhancers [36, 37]. It was recently uncovered that FOXA1 interacts with components of DNA repair complexes and that the FOXA1-associated DNA repair complex is implicated in active DNA demethylation [38]. BAP1, which is critical for promoting DNA repair by homologous recombination [39], plays a key role in chromatin remodeling by mediating deubiquitination of histone H2A and HCFC1 [40]; IDH1 is classified as an epigenetic modifier in the review by Feinberg et al. [27] and its mutation is known to induce the genome-wide alterations in DNA methylation by inhibiting function of histone and DNA demethylases [41], which also impairs DNA repair [42]. NF1 is also known to participate in chromatin remodeling activities [43]. For the rest of the MDGs, there is evidence supporting many of their involvement in chromatin modification either by interacting with histone modification enzymes or chromatin remodeling complexes or with chromatin directly, such as CDH1 [44], CTNNB1 [45], EGFR [46], KEAP1 [47], NOTCH1 [48], STK11[49], and ZBTB20 [50]. Especially gene CIC, a transcription repressor in the central nervous system identified as the MDG in LGG, physically interacts with a histone methyltransferase KMT3A [51]. Note that CIC mutations are associated with hyper-methylation in LGG both among IDH1 wild-type tumors and IDH1 mutated tumors. Further studies are needed to investigate if the observed clinical and biological impact of CIC mutations in LGG is through hyper-methylation of the epigenome. In this study, we identified CDGs whose somatic mutations are associated with pan-cancer genome-wide methylation/expression changes by using a simple and straightforward method to compare methylation or expression levels between mutated and non-mutated groups of each CDG. We acknowledge that the difference between the two groups may be confounded by other factors, such as mutations in other genes as we observed for TP53 and IDH1 in LGG tumors. However, multivariate approaches such as regression models to control for other gene mutations may not be feasible for our purpose due to highly non-normal distribution of methylation levels and sparseness of mutations. Although we focused on associations of somatic mutations with genome-wide methylation and expression changes in this study, this approach can be readily modified to examine association between copy number variations or structural variations with genome-wide methylation and expression changes.

Conclusions

Our pan-cancer analysis examining connections between somatic mutation and DNA methylation/gene expression identified CDGs (32 MDGs and 29 EDGs) whose somatic mutations are associated with genome-wide methylation/expression changes across multiple cancer types. Many of the identified MDGs are either chromatin regulators or the ones that regulate the expression of or physically interact with chromatin regulators. Twenty-six out of the 29 EDGs overlap with the 32 MDGs. We further confirmed the enrichment of target gene patterns being hyper-methylated and down-regulated or hypo-methylated and up-regulated, by the 26 overlapping genes. These findings highlight that the dysregulation of chromatin regulation is an important mechanism that amplifies the impact of mutations in CDGs by global methylation and gene expression changes. Table S1. Selection of candidate CDGs across tumor types. (XLSX 9 kb) Table S2. A. Number of CDGs mutated in at least 5 samples with methylation data in one tumor type; B. Number of CDGs mutated in at least 5 samples with expression data in one tumor type. (XLSX 8 kb) Text S1. Details with some modified steps for identifying EDGs.; Results of MDGs in LGG; Results from stratified analysis in all other cancers other than LGG. (DOCX 68 kb) Table S3. Number of genes in the methylation null pool. (XLSX 36 kb) Table S4. CDGs associated with significant genome-wide methylation changes in one cancer type. Hyper-methylation is defined as CDGs with pi,k <0.05 and the number of hyper-methylated sites is larger than the number of hypo-methylated sites (). Hypo-methylation is defined as CDGs with pi,k <0.05 and the number of hyper-methylated sites is smaller than the number of hypo-methylated sites (). (XLSX 11 kb) Figure S1. Mutation patterns of the identified 32 MDGs across 20 TCGA tumor types. Each row represents a tumor sample and each column represents a MDG. Light color indicates no mutation and dark color indicates mutations. (TIF 287 kb) Table S5. Results from stratified analysis for the rest of the 15 cancer types (other than LGG) whose genome-wide methylation patterns were affected by one of the identified 32 major MDGs. (XLSX 10 kb) Table S6. Number of tumor samples with mutations in each of the 21 chromatin regulator genes in list B. (XLSX 10 kb) Table S7. Chromatin regulators dysregulated by IDH1 and CIC in LGG. |Ti| is number of tumor types whose genome-wide methylation levels are significantly associated with the mutation status of CDG i; N is number of genome-wide genes whose expression levels are dysregulated by MDG i in all |Ti| tumor types; N is number of genes in list A that are dysregulated in all |Ti| tumor types; N is number of genes in list B that are dysregulated in all |Ti| tumor types; Enrichment PvalueA and PvalueB are calculated using hypergeometric distributions testing if genes in lists A and B occur more frequently than expected by random chance among genome-wide differentially expressed target genes. (XLSX 9 kb) Table S8. CDGs associated with significant genome-wide expression changes in one cancer type. Up-regulation is defined as CDGs with <0.05 and the number of up-regulated genes is larger than the number of down-regulated genes (). Down-regulation is defined as CDGs with <0.05 and the number of up-regulated genes is smaller than the numsber of down-regulated genes (). (XLSX 9 kb) Table S9. Correlation between methylation and expression changes in each cancer type. K1 is number of CDGs mutated in ≥ 5 samples with expression data available; K2 is number of CDGs mutated in ≥ 5 samples with methylation data available; E is set of CDGs with p’<0.05, that is, CDGs whose mutation status are significantly associated with f differentially expressed genes; M is set of CDGs with p<0.05, that is, CDGs whose mutation status are significantly associated with differential methylation; ‘cor’ stands for correlation between the number of differentially methylated sites and the number of differentially expressed genes by the CDG mutation. (XLSX 9 kb) Figure S2. Significance of overlap between genome-wide up/down-regulation and hyper/hypo-methylation associated with the mutation status of the overlapping driver genes. (a) We examined signature patterns of target genes’ promotor regions methylation and expression changes by the overlapping driver genes, i.e., target genes that are hyper-methylated and up-regulated by overlapping driver gene i, the “++” pattern; target genes that are hyper- methylated and down-regulated by overlapping driver gene i, the “+-” pattern; target genes that are hypo-methylated and up-regulated by overlapping driver gene i, the “-+” pattern; and target genes that are hypo-methylated and down- regulated by overlapping driver gene i, the “–” pattern. (b) We calculated a p-value that tests if number of target genes that are differentially methylated and expressed is larger than expected using a hypergeometric distribution, and a p-value that tests if number of target genes with one of the 4 pattern of methylation and expression changes is larger than expected using a hypergeometric distribution, where we combined per tumor type p-values across tumor types using the Fisher’s method. (TIF 94 kb)
  50 in total

1.  The chromatin remodelling factor Brg-1 interacts with beta-catenin to promote target gene activation.

Authors:  N Barker; A Hurlstone; H Musisi; A Miles; M Bienz; H Clevers
Journal:  EMBO J       Date:  2001-09-03       Impact factor: 11.598

2.  Epigenetic switch involved in activation of pioneer factor FOXA1-dependent enhancers.

Authors:  Aurélien A Sérandour; Stéphane Avner; Frédéric Percevault; Florence Demay; Maud Bizot; Céline Lucchetti-Miganeh; Frédérique Barloy-Hubler; Myles Brown; Mathieu Lupien; Raphaël Métivier; Gilles Salbert; Jérôme Eeckhoute
Journal:  Genome Res       Date:  2011-01-13       Impact factor: 9.043

3.  Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer.

Authors:  Gary C Hon; R David Hawkins; Otavia L Caballero; Christine Lo; Ryan Lister; Mattia Pelizzola; Armand Valsesia; Zhen Ye; Samantha Kuan; Lee E Edsall; Anamaria Aranha Camargo; Brian J Stevenson; Joseph R Ecker; Vineet Bafna; Robert L Strausberg; Andrew J Simpson; Bing Ren
Journal:  Genome Res       Date:  2011-12-07       Impact factor: 9.043

4.  A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration.

Authors:  Janghoo Lim; Tong Hao; Chad Shaw; Akash J Patel; Gábor Szabó; Jean-François Rual; C Joseph Fisk; Ning Li; Alex Smolyar; David E Hill; Albert-László Barabási; Marc Vidal; Huda Y Zoghbi
Journal:  Cell       Date:  2006-05-19       Impact factor: 41.582

Review 5.  Epigenetic modulators, modifiers and mediators in cancer aetiology and progression.

Authors:  Andrew P Feinberg; Michael A Koldobskiy; Anita Göndör
Journal:  Nat Rev Genet       Date:  2016-03-14       Impact factor: 53.242

6.  EGFR-mediated chromatin condensation protects KRAS-mutant cancer cells against ionizing radiation.

Authors:  Meng Wang; Ashley M Kern; Marieke Hülskötter; Patricia Greninger; Anurag Singh; Yunfeng Pan; Dipanjan Chowdhury; Mechthild Krause; Michael Baumann; Cyril H Benes; Jason A Efstathiou; Jeff Settleman; Henning Willers
Journal:  Cancer Res       Date:  2014-03-19       Impact factor: 12.701

7.  Sox2 cooperates with Chd7 to regulate genes that are mutated in human syndromes.

Authors:  Erik Engelen; Umut Akinci; Jan Christian Bryne; Jun Hou; Cristina Gontan; Maaike Moen; Dorota Szumska; Christel Kockx; Wilfred van Ijcken; Dick H W Dekkers; Jeroen Demmers; Erik-Jan Rijkers; Shoumo Bhattacharya; Sjaak Philipsen; Larysa H Pevny; Frank G Grosveld; Robbert J Rottier; Boris Lenhard; Raymond A Poot
Journal:  Nat Genet       Date:  2011-05-01       Impact factor: 38.330

8.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors:  Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

9.  Mutant IDH1 Downregulates ATM and Alters DNA Repair and Sensitivity to DNA Damage Independent of TET2.

Authors:  Satoshi Inoue; Wanda Y Li; Alan Tseng; Isabel Beerman; Andrew J Elia; Sean C Bendall; François Lemonnier; Ken J Kron; David W Cescon; Zhenyue Hao; Evan F Lind; Naoya Takayama; Aline C Planello; Shu Yi Shen; Alan H Shih; Dana M Larsen; Qinxi Li; Bryan E Snow; Andrew Wakeham; Jillian Haight; Chiara Gorrini; Christian Bassi; Kelsie L Thu; Kiichi Murakami; Alisha R Elford; Takeshi Ueda; Kimberly Straley; Katharine E Yen; Gerry Melino; Luisa Cimmino; Iannis Aifantis; Ross L Levine; Daniel D De Carvalho; Mathieu Lupien; Derrick J Rossi; Garry P Nolan; Rob A Cairns; Tak W Mak
Journal:  Cancer Cell       Date:  2016-07-14       Impact factor: 31.743

10.  Alternative lengthening of telomeres: remodeling the telomere architecture.

Authors:  Dimitri Conomos; Hilda A Pickett; Roger R Reddel
Journal:  Front Oncol       Date:  2013-02-20       Impact factor: 6.244

View more
  7 in total

Review 1.  Epigenetics/epigenomics and prevention by curcumin of early stages of inflammatory-driven colon cancer.

Authors:  Renyi Wu; Lujing Wang; Ran Yin; Rasika Hudlikar; Shanyi Li; Hsiao-Chen D Kuo; Rebecca Peter; Davit Sargsyan; Yue Guo; Xia Liu; A N Kong
Journal:  Mol Carcinog       Date:  2019-12-09       Impact factor: 4.784

2.  EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer.

Authors:  Leila Mirsadeghi; Reza Haji Hosseini; Ali Mohammad Banaei-Moghaddam; Kaveh Kavousi
Journal:  BMC Med Genomics       Date:  2021-05-07       Impact factor: 3.063

3.  Characterization of Epigenomic Alterations in HPV16+ Head and Neck Squamous Cell Carcinomas.

Authors:  Anders Berglund; Clarisse Muenyi; Erin M Siegel; Abidemi Ajidahun; Steven A Eschrich; Denise Wong; Leah E Hendrick; Ryan M Putney; Sungjune Kim; D Neil Hayes; David Shibata
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2022-04-01       Impact factor: 4.090

4.  Coupled Genome-Wide DNA Methylation and Transcription Analysis Identified Rich Biomarkers and Drug Targets in Triple-Negative Breast Cancer.

Authors:  Maoni Guo; Siddharth Sinha; San Ming Wang
Journal:  Cancers (Basel)       Date:  2019-11-04       Impact factor: 6.639

5.  Widespread redundancy in -omics profiles of cancer mutation states.

Authors:  Jake Crawford; Brock C Christensen; Maria Chikina; Casey S Greene
Journal:  Genome Biol       Date:  2022-06-27       Impact factor: 17.906

6.  A pan-cancer analysis of the oncogenic role of dual-specificity tyrosine (Y)-phosphorylation- regulated kinase 2 (DYRK2) in human tumors.

Authors:  Xinyue Qiu; Cheng Shen; Wenjing Zhao; Xunlei Zhang; Dakun Zhao; Xuming Wu; Lei Yang
Journal:  Sci Rep       Date:  2022-09-14       Impact factor: 4.996

7.  Charting differentially methylated regions in cancer with Rocker-meth.

Authors:  Matteo Benelli; Gian Marco Franceschini; Alberto Magi; Dario Romagnoli; Chiara Biagioni; Ilenia Migliaccio; Luca Malorni; Francesca Demichelis
Journal:  Commun Biol       Date:  2021-11-02
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.