Chromatin regulators (CRs) regulate the gene transcription process through combinatorial patterns, which currently remain obscure for pan-cancer. This study identified the interaction of CRs and constructed CR-CR interaction networks across five tumor cell lines. The global interaction analysis revealed that CRs tend to function in synergistically. In addition, common and specific CRs in interaction networks were identified, and the epigenetic processes of these CRs in regulating gene transcription were analyzed. Common CRs have conserved binding sites but cooperate with different partners in multiple tumor cell lines. They also participate in gene transcription regulation, through mediation of different histone modifications (HMs). Specific CRs, ATF2 and PRDM10 were found to distinguish liver cancer samples with different prognosis. PRDM10 participates in gene transcription regulation, by exertion of influence on the DNA methylation level of liver cancer. Through analysis of the edges in the CR-CR interaction networks, it was found EP300-TAF1 has genome-wide distinct signaling patterns, which exhibit different effects on downstream targets. This analysis provides novel insights for the understanding of synergistic mechanism of CRs function, as controllers of gene transcription across cancer types.
Chromatin regulators (CRs) regulate the gene transcription process through combinatorial patterns, which currently remain obscure for pan-cancer. This study identified the interaction of CRs and constructed CR-CR interaction networks across five tumor cell lines. The global interaction analysis revealed that CRs tend to function in synergistically. In addition, common and specific CRs in interaction networks were identified, and the epigenetic processes of these CRs in regulating gene transcription were analyzed. Common CRs have conserved binding sites but cooperate with different partners in multiple tumor cell lines. They also participate in gene transcription regulation, through mediation of different histone modifications (HMs). Specific CRs, ATF2 and PRDM10 were found to distinguish liver cancer samples with different prognosis. PRDM10 participates in gene transcription regulation, by exertion of influence on the DNA methylation level of liver cancer. Through analysis of the edges in the CR-CR interaction networks, it was found EP300-TAF1 has genome-wide distinct signaling patterns, which exhibit different effects on downstream targets. This analysis provides novel insights for the understanding of synergistic mechanism of CRs function, as controllers of gene transcription across cancer types.
Some epigenetic regulators play an essential role in the regulation of chromatin structure, to control DNA templating, and are broadly reprogrammed in the development and progression of malignant tumors [1], [2]. Chromatin regulators (CRs), which are indispensable in epigenetics, act as master controllers of gene transcription in many common cellular processes [3]. Chromatin regulators include a class of enzymes with specialized function domains, which could recognize, shape, and maintain the epigenetic state in a cell context-dependent fashion, such as histone modifiers, DNA-modifying enzymes, and chromatin remodelers [4], [5], [6], [7]. Histone modifiers modify the basic residues of histone tail, which lead to the alteration of chromatin structure, and in turn influences the transcriptional regulation. Such as histone demethylase, histone acetyltransferase and histone methyltransferase [8]. DNA-modifying enzymes (such as DNA methyltransferases (DNMTs)) could establish and maintain DNA methylation. It could also regulate several DNA mediated processes sequentially, to influence transcriptional regulation [9]. Chromatin remodelers are responsible for chromatin structural changes, and regulation of gene transcription by affecting chromatin accessibility [10], [11].Growing evidence suggests that CRs tend to regulate gene transcription by synergistic interactions, rather than an independent role [12], [13]. For instance, FOXA1 functions as a chromatin remodeler by facilitating access to chromatin for steroid hormone receptors. Histone demethylase LSD1 (KDM1A) acts as an eraser. It was reported that KDM1A cooperated with FOXA1, and they caused the androgen receptor to fail in the completion of the transcription process, which in turn inhibits the growth of prostate tumors [14]. H3K9 and H3K27me methyltransferase enhancer of zeste homolog 2 (EZH2) is a critical histone modifier in the germinal center (GC) reaction. It recruits histone deacetylases HDAC 1/2 and DNA methyltransferases (DNMTs) to inhibit the expression of PRC2, which could silence tumor suppressor genes [15], [16], [17]. These studies demonstrated that the synergy of CRs regulates the transcription process of genes, and affects tumor progression.The increased attention paid to CRs and the advances in genomic technologies (ChIP-seq) [18], [19], allow researchers to characterize chromatin structure at the whole-genome scale. It helps to determine how CRs mediate distinct histone modifications (HMs) and decipher how they influence transcription processes. Integrated analysis of ChIP-Seq datasets of CR and HM could systematically characterize the genome-wide functions of CRs. The latest research revealed that H3K36 methyltransferase NSD1 mediated H3K36me2 is required for the recruitment of DNA methyltransferase DNMT3A, and the maintenance of DNA methylation at intergenic regions [20]. Jhd2 and its human homologs, KDM5B and KDM5A, function as an H3K4 demethylase in human cancer [21], [22], [23]. Combined activities of histone methyltransferase Set1 and H3K4 demethylase KDM5 via H3K4 methylation contribute to positive or negative transcriptional regulation [24]. These results suggest that synergetic CRs play important roles in transcription processes by mediating epigenetic features, including histone modifications and DNA methylation.To reveal the interactions between CRs and the HMs (or histone variants) in related tumors, this study performed an integrative analysis of ChIP-seq datasets for CRs and HMs in five cell types. Next, the CR-CR interaction networks were built, and the common and specific CR nodes were identified across the five networks. In different cell lines, common CRs were associated with diverse histone modifications through cooperation with different partners at similar genomic locations, and further affect the downstream gene expression and biological processes. Additionally, this study also confirmed the synergy between different types of CRs. PRDM10, as a histone methyltransferase, was screened to be the specific CR in the hepatocellular carcinoma (HepG2) cell line. The results show that PRDM10 cooperates with DNA methylator ZBTB33, and causes downstream genome-wide hypomethylation, which further contributed to differences in survival of liver cancer patients. Further detailed analysis shows that the synergistic regulation of CR-CR to target genes was fairly complicated. Even for a pair of specific CR-CR, the multiple patterns had different effects on the regulation of downstream HMs, and target genes such as EP300-TAF1 interaction. In summary, CRs regulate gene transcription processes through a variety of synergistic patterns.
Materials and methods
Data sets
CR and corresponding ChIP-seq datasets: the list of CRs was collected from our previous study, including 870 CRs (Supplementary Table S1) [5]. Total of 644 alignment BAM files were obtained in human non-small cell lung cancer (A549), hepatocellular carcinoma (HepG2), myelogenous leukemia (K562), breast cancer cells (MCF-7), and neuroblastoma (SK-N-SH) tumor cell lines, from the Encyclopedia of DNA Elements (ENCODE) and ChIP-Atlas [25], [26]. A total of 166 CRs were collected. Readings from CRs ChIP-seq data were aligned to hg38/GRCh38 assembly of the human genome. The full ENCODE or ChIP-Atlas accession of CR ChIP-seq datasets and their whole cell extracts were shown in Supplementary Table S2. In the subsequent analysis, only the CR with ChIP-seq dataset in more than two cell lines were retained.Histone modification (HM) ChIP-seq datasets: for each of these cell lines, the alignment BAM files from ChIP-seq experiments for HMs were also downloaded from ENCODE. The sequence readings were mapped to the human reference genome by hg38/GRCh38 version. The data covers various types of HMs and histone variants, which include H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3K9me3, H2AFZ, H3F3A, H3K79me2, H3K9ac, H3K9me2 and H4K20me1. The full ENCODE accession of HM ChIP-seq datasets and their whole cell extracts are shown in Supplementary Table S3.Gene expression datasets: the RNA-Seq data of liver tissue was downloaded from UCSC Xena (https://xena.ucsc.edu/) and quantified gene expression as FPKM. The expression data of the HepG2 cell line was downloaded from the Cancer Cell Line Encyclopedia (CCLE).DNA methylation dataset: the HM450 DNA methylation profile of liver cancer tissue was downloaded from the TCGA database. Probes with missing values in >30% of the samples were removed, and other missing values were imputed using the k-nearest neighbors (KNN) imputation procedure.
Analysis of ChIP-Seq data (Peak Calling)
The MACS2 (version 2.1.4) [27] peak calling was used to compare the ChIP-seq signals with a corresponding whole cell extract sequenced control. The enriched intervals with the threshold of q value <0.05 were considered as binding profiles (peaks) for a CR or HM. It is only considered a peak, if it was found in at least two replicate ChIP-seq experiments. DeepTools2 (version 3.5.1) [28] was used to normalize and visualize the genome-wide data.
Identification of CR-CR pairs and CR-HM relationship pairs
The identification of CR-CR pairs and CR-HM relationship pairs were based on Oren Ram's method, which was published in Cell in 2011 [6]. First, five sets of CR binding sites (CRBSs) intervals were compiled, by merging all peaks of CRs for each cell type. Next, sliding windows were constructed by ‘makewindows’ function of Bedtools software (version 2.30.0) [29]. The size of sliding windows was set as 200 bp center on peak summit positions. The BAM files were transformed into BigWig files with bamCoverage (version 3.5.1) via the arguments normalizeUsing RPKM. Used the sliding windows of peaks and the BigWig files of CRs in each tumor cell line as input for the function multiBigwigSummary in DeepTools2, the average scores of these sliding windows in each tumor cell line were got. Then, the Pearson correlation coefficient R and BH-FDR values between each pair of CRs were calculated, based on their average scores across all windows. Those CR-CR pairs with R > 0.4 and BH-FDR < 0.01 were considered as synergistic CR-CR interactions. The CR-CR pairs with R < -0.4 and BH-FDR < 0.01 were defined as the antagonistic pairs (Fig. 1A). Similarly, the CR-HM regulatory interactions were identified by using the same co-location method, and set the threshold as R > 0.3, BH-FDR < 0.01. The visualization of CR-CR interaction networks was achieved through Cytoscape software (version 3.9.0) [30], [31].
Fig. 1
The pipeline of identifying the interactions of CR-CR and the global CR-CR interactive patterns. (A) The processes of identifying CR-CR interactions for each cell type. Step I: collection and preprocessing of CR ChIP-seq datasets of five cancer cell lines from Encode database. Step II: obtain the binding sites of CRs, merge these intervals and normalize the signal to 200 bp windows. Step III: pairwise calculation of Pearson correlation coefficient based on window-based signal distribution to get the CR-CR interactions using the method described in section “Materials and Methods”. (B) The correlation matrix reflects pairwise correlations of CRs in five cell lines respectively, including positive correlation (purple), no correlation (white) and negative correlation (yellow) between CRs. CRs divide into modules based on correlated binding signal in each cell line.
The pipeline of identifying the interactions of CR-CR and the global CR-CR interactive patterns. (A) The processes of identifying CR-CR interactions for each cell type. Step I: collection and preprocessing of CR ChIP-seq datasets of five cancer cell lines from Encode database. Step II: obtain the binding sites of CRs, merge these intervals and normalize the signal to 200 bp windows. Step III: pairwise calculation of Pearson correlation coefficient based on window-based signal distribution to get the CR-CR interactions using the method described in section “Materials and Methods”. (B) The correlation matrix reflects pairwise correlations of CRs in five cell lines respectively, including positive correlation (purple), no correlation (white) and negative correlation (yellow) between CRs. CRs divide into modules based on correlated binding signal in each cell line.
Identification of CR-gene regulatory interactions and analysis of DNA sequence motif
Gene annotation of the UCSC hg38 genome was used. The promoter of a gene was defined as 1 kb upstream and downstream of the transcriptional start site (TSS). For each cell type, a gene was considered as a target of a CR if it had CR binding site in the promoter region. DNA sequence motif analysis was performed using HOMER (version 4.10). CR motifs with p < 0.005 were identified as enriched motifs.
Evaluating the effect of CRs on abnormal DNA methylation
To calculate the methylation regulation of CRs on liver cancer samples, HyperZ and HypoZ indices were calculated based on the downloaded DNA methylation data of liver cancer samples. The values could measure the genome-wide hypermethylation and hypomethylation levels of each sample, respectively. The indices were performed in reference to a previous study [32].All CpG sites in CGI (CpG islands) or open sea regions were extracted. A maximum cluster width of 1500 bp and a maximum gap of 500 bp between any two neighboring CpGs, CpG sites were particularly grouped into CGI clusters or open sea clusters. This was completed with the boundedClusterMaker function of the bumphunter BioC package. The methylation level of these clusters was defined as the average beta value of CpG sites.For each given tumor sample cluster, labeled s, in a given cluster labeled r, the Z score was defined as:and denote the mean and standard deviation of the DNA methylation level of the regional cluster r over all the normal tissue samples. Z score reflects the absolution deviation in DNA methylation of that cluster in the given cancer sample relative to all normal samples of the liver tissue [32].Since cluster regions in promoter CGIs and open sea regions usually show hypermethylation and hypomethylation in cancer samples respectively, only positive and negative Z scores were used to calculate the hypermethylation and hypomethylation for each cancer sample. The mean methylation level of the CGI cluster in the cancer sample was defined as HyperZ, which represents the genome-wide hypermethylation value of the sample. Similarly, HypoZ was calculated by the mean methylation level of the open sea cluster in the cancer sample, which reflects the genome-wide hypomethylation value of the sample.Then the Pearson correlation coefficient and BH-FDR between the expression of CRs and their HyperZ/ HypoZ indices were calculated respectively. If the expression of a CR was significantly correlated with HyperZ or HypoZ (BH-FDR < 0.05), it was believed that this particular CR had a regulatory role in genome-wide hypermethylation or hypomethylation.
Prognosis analysis of specific CRs in HepG2 cell line
Survival data of 463 patients with liver cancer were downloaded from TCGA Data Portal. The patients were divided into three groups, in accordance with the expression of synergistic CRs with PRDM10 or ATF2 (two specific CRs of liver cell line), through hierarchical clustering method. The Kaplan–Meier survival plots and log-rank tests were used to evaluate the survival differences between groups of patients. This process was performed with the R package ‘survminer’.
Functional analysis
Metascape (https://metascape.org) was used to further verify the function enrichment of synergistic CRs [33]. The cutoff value was set to P < 0.05, Gene Ontology (GO) annotation was visualized by R software. GREAT web interface was used (version 4.0.4) (https://great.stanford.edu/public/html/) to visualize EP300 and RCOR1 putatively regulated genes TSS distance and annotate their enriched biological process with whole genome (GRCh38/hg38) as background parameters [34].
Results
The general synergistic patterns of CR across five tumor cell lines
A pipeline was proposed to identify CR-CR interactions in single human tumor cell line, by integrating ChIP-seq datasets of CRs. This involved three steps (Fig. 1A). First, the study selected the cell line in which number of CRs ChIP-seq data was more than ten, including A549, K562, HepG2, MCF-7, and SK-N-SH cell lines. Second, the peaks of CRs binding sites (CRBSs) were obtained, then the Pearson correlation coefficient and significance for each pair of CRs were calculated, based on signal distributions across these CRBSs intervals. Related CR-CR pairs (R > 0.4 or R < -0.4, BH-FDR < 0.01) were identified. CR-CR interaction networks across five cell lines were constructed (Supplementary Fig. S1), through the integration of the CR-CR pairs.All CR-CR pairs for each cell line were hierarchically clustered. The dendrogram and correlation heatmap results revealed the major synergistic pattern of CRs across five cell lines (Fig. 1B). In A549 and SK-N-SH cell lines, CRs were aggregated into smaller modules, with 2–4 CRs in each module. In the HepG2, K562 and MCF-7 cell lines, more CRs were aggregated into larger modules, with 14–20 CRs in each module. Notably, 69% of CRs were aggregated into a large module in the MCF-7 cell line. Meanwhile, the modules across different cell lines tended to be consistent, such as CBX8 (PRC1 components) cooperated with RNF2 in A549 and K562 cell lines [35], [36]. This is consistent with a prior study, which described the combinatorial patterning of 29 CRs in K562 cell line. The same synergistic modules of SMARCE1, DPF2 and MTA1 were found in K562 and MCF-7 cell lines. Apart from the extensive synergistic interaction between CRs, there were also several CRs which showed antagonistic binding against other CRs, such as CTCF. It was reported as a mark for chromatin insulators [37], [38]. CTCF was discovered antagonizing with other CRs in the A549 cell line except RCOR1. CBX2, CBX8 and RNF2 tend to antagonize with most other CRs that play active roles in K562 cell line, which was consistent with their repressive roles in a previous study [6]. These results suggest that most CRs tended to coordinate with certain CRs to perform functions in a modular manner.Through the extraction of significantly synergistic or antagonistic CR-CR pairs, 4 to 408 related CR-CR pairs across five cell lines were obtained. This involved 7 to 64 CRs (Table 1). Next, the clustering coefficients of each CR-CR interaction network were analyzed, except for SK-N-SH due to its small number of nodes. Compared with random networks, the clustering coefficient of each CR-CR interaction network was significantly higher (all P < 0.001, Supplementary Fig. S1). Among these CR-CR pairs, the vast majority were synergistic pairs, and only a few pairs showed antagonistic effects. Overall, 3 antagonistic CR pairs were found in the A549 cell line, and 17 antagonistic CR pairs in the K562 cell line. In summary, the results show that most of the CR-CR pairs present a synergistic effect.
Table 1
The number of nodes and edges of CR-CR associated networks across five cell lines.
Cell lines
Number of nodes
Number of edges
A549
21
46
HepG2
56
408
K562
64
396
MCF-7
29
275
SK-N-SH
7
4
The number of nodes and edges of CR-CR associated networks across five cell lines.
Common CRs have diverse synergistic partners across five cell lines
There were large-scale synergistic CRs in five cell lines, thus the study aimed to further explore the common and specific characteristics across various tumor types, based on the synergistic pattern of CRs. As shown in Supplementary Table S4, eight CRs (EP300, SIN3A, RCOR1, TAF1, ZBTB33, MAX, REST, and CTCF) have ChIP-seq data in five tumor cell lines, however only four common CRs (EP300, SIN3A, RCOR1 and TAF1) are associated with other CRs (FDR < 0.01 and |R| > 0.4). In addition, five specific CRs including ATF2, PRDM10, CTCF, EZH2 and KAT2B play roles in a specific cell line (Fig. 2A).
Fig. 2
Common CRs have conserved binding sites and different synergistic partners across five cell lines. (A) Venn diagram shows intersection of the CRs in A549, K562, HepG2, MCF-7 and SK-N-SH cell lines. (B) Examples of common CRs have conserved binding sites in chromosome 1 and chromosome 3. ChIP-seq signals for common CRs are shown on four genome loci (chromosome 3: 149,179,900–149,180,500; chromosome 1: 77,682,310–77,683,450; chromosome 3: 167,734,000–173,734,000; chromosome 1: 77,779,450–77,780,370). (C-F) Alluvial diagram of common CRs show their synergistic CRs in different cell lines.
Common CRs have conserved binding sites and different synergistic partners across five cell lines. (A) Venn diagram shows intersection of the CRs in A549, K562, HepG2, MCF-7 and SK-N-SH cell lines. (B) Examples of common CRs have conserved binding sites in chromosome 1 and chromosome 3. ChIP-seq signals for common CRs are shown on four genome loci (chromosome 3: 149,179,900–149,180,500; chromosome 1: 77,682,310–77,683,450; chromosome 3: 167,734,000–173,734,000; chromosome 1: 77,779,450–77,780,370). (C-F) Alluvial diagram of common CRs show their synergistic CRs in different cell lines.Correspond to target genes among five cell types for each common CR, there are same common binding sites (Table 2, Fig. 2B). A large proportion of these binding sites are shared between cell lines as shown in Supplementary Fig. S2. Then it was explored whether each common CR cooperated with the same CRs in different cell lines. The results show that the same synergistic interactions exist in no more than three cell lines. There were only 7.1% synergistic pairs in three cell lines, and 30.5% synergistic pairs maintained in two cell lines, while the majority of synergistic pairs were cell line specific at a ratio of 62.4%. The synergistic pair of EP300 and HDAC2 in HepG2, K562 and MCF-7 cell lines was consistent with prior reports, which noted that the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300 were mediated by cell-line specific motifs [39] (Fig. 2C). The latest research found the role of LSD1(KDM1A)/RCOR1 corepressor complex for proteasomal degradation in human hematopoietic stem cells [40], [41]. The synergistic role of KDM1A and RCOR1 was also found in the K562 cell line (Fig. 2D). SIN3A had been reported to be a major HDAC1/2 co-repressor [42], [43], [44], [45]. SIN3A cooperated with HDAC2 in A549, HepG2, and K562 cell lines in our study (Fig. 2E). Recent research showed the binding of YY1 and TAF1 in schizophrenia [46]. Their synergistic relationship also exists in A549, HepG2 and K562 cell lines (Fig. 2F). In addition, new synergistic CRs which had not been reported in the previous research were also identified. For an instance, EP300 cooperated with DPF2 in the HepG2, K562 and MCF-7 cell lines. In summary, many important CRs were identified in multiple cancer types. These include both the synergistic CRs proven in the research, and the novel synergistic model of CRs. The results underlined that common CRs play an important role in cell lines, but their synergistic CRs were rewired in these cell lines, which suggest that the role of common CRs may vary across different cell lines.
Table 2
The number of common CRs binding sites in five cell types.
Cell lines
EP300
RCOR1
SIN3A
TAF1
A549
20,590
827
31,603
14,469
HepG2
8967
16,897
17,204
22,671
K562
26,026
21,259
24,297
25,197
MCF-7
8499
14,883
29,501
2534
SK-N-SH
27,903
17,168
32,444
35,251
Overlap
684
417
9943
2269
The number of common CRs binding sites in five cell types.
Common CRs have conserved binding sites whereas mediate rewired histone modifications
The binding sites in five cell lines were analyzed, to uncover their combinatorial binding of common CRs to the genome. The results show that the binding pattern of common CRs in different cell types was relatively similar, which is consistent with Fig. 2B. SIN3A and TAF1 were integrally characterized, through preferentially binding in promoter regions (Fig. 3A and Supplementary Fig. S3A). Around 40% to 75% of the binding sites overlaps TSSs, which is consistent with prior reports [47]. However, the binding sites of SIN3A and TAF1 are not completely identical across the five cell lines. SIN3A had 11,864 peaks on the promoter regions in SK-N-SH, and the numbers of the promoter regions peaks in A549, HepG2, K562 and MCF-7 cell lines are 16,307, 11,982, 17,957 and 19,438 respectively (Fig. 3A). Compared to the other four cell lines, the binding sites of TAF1 have preference location at the promoter regions in the MCF-7 cell line (Supplementary Fig. S3A). However, EP300 preferentially bound intron, intergenic and distal regulatory elements, the same trend was observed in all tumor cell lines (Fig. 3A). RCOR1 preferentially bound to promoters, introns, and distal intergenic regions. In K562, MCF-7 and SK-N-SH cell lines, the binding sites are more evenly distributed in different regions, while in A549 and HepG2 cell lines, more sites are bound to the promoter regions (Supplementary Fig. S3A). Further analysis of the binding sites for each common CR synergistic partner showed that partners of common CR are roughly similar to them in different cell lines, but there are some differences (Fig. 3B and Supplementary Fig. S3B). For instance, <50% of SIN3A's synergistic partners located at promoters. The results demonstrate that common CRs tend to bind conserved sites, but there are some differences in different cell lines. On the other hand, although partners of common CR are rewired across tumor cell lines, the corporate targeting sites of common CRs and their partners are generally conserved across five cell lines.
Fig. 3
Common CRs SIN3A and EP300 combine with specific regulatory elements while mediate different histone modifications across five cell lines. (A) Pie chart depicts the distribution of SIN3A and EP300 binding sites in the genome. (B) The ratio of binding regulatory elements of CRs that cooperate with SIN3A and EP300. (C) The histone modifications mediated by SIN3A and EP300 across five cell lines. The values are colored based on the Pearson correlation coefficients between CRs and HMs. (D) Dynamics of SIN3A mediated histone modifications centered on SIN3A binding sites in the promoter region. (E) Dynamics of EP300 mediated histone modifications centered on EP300 binding sites in the promoter region.
Common CRs SIN3A and EP300 combine with specific regulatory elements while mediate different histone modifications across five cell lines. (A) Pie chart depicts the distribution of SIN3A and EP300 binding sites in the genome. (B) The ratio of binding regulatory elements of CRs that cooperate with SIN3A and EP300. (C) The histone modifications mediated by SIN3A and EP300 across five cell lines. The values are colored based on the Pearson correlation coefficients between CRs and HMs. (D) Dynamics of SIN3A mediated histone modifications centered on SIN3A binding sites in the promoter region. (E) Dynamics of EP300 mediated histone modifications centered on EP300 binding sites in the promoter region.The four common CRs are all histone modifiers, thus the pattern of common CRs associated with histone modifications were explored. We found that SIN3A is in association with H3K4me2, H3K4me3, H3K27ac and H3K9ac, in five cell types (Fig. 3C). This is consistent with a recent study about RREB1 recruiting SIN3A and KDM1A to control H3K4 methylation, at MAPK pathway gene promoters [48]. The pattern of TAF1 is similar to SIN3A. This may be due to the similarity between TAF1 and SIN3A cis-regulatory regions (Supplementary Fig. S3C). A prior study reported that over 70% of EP300 sites are distal from TSSs, and ∼50% of those distal regions are enriched for modifications, which correlate with enhancer activity, such as H3K4me1 and H3K27ac [49], [50]. Similarly, the peak of EP300 overlapped with H3K4me1 and H3K27ac modifications in MCF-7 and HepG2 cell lines (Fig. 3C). This lead to the activation of downstream gene expression [51]. RCOR1 is associated with H3K4me2 modification in A549, HepG2, K562 and MCF-7 (Supplementary Fig. S3C). It inhabits the expression of downstream target genes [52]. This study found that common CRs associated with rewired histone modifications across five cell lines, all common CRs have conservative binding sites at promoters. Then the dynamics of histone modifications at the binding sites of common CRs promoter regions were analyzed. For the same histone modification, its positioning pattern on the common CRs promoter binding regions were the same (Supplementary Fig. S3D, E and Fig. 3D, E). SIN3A is associated with different patterns of H3K27ac, H3K9ac, H3K4me2 and H3K4me3 in all cell lines (Fig. 3D). However, EP300 deposited similar patterns of H3K27ac in A549, HepG2, K562 and MCF-7 cell lines (Fig. 3E). Importantly, for four common CRs, the positioning pattern of histone modifications in MCF-7 are all completely different from other cell lines. These may be due to the large-scale synergistic CRs in MCF-7 cell line (Fig. 1B). The study also found, that in diverse cell lines, common CRs generally regulate different important biological processes, which play key roles in gene transcription (Supplementary Fig. S4A-D). In addition, EP300 and RCOR1 have more peaks away from the TSS of their putative regulatory gene (Supplementary Fig. S4E-F). SIN3A participate in ribosomal large subunit biogenesis, regulation of mitochondrial gene expression, the regulation of telomerase RNA localization, the regulation of RNA splicing and DNA-templated transcription in A549, HepG2, K562, MCF-7 and SK-N-SH cell lines, respectively.In summary, each common CR plays an important role in different cell lines. Its binding sites are relatively similar, but its synergistic CRs may be different, and are associated with rewired downstream HMs. These further affect different biological functions, which may ultimately lead to different phenotypes.
Specific CRs play vital roles in particular cell lines through mediate downstream epigenetic modifications
In addition to common CRs, five specific CRs were also identified, including CTCF, EZH2, KAT2B, ATF2 and PRDM10 (Fig. 2A). Among them, CTCF tends to play an independent role in the A549 cell line. EZH2 specifically exhibits synergistic and antagonistic dual effects in the K562 cell line. It synergized with SUZ12 and antagonized with SP1 and SRSF. EZH2 is in association with H3K27me3 histone modification in the K562 and HepG2 cell lines. The Pearson correlation coefficient of them is 0.617 and 0.623 respectively. This is consistent with the report that H3K27me3 is mediated by EZH2. KAT2B associated with histone variant H2AFZ and histone modification H3K9me3 with its binding sites evenly distributed across the genome. The detailed information of their synergistic CRs, associated with histone modifications and histone variants are listed in Supplementary Table S5 and S6. Due to lacking corresponding transcription data in the K562 cell line, special CRs ATF2 and PRDM10 in HepG2 were chosen for further study.It was found that ATF2 cooperates with 5 chromatin remodelers (SFPQ, SRSF1, SP1, NFYB and MTA1) and 8 histone modifiers (CBX5, EP300, HDAC1/2/6, TFDP1, BRCA1 and KDM4B) (Supplementary Fig. S5A). ATF2 is in association with H3K4me2, H3K4me3, H3K27ac and H3K9ac in the HepG2 cell line, with about 45% promoter binding sites, and shows a consistent bimodal distribution trend (Supplementary Fig. S5B-C). These results also illustrate the nucleosome occupancy around TSS. CRs usually play a key regulatory role in the occurrence and development of tumors, the expression of ATF2 in liver tumor tissue is significantly higher than that of normal tissue (Wilcoxon rank-sum test P = 1.2e-12, Supplementary Fig. S5D). It is widely known that tumors exist multiple subtypes with different molecular perturbations and treatment responses. Therefore, this study devotes to investigating the contribution of CRs to tumor subtyping. Based on transcriptome data of ATF2 and its synergistic CRs, the liver tumor samples were divided into three groups (Supplementary Fig. S5E). The Kaplan–Meier survival plot and log-rank test show that the difference between the three groups is significant (Log-rank P = 0.0081, Supplementary Fig. S5F). These results suggest that the expression of ATF2 and its synergistic CRs could distinguish liver cancer patients (LIHC) with different survival times, the result is consistent with the report that ATF2 influences the development of hepatocellular carcinoma [53].PRDM10 cooperated with 11 CRs, which include the histone deacetylase HDAC1/2. This is usually related to the formation of SMRT/NCOR complexes [54] (Fig. 4A). Its binding sites distribution is similar to histone deacetylases, with more binding sites located at promoters (Fig. 4B). Noticeably, the zinc finger protein ZBTB33 is a synergistic CR of PRDM10. ZBTB33 is a DNA methylator and has been shown to bind preferentially to methylated DNA and to interact with the SMRT/NCOR histone deacetylase complexes in vitro [55]. Combing through PRDM10 expression data and methylation data of LIHC downloaded from the TCGA database, PRDM10 was found to be significantly negatively correlated with the HypoZ indice (P < 0.05, Fig. 4C). Meanwhile, the expression of PRDM10 in LIHC is higher than in normal tissue (P = 1.3e-14, Fig. 4D). PRDM10 was reported as a supporting factor of NK cell function and it has connection with immune evasion of hepatocellular carcinoma cells, it might influence the progression of hepatocellular carcinoma [56]. In this study PRDM10 and its synergistic CRs also are associate with the development of hepatocellular carcinoma (Log-rank P < 0.0001, Fig. 4E-F). Notably, PRDM10 and ATF2 shared nine CRs, among these shared CRs, eight CRs including HDAC1/2, TFDP1, SRSF1, MTA1, SP1, BRCA1 and KDM4B were all reported in connection with the invasion and progression of hepatocellular carcinoma cells [57], [58], [59], [60], [61], [62], [63], [64]. These results suggest that CRs regulate the diversity of the transcription process. It not only mediates histone modifications, but also affects DNA methylation and different types of CRs can collaboratively influence biological processes, such as the occurrence and development of cancer.
Fig. 4
Specific CR PRDM10 influenced liver cancer progression by affecting DNA methylation. (A) The network of CRs cooperate with PRDM10. ZBTB33 is one of the DNA methylators. (B) Pie chart depicts the distribution of PRDM10 binding sites in the genome. (C) Scatter plots show correlation of PRDM10 expression with the HypoZ indice in LIHC. R represents the Pearson correlation coefficient, and ρ represents the Spearman correlation coefficient. (D) PRDM10 exhibits higher expression in LIHC patients. The p value was calculated by Wilcoxon rank-sum test. (E) The heatmap depicts the expression of PRDM10 and its synergistic CRs in three groups of LIHC patients. (F) Kaplan–Meier survival curves plotted for three groups LIHC patients stratified by PRDM10 and its synergistic CRs expression; differences assessed with the log-rank test.
Specific CR PRDM10 influenced liver cancer progression by affecting DNA methylation. (A) The network of CRs cooperate with PRDM10. ZBTB33 is one of the DNA methylators. (B) Pie chart depicts the distribution of PRDM10 binding sites in the genome. (C) Scatter plots show correlation of PRDM10 expression with the HypoZ indice in LIHC. R represents the Pearson correlation coefficient, and ρ represents the Spearman correlation coefficient. (D) PRDM10 exhibits higher expression in LIHC patients. The p value was calculated by Wilcoxon rank-sum test. (E) The heatmap depicts the expression of PRDM10 and its synergistic CRs in three groups of LIHC patients. (F) Kaplan–Meier survival curves plotted for three groups LIHC patients stratified by PRDM10 and its synergistic CRs expression; differences assessed with the log-rank test.
The hierarchical effects of synergistic CRs binding levels with their associated histone modifications
The edges were then analyzed in the CR-CR interaction network. EP300 and TAF1 are the common CRs in five cell types, and they have a synergistic effect in the HepG2 cell line. Then all of their binding sites were divided into three clusters using the Elbow method, based on signal distributions across enriched intervals. These three clusters involve 2869, 2188 and 4574 sites, respectively. The cluster 1 and cluster 3 are primarily located in the promoter regions, while the cluster 2 is mostly located in the intron and distal intergenic regions (Supplementary Fig. S6 and Fig. 5E). There are three patterns in binding sites of EP300 and TAF1: the signal of TAF1 higher than EP300 (cluster 1), the signal of TAF1 lower than EP300 (cluster 2), the signal of TAF1 nearly equal to EP300 (cluster 3). The signal distributions of EP300, TAF1 and their associated HMs in the HepG2 cell line were divided into three clusters. The HMs include H3K27ac, H3K9ac and H3K4me2 (Fig. 5A). The synergistic CRs exist multiple signal patterns in different genomic regions. These multiple signal patterns of synergistic CRs further affect associated histone modifications. Among the genome regions in cluster 1, the signal distributions of H3K27ac, H3K9ac and H3K4me2 are persistently occupied. However, in the cluster 2, the signal distributions of H3K27ac further enhanced while the signal of H3K9ac and H3K4me2 weakened. The sharp drop in CRs signal in cluster 3 is accompanied by the weak signal distribution of these three histone modifications (Fig. 5A). These results revealed that there are varying signal patterns of synergistic CRs in different genomic regions. Compared with binding genomic regions, histone modifications are more affected by the signal patterns of particular CRs. The correlation between histone modifications signal and CRs signal is maintained in these patterns, for example, the signal pattern of H3k27ac is same as that of EP300.
Fig. 5
The synergetic regulation of EP300 and TAF1 has a complex model with different effects on downstream target genes. (A) ChIP-seq signal distribution of EP300, TAF1 and their mediated histone modifications. A region around ±3 kb of EP300 and TAF1 binding sites is shown. The peaks are grouped into three clusters based on the dynamics of EP300 and TAF1 occupancy. (B) TFs enriched in three clusters. (C) The expression of target genes in three clusters. (D) Biological processes enriched by target genes of three clusters. (E) Genome browser snapshots for representative genomic loci depict three clusters mediate histone modifications.
The synergetic regulation of EP300 and TAF1 has a complex model with different effects on downstream target genes. (A) ChIP-seq signal distribution of EP300, TAF1 and their mediated histone modifications. A region around ±3 kb of EP300 and TAF1 binding sites is shown. The peaks are grouped into three clusters based on the dynamics of EP300 and TAF1 occupancy. (B) TFs enriched in three clusters. (C) The expression of target genes in three clusters. (D) Biological processes enriched by target genes of three clusters. (E) Genome browser snapshots for representative genomic loci depict three clusters mediate histone modifications.Based on the sequences of cluster 1 and cluster 3, the binding sites of ELK4, ELK1 and ELF1 are all significantly enriched. The cluster 1 and cluster 3 mainly concentrate in the promoter regions. For the cluster 2, three CRs were identified significantly binding to the sequences which are different from the other two clusters (Fig. 5B). These revealed that synergistic CR combining is more affected by binding genomic regions. The impact of three clusters on downstream target genes were further analyzed. The result shows no difference between target gene expression of cluster 1 and cluster 2 in CCLE (Wilcoxon rank-sum test P = 0.9, Fig. 5C). However, the expression of target genes in cluster 3 is significantly higher than cluster 1 and cluster 2 (Wilcoxon rank-sum test P < 2.2e-16 and P = 0.00097, respectively). These results were also confirmed in liver tissue samples (Fig. 5C). This implies that different signal patterns of synergistic CRs is associated with various downstream HM intensity, which result in downstream gene expression differences. Functional enrichment of target genes show that three clusters are respectively enriched in different biological functions, cluster 1 enriched to DNA repair and autophagy, cluster 2 participate in the regulation of protein modifications, and cluster 3 is involved in the mRNA metabolic process (Fig. 5D). The cluster 1 and cluster 3 are localized at the promoters of DOT1L and FOXA1 respectively. The cluster 2 is positioned in the distal of HLF. These target genes were reported by EP300 or TAF1 [65], [66], [67](Fig. 5E). The binding sites of cluster 1 and cluster 3 tend to bind to promoter regions. However, the binding sites of cluster 2 are almost all located in the distal regulatory region. It implies a unique binding pattern for each pair of CR-CR.These results revealed that there are multiple signal patterns of synergistic CRs in their binding sites. The signal patterns of CRs have various effects on histone modifications, binding motif, regulation of downstream target genes and biological functions. These further supports the notion of CRs regulate the transcription process of downstream target genes in a variety of ways.
Discussion
CRs are crucial regulatory factors of epigenetics, which have important roles in many cellular events [68], [69]. Despite their indispensability, the systematic binding patterns in cancers remain largely unknown. In this study, the landscape of CRs synergistic patterns in A549, HepG2, K562, MCF-7 and SK-N-SH cell lines from ENCODE database were comprehensively depicted. The results showed that CR-CR interactions have both synergistic and antagonistic effects, and most CRs tend to act synergistically. These maybe due to a lot more CRs with active roles than CRs with repressive roles were selected and antagonistic effects are more likely from a pair of CRs with opposite roles. Common CRs which simultaneously cooperated with other CRs in multiple cell lines were defined. Specific CRs which cooperated with other CRs only in one cell type were also defined. Furthermore, common CRs have conserved binding sites, but cooperate with different partners and associate with diverse histone modifications in multiple tumor cell lines. For example, TAF1 have 2269 same binding sites in five cell lines, it associates with H3K9ac and H3K4me3 by cooperating with YY1, KDM5B, MAZ, MBD2 etc. Meanwhile, specific CRs and their synergistic CRs play important roles in identifying prognostic biomarkers. This reinforced the understanding of the underlying mechanisms for the pathogenesis of human cancers.Through analysis, common CR-CR synergistic pairs and CR-HM regulatory pairs were found in the 5 cell lines. Furthermore, it was analyzed whether the correlation strength between CR-CR synergistic pairs and CR-HM regulatory pairs is conserved between different cell lines. The trend of CR-CR pairs is conservative in A549 and K562 cell lines (ρ = 0.67, P = 0.04). Fewer CR-CR pairs were recognized in the SK-N-SH cell line. The same CR-CR pairs were not found in other cell lines. The trend of CR-HM pairs is conserved in A549, K562, HepG2 and SK-N-SH cell lines, with all the Pearson correlation >0.3, and P values less than or equal to 0.05 (Supplementary Fig. S7). Besides, the strength of the relationship between CR-CR and CR-HM obtained in the MCF-7 cell line is quite different from other cell lines. The specific reasons need to be further studied. We also found that common CRs associate with rewired histone modifications across five cell lines. There are more CRs associated with histone modifications in the MCF-7 cell line and fewer CR-HM regulatory pairs in the SK-N-SH cell line. This may be due to a lack of comparable number of ChIP-seq data, there were 11 CRs have ChIP-seq data in SK-N-SH cell line but 33 CRs with ChIP-seq data in MCF-7 cell line.Besides DNA-modifying enzymes and histone modifiers, CRs can also act as chromatin remodelers [10], [70]. They achieve downstream regulation by twisting and folding chromatin [11], [71]. However, due to the limitation of the corresponding data such as Assay for Transposase-Accessible Chromatin with high throughput sequencing (ATAC-seq) and High-throughput chromosome conformation capture (Hi-C), we only analyzed how DNA-modifying enzymes and histone modifiers regulate the transcription process.Overall, this study comprehensively analyzed the interactions among CRs and how they regulate gene transcription. We found that CRs perform their regulatory function in multiple ways. First, CRs mediate different histone modifications by binding regulatory elements. Second, CRs cooperate with DNA-modifying enzymes to affect the DNA methylation regulation of target genes. At the same time, the synergistic regulation of a particular CR-CR pair has multiple signal patterns in their binding sites. These patterns of particular CR-CR pair can affect histone modifications, binding motif, regulation of downstream target genes and biological functions.
Funding
This work was supported by Major Science and Technology Program of Hainan Province (ZDKJ2021040 and ZDKJ202003), the National Natural Science Foundation of China (32160152, 31900493), Hainan Province Science and Technology special fund (ZDYF2021SHFZ097, ZDYF2021SHFZ247 and ZDYF2020132), Hainan Provincial Natural Science Foundation of China (820RC637 and 822QN462), China Postdoctoral Science Foundation (2019M661296 and 2020T130163), Open Foundation of Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University (2021TTM006), Innovation Research Fund for Graduate Students (Hyb2020-56, Hys2020-378, HYYS2021A33, Qhys2021-354, 202111810014, X202111810069 and X202111810108), and Hainan Province Clinical Medical Center (QWYH202175).
Author contributions
KL, JL and BW came up with the design and conception. Development of methodology was performed by MC, LW, DX and XB. Acquisition, analysis and interpretation of data were performed by MC, LW, DX, XB, SG, ZX, LC, DZ, PL, JX, SZ and HW. Writing, review, and/or revision of the manuscript were performed by MC, LW, DX, XB, KL, JL and BW. MC, LW, DX and XB contributed equally to this study and shared co-first authors. All authors read and approved the final manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Jennifer K Lue; Sathyen A Prabhu; Yuxuan Liu; Yulissa Gonzalez; Akanksha Verma; Prabhjot S Mundi; Nebiyu Abshiru; Jeannie M Camarillo; Swasti Mehta; Emily I Chen; Changhong Qiao; Renu Nandakumar; Serge Cremers; Neil L Kelleher; Olivier Elemento; Jennifer E Amengual Journal: Clin Cancer Res Date: 2019-04-12 Impact factor: 13.801
Authors: Fidel Ramírez; Devon P Ryan; Björn Grüning; Vivek Bhardwaj; Fabian Kilpert; Andreas S Richter; Steffen Heyne; Friederike Dündar; Thomas Manke Journal: Nucleic Acids Res Date: 2016-04-13 Impact factor: 16.971
Authors: Oliver A Kent; Manipa Saha; Etienne Coyaud; Helen E Burston; Napoleon Law; Keith Dadson; Sujun Chen; Estelle M Laurent; Jonathan St-Germain; Ren X Sun; Yoshinori Matsumoto; Justin Cowen; Aaryn Montgomery-Song; Kevin R Brown; Charles Ishak; Jose La Rose; Daniel D De Carvalho; Housheng Hansen He; Brian Raught; Filio Billia; Peter Kannu; Robert Rottapel Journal: Nat Commun Date: 2020-09-16 Impact factor: 14.919