Literature DB >> 29987030

Positional specificity of different transcription factor classes within enhancers.

Sharon R Grossman1,2,3, Jesse Engreitz1, John P Ray1, Tung H Nguyen1, Nir Hacohen1,4, Eric S Lander5,2,6.   

Abstract

Gene expression is controlled by sequence-specific transcription factors (TFs), which bind to regulatory sequences in DNA. TF binding occurs in nucleosome-depleted regions of DNA (NDRs), which generally encompass regions with lengths similar to those protected by nucleosomes. However, less is known about where within these regions specific TFs tend to be found. Here, we characterize the positional bias of inferred binding sites for 103 TFs within ∼500,000 NDRs across 47 cell types. We find that distinct classes of TFs display different binding preferences: Some tend to have binding sites toward the edges, some toward the center, and some at other positions within the NDR. These patterns are highly consistent across cell types, suggesting that they may reflect TF-specific intrinsic structural or functional characteristics. In particular, TF classes with binding sites at NDR edges are enriched for those known to interact with histones and chromatin remodelers, whereas TFs with central enrichment interact with other TFs and cofactors such as p300. Our results suggest distinct regiospecific binding patterns and functions of TF classes within enhancers.
Copyright © 2018 the Author(s). Published by PNAS.

Entities:  

Keywords:  chromatin structure; gene regulation; genomics; transcription factor binding

Mesh:

Substances:

Year:  2018        PMID: 29987030      PMCID: PMC6065035          DOI: 10.1073/pnas.1804663115

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


To investigate the characteristic positions of transcription factor (TF)-binding sites in distal regulatory elements (enhancers), we identified active regulatory elements across numerous cell types and characterized predicted functional TF-binding sites within these elements. We defined putative active regulatory elements by first identifying nucleosome-depleted regions of DNA (NDRs) in 47 cell types based on DNaseI-hypersensitive (DHS) sites defined by the Roadmap Epigenomics project (1) and Assay for Transposase-Accessible Chromatin-sequencing (ATAC-seq) experiments performed in each cell type (2–4). We then further selected those NDRs marked by the active chromatin modification H3K27ac using ChIP-sequencing (ChIP-seq) data from the Roadmap Epigenomics project and other studies; two example regions from K562 cells are shown in . We and others have previously shown by massively parallel reporter assays (MPRA) that genomic sites satisfying these criteria are highly enriched for enhancer activity compared with other genomic sites and random sequences (5–8). Overall, we identified ∼40,000–160,000 putative active regulatory elements per cell type, together representing a total of ∼500,000 distinct (nonoverlapping) elements (). The edges of flanking nucleosomes appear to occur ∼120 ± 50 bp from the peak of the DHS/ATAC-seq signal, as assayed by micrococcal nuclease-digestion assays (MNase-seq) (Fig. 1 ). The regions are enriched for transcriptional initiation, consistent with previous reports (9); the peak of transcription initiation is ∼55 bp away from the peak of the DHS/ATAC-seq signal and ∼45 bp before the position of the flanking nucleosome (Fig. 1 ). As expected, cell types with similar anatomical and developmental origins tended to have correlated regulatory elements (). Because developmental enhancers and housekeeping enhancers are typically regulated by distinct sets of TFs (10, 11), in our analysis we distinguished between cell type-restricted enhancers (active in <50% of the cell types) and ubiquitous enhancers (active in >90% of the cell types) ().
Fig. 1.

Chromatin structure around putative regulatory NDRs. The nucleosome-depleted region at putative regulatory elements tends to span ∼200 bp centered around the peak of the DHS signal and is generally flanked by well-positioned nucleosomes centered around +200 bp and −200 bp. (A and B) Composite plot (Upper) and heatmap (Lower) of the DHS signal (A) and MNase-seq reads (B) in a 1-kb region aligned around the peak of the DHS signal. Five thousand regions from K562 cells sorted by maximum DHS score are shown in heatmaps. (C) Composite profile of CAGE reads, indicating transcriptional initiation on the plus strand (red) and minus strand (blue) from 14 cell types aligned around the peak of the DHS signal in NDRs. The initiation of gene and enhancer RNA transcription peaks ∼55 bp away from the peak of the DHS signal and is oriented outwards from the accessible region. (D) Overlay of DHS (solid black line), MNase-seq (dashed line), and CAGE (red and blue lines) signals in 400-bp region centered around the peak of the DHS signal.

Chromatin structure around putative regulatory NDRs. The nucleosome-depleted region at putative regulatory elements tends to span ∼200 bp centered around the peak of the DHS signal and is generally flanked by well-positioned nucleosomes centered around +200 bp and −200 bp. (A and B) Composite plot (Upper) and heatmap (Lower) of the DHS signal (A) and MNase-seq reads (B) in a 1-kb region aligned around the peak of the DHS signal. Five thousand regions from K562 cells sorted by maximum DHS score are shown in heatmaps. (C) Composite profile of CAGE reads, indicating transcriptional initiation on the plus strand (red) and minus strand (blue) from 14 cell types aligned around the peak of the DHS signal in NDRs. The initiation of gene and enhancer RNA transcription peaks ∼55 bp away from the peak of the DHS signal and is oriented outwards from the accessible region. (D) Overlay of DHS (solid black line), MNase-seq (dashed line), and CAGE (red and blue lines) signals in 400-bp region centered around the peak of the DHS signal. We next sought to infer functional TF-binding sites within the active regulatory elements. In a recent study (5), we found that TF binding is strongly correlated with the quantitative DNA accessibility of a region. Furthermore, the TF motifs associated with enhancer activity in reporter assays in a cell type corresponded closely to those that are most enriched in the genomic sequences of active regulatory elements in that cell type (5). In these assays, disrupting occurrences of the 20–30 most enriched motifs in such genomic regulatory sequences frequently caused significant changes in enhancer activity, indicating that many represent functional TF-binding sites. Together, these results suggest that occurrences of highly enriched motifs in highly accessible regions very likely represent functional TF-binding sites for a cell type. We used this approach to define a set of candidate functional TF-binding sites. For each of the 47 cell types, we selected the 7,500 cell-type restricted NDRs (active in <50% of cell types) with the strongest DHS/ATAC-seq signals, with an average of 6% being promoter-proximal regions [<1 kb from an annotated transcription start site (TSS)] and 94% being distal enhancers. Within these regions, we identified all occurrences of 1796 known motifs (corresponding to 777 TFs) and focused on the 20 most enriched motifs in the cell type (after removing highly similar motifs) ( and ). Overall, these enriched motifs corresponded to 103 different TFs across the 47 cell types. As expected, the motif-enrichment profiles were correlated among related cell types (). We then studied the positions of inferred binding sites for each of the 103 TFs relative to the peak of the DHS/ATAC-seq signal in the active regulatory elements (). Different TFs show strikingly different positional binding-site patterns (Fig. 2 and ). Some are strongly concentrated at the peak of the DNase/ATAC-seq signal (e.g., CTCF); some are enriched over a more widely distributed central region (e.g., ELF1); some are clustered near the edges of the region (e.g., FOXP1 and ARID3A); and some tend to bind at a specific distance from the center of the NDRs (e.g., EPAS1 and RREB1).
Fig. 2.

Positional binding patterns of TF motifs show striking differences. Distribution of the position of motif sites for CTCF (A), ELF1 (B), FOXP1 (C), ARID3A (D), EPAS1 (E), and RREB1 (F) across NDR regions, centered around the peak of the DHS signal. (Upper) Histograms show the density of motif sites in 10-bp bins tiled across the NDR. (Lower) Heatmaps show the position of 10,000 motif sites in NDRs. Colors indicate motif sites in different cell types (see for the color key).

Positional binding patterns of TF motifs show striking differences. Distribution of the position of motif sites for CTCF (A), ELF1 (B), FOXP1 (C), ARID3A (D), EPAS1 (E), and RREB1 (F) across NDR regions, centered around the peak of the DHS signal. (Upper) Histograms show the density of motif sites in 10-bp bins tiled across the NDR. (Lower) Heatmaps show the position of 10,000 motif sites in NDRs. Colors indicate motif sites in different cell types (see for the color key). To classify these patterns, we calculated the density profiles in ±200-bp regions around the peak and clustered them using k-medoid clustering (). The analysis identified six clusters of distinct position patterns (Fig. 3). The clusters are clearly significant: The mean Kullback–Leibler divergence between density profiles within the same cluster is one to two orders of magnitude smaller than the mean divergence between density profiles in different clusters (Fig. 3), and the density profiles cannot be explained by local sequence composition (). Three of these clusters represent motifs that occur most frequently near the center of NDRs, while the other three clusters tend to occur nearer to the edges (Fig. 3).
Fig. 3.

TF motif position patterns fall into six distinct clusters. (A) Motif-density profiles in 400-bp regions centered around the peak of the DHS signal (gray lines) were clustered using k-medoids clustering with k = 6. Density profiles were generated by calculating the frequency of motif occurrences in 20-bp bins tiled every 1 bp in the region. Blue lines depict the smoothed overall density profile of the cluster using the LOESS method. MNase-seq read density (indicating the position of the flanking nucleosomes) is shown by dashed dotted lines for context. (B) Average Kullback–Leibler divergence between the motif-density profiles of pairs of motifs in the same cluster (diagonal boxes) and different clusters (off-diagonal boxes). Motif-density profiles within the same cluster are substantially more similar than those in different clusters. (C) Schematic of NDR structure and motif positions. The arrows indicate the peak of transcriptional initiation estimated from CAGE data. The colored bars represent regions for each cluster with motif densities above the mean. Tick marks occur at 20-bp intervals.

TF motif position patterns fall into six distinct clusters. (A) Motif-density profiles in 400-bp regions centered around the peak of the DHS signal (gray lines) were clustered using k-medoids clustering with k = 6. Density profiles were generated by calculating the frequency of motif occurrences in 20-bp bins tiled every 1 bp in the region. Blue lines depict the smoothed overall density profile of the cluster using the LOESS method. MNase-seq read density (indicating the position of the flanking nucleosomes) is shown by dashed dotted lines for context. (B) Average Kullback–Leibler divergence between the motif-density profiles of pairs of motifs in the same cluster (diagonal boxes) and different clusters (off-diagonal boxes). Motif-density profiles within the same cluster are substantially more similar than those in different clusters. (C) Schematic of NDR structure and motif positions. The arrows indicate the peak of transcriptional initiation estimated from CAGE data. The colored bars represent regions for each cluster with motif densities above the mean. Tick marks occur at 20-bp intervals. Cluster 1 contains 10 TFs with inferred binding sites that are strongly biased toward the peak of highest DNA accessibility at the middle of the NDR, suggesting that their binding directly shapes local chromatin architecture. For six of these TFs (CTCF, NF-I, C/EBPβ, KLF7, GRHL1, and TFAP2) there is clear functional evidence to support this notion: (i) CTCF induces stably positioned arrays of nucleosomes around its genomic binding sites (12); (ii) NF-I, C/EBPβ, KLF7, and GRHL1 can function as pioneer factors that can establish and maintain chromatin accessibility (13–18); (iii) a recent systematic analysis of the TF-dependent changes in chromatin accessibility induced by the binding of 733 TFs identified CTCF, KLF7, and TFAP2 as having some of the strongest effects on local chromatin accessibility during ES cell differentiation (19); (iv) CTCF, NF-I, C/EBPβ, and GRHL1 show unusually stable binding to DNA and long residence times (14, 20–22); and (v) motifs in cluster 1 have especially strong DNaseI footprinting signals (), a feature associated with a slow DNA-binding off-rate (17, 23). The properties of the six TFs may enable them to serve as central anchor points for displacing the central nucleosome, adapting the surrounding chromatin, and stabilizing the NDR and flanking nucleosomes (14). The remaining three TFs in cluster 1 are nuclear receptors (ESRRB, HNF4A, and PPAR). Unlike the other TFs in the cluster, nuclear receptors are characterized by transient binding to DNA with short residence times (24, 25) and localize almost exclusively to preaccessible chromatin (16, 25–27). Nuclear receptor binding to genomic motif sites is often aided by assisted loading by a partner factor, which binds to a site overlapping or adjacent to the nuclear receptor motif site and opens the chromatin (28). Notably, two of the pioneer TFs in cluster 1 (C/EBPβ and NF-I) have been shown to catalyze the assisted loading of several nuclear receptors (13, 16, 29–31). The central location of the nuclear receptor motifs may be related to the assisted loading by pioneer TFs in cluster 1. Cluster 2 contains 31 TFs whose binding sites also are peaked at the center of the NDR but with a wider distribution than for cluster 1. The cluster is strongly enriched for transcriptional activators [Gene Ontology (GO) category enrichment, PBenjamini = 3.2 × 10−16], such as the activator protein 1 (AP-1) subunits (JUN, FOS, ATF, and MAF factors) and activating factors from the TCF, TEA, RUNX, IRF, and KLF families. Based on known interactions reported in the bioGRID and IntAct databases (32, 33), these TFs are enriched for interactions with numerous transcriptional coactivators, including p300, CREB-binding protein (CBP), YAP1, KDM1A, KAT2B, and WWTR1 (Fig. 4 and ). Furthermore, the TFs in this cluster interact frequently with each other [average of 1.8 pairwise interactions among the 32 TFs vs. 0.7–1.4 (mean = 1.0) interactions among the TFs in other clusters], suggesting they could cooperatively activate transcription. For example, studies of the IFNβ enhancer have shown that two TFs from this cluster (ATF2 and Jun) bind overlapping motif sites to form a scaffold that recruits CBP/p300 through multidentate interactions (34), leading to synergistic transcriptional activation in response to viral infection (35, 36). Interestingly, TFs in cluster 2 are twice as likely to participate in signaling pathways as the TFs in other clusters [52% of TFs in cluster 2 vs. 16–30% of TFs in other clusters, based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (37)] (Fig. 4 and ), and AP-1 factors are required to maintain accessible chromatin to facilitate the binding of stimulus-regulated TFs (38). Therefore the tightly clustered pattern of these motifs in this cluster may promote cooperativity by both facilitating TF–TF interactions and positioning TFs to form complexes that contact multiple sites on cofactors, thereby allowing enhancers to link multiple signaling pathways and respond in a highly synergistic fashion to specific regulatory cues.
Fig. 4.

TF clusters are enriched for distinct functional and structural properties. Selected enrichments for general annotation (Entrez Gene), GO categories, protein–protein interactions, and protein structural domains in the TF clusters. All terms included in the heatmap are significantly enriched (PBenjamini < 0.05) in at least one cluster. See for all significant enrichments.

TF clusters are enriched for distinct functional and structural properties. Selected enrichments for general annotation (Entrez Gene), GO categories, protein–protein interactions, and protein structural domains in the TF clusters. All terms included in the heatmap are significantly enriched (PBenjamini < 0.05) in at least one cluster. See for all significant enrichments. Cluster 3, which contains 25 TFs, also peaks at the center of the NDR with a broader distribution than cluster 2. These TFs are generally characterized by greater cell-type specificity in expression across the 57 cell types profiled in the Epigenomics Roadmap project and greater motif enrichment than the TFs in other clusters (). Consistent with this observation, cluster 3 contains numerous TFs that play critical roles in development, including all the homeobox, POU, SOX, ETS, and GATA factors in our dataset (39–44) (). Furthermore, 20 of the 23 TFs have functional annotations in GO related to differentiation and development in a wide range of tissues (Fig. 4 and ), including erythrocytes (GATA1, GATA3, and ETS1), myeloid and lymphoid cells (SPI1), osteoblasts (TP63 and ID4), keratinocytes (TP63 and POU3F1), blastocysts (SPIC, POU5F1, and ELF3), neurons (ASCL2, FEV, and ZEB1), and more. Although clusters 2 and 3 may represent a continuum of broad-occupancy profiles, the TFs in cluster 3 have fewer annotated interactions with cofactors and other TFs than the TFs in cluster 2 (average 5.2 vs. 10.2 interactions per TF). One possible explanation is that the TFs in cluster 3 participate in fewer physically mediated cooperative interactions and therefore are less tightly clustered in the NDR. These TFs may function more independently or through indirect cooperation with other factors. Cluster 4, which contains 16 TFs, is unusual in several respects. The motif profiles show both a central peak and flanking peaks at ∼70 bp upstream and downstream. Moreover, many of the motifs in this cluster are asymmetric. When the NDRs are oriented so that the motif occurrences for each TF all appear on the same strand (), the motif occurrences in the flanking peaks show a clear preferred orientation relative to the center of the NDRs—that is, one of the two reverse-complementary sequences defining the motifs preferentially points inward (). This bias indicates that one side of the TFs is generally positioned facing the edges of the NDR, while the other side faces the NDR core. The motifs in cluster 4 are also strongly enriched in promoter-proximal regions (6% of such NDRs contain occurrences for motifs in cluster 4 vs. 1–3% for other clusters) (Fig. 5). ENCODE ChIP-seq data for 39 TFs in our dataset show greater enrichment in promoter regions for TFs in cluster 4 than for TFs in other clusters (38% of reported peaks within <1 kb of a TSS vs. 8–28% for other clusters) (Fig. 5). One of the TFs in this cluster, SP1, is a well-characterized promoter-proximal factor that binds GC-rich elements in a wide variety of cellular and viral promoters. Many of the other TFs in the cluster (including SP3, EGR1, EPAS1, ZBTB7B, E2F, KLF15, MEF2C, WT1, and PURA) also bind GC-rich motifs and are known to interact with SP1 at promoters (45–55). We compared the motif-density profiles in NDRs classified as promoter-proximal versus distal enhancers but found them to be indistinguishable ().
Fig. 5.

TFs in cluster 4 are enriched in promoters and are associated with transcriptional initiation. (A) The fraction of motif sites in NDRs in our analysis that occur in promoters (<1 kb upstream of the annotated TSS) for TFs in each cluster. (B) The fraction of ChIP-seq peaks for TFs in each cluster that overlap promoter (data for 39 TFs profiled in ENCODE are included). Cluster 4 motifs and TF binding occur in promoters far more frequently than do motifs in other clusters. (C) Composite of CAGE reads on the plus strand (red) and minus strand (blue) aligned to the center of each TF motif. Thin red and blue lines correspond to CAGE profiles of individual TF motifs, and thick red and blue lines show the average CAGE profile of all motifs in the cluster. Motifs in clusters 3 and 4 show a peak of transcriptional initiation at the location of the motif site. (D and E) Empirical cumulative distribution function (ECDF) of the number of cluster 3 (D) and cluster 6 (E) motif sites in NDRs, conditional on the number of cluster 4 motifs. NDRs with cluster 4 motif sites are coenriched with cluster 3 motif sites and depleted of cluster 6 motif sites.

TFs in cluster 4 are enriched in promoters and are associated with transcriptional initiation. (A) The fraction of motif sites in NDRs in our analysis that occur in promoters (<1 kb upstream of the annotated TSS) for TFs in each cluster. (B) The fraction of ChIP-seq peaks for TFs in each cluster that overlap promoter (data for 39 TFs profiled in ENCODE are included). Cluster 4 motifs and TF binding occur in promoters far more frequently than do motifs in other clusters. (C) Composite of CAGE reads on the plus strand (red) and minus strand (blue) aligned to the center of each TF motif. Thin red and blue lines correspond to CAGE profiles of individual TF motifs, and thick red and blue lines show the average CAGE profile of all motifs in the cluster. Motifs in clusters 3 and 4 show a peak of transcriptional initiation at the location of the motif site. (D and E) Empirical cumulative distribution function (ECDF) of the number of cluster 3 (D) and cluster 6 (E) motif sites in NDRs, conditional on the number of cluster 4 motifs. NDRs with cluster 4 motif sites are coenriched with cluster 3 motif sites and depleted of cluster 6 motif sites. TFs in this cluster are also enriched for interactions with p300 [false-discovery rate (FDR) = 2.6 × 10−6] and Dnmt1, a DNA methyltransferase that plays a key role in maintaining CpG island methylation (56) (FDR = 0.03). Notably, functional studies have demonstrated that SP1 stimulates transcription when bound close to the initiation site but not in distal positions (57, 58), unlike distal enhancer-binding factors from clusters 1–3. These results suggest that SP1 and other TFs in cluster 4 may belong to a distinct functional class of TFs with specialized promoter-associated functions. Because a key function of promoters is transcript initiation, we hypothesized that the flanking peaks and orientation of TFs in cluster 4 might reflect a role in establishing or stabilizing TSSs at both promoters and enhancers. Recent studies have suggested that, in addition to such features as TATA boxes and INR elements, TF-binding sites also contribute to determining the position of the TSS (9, 59). To examine the relationship of TFs in each cluster with the TSS, we examined cap analysis of gene expression (CAGE) data for both enhancer and promoter-proximal NDRs for 14 of the cell lines in our dataset (60). Transcriptional initiation tends to peak at 50–60 bp from the center of the NDRs (as noted above) (Fig. 1) and ∼50 bp away for TF motif occurrences (Fig. 5). However, 64% of TFs in cluster 4 and 42% of TFs in cluster 3 (vs. 0–8% in other clusters) show an additional peak of transcriptional initiation ∼10 bp away from the location of motifs sites (EGR1, EGR4, MAZ, PURA, SP1, SP3, ZBTB7B, and ZNF281 from cluster 4 and ELF1, ELF2, ELF5, FLI1, SPI1, and SPIC from cluster 3) (Fig. 5). This observation suggests these TFs play unique roles in positioning the site of initiation. Cluster 5 contains six TFs whose binding sites are not enriched at the center of NDRs but have peaks at ∼60 bp upstream or downstream. The TFs in this cluster all belong to the FOX family of TFs and include the two best-characterized pioneer factors, FOXA and FOXO. The DNA-binding domain (DBD) of FOX factors structurally resembles the DBD of linker histones H1 and H5 (61, 62), and FOXA factors can compete for binding to linker histone-binding sites, which are located near the edges of the core nucleosome, ∼65 bp away from its center (61, 63–65). However, whereas linker histone binding leads to the compaction of nucleosomal arrays, FOXA binding destabilizes nucleosomes and opens the region for binding by other TFs (66–68). Since enhancer activation typically entails the elimination of a well-positioned central nucleosome (69), motif sites for FOXA and other FOX factors in cluster 5 may be positioned ±60 bp to displace linker histones and destabilize the central nucleosome, helping other TFs bind their target sites. Finally, cluster 6 contains 14 TFs with binding sites enriched near the edges of the accessible region (80–200 bp from the center), suggesting these TFs could interact with the surrounding chromatin. As with cluster 4, the TFs in cluster 6 have asymmetric motifs and mostly exhibit a preferred orientation relative to the center of the region (), allowing directional interactions with the surrounding nucleosomes and larger chromatin landscape. Consistent with this notion, 10 of the 14 TFs in cluster 6 are known to play roles in chromatin remodeling. These include BPTF, the DNA-binding subunits of nucleosome remodeling factor (NURF), which recognizes H3K4me3 and facilitates ATP-dependent nucleosome sliding (70–72), ARID3A, which facilitates the opening of the IgH enhancer (73–75), and several FOX factors, which interact with histones and mediate recruitment of chromatin remodeling complexes such as SWI/SNF (68, 76). Many of the motifs in this cluster are A/T-rich (). It is possible that they also recruit additional members of the ARID (A+T-rich interaction domain) family that binds nonspecifically to A/T sequences and has been implicated in chromatin remodeling, including ARID1A/BAF250, the DNA-binding subunit of the BAF chromatin remodeling complex (77). The TFs in cluster 6 also play roles in nuclear attachment, DNA bending, and DNA unwinding. These TFs are enriched for interactions with the chromatin organizers SATB1 and SATB2, which induce chromatin looping and tether DNA to the nuclear matrix (78, 79). For example, ARID3A binds to sites on the periphery of the IgH enhancer to mediate the attachment of the nuclear matrix (80). Several of the TFs (ARID3A, SRY, and YY1) induce significant DNA bending (74, 81, 82), facilitating TF binding and TF–TF interactions (83, 84). Finally, some (SRY and FUBP1) unwind the DNA double helix, which can promote transcriptional initiation and attachment to the nuclear matrix (82, 85, 86). To test directly whether TFs in cluster 6 interact with the surrounding nucleosomes, we used MNase-seq data from two cell types (GM12878 and K562) to infer the position of the flanking nucleosomes for each individual NDR and then aligned the TF motifs. For the TFs present in these cell types, we examined the motif distribution relative to the inferred edge of the flanking nucleosome (rather than to the peak of the DHS signal). The TFs in clusters 1–5 did not show peaks of motif sites adjacent the nucleosome edge, but five of the eight TFs in cluster 6 (FOXC1, FOXJ3, FOXO1, FOXP1, and ARID3A) showed a peak (). The remaining three TFs (FUBP1, IRF1, and IRF5) are not known to play roles in chromatin remodeling. Finally, we wondered whether certain classes of TFs tend to co-occur in enhancers. To investigate this, we examined whether the distribution of motif sites from each class in the NDRs varied with the presence or absence of motif sites from each of the other classes (Fig. 5 and ). We counted the number of nonoverlapping motif sites from each cluster in the NDRs and calculated the odds ratio (OR) for coenrichment between the motif from each pair of clusters. To control for motif similarities, we also calculated the baseline OR for each pair of clusters in shuffled sequences. Significantly coenriched or codepleted cluster pairs were defined as pairs for which the OR falls outside the 95% CI of the OR in shuffled sequences (). We found that all six clusters showed significant preferences for coenrichment and codepletion with specific other clusters (). For example, regulatory elements with TF motif sites in cluster 4 (associated with TSS-related functions) contain significantly more TF motif sites from cluster 3 (associated with cell type-specific activation) (Fig. 5) and significantly fewer motif sites from clusters 5 and 6 (associated with nucleosome remodeling and chromatin architecture) (Fig. 5) than regulatory elements without cluster 4 motifs. Importantly, these cluster associations are consistent across cell types, even though the specific set of TFs active in each cell type differs (). Thus, the TF clusters may constitute a general regulatory code, with different cell types substituting specific TFs to activate different sets of enhancers. It has long been suggested that TFs may belong to different functional classes. In some cases, prior biological knowledge of certain TFs has been used to categorize TFs into classes, such as pioneer factors that have the capability to bind motif sites in closed chromatin versus nonpioneer factors that bind motif sites only in open chromatin and cell type-specific versus ubiquitous factors. However, there have been few systematic approaches to recognize distinct classes and properties independent of the known biological properties of the individual TFs. One such functional study was recently performed in Drosophila, in which investigators asked which TFs could substitute for each other across a variety of regulatory contexts (10). Here, we show that, solely by looking at the positional distribution of motif sites within NDRs, we are able to recognize six distinct classes of TFs. These classes bring together factors that have a number of similar properties, such as binding stability, interactions with other TFs and cofactors, cell-type specificity, and pioneering ability. Furthermore, the position of motif sites appears to be related to their known functions—for example, localizing pioneer factors to the optimal positions to displace nucleosomes and targeting chromatin remodelers in close proximity to flanking nucleosome. The degree to which the arrangement of motif sites within regulatory elements determines their function remains an open question. At one end of the spectrum, there are examples of enhancesomes, such as the IFNβ enhancer, that are exquisitely sensitive to the spacing and orientation of the motif sites (34, 87, 88). However, the activity of other regulatory elements, referred to as “billboard” enhancers, appears to be relatively insensitive to the arrangement of motif sites (89–91). Instead, our work suggests a different kind of constraint, whereby TFs play distinct roles in forming a functional enhancer, facilitated by their position within a regulatory sequence. The classes identified here also help shed light on the properties of some less characterized TFs. For example, they suggest that several other FOX factors in cluster 5 may use a mechanism similar to that of FOXA1 to displace nucleosomes and that the uncharacterized zinc finger TFs in cluster 6 (ZNF148, ZNF202, and ZNF35) may have pioneering abilities. In addition, the positional preferences identified may prove useful in building predictors of enhancer activity and recognizing functional enhancers in genomic sequence. While here we focused on the classes of TFs, these results naturally raise the question of whether different functional classes of enhancers are formed based on these classes of TFs. Identifying such enhancer classes may shed light on the classes of TFs that must come together to accomplish all the functions necessary to build a functional enhancer. Finally, in addition to helping us understand natural enhancers, better knowledge about the constraints and the functional implications of TF positions may aid in creating synthetic enhancers with specific properties that can be used in synthetic biology.

Materials and Methods

ATAC-Seq for Jurkat and U937 Cell Lines.

Cells were washed with ice-cold FACS buffer and were kept on ice until cell sorting. Twenty-five thousand live cells from each condition were sorted into FACS buffer and were pelleted by centrifugation at 500 × g for 5 min at 4 °C in a precooled fixed-angle centrifuge. Cell lines then were tagmented according to the previously described Fast-ATAC protocol (92). Briefly, all supernatant was removed with care taken not to disturb the not-visible cell pellet. Transposase mixture (50 μL: 25 μL of 2× TD, 2.5 μL of TDE1, 0.5 μL of 1% digitonin, 22 μL of nuclease-free water) (catalog no. FC-121-1030, Illumina; catalog no. G9441, Promega) was added to the cells, and the pellet was dissociated by pipetting. Transposition reactions were incubated at 37 °C for 30 min in an Eppendorf ThermoMixer with agitation at 300 rpm. Transposed DNA was purified using a Qiagen MinElute Reaction Cleanup kit (catalog no. 28204), and purified DNA was eluted in 12 μL of elution buffer (10 mM Tris⋅HCl, pH 8). Transposed fragments were amplified and purified as described previously (93) with modified primers (94). Libraries were quantified using qPCR before sequencing. All Fast-ATAC libraries were sequenced using paired-end, dual-index sequencing on a NextSeq sequencer (Illumina) with 76 × 8 × 8 × 76 cycle reads at an average read depth of 30 million reads per sample.

Definition of NDRs.

To define NDRs for our analysis, we used DNaseI-seq and H3K27ac ChIP-seq data for 45 cell types in the Epigenomics Roadmap and ENCODE Projects (1, 60). We supplemented this dataset with ATAC-seq data for Jurkat and U937 cells generated in the N.H. laboratory and H3K27ac ChIP-seq data for Jurkat and U937 cells from studies deposited in the National Center for Biotechnology Information Gene Expression Omnibus database (accession no. SRR1057274) (95) and the European Nucleotide Archive database (accession no. ERR671846), respectively. We aligned the ATAC-seq and H3K27ac data for Jurkat and U937 cells as described in ref. 96 and called peaks using MACS2 (97) with the standard parameters used by the Epigenomics Roadmap Project. To select our initial set of NDRs, we intersected DHS/ATAC-seq narrowPeaks regions and H3K27ac gappedPeaks regions. We then filtered out NDRs that were present in more than 24 (50%) of the cell types in our analysis and selected the top 7,500 cell type-restricted NDRs for motif enrichment and positioning analysis. We defined the coordinates in the NDRs relative to the summit called by MACS2 (i.e., the position with the maximum DHS/ATAC-seq signal). For MNase-seq analysis, we used data from GM12878 and K562 cells generated by the ENCODE project. The center of the nucleosomes flanking the NDRs was estimated by identifying the position with the highest MNase-seq read coverage in the 300 bp upstream and downstream of the peak of the DHS signal.

Motif Enrichment Analysis.

We calculated motif counts for all vertebrate motifs in TRANSFAC (98), JASPAR (99), and CIS-BP (100) in the genomic NDR sequences as well as scrambled genomic NDR sequences (holding dinucleotide frequencies constant). To identify enriched motifs in each cell type, we used AME (101) with the mhg method to calculate the enrichment of the total number of matches of each motif in the genomic sequences compared with the scrambled sequences. When the combined databases contained multiple position weight matrices (PWMs) corresponding to a single TF, we selected the most enriched motif in each cell type corresponding to each TF. To remove highly similar motifs, we calculated the pairwise similarity of the motifs using the R package PWMEnrich and removed motifs that had a similarity of >0.8 with a more highly enriched motif. We then selected the top 20 motifs from the filtered list in each cell type for positioning analysis. We called motif sites in the genomic and scrambled sequences by running FIMO (102) with a P value threshold of 10−4.

Motif-Position Profiles and Clustering.

To analyze the positioning of the motifs with NDRs, we collapsed the motif matches to their central position and calculated the density of each motif in 20-bp windows tiled every 1 bp across the 400 bp centered around the position of maximum DHS/ATAC signal in each NDR. The motif-position profiles were then clustered using the pam function from the R package cluster with k = 6. To assess how much each motif-position profile is due to the variation in dinucleotide content across the regions, we calculated the background motif-density profiles in shuffled sequences, holding the dinucleotide content at each position constant, and normalized the genomic-density profiles by subtracting out the background motif frequencies ().

TF Cluster Feature Enrichment Analysis.

Enrichment analysis was performed using DAVID (103) for each of the six clusters for four types of features: protein domains (PFAM, PIR, and SMART), functional annotations (GO and Entrez Gene), protein–protein interactions (BioGRID interaction and intact databases), and pathways (KEGG and BioCarta). P values were calculated using the Benjamini correction for multiple testing.

TF Coenrichment Analysis.

We tested for coenrichment and codepletion of motifs from the six TF motif clusters in genomic NDR sequences using a Fisher exact test. For each pair of cluster A and cluster B, we calculated the OR that a genomic sequence contains a motif from cluster B, conditional on the presence of a motif from cluster A. To control for motif similarities between motifs in different clusters, we also calculated the same OR in scrambled sequences (holding dinucleotide content constant). To identify significantly coenriched or codepleted pairs, we selected pairs for which the 95% CI of the genomic OR did not overlap the 95% CI of the shuffled OR.
  103 in total

Review 1.  ARID proteins: a diverse family of DNA binding proteins implicated in the control of cell growth, differentiation, and development.

Authors:  Deborah Wilsker; Antonia Patsialou; Peter B Dallas; Elizabeth Moran
Journal:  Cell Growth Differ       Date:  2002-03

2.  Transcription factor FoxA (HNF3) on a nucleosome at an enhancer complex in liver chromatin.

Authors:  D Chaya; T Hayamizu; M Bustin; K S Zaret
Journal:  J Biol Chem       Date:  2001-09-24       Impact factor: 5.157

Review 3.  POU-domain transcription factors: pou-er-ful developmental regulators.

Authors:  M G Rosenfeld
Journal:  Genes Dev       Date:  1991-06       Impact factor: 11.361

4.  The HMG domain of lymphoid enhancer factor 1 bends DNA and facilitates assembly of functional nucleoprotein structures.

Authors:  K Giese; J Cox; R Grosschedl
Journal:  Cell       Date:  1992-04-03       Impact factor: 41.582

Review 5.  Nuclease hypersensitive sites in chromatin.

Authors:  D S Gross; W T Garrard
Journal:  Annu Rev Biochem       Date:  1988       Impact factor: 23.643

Review 6.  Hox genes and regional patterning of the vertebrate body plan.

Authors:  Moises Mallo; Deneen M Wellik; Jacqueline Deschamps
Journal:  Dev Biol       Date:  2010-05-07       Impact factor: 3.582

7.  Tissue-specific nuclear architecture and gene expression regulated by SATB1.

Authors:  Shutao Cai; Hye-Jung Han; Terumi Kohwi-Shigematsu
Journal:  Nat Genet       Date:  2003-05       Impact factor: 38.330

Review 8.  Chromatin structure and gene expression.

Authors:  G Felsenfeld; J Boyes; J Chung; D Clark; V Studitsky
Journal:  Proc Natl Acad Sci U S A       Date:  1996-09-03       Impact factor: 11.205

9.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

10.  Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.

Authors:  M Ryan Corces; Jason D Buenrostro; Beijing Wu; Peyton G Greenside; Steven M Chan; Julie L Koenig; Michael P Snyder; Jonathan K Pritchard; Anshul Kundaje; William J Greenleaf; Ravindra Majeti; Howard Y Chang
Journal:  Nat Genet       Date:  2016-08-15       Impact factor: 38.330

View more
  30 in total

1.  RNA polymerase efficiently transcribes through DNA-scaffolded, cooperative bacteriophage repressor complexes.

Authors:  Yue Lu; Zsuzsanna Voros; Gustavo Borjas; Cristin Hendrickson; Keith Shearwin; David Dunlap; Laura Finzi
Journal:  FEBS Lett       Date:  2022-07-22       Impact factor: 3.864

2.  Genetic dissection of the RNA polymerase II transcription cycle.

Authors:  Shao-Pei Chou; Adriana K Alexander; Edward J Rice; Lauren A Choate; Charles G Danko
Journal:  Elife       Date:  2022-07-01       Impact factor: 8.713

3.  Heterogeneity of enhancers embodies shared and representative functional groups underlying developmental and cell type-specific gene regulation.

Authors:  Wei Song; Ivan Ovcharenko
Journal:  Gene       Date:  2022-06-06       Impact factor: 3.913

4.  DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers.

Authors:  Bernardo P de Almeida; Franziska Reiter; Michaela Pagani; Alexander Stark
Journal:  Nat Genet       Date:  2022-05-12       Impact factor: 41.307

5.  Differential DNA methylation analysis across the promoter regions using methylated DNA immunoprecipitation sequencing profiling of porcine loin muscle.

Authors:  Kaj Chokeshaiusaha; Denis Puthier; Thanida Sananmuang; Em-On Olanratmanee; Catherine Nguyen; Roongtham Kedkovid
Journal:  Vet World       Date:  2020-06-16

6.  The Phosphorylated Estrogen Receptor α (ER) Cistrome Identifies a Subset of Active Enhancers Enriched for Direct ER-DNA Binding and the Transcription Factor GRHL2.

Authors:  Kyle T Helzer; Mary Szatkowski Ozers; Mark B Meyer; Nancy A Benkusky; Natalia Solodin; Rebecca M Reese; Christopher L Warren; J Wesley Pike; Elaine T Alarid
Journal:  Mol Cell Biol       Date:  2019-01-16       Impact factor: 4.272

7.  ARID3a gene profiles are strongly associated with human interferon alpha production.

Authors:  Michelle L Ratliff; Joshua Garton; Lori Garman; M David Barron; Constantin Georgescu; Kathryn A White; Eliza Chakravarty; Jonathan D Wren; Courtney G Montgomery; Judith A James; Carol F Webb
Journal:  J Autoimmun       Date:  2018-10-05       Impact factor: 7.094

Review 8.  Neurobiological functions of transcriptional enhancers.

Authors:  Alex S Nord; Anne E West
Journal:  Nat Neurosci       Date:  2019-11-18       Impact factor: 24.884

9.  The Non-continuum Nature of Eukaryotic Transcriptional Regulation.

Authors:  Gregory M K Poon
Journal:  Adv Exp Med Biol       Date:  2022       Impact factor: 2.622

10.  Genome-wide discovery of OsHOX24-binding sites and regulation of desiccation stress response in rice.

Authors:  Annapurna Bhattacharjee; Prabhakar Lal Srivastava; Onkar Nath; Mukesh Jain
Journal:  Plant Mol Biol       Date:  2020-10-06       Impact factor: 4.076

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.