| Literature DB >> 29987030 |
Sharon R Grossman1,2,3, Jesse Engreitz1, John P Ray1, Tung H Nguyen1, Nir Hacohen1,4, Eric S Lander5,2,6.
Abstract
Gene expression is controlled by sequence-specific transcription factors (TFs), which bind to regulatory sequences in DNA. TF binding occurs in nucleosome-depleted regions of DNA (NDRs), which generally encompass regions with lengths similar to those protected by nucleosomes. However, less is known about where within these regions specific TFs tend to be found. Here, we characterize the positional bias of inferred binding sites for 103 TFs within ∼500,000 NDRs across 47 cell types. We find that distinct classes of TFs display different binding preferences: Some tend to have binding sites toward the edges, some toward the center, and some at other positions within the NDR. These patterns are highly consistent across cell types, suggesting that they may reflect TF-specific intrinsic structural or functional characteristics. In particular, TF classes with binding sites at NDR edges are enriched for those known to interact with histones and chromatin remodelers, whereas TFs with central enrichment interact with other TFs and cofactors such as p300. Our results suggest distinct regiospecific binding patterns and functions of TF classes within enhancers.Entities:
Keywords: chromatin structure; gene regulation; genomics; transcription factor binding
Mesh:
Substances:
Year: 2018 PMID: 29987030 PMCID: PMC6065035 DOI: 10.1073/pnas.1804663115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Chromatin structure around putative regulatory NDRs. The nucleosome-depleted region at putative regulatory elements tends to span ∼200 bp centered around the peak of the DHS signal and is generally flanked by well-positioned nucleosomes centered around +200 bp and −200 bp. (A and B) Composite plot (Upper) and heatmap (Lower) of the DHS signal (A) and MNase-seq reads (B) in a 1-kb region aligned around the peak of the DHS signal. Five thousand regions from K562 cells sorted by maximum DHS score are shown in heatmaps. (C) Composite profile of CAGE reads, indicating transcriptional initiation on the plus strand (red) and minus strand (blue) from 14 cell types aligned around the peak of the DHS signal in NDRs. The initiation of gene and enhancer RNA transcription peaks ∼55 bp away from the peak of the DHS signal and is oriented outwards from the accessible region. (D) Overlay of DHS (solid black line), MNase-seq (dashed line), and CAGE (red and blue lines) signals in 400-bp region centered around the peak of the DHS signal.
Fig. 2.Positional binding patterns of TF motifs show striking differences. Distribution of the position of motif sites for CTCF (A), ELF1 (B), FOXP1 (C), ARID3A (D), EPAS1 (E), and RREB1 (F) across NDR regions, centered around the peak of the DHS signal. (Upper) Histograms show the density of motif sites in 10-bp bins tiled across the NDR. (Lower) Heatmaps show the position of 10,000 motif sites in NDRs. Colors indicate motif sites in different cell types (see for the color key).
Fig. 3.TF motif position patterns fall into six distinct clusters. (A) Motif-density profiles in 400-bp regions centered around the peak of the DHS signal (gray lines) were clustered using k-medoids clustering with k = 6. Density profiles were generated by calculating the frequency of motif occurrences in 20-bp bins tiled every 1 bp in the region. Blue lines depict the smoothed overall density profile of the cluster using the LOESS method. MNase-seq read density (indicating the position of the flanking nucleosomes) is shown by dashed dotted lines for context. (B) Average Kullback–Leibler divergence between the motif-density profiles of pairs of motifs in the same cluster (diagonal boxes) and different clusters (off-diagonal boxes). Motif-density profiles within the same cluster are substantially more similar than those in different clusters. (C) Schematic of NDR structure and motif positions. The arrows indicate the peak of transcriptional initiation estimated from CAGE data. The colored bars represent regions for each cluster with motif densities above the mean. Tick marks occur at 20-bp intervals.
Fig. 4.TF clusters are enriched for distinct functional and structural properties. Selected enrichments for general annotation (Entrez Gene), GO categories, protein–protein interactions, and protein structural domains in the TF clusters. All terms included in the heatmap are significantly enriched (PBenjamini < 0.05) in at least one cluster. See for all significant enrichments.
Fig. 5.TFs in cluster 4 are enriched in promoters and are associated with transcriptional initiation. (A) The fraction of motif sites in NDRs in our analysis that occur in promoters (<1 kb upstream of the annotated TSS) for TFs in each cluster. (B) The fraction of ChIP-seq peaks for TFs in each cluster that overlap promoter (data for 39 TFs profiled in ENCODE are included). Cluster 4 motifs and TF binding occur in promoters far more frequently than do motifs in other clusters. (C) Composite of CAGE reads on the plus strand (red) and minus strand (blue) aligned to the center of each TF motif. Thin red and blue lines correspond to CAGE profiles of individual TF motifs, and thick red and blue lines show the average CAGE profile of all motifs in the cluster. Motifs in clusters 3 and 4 show a peak of transcriptional initiation at the location of the motif site. (D and E) Empirical cumulative distribution function (ECDF) of the number of cluster 3 (D) and cluster 6 (E) motif sites in NDRs, conditional on the number of cluster 4 motifs. NDRs with cluster 4 motif sites are coenriched with cluster 3 motif sites and depleted of cluster 6 motif sites.