Literature DB >> 21441907

Mapping and analysis of chromatin state dynamics in nine human cell types.

Jason Ernst¹, Pouya Kheradpour, Tarjei S Mikkelsen, Noam Shoresh, Lucas D Ward, Charles B Epstein, Xiaolan Zhang, Li Wang, Robbyn Issner, Michael Coyne, Manching Ku, Timothy Durham, Manolis Kellis, Bradley E Bernstein.

Abstract

Chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity. The approach is especially well suited to the characterization of non-coding portions of the genome, which critically contribute to cellular phenotypes yet remain largely uncharted. Here we map nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions. Focusing on cell-type-specific patterns of promoters and enhancers, we define multicell activity profiles for chromatin state, gene expression, regulatory motif enrichment and regulator expression. We use correlations between these profiles to link enhancers to putative target genes, and predict the cell-type-specific activators and repressors that modulate them. The resulting annotations and regulatory predictions have implications for the interpretation of genome-wide association studies. Top-scoring disease single nucleotide polymorphisms are frequently positioned within enhancer elements specifically active in relevant cell types, and in some cases affect a motif instance for a predicted regulator, thus suggesting a mechanism for the association. Our study presents a general framework for deciphering cis-regulatory connections and their roles in disease. ©2011 Macmillan Publishers Limited. All rights reserved

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2011 PMID： 21441907 PMCID： PMC3088773 DOI： 10.1038/nature09906

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Introduction

A major challenge in biology is to understand how a single genome can give rise to an organism comprising hundreds of distinct cell types. Much emphasis has been placed on the application of high-throughput tools to study interacting cellular components1. The field of systems biology has exploited dynamic gene expression patterns to reveal functional modules, pathways and networks2. Yet cis-regulatory elements, which may be equally dynamic, remain largely uncharted across cellular conditions. Chromatin profiling provides a systematic means for detecting cis-regulatory elements, given the central role of chromatin in mediating regulatory signals and controlling DNA access, and the paucity of recognizable sequence signals. Specific histone modifications correlate with regulator binding, transcriptional initiation and elongation, enhancer activity and repression1,3-6. Combinations of modifications can provide even more precise insight into chromatin state7,8. Here, we apply a high-throughput pipeline to map 9 chromatin marks and input controls across 9 cell types. We use recurrent combinations of marks to define 15 chromatin states corresponding to repressed, poised, and active promoters, strong and weak enhancers, putative insulators, transcribed regions, and large-scale repressed and inactive domains. We use directed experiments to validate biochemical and functional distinctions between states. The resulting chromatin state maps portray a highly dynamic landscape, with the specific patterns of change across cell types revealing strong correlations between interacting functional elements. We use correlated patterns of activity between chromatin state, gene expression and regulator activity to connect enhancers to likely target genes, to predict cell type-specific activators and repressors, and to identify individual binding motifs responsible for these interactions. Our results have implications for interpreting genome-wide association studies. We find that disease variants frequently coincide with enhancer elements specific to a relevant cell type. In several cases, we can predict upstream regulators whose regulatory motif instances are affected or target genes whose expression may be altered, thereby proposing specific mechanistic hypotheses for how disease-associated genotypes lead to the observed disease phenotypes.

Results

Systematic mapping of chromatin marks in multiple cell types

To explore chromatin state in a uniform way across multiple cell types, we applied a production pipeline for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) to generate genome-wide chromatin datasets (see Methods, Fig. 1a). We profiled nine human cell types, including common lines designated by the ENCODE consortium1 and primary cell types. These consist of embryonic stem cells (H1 ES), erythrocytic leukemia cells (K562), B-lymphoblastoid cells (GM12878), hepatocellular carcinoma cells (HepG2), umbilical vein endothelial cells (HUVEC), skeletal muscle myoblasts (HSMM), normal lung fibroblasts (NHLF), normal epidermal keratinocytes (NHEK), and mammary epithelial cells (HMEC).

Figure 1

Chromatin state discovery and characterization

a, Top: Profiles for nine chromatin marks (grayscale) are shown across the wntless (WLS) gene in four cell types, and summarized in a single chromatin state annotation track for each (colored according to b). WLS is poised in ES cells, repressed in GM12878 cells, and transcribed in HUVEC and NHLF. Its TSS switches accordingly between poised (purple), repressed (grey) and active (red) promoter states; enhancer regions within the gene body become strongly activated (orange, yellow); and its gene body changes from low signal (white) to transcribed (green). These chromatin state changes summarize coordinated changes in many chromatin marks; for example, H3K27me3, H3K4me3 and H3K4me2 jointly mark a poised promoter, while loss of H3K27me3 and gain of H3K27ac and H3K9ac mark promoter activation. Bottom: Nine chromatin state tracks, one per cell type, in a 900kb region centered at WLS summarize 90 chromatin tracks in directly-interpretable dynamic annotations, showing activation and repression patterns for 6 genes and hundreds of regulatory regions, including enhancer states. b, Chromatin states learned jointly across cell types by a multivariate HMM. Table shows emission parameters learned de novo based on genome-wide recurrent combinations of chromatin marks. Each entry denotes the frequency with which a given mark is found at genomic positions corresponding to the chromatin state. c, Genome coverage, functional enrichments, and candidate annotations for each chromatin state. Blue shading indicates intensity, scaled by column. d, Box plot depicts enhancer activity for predicted regulatory elements. 250bp-long sequences corresponding to strong or weak/poised HepG2 enhancer elements, or GM12878-specific strong enhancer elements were inserted upstream of a luciferase gene and transfected into HepG2 cells. Reporter activity was measured in relative light units. Robust activity is seen for strong enhancers in the matched cell type, but not for weak/poised enhancers or for strong enhancers specific to a different cell type. Box-and-whiskers indicate 5th, 25th, 50th, 75th and 95th percentiles.

We used antibodies for histone H3 lysine 4 tri-methylation (H3K4me3), a modification associated with promoters4,5,9; H3K4me2, associated with promoters and enhancers1,3,6,9; H3K4me1, preferentially associated with enhancers1,6; lysine 9 acetylation (H3K9ac) and H3K27ac, associated with active regulatory regions9,10; H3K36me3 and H4K20me1, associated with transcribed regions3-5; H3K27me3, associated with Polycomb-repressed regions3,4; and CTCF, a sequence-specific insulator protein with diverse functions11. We validated each antibody by Western blots and peptide competitions, and sequenced input controls for each cell type. We also collected data for H3K9me3, RNAPII, and H2A.Z in a subset of cells. This resulted in 90 chromatin maps corresponding to ~2.4 billion reads covering ~100 billion bases across nine cell types, which we set out to interpret computationally.

Learning a common set of chromatin states across cell types

To summarize these datasets into nine readily interpretable annotations, one per cell type, we applied a multivariate Hidden Markov Model (HMM) that uses combinatorial patterns of chromatin marks to distinguish chromatin states8. The approach explicitly models mark combinations in a set of ‘emission’ parameters and spatial relationships between neighboring genomic segments in a set of ‘transition’ parameters (see Methods). It has the advantage of capturing regulatory elements with greater reliability, robustness and precision relative to studying individual marks8. We learned chromatin states jointly by creating a virtual concatenation of all chromosomes from all cell types. We selected 15 states which showed distinct biological enrichments and were consistently recovered (Fig. 1a,b; Supplementary Fig. 1). Even though states were learned de novo based solely on the patterns of chromatin marks and their spatial relationships, they showed distinct associations with transcriptional start sites (TSSs), transcripts, evolutionarily-conserved non-coding regions, DNase hypersensitive sites12, binding sites for the regulators, c-Myc13 and NF-κB14, and inactive genomic regions associated with the nuclear lamina15 (Fig. 1c). We distinguished six broad classes of chromatin states, which we refer to as promoter, enhancer, insulator, transcribed, repressed, and inactive states (Fig. 1c). Within them, active, weak and poised4 promoters (states 1-3) differ in expression levels, strong and weak candidate enhancers (states 4-7) differ in expression of proximal genes, and strongly and weakly transcribed regions (states 9-11) also differ in their positional enrichments along transcripts. Similarly, Polycomb-repressed regions (state 12) differ from heterochromatic and repetitive states (states 13-15), which are also enriched for H3K9me3 (Supplementary Fig. 2-4). The states vary widely in their average segment length (~500bp for promoter and enhancer states vs. 10 kb for inactive regions), and in the portion of the genome covered (<1% for promoter and enhancer states vs. >70% for inactive state 13). For each state, coverage was relatively stable across cell types (Supplementary Fig. 5), with the exception of ES cells in which the poised promoter state is more abundant while strong enhancer and Polycomb-repressed states are depleted, consistent with the unique biology of pluripotent cells4,16. We confirmed that promoter and enhancer states showed distinct biochemical properties (Supplementary Fig. 6). RNAPII was highly enriched at strong promoters, weakly enriched at strong enhancers, and nearly undetectable at weak/poised enhancers, consistent with strong transcription at promoters, and reports of weak transcription at active enhancers17,18. H2A.Z, a histone variant associated with nucleosome free regions19, was enriched in active promoters and strong enhancers, consistent with nucleosome displacement at TSSs and sites of abundant transcription factor (TF) binding in active enhancers. We also used luciferase reporter assays to validate the functionality of predicted enhancers, the distinction between strong and weak enhancer states, and their predicted cell type-specificity. We tested strong enhancers, weak enhancers, and strong enhancers specific to an unmatched cell type by transfection in HepG2 cells. We observed strong luciferase activity only for strong enhancer elements from the matched cell type (Fig. 1d). These results and additional properties of the model (Supplementary Fig. 7-10) suggest that chromatin states are an inherent, biologically-informative feature of the genome. The framework enables us to reason about coordinated differences in marks by directly studying chromatin state changes between cell types (which we refer to as ‘changes’ or ‘dynamics’ without implying any temporal relationship).

Extent and significance of chromatin state changes across cell types

We next explored the extent to which chromatin states vary between pairs of cell types. The overall patterns of variability (Supplementary Fig. 11,12) suggest that regulatory regions vary dramatically in activity levels across cell types. Enhancer states show frequent interchange between strong and weak enhancers, and promoter states vary between active, weak and poised. Promoter states appear more stable than enhancers; they are eight times more likely to remain promoter states, controlling for coverage. Switching was also observed between promoter, enhancer, and transcriptional transition states, but no preferential changes were found to other groups. These general patterns suggest that despite varying activity levels, enhancer and promoter regions tend to preserve their chromatin identity as regions of regulatory potential. Chromatin state differences between cell types relate to cell type-specific gene functions. An unbiased clustering of chromatin state profiles across annotated TSSs in lymphoblastoid and skeletal muscle cells distinguished informative patterns predictive of downstream gene expression and functional gene classes (Supplementary Fig. 13,14). Cell type-specific patterns were also evident when TSSs were simply assigned to the most prevalent chromatin state. Promoters activate in skeletal muscle were associated with extracellular structure genes (8.5-fold enrichment), those activate in lymphoblastoid cells with immune response genes (7.2-fold enrichment), and those active in both with metabolic housekeeping genes.

Clustering of promoter and enhancer states based on their activity patterns

Extending our pair-wise promoter analysis, we clustered strong promoter and strong enhancer regions across all cell types (see Methods). This revealed clusters showing common activity and associated with highly coherent functions (Fig. 2a,b). For promoter clusters, these include immune response (GM12878-specific clusters, p<10−18), cholesterol transport (HepG2-specific, 10−4), and metabolic processes (all cells, 10−131). Remarkably, genes assigned to enhancer clusters by proximity also showed strong functional enrichments, including immune response (GM12878-specific, 10−6), lipid metabolism (HepG2-specific, 10−5) and angiogenesis (HUVEC-specific, 10−4).

Figure 2

Cell type-specific promoter and enhancer states and associated functional enrichments

a, Clustering of genomic locations (rows) assigned to active promoter state 1 (red) across cell types (columns) reveals 20 common patterns of activity (A-T) (see Methods). For each cluster, enriched gene ontology (GO) terms are shown with hypergeometric P-value and fold-enrichment, based on nearest TSS. For most clusters, several cell types show strong (dark red) or moderate (light red) activity. b, Analogous clustering and functional enrichments for strong enhancer state 4 (yellow). Enhancer states show greater cell type-specificity, with most clusters active in only one cell type.

Promoters and enhancers differed in their overall specificity. The majority of promoter clusters showed activity in multiple cell types, consistent with previous work5,10 (Fig. 2a). Enhancer clusters are significantly more cell type-specific, with few regions showing activity in more than two cell types and a majority being specific to a single cell type (Fig. 2b). We also found differences in the relative contributions of enhancer-based and promoter-based regulation among gene classes. Developmental genes appear strongly regulated by both, showing the highest number of proximal enhancers and diverse promoter states, including poised and Polycomb-repressed (Supplementary Fig. 15). Tissue-specific genes (e.g. immune genes, steroid metabolism genes) appear more dependent on enhancer regulation, showing multiple tissue-specific enhancers but less diverse promoter states. Lastly, housekeeping genes are primarily promoter-regulated with few enhancers in their vicinity. Overall, this dynamic view of the chromatin landscape suggests that multi-cell chromatin profiles can be as productive for systems biology as expression analysis has traditionally been, and may hold additional information on genome regulatory programs, which we explore next.

Correlations in activity profiles link enhancers to target genes

We next investigated functional interconnections between enhancers, the factors that activate or repress them, and the genes whose expression they regulate, by defining ‘activity profiles’ for each across the cell types (Fig. 3). We complemented these enhancer activity profiles (Fig. 3a) with profiles for gene expression (Fig. 3b), sequence motif enrichment (Fig. 3d), and the expression of TFs recognizing each motif (Fig. 3e). We used correlations between these profiles to probabilistically link enhancers to their downstream targets and upstream regulators (see Methods).

Figure 3

Correlations in activity patterns link enhancers to gene targets and upstream regulators

a, Average enhancer activity across the cell types (columns) for each enhancer cluster (rows) defined in Figure (labeled A-T) and number of 200bp windows in each cluster. b, Average mRNA expression of nearest gene across the cell types and correlation with enhancer activity profile from a. High correlations between enhancer activity and gene expression provide a means for linking enhancers to target genes. c, Enrichment for Oct4 binding in ES cells24 and NF-κB binding in lymphoblastoid cells14 for each cluster. d, Strongly enriched (red) or depleted (blue) motifs for each cluster, from a catalog of 323 consensus motifs. e, Predicted causal regulators for each cluster based on positive (activators) or negative (repressors) correlations between motif enrichment (top left triangles) and TF expression (bottom right triangles). For example, a red/yellow combination predicts Oct4 as a positive regulator of ES-specific enhancers, as its motif-based predicted targets are enriched (red upper triangle) for enhancers active in ES (cluster A), and the Oct4 gene is expressed specifically in ES cells, resulting in a positive TF expression correlation (yellow triangle). Overall correlations between motif and TF expression across all clusters denote predicted activators (positive correlation, orange) and repressors (negative correlation, purple).

We found that patterns of enhancer activity (Fig. 2b,3a) correlated strongly with patterns of nearest-gene expression (Fig. 3b, correlation >0.9 in 16 of 20 clusters). Since this correlation remained high even for large distances (>50kb), we used activity correlation as a complement to genomic distance for linking enhancers to target genes (see Methods). Activity-based linking yielded increased functional gene class enrichments for several clusters (Supplementary Fig. 16). We validated our approach using quantitative trait locus (QTL) mapping studies which use co-variation between SNP alleles and gene expression levels to link cis-regulatory regions to target genes. Investigation of four recent QTL studies in liver20 and lymphoblastoid cells21-23 revealed remarkable agreement with our enhancer predictions. Enhancers linked to a given target gene by our method were significantly enriched for SNPs correlated with the gene’s expression level (Supplementary Fig. 17), thus confirming our enhancer-gene linkages with orthogonal data.

Correlations with TF expression and motif enrichment predict upstream regulators

We next predicted sequence-specific TFs likely to target enhancers in a given cluster based on regulatory motif enrichments. This implicated a number of TFs whose known biological roles matched the respective cell types (Fig. 3d, Supplementary Fig. 18). When ChIP-seq data was available in the relevant cell type, we confirmed that enriched motifs were preferentially bound by the cognate factor (Fig. 3c). Oct4 motif instances in cluster A (ES-specific enhancers) were preferentially bound by Oct4 in ES cells24, and NF-kB motif instances in cluster F (lymphoblastoid-specific enhancers) were preferentially bound by NF-kB in lymphoblastoid cells14. In both cases, motif instances in cell type-specific enhancers showed a ~5-fold increase in binding compared to other enhancers. However, sequence-based motif enrichments do not distinguish causality. Enrichment could reflect a parallel binding event that does not affect the chromatin state, or the motif could actually be antagonistic to the enhancer state through specific repression in orthogonal cell types. To distinguish between these possibilities, we complemented the observed motif enrichments with cell type-specific expression for the corresponding TFs (Fig. 3e). We then correlated a ‘motif score’ based on motif enrichment in a given cluster, and a ‘TF-expression score’ based on the agreement between the TF expression pattern and the cluster activity profile (see Methods). A positive correlation between the two scores implies that the TF may be establishing or reinforcing the chromatin state. A negative correlation would instead imply that the TF may act as a repressor. For example, in addition to the enrichment of the Oct4 motif in the ES-specific cluster A, Oct4 is specifically expressed in ES cells, leading to its prediction as a causal regulator of ES cells (Fig. 3e), consistent with known biology16. For 18 of the 20 clusters, this analysis revealed one or more candidate regulators. Recovery of known roles for well-studied regulators validated our approach. For example, HNF1, HNF4, and PPARγ are predicted as activators of HepG2-specific enhancers (clusters H,I), PU.1 and NF-κB as activators of lymphoblastoid (GM12878) enhancers (clusters C,F,G), Gata1 as an activator of K562-specific enhancers (cluster B) and Myf as an activator of skeletal muscle (HSMM) enhancers (cluster O)14,25-27. The analysis also revealed potentially novel regulatory interactions. ETS factors (Elk1,Tel2,Ets) are predicted activators of enhancers active in both GM12878 and HUVEC (cluster G), but not of GM12878-specific or HUVEC-specific clusters emphasizing the value of unbiased clustering. These connections are consistent with reported roles for ETS factors in lymphopoiesis and endothelium28. The prediction of p53 as an activator in HSMM, NHLF, NHEK and HMEC (clusters N,Q,R) likely reflects its maintained activity in these primary cells as opposed to other cell models where it may be suppressed by mutation (K562)29, viral inactivation (GM12878)30 or cytoplasmic localization (ES cells)31. A widespread role for p53 in regulating distal elements is consistent with its known binding to distal regions32,33. Our analysis also revealed several repressor signatures, including Gfi1 in K562 and GM12878 cells (clusters B,C) and Bach2 in ES cells (cluster A). Both regulators are known to repress transcription by recruiting histone deacetylases and methyltransferases to proximal promoters34,35, and Gfi1 has also been implicated in silencing of satellite repeats35. Our regulatory inferences suggest that they also modulate chromatin to inhibit enhancer activity, thus proposing a new mechanism for distal gene regulation.

Validation of predicted binding events and regulatory outcomes

The regulatory inferences above imply TF binding events at motif instances within enhancer regions in specific cellular contexts, which we sought to validate using a general molecular signature. Binding events are associated with nucleosome displacement, a structural change evident in ChIP-seq data for histones36. We thus studied local depletions in the chromatin intensity profiles (‘dips’) as indicative of TF binding. We confirmed that dips were present in individual signal tracks at active enhancers, and were associated with preferential sequence conservation and regulatory motif instances (Fig. 4a).

Figure 4

Validation of regulatory predictions by nucleosome depletions and enhancer activity

a, Dips in chromatin intensity profiles in a K562-specific strong enhancer (orange) coincide with a predicted causal GATA motif instance (logo). The dips likely reflect nucleosome displacement associated with TF binding, supported by DNase hypersensitivity12 and GATA1 binding25. b, Superposition of H3K27ac signal across loci containing GATA motifs, centered on motif instances, shows dips in K562 cells, as predicted. c, Superposition of H3K4me2 signal for HepG2 cells shows dips over HNF4 motifs in strong enhancer states, as predicted. d, HepG2-specific strong enhancers with predicted causal HNF motifs were tested in reporter assays. Constructs with permuted HNF motifs (red) led to significantly reduced luciferase activity compared to wild type (blue), with an average 2-fold reduction. Mean luciferase relative light units over three replicates and 95% confidence intervals are indicated.

To test our specific predictions, we superimposed chromatin profiles of coordinately regulated enhancer regions, anchoring them on the implied motif instances. Striking dips precisely coincide with regulatory motifs, and are both cell type-specific and region-specific, exactly as predicted (Fig. 4b,c). As dips only appear when the factor is expressed, they also support the identity of the trans-acting TF. To validate that predicted causal motifs contribute to enhancer activity, we used luciferase reporters. Our model implicated HNF regulators as activators of HepG2-specific enhancers (Fig. 3), and context-specific dips supported binding interactions (Fig. 4c). We thus selected for functional analysis 10 sites with HNF motifs showing dips in strong HepG2-specific enhancers, and evaluated them with and without the HNF motif. We found that permutation of the motif consistently led to a reduction in enhancer activity (Fig. 4d), supporting its predicted causal role.

Assigning candidate regulatory functions to disease-associated variants

Finally, we explored whether our chromatin annotations and regulatory predictions can provide insight into sequence variants associated with disease phenotypes. To that effect, we gathered a large set of non-coding SNPs from GWAS catalogs, an exceedingly small proportion of which are currently understood37. We found that disease-associated SNPs are significantly more likely to coincide with strong enhancers (states 4,5; 2-fold enrichment, p<10−10), despite the fact that no notable association to these states are seen for SNPs in general or for those SNPs tested in the studies. To test whether SNPs associated with a particular disease might have even more specific correspondences, we examined 426 GWAS datasets. We identified 10 studies38-47 whose variants showed significant correspondences to cell type-specific strong enhancer states (see Methods; Fig. 5a).

Figure 5

Disease variants annotated by chromatin dynamics and regulatory predictions

a, Intersection of strong enhancer states (4,5) with disease-associated SNPs from GWAS studies shows significant enrichment (blue shading) in relevant cell types (see Methods). Fold-enrichments of the SNPs in strong enhancer states for each cell type are indicated. b, For three GWAS datasets38-40, state annotations are shown for a subset of lead SNPs in the 9 cell types (colors as in Figure , except state 11 is white). Strong enhancer state (orange) is most prevalent in cell types related to the phenotype. For SNPs overlapping strong enhancers, proximal genes with correlated expression are indicated, with linking score and distance. c, Example GWAS locus with blood lipid traits41 association, where the lead variant (red circle) has no functional annotation but a linked SNP (arrow) coincides with a HepG2-specific strong enhancer (orange), and may represent a causal variant. Strong enhancer annotations are shown for all cell types. d, Example GWAS loci where disease SNP affects a conserved instance of a predicted causal motif. Left: Lead SNP rs9374080 in the erythrocyte phenotype GWAS38 is <100 bp from a strong enhancer in K562 erythroleukemia cells and strengthens a motif for Gfi1b, a predicted repressor in K562 (Fig. 3d). Right: SNP rs9271055 associated with lupus39 coincides with a lymphoblastoid (GM12878)-specific strong enhancer and strengthens a motif for Ets1, a predicted activator of lymphoblastoid enhancers (Fig. 3d). This factor is further implicated by lupus-associated variants that directly affect the Ets1 locus39.

Individual variants from these studies were strongly enriched in enhancer states specifically active in relevant cell types (Fig. 5a,b). For example, SNPs associated with erythrocyte phenotypes38 were found in erythroleukemia cell (K562) enhancers, SNPs associated with systemic lupus erythematosus39 were found in lymphoblastoid cell (GM12878) enhancers, while SNPs associated with triglyceride40 phenotypes or blood lipid phenotypes41 were found in hepatocellular carcinoma cell (HepG2) enhancers. We also applied our model to chromatin data for T-cells3 (Supplementary Fig. 19), for which strong enhancer states correlated to variants associated with risk of childhood acute lymphoblastic leukemia48, further validating our approach. We also used our predicted enhancer-target gene associations to find candidate downstream genes whose expression might be affected by cis-changes occurring in the enhancer region. Although most of the predicted target genes are proximal to the enhancer, a subset of more distal predicted targets could reflect novel candidates for the disease phenotypes (Fig. 5b). In addition, we identified several instances where a lead GWAS variant does not correspond to a particular chromatin element but a linked variant coincides with an enhancer with the predicted cell type-specificity (Fig. 5c). Thus, chromatin profiles may provide a general means to triage variants within a haplotype block, a common problem faced in GWAS. Lastly, we identified several cases in which a disease-associated SNP created or disrupted a regulatory motif instance for a predicted causal TF in the relevant cell type (Fig. 5d), suggesting a specific molecular mechanism by which the disease-associated genotype could lead to the observed disease phenotype consistent with our regulatory predictions.

Discussion

Our work provides a systematic view of many chromatin marks across many cell types, demonstrating the power of chromatin profiling as an additional and dynamic layer of genome annotation. We presented methods to distinguish different classes of functional elements, elucidate their cell type-specificities, and reveal cis-regulatory interactions that govern them and ultimately drive target gene expression. By intersecting our predictions with non-coding SNPs from GWAS datasets, we propose potential mechanistic explanations for disease variants, either through their presence within cell type-specific enhancer states, or by their effect on binding motifs for predicted regulators. Chromatin states dramatically reduced the large combinatorial space of 90 chromatin datasets (2^90 combinations) into a manageable set of biologically-interpretable annotations, thus providing an efficient and robust way to track coordinated changes across cell types. This enabled the systematic identification and comparison of >100 thousand promoter and enhancer elements. Both types of elements are cell type-specific, associated with motif enrichments, and assume strong, weak and poised states that correlate with neighboring gene expression and function. Enhancers showed exquisite tissue-specificity, enrichment in the vicinity of developmental and cell type-specific genes, and predictive power for proximal gene expression, reinforcing their roles as sentinels of tissue-specific gene expression49. By elucidating enhancers systematically, and linking them to upstream regulators and downstream genes, our analysis can help provide a missing link between regulators and target genes. The power of the approach should increase considerably as additional phenotypically-distinct cell types are surveyed, and enable a greater proportion of enhancer elements to be incorporated into the connectivity network. The inferred cis-regulatory interactions make specific testable predictions, many of which were confirmed through additional experiments and analyses. Our enhancer-target gene linkages are supported by cis-regulatory inferences from QTL mapping studies. Predicted TF-motif interactions within cell type-specific enhancers were confirmed in specific cases by TF binding and more generally by depletions in the chromatin profiles at causal motifs in appropriate cellular contexts. Motifs predicted as causal regulators of cell type-specific enhancers were also confirmed in enhancer assays. The regulatory inferences afforded by multi-cell chromatin profiles are unique and highly complementary to datasets for TF binding, expression, chromatin accessibility, nucleosome positioning, and chromosome conformation50. For example, our regulatory predictions can help focus the spectrum of TF binding events to a smaller number of functional interactions. The chromatin-centric approach also complements the extensive body of work on biological network inference from expression data with the potential to introduce enhancers and other genomic elements into connectivity networks. Our study has important implications for the understanding of disease. Our detailed and dynamic functional annotations of the relatively uncharted non-coding genome can facilitate the interpretation of GWAS datasets by predicting specific cell types and regulators related to specific diseases and phenotypes. Furthermore, the connections derived for enhancer regions, to upstream regulators and downstream genes, propose cis- and trans-acting interactions that may be modulated by the sequence variants. While the current study represents only a first small step in this direction, we expect that future iterations with greater diversity of cell types and improved methodologies will help define the molecular underpinnings of human disease.

Methods Summary

ChIP-seq analysis was performed in biological replicate as described4 using antibodies validated by Western blots and peptide competitions. ChIP DNA and input controls were sequenced using the Illumina Genome Analyzer. Expression profiles were acquired using Affymetrix GeneChip arrays. Chromatin states were learned jointly by applying an HMM8 to 10 data tracks for each of the 9 cell types. We focused on a 15 state model that provides sufficient resolution to resolve biologically-meaningful patterns yet is reproducible across cell types when independently processed. We used this model to produce 9 genome-wide chromatin state annotations, which were validated by additional ChIP experiments and reporter assays. Multi-cell type clustering was conducted on locations assigned to strong promoter state 1 (or strong enhancer state 4) in at least one cell type using the k-means algorithm. Enhancer-target gene linkages were predicted by correlating normalized signal intensities of H3K27ac, H3K4me1 and H3K4me2 with gene expression across cell types as a function of distance to the TSS. Upstream regulators were predicted using a set of known TF motifs assembled from multiple sources. Motif instances were identified by sequence match and evolutionary conservation. P-values for GWAS studies were based on randomizing the location of SNPs, and the FDR based on randomizing the assignment of SNPs across studies. Datasets are available from the ENCODE website (http://genome.ucsc.edu/ENCODE), the supporting website for this paper (http://compbio.mit.edu/ENCODE_chromatin_states), and the Gene Expression Omnibus (GSE26386).

65 in total

1. JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors: Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. Feeder-independent culture of human embryonic stem cells.

Authors: Tenneille E Ludwig; Veit Bergendahl; Mark E Levenstein; Junying Yu; Mitchell D Probasco; James A Thomson
Journal: Nat Methods Date: 2006-08 Impact factor: 28.547

3. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.

Authors: Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk
Journal: Nat Biotechnol Date: 2006-09-24 Impact factor: 54.908

4. Direct multiplexed measurement of gene expression with color-coded probe pairs.

Authors: Gary K Geiss; Roger E Bumgarner; Brian Birditt; Timothy Dahl; Naeem Dowidar; Dwayne L Dunaway; H Perry Fell; Sean Ferree; Renee D George; Tammy Grogan; Jeffrey J James; Malini Maysuria; Jeffrey D Mitton; Paola Oliveri; Jennifer L Osborn; Tao Peng; Amber L Ratcliffe; Philippa J Webster; Eric H Davidson; Leroy Hood; Krassen Dimitrov
Journal: Nat Biotechnol Date: 2008-02-17 Impact factor: 54.908

5. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

6. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome.

Authors: Nathaniel D Heintzman; Rhona K Stuart; Gary Hon; Yutao Fu; Christina W Ching; R David Hawkins; Leah O Barrera; Sara Van Calcar; Chunxu Qu; Keith A Ching; Wei Wang; Zhiping Weng; Roland D Green; Gregory E Crawford; Bing Ren
Journal: Nat Genet Date: 2007-02-04 Impact factor: 38.330

7. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

8. Discovery and characterization of chromatin states for systematic annotation of the human genome.

Authors: Jason Ernst; Manolis Kellis
Journal: Nat Biotechnol Date: 2010-07-25 Impact factor: 54.908

Review 9. Transcriptional regulatory circuits: predicting numbers from alphabets.

Authors: Harold D Kim; Tal Shay; Erin K O'Shea; Aviv Regev
Journal: Science Date: 2009-07-24 Impact factor: 47.728

10. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans.

Authors: Sekar Kathiresan; Olle Melander; Candace Guiducci; Aarti Surti; Noël P Burtt; Mark J Rieder; Gregory M Cooper; Charlotta Roos; Benjamin F Voight; Aki S Havulinna; Björn Wahlstrand; Thomas Hedner; Dolores Corella; E Shyong Tai; Jose M Ordovas; Göran Berglund; Erkki Vartiainen; Pekka Jousilahti; Bo Hedblad; Marja-Riitta Taskinen; Christopher Newton-Cheh; Veikko Salomaa; Leena Peltonen; Leif Groop; David M Altshuler; Marju Orho-Melander
Journal: Nat Genet Date: 2008-01-13 Impact factor: 38.330

1561 in total

1. Conserved molecular interactions within the HBO1 acetyltransferase complexes regulate cell proliferation.

Authors: Nikita Avvakumov; Marie-Eve Lalonde; Nehmé Saksouk; Eric Paquet; Karen C Glass; Anne-Julie Landry; Yannick Doyon; Christelle Cayrou; Geneviève A Robitaille; Darren E Richard; Xiang-Jiao Yang; Tatiana G Kutateladze; Jacques Côté
Journal: Mol Cell Biol Date: 2011-12-05 Impact factor: 4.272

2. Transcription factors, coregulators, and epigenetic marks are linearly correlated and highly redundant.

Authors: Tobias Ahsendorf; Franz-Josef Müller; Ved Topkar; Jeremy Gunawardena; Roland Eils
Journal: PLoS One Date: 2017-12-07 Impact factor: 3.240

3. Broadening our understanding of the genetics of Juvenile Idiopathic Arthritis (JIA): Interrogation of three dimensional chromatin structures and genetic regulatory elements within JIA-associated risk loci.

Authors: Kaiyu Jiang; Haeja Kessler; Yungki Park; Marc Sudman; Susan D Thompson; James N Jarvis
Journal: PLoS One Date: 2020-07-30 Impact factor: 3.240

4. Exploring the underlying biology of intrinsic cardiorespiratory fitness through integrative analysis of genomic variants and muscle gene expression profiling.

Authors: Sujoy Ghosh; Monalisa Hota; Xiaoran Chai; Jencee Kiranya; Palash Ghosh; Zihong He; Jonathan J Ruiz-Ramie; Mark A Sarzynski; Claude Bouchard
Journal: J Appl Physiol (1985) Date: 2019-01-03

5. An epigenetic mechanism of resistance to targeted therapy in T cell acute lymphoblastic leukemia.

Authors: Birgit Knoechel; Justine E Roderick; Kaylyn E Williamson; Jiang Zhu; Jens G Lohr; Matthew J Cotton; Shawn M Gillespie; Daniel Fernandez; Manching Ku; Hongfang Wang; Federica Piccioni; Serena J Silver; Mohit Jain; Daniel Pearson; Michael J Kluk; Christopher J Ott; Leonard D Shultz; Michael A Brehm; Dale L Greiner; Alejandro Gutierrez; Kimberly Stegmaier; Andrew L Kung; David E Root; James E Bradner; Jon C Aster; Michelle A Kelliher; Bradley E Bernstein
Journal: Nat Genet Date: 2014-03-02 Impact factor: 38.330

Review 6. Using chromatin marks to interpret and localize genetic associations to complex human traits and diseases.

Authors: Gosia Trynka; Soumya Raychaudhuri
Journal: Curr Opin Genet Dev Date: 2013-11-25 Impact factor: 5.578

7. Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells.

Authors: Alla A Sigova; Alan C Mullen; Benoit Molinie; Sumeet Gupta; David A Orlando; Matthew G Guenther; Albert E Almada; Charles Lin; Phillip A Sharp; Cosmas C Giallourakis; Richard A Young
Journal: Proc Natl Acad Sci U S A Date: 2013-02-04 Impact factor: 11.205

8. Genome-wide analysis of histone marks identifying an epigenetic signature of promoters and enhancers underlying cardiac hypertrophy.

Authors: Roberto Papait; Paola Cattaneo; Paolo Kunderfranco; Carolina Greco; Pierluigi Carullo; Alessandro Guffanti; Valentina Viganò; Giuliano Giuseppe Stirparo; Michael V G Latronico; Gerd Hasenfuss; Ju Chen; Gianluigi Condorelli
Journal: Proc Natl Acad Sci U S A Date: 2013-11-27 Impact factor: 11.205

9. Genome-wide age-related DNA methylation changes in blood and other tissues relate to histone modification, expression and cancer.

Authors: Zongli Xu; Jack A Taylor
Journal: Carcinogenesis Date: 2013-11-28 Impact factor: 4.944

10. DNA methylation of a PLPP3 MIR transposon-based enhancer promotes an osteogenic programme in calcific aortic valve disease.

Authors: Ghada Mkannez; Valérie Gagné-Ouellet; Mohamed Jalloul Nsaibia; Marie-Chloé Boulanger; Mickael Rosa; Deborah Argaud; Fayez Hadji; Nathalie Gaudreault; Gabrielle Rhéaume; Luigi Bouchard; Yohan Bossé; Patrick Mathieu
Journal: Cardiovasc Res Date: 2018-09-01 Impact factor: 10.787