| Literature DB >> 28387224 |
Eugenio Marco1, Wouter Meuleman2, Jialiang Huang1, Kimberly Glass3, Luca Pinello1, Jianrong Wang2, Manolis Kellis2, Guo-Cheng Yuan1.
Abstract
Chromatin-state analysis is widely applied in the studies of development and diseases. However, existing methods operate at a single length scale, and therefore cannot distinguish large domains from isolated elements of the same type. To overcome this limitation, we present a hierarchical hidden Markov model, diHMM, to systematically annotate chromatin states at multiple length scales. We apply diHMM to analyse a public ChIP-seq data set. diHMM not only accurately captures nucleosome-level information, but identifies domain-level states that vary in nucleosome-level state composition, spatial distribution and functionality. The domain-level states recapitulate known patterns such as super-enhancers, bivalent promoters and Polycomb repressed regions, and identify additional patterns whose biological functions are not yet characterized. By integrating chromatin-state information with gene expression and Hi-C data, we identify context-dependent functions of nucleosome-level states. Thus, diHMM provides a powerful tool for investigating the role of higher-order chromatin structure in gene regulation.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28387224 PMCID: PMC5385569 DOI: 10.1038/ncomms15011
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1A schematic overview of diHMM.
(a) Shown is the underlying graphic model for diHMM with two levels of hidden states corresponding to the domain level (represented by rectangles) and nucleosome level (represented by squares), respectively. Multidimensional input ChIP-seq data are represented by circles. Arrows indicate the conditional dependence structure of diHMM. Nucleosome-level state transitions are dependent on the domain-level state at the end but not the initial position. The emission probability is conditionally independent of the domain-level state given the nucleosome-level state (see methods and Supplementary Fig. 1 for additional details). (b) Genome tracks displaying diHMM state calls in H1 cells for domain- and nucleosome-level states, and nine histone marks in the HOXB cluster region in chromosome 17. Grey box is expanded in c and shows a region of ∼8 kb. In the domain-level track black bars indicate transitions between different domains.
Figure 2Annotation of the chromatin states identified by diHMM.
(a) Emission probability matrix for our diHMM model that contains 30 domain-level and 30 nucleosome-level states. The scale varies linearly between 0 (white) and 1 (dark purple). Colour legend on the left shows our nucleosome-level state annotations. (b) Genomic annotation enrichment for our 30 nucleosome-level states in all cell types combined. Each column shows relative enrichment in a linear scale between 0 (white) and 1 (dark orange). (c) Fraction of genomic coverage in each cell type for each nucleosome-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue). (d) Significant fold enrichments for nucleosome- and domain-level combinations. Only combinations for which false discovery rate (FDR) <0.01 (Fisher's exact test) are displayed above background level. The scale varies logarithmically between 1 (white) and 50 (dark green). Colour legend on the left shows our domain-level annotations. (e) Fraction of genomic coverage in each cell type for each domain-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue).
Figure 3Context-specific functionality of diHMM nucleosome-level states.
(a,b) Heatmaps represent average gene expression (z-score for each gene and cell line obtained from a panel of 17 cell lines studied by ENCODE2) for genes mapped to enhancers in different domain contexts. In each row, genes are selected by proximity (±2 kb from TSS) to nucleosome-level enhancers (states N9 to N13) in super-enhancer domains (D10–D13) or in the rest of the domains, as indicated by the small cartoon in each heatmap. Each column represents the average gene expression values for the different sets of genes when estimated in different cell lines. Numbers indicate the fraction of enhancers distributed between the different domains. (c–e) Heatmaps represent average gene expression for genes mapped to bivalent promoter state N7 in different domain contexts as indicated.