| Literature DB >> 29087948 |
Michele Di Pierro1, Ryan R Cheng2, Erez Lieberman Aiden2,3, Peter G Wolynes2,4,5, José N Onuchic1,5.
Abstract
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.Entities:
Keywords: Hi-C; energy landscape theory; epigenetics; genomic architecture; machine learning
Mesh:
Substances:
Year: 2017 PMID: 29087948 PMCID: PMC5699090 DOI: 10.1073/pnas.1714980114
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Schematic illustration of the MEGABASE + MiChroM computational pipeline. (1) ChIP-Seq data constitute the only input to our pipeline. ChIP-Seq tracks obtained from a publicly available resource (ENCODE) are converted into a sequence of chromatin structural types using a neural network dubbed MEGABASE. The neural network encodes the relationship between compartmentalization and the biochemical state of each locus along the genome. (2) Sequences of chromatin structural types are used as input to a physical model for chromatin folding (MiChroM) to obtain the ensembles of 3D structures of specific chromosomes (17). MiChroM is an effective energy landscape model consisting of a generic polymer with chromatin-type interactions and a translational invariant local ordering term (Ideal Chromosome). (3) Ensembles of 3D structures are validated by comparing the predicted contact maps with those experimentally determined by using Hi-C.
Fig. 2.Predicting the 1D chromatin sequences, 3D conformations, and 2D contact probabilities of human chromosomes from epigenetic marking patterns. We apply MEGABASE + MiChroM to obtain an ensemble of 3D structures for all of the autosomes of cell line GM12878. For illustrative purposes, predictions for chromosome 2 (Left) and chromosome 10 (Right) are shown, respectively. (A) Ninety-five ChIP-Seq tracks are downloaded from the ENCODE database and used as input for MEGABASE to predict 1D sequences of chromatin types (shown in B). The 3D structure of each chromosome is encoded in its specific 1D sequence of chromatin structural types. (C) Typical 3D conformation obtained by MiChroM is shown for chromosomes 2 and 10. (D) Approximately 50,000 structures are collected from simulation to generate high-quality contact maps. These contact maps are compared with the Hi-C maps shown in E. The simulations correctly predict the long-range contact probability patterns that are observed in Hi-C maps, as seen in the magnified regions.
Fig. 3.Simulated conformational ensembles predict the distances measured by 3D FISH experiments. Simulations and 3D FISH experiments support the idea that the compartmentalization observed in Hi-C maps emerges from the phase separation of chromatin structural types. (A and B) Cartesian distances between four loci (L1, L2, L3, and L4) in chromosome 14 (cell line GM06990) were measured in two distinct 3D FISH experiments reported by Lieberman-Aiden et al. (11). The same distances were measured using the MEGABASE + MiChroM pipeline. The positions of the fluorescent probes are illustrated in representative 3D configurations from simulations, as well as along the chromosome. As illustrated by the annotations from MEGABASE shown in the figure, the four loci are composed of chromatin of alternating types: L1 and L3 composed of type A chromatin and L2 and L4 composed of type B chromatin. (C and D) Cumulative distribution functions (CDF) show that loci composed of chromatin belonging to the same type tend to be closer in space than otherwise, despite the interlaced order and despite lying at greater genomic distances. This phenomenon is observed in FISH experiments, and it is correctly predicted by our ChIP-Seq–based modeling. The comparison between the predicted and measured probability distributions shows excellent agreement for both the average distance and the distance fluctuations (more examples of validation with FISH data are provided in ). The average ratio between simulated distances and FISH-measured distances has been used to calibrate the length scale of simulation. One unit of length in simulation corresponded to a length of 0.17 μm, which also implies the size of a simulated chromosomal territory being ∼2–3 μm across, which is consistent with what was previously reported by Cremer and Cremer (2).
Fig. 4.Process of microphase separation explains compartmentalization in chromosomes. The MEGABASE + MiChroM hypothesizes that chromatin characterized by homogeneous epigenetic markings undergoes a process similar to phase separation under the action of the proteome present in the nucleus. In simulations, we observe that segments of chromatin belonging to the same structural type tend to segregate, forming liquid droplets, which rearrange dynamically by splitting and fusing. This simple process of phase separation is sufficient to explain the emergence of compartmentalization in genomes as observed in DNA-DNA ligation assays and microscopy experiments.