Soya Shinkai1, Shuichi Onami1, Ryuichiro Nakato2. 1. Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, Kobe 650-0047, Japan. 2. Laboratory of Computational Genomics, Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 113-0032, Japan.
Abstract
The three-dimensional (3D) genome organization and its role in biological activities have been investigated for over a decade in the field of cell biology. Recent studies using live-imaging and polymer simulation have suggested that the higher-order chromatin structures are dynamic; the stochastic fluctuations of nucleosomes and genomic loci cannot be captured by bulk-based chromosome conformation capture techniques (Hi-C). In this review, we focus on the physical nature of the 3D genome architecture. We first describe how to decode bulk Hi-C data with polymer modeling. We then introduce our recently developed PHi-C method, a computational tool for modeling the fluctuations of the 3D genome organization in the presence of stochastic thermal noise. We also present another new method that analyzes the dynamic rheology property (represented as microrheology spectra) as a measure of the flexibility and rigidity of genomic regions over time. By applying these methods to real Hi-C data, we highlighted a temporal hierarchy embedded in the 3D genome organization; chromatin interaction boundaries are more rigid than the boundary interior, while functional domains emerge as dynamic fluctuations within a particular time interval. Our methods may bridge the gap between live-cell imaging and Hi-C data and elucidate the nature of the dynamic 3D genome organization.
The three-dimensional (3D) genome organization and its role in biological activities have been investigated for over a decade in the field of cell biology. Recent studies using live-imaging and polymer simulation have suggested that the higher-order chromatin structures are dynamic; the stochastic fluctuations of nucleosomes and genomic loci cannot be captured by bulk-based chromosome conformation capture techniques (Hi-C). In this review, we focus on the physical nature of the 3D genome architecture. We first describe how to decode bulk Hi-C data with polymer modeling. We then introduce our recently developed PHi-C method, a computational tool for modeling the fluctuations of the 3D genome organization in the presence of stochastic thermal noise. We also present another new method that analyzes the dynamic rheology property (represented as microrheology spectra) as a measure of the flexibility and rigidity of genomic regions over time. By applying these methods to real Hi-C data, we highlighted a temporal hierarchy embedded in the 3D genome organization; chromatin interaction boundaries are more rigid than the boundary interior, while functional domains emerge as dynamic fluctuations within a particular time interval. Our methods may bridge the gap between live-cell imaging and Hi-C data and elucidate the nature of the dynamic 3D genome organization.
The three-dimensional (3D) genome architecture plays a key role in diverse, intricately cooperating biological activities [1], [2], [3]. Mounting evidence suggests that the chromatin structure and concomitant intra- and inter-chromosome interactions are related to the proper regulation of transcription, replication, DNA damage repair, and cell differentiation [4], [5], [6]. The disruption of chromatin structures causes cancer and other diseases [7], [8], [9].In the last decade, a plethora of studies have emerged that provide a mechanistic understanding of the role of chromatin interactions in genomic functions. This progress has been supported by the advancement of technology, including super-resolution microscopy, genome-wide chromosome conformation capture techniques (Hi-C), and in silico polymer simulation. The results of these studies suggested that chromosomes can be classified as belonging to either compartment A (regions with active transcription and open structures) or compartment B (largely inactive regions) [10]. Within these compartments, chromatin is further categorized into megabase-scale topologically associating domains (TADs) [11]. Recent experiments based on polymer simulation and in vitro systems support the idea that the higher-order chromatin structures such as TADs and enhancer-promoter loops are dynamic and can be partly explained by the “loop extrusion model,” an active ATP-dependent process regulated by multiple factors such as cohesin, condensin, and CTCF [12], [13], [14].In this review, we focus on the dynamic nature of chromosome structures, particularly from a physical perspective. We highlight our recently developed procedures for modeling the four-dimensional (4D) genome organization [15], [16]. First, we provide a brief overview of polymer modeling of 3D genome organization. Second, we explain how the physical implications of Hi-C data can be decoded using polymer modeling. Next, we explain the unique points of our computational method, PHi-C [15], and compare it with other Hi-C data-driven polymer modeling methods. Then, we demonstrate the application of PHi-C analysis to Hi-C data with molecular perturbations. Finally, we propose a new interpretation of the dynamic 3D genome organization that can be used to understand Hi-C data better both physically and quantitatively.
Polymer modeling of chromatin for 3D genome organization
Chromosomes consist of long chromatin fibers packed inside the nucleus. Labeling of DNA within the nucleus and observation of its organization have led the field of chromosome biology since Walther Flemming described the rod-shaped structure of mitotic chromosomes [17]. Fluorescence in situ hybridization (FISH) technologies revealed that interphase chromosomes occupy distinct positions known as chromosome territories [18] and that labeled pairs of DNA sequences separated by 100–2000-kb are spatially distributed in interphase chromatin as polymers [19]. Therefore, polymer physical approaches, including polymer simulations, have been competent to uncover the nature of chromosome organization.A polymer is a macromolecule consisting of N monomers subject to thermal fluctuations at the sub-micron scales [20]. The conformation of the polymer system is represented by the set of position vectors of monomers . In terms of physics, describing all interaction potentials in the polymer system is modeling the polymer and allows for theoretical analyses and simulations of the polymer dynamics and structure. For example, the polymeric connection between adjacent monomers is represented by the bonding potential . Additionally, one can add M diffusive molecules, , interacting with the polymer system. Then, the potential can describe the interaction between a monomer of the polymer and a diffusive molecule.Since Hi-C technology revealed the 3D genome organization within the nucleus, polymer modeling has bifurcated into two approaches: first-principles (or bottom-up, forward, mechanistic) models, and data-driven (or top-down, inverse, fitting-based) models [21], [22], [23]. The former consists of minimal physical assumptions and aims to reveal a set of minimal physical mechanisms underlying the 3D genome organization. In contrast, the latter uses Hi-C data as a constraint condition of the 3D genome organization and aims to reproduce the input Hi-C data and 3D conformations of the polymer model via an optimization procedure. In the next section, we focus on the latter modeling approaches to decipher Hi-C data. Here, we provide an overview of the historical results of the first-principles polymer models with simulations.In 1998, using polymer modeling with simulations, researchers first evaluated how chromosome territories within the nucleus and the internal structure as a polymer were organized. The multiloop subcompartment model considered not only an excluded-volume interaction between monomers as a repulsive interaction but also subcompartments consisting of 120-kb loops [24]. The simulation results showed the formation of chromosome territories with separated chromosome arms, which agreed with experimental results regarding the radial density and position of genes. However, these results were derived from physical interaction potentials to correspond to experimental observations. In 2008, Rosa and Everaers presented simulations of a generic and minimal polymer model. Interestingly, the results showed that chromosome territories spontaneously emerge due to the slow relaxation from the mitotic chromosome structure because of generic polymer effects [25].After Hi-C technology was developed [10], novel genomic domains such as A/B compartments and TADs were identified, and the scaling law in 3D chromatin folding was revealed. The first-principles polymer modeling has attempted to determine the mechanism underlying chromatin domain formation according to the scaling law. The strings-binders-switch model was the first attempt to account for diffusive binding molecules interacting with sequence-specific chromatin-binding sites [26]. The binding molecules can bridge binding sites on chromatin and form dynamic chromatin loops via the molecules. This mechanism is known as bridging-induced attraction and can drive chromatin-induced phase separation [27], [23]. Additionally, the binding effects were refined as epigenetically related attractive interactions between monomers of the block copolymer model [28]. These models suggest that sequence-specific chromatin-chromatin interactions contribute to the formation of dynamic loops and clusters in the 3D genome organization. Recently, the loop extrusion model suggested that the active loop extrusion process on chromatin dynamically forms TADs in concert with loop extruding and boundary factors [29]. Interestingly, the polymer simulations revealed the molecular roles of the chromatin architectural proteins, such as cohesin and CTCF. In the conclusion of this section, we summarize the remarkable features of the first-principles polymer models with their physical assumptions in terms of the interaction potentials (Table 1).
Table 1
First-principles polymer models for 3D genome organization with simulations., the bonding potential along the polymer backbone; , the repulsive interaction potential between monomers and/or molecules of the system; , the territory potential around each modeled polymer chain; , the bending potential along the polymer backbone; , the loop interaction potential between looping monomers; , the interaction potential of the polymer inferred from Hi-C matrix; , the constraint potential from the nucleus structure; , the interaction potential between monomers of the polymer and diffusive molecules; , the attractive interaction potential between monomers of the polymer. These potentials are function of elements of the set . The detailed forms of these potentials are slightly different with each modeling. SAW, self-avoiding walk.
Author (reference)
Interaction potentials
Main features
Year
Münkel & Langowski [24]
Ubond, Urepulsive, Uterritory, Uloop(120-kb)
Formation of chromosome territories with separated chromosome arms and multiloop subcompartments
1998
Rosa & Everaers [25]
Ubond, Urepulsive, Ubend
Existence and stability of chromosome territories due to generic polymer effects
2008
Mateos-Langerak et al. [82]
Ubond, Urepulsive, Uloop(random)
Foloding of chromatin at 0.5–75 Mb length scales
2009
Tokuda et al. [83]
Ubond, Urepulsive, Ubend, UHiC, Unucleus
Large fluctuation of the genome structure forming chromosome territories in the nucleus
2012
Barbieri et al. [26]
SAW polymer on a cubic lattice, binder affinity and concentration
Scaling properties of chromatin folding, Fractal state of chromatin, Processes of domain formation
2012
Brackley et al. [27]
Ubond, Urepulsive, Ubend, Upolymer-molecule
Clustering of DNA-binding proteins driven by the bridging-induced attraction
2013
Jost et al. [28]
Ubond, Urepulsive, Uattractive
Chromatin folding due to epigenetic-specific attractive interactions, Existence of different four phases: (i) coil, (ii) globule, (iii) microphase separation and (iv) multistability
2014
Fudenberg et al. [29]
Ubond, Urepulsive, Ubend, Uloop(extrusion)
Formation of TADs via active loop extrusion process limited by boundary elements
2016
First-principles polymer models for 3D genome organization with simulations., the bonding potential along the polymer backbone; , the repulsive interaction potential between monomers and/or molecules of the system; , the territory potential around each modeled polymer chain; , the bending potential along the polymer backbone; , the loop interaction potential between looping monomers; , the interaction potential of the polymer inferred from Hi-C matrix; , the constraint potential from the nucleus structure; , the interaction potential between monomers of the polymer and diffusive molecules; , the attractive interaction potential between monomers of the polymer. These potentials are function of elements of the set . The detailed forms of these potentials are slightly different with each modeling. SAW, self-avoiding walk.
Polymer modeling of Hi-C data
How to describe contacts in polymer modeling
Chromosome conformation capture (3C)-based technologies detect genomic pair fragments that are in proximity to one another. Calculating a population-averaged Hi-C contact map requires a massive number of pairs from several million cells [30]. In this vast ensemble, the contact frequency between a DNA pair can be described as a probability. The binning resolution and matrix size of a contact map depend on the number of sequencing reads. Therefore, a dense contact matrix at an appropriate resolution is ideal for quantitatively understanding the contact probabilities.In Hi-C data-driven polymer modeling, we first model the polymeric chromatin fibers as successively connected beads, whose physical size is variable according to the binning resolution (Fig. 1A). Then, the contacts between the ith and jth beads are represented by certain conformations of the bead model (Fig. 1B). This model mimics the genomic contacts in 3C-derivative experiments. As FISH experiments have revealed conformational variety between two genomic loci in the cell nucleus, the distance between the ith and jth beads, , should be distributed in the conformation ensemble according to a probability density function . Additionally, to gather contributions of within the contact distance , here, we introduce a contact kernel function (Fig. 1C). Based on the above setting, in general, we can describe the contact probability between the ith and jth beads as
Fig. 1
Modeling the contacts in Hi-C experiments. A, Hi-C data are expressed as a heat map of the contact matrix binned at an appropriate resolution. Corresponding to the binning, successively connected beads represent a toy polymer for the genomic region of interest. B, The contact between the ith and jth beads occurs in the proximity of the distance . C, The probability density function represents a variety of the distance in polymer conformations. The contact kernel function gathers the contribution of the distances within the contact distance based on Eq. (1).
Modeling the contacts in Hi-C experiments. A, Hi-C data are expressed as a heat map of the contact matrix binned at an appropriate resolution. Corresponding to the binning, successively connected beads represent a toy polymer for the genomic region of interest. B, The contact between the ith and jth beads occurs in the proximity of the distance . C, The probability density function represents a variety of the distance in polymer conformations. The contact kernel function gathers the contribution of the distances within the contact distance based on Eq. (1).This formula is a generalization of the relationship between FISH and 3C [31], [32]. So far, in computational 3D genome modeling, the contact probability has been assumed to be a function of the spatial distance as follows [33]:However, this is an ad hoc relationship established for reconstructing 3D structures and lacks a physical basis.Here, we provide a concrete functional form of Eq. (1). To calculate the integral, we need information for and . The former depends on the binning resolution of the Hi-C matrix, the physical size of the modeled beads, and the contact range represented by the contact distance for the modeled beads pair. Once we establish a physical assumption for the bead model, we can calculate the latter. To find a clue, we first started with a simple model, in which we adopt the bead-spring model and the Gaussian-type contact kernel function,
[15]. The analytical calculation provided a formulawhere represents the variance for spatial conformations of the ith and jth beads and stems from the structural fluctuations of a chromosome in a thermal environment. Interestingly, this formula differs from Eq. (2). While Eq. (2) implies that the contact probability can be interpreted as a static distance, Eq. (3) states that the contact probability is a function of the variety of chromosome structures. Also, another contact kernel represented by the step function states that is a function of the variance [34], [35]:Notably, these theoretical expressions of the contact probability require the probability of the diagonal elements to be normalized as .
Polymer modeling with network-like interactions can depict any contact pattern and simulate dynamic and structural features
Although Eqs. (3), (4) have provided new mathematical insight into the contacts in Hi-C experiments, the quantitative interpretation of contact matrix data remains unclear. A useful polymer model allows us to simulate and predict the chromatin dynamics and structure in living cells from Hi-C data. To establish such a model, we need to include additional assumptions. However, as a minimal model, we need to limit the number of physical assumptions and parameters as much as possible. Handling NN-values of the contact matrix presents a problem, which can be addressed by mapping one contact matrix to another matrix through the relationship described in Eq. (3). Therefore, we introduced a polymer model with network-like interactions between two beads described by the interaction matrix
[15] (Fig. 2A). In terms of polymer physics, the positive or negative value of the matrix element represents the normalized intensity of attractive or repulsive interactions between the ith and jth beads, corresponding to the interaction potential . Then, we identified a one-to-one matrix transformation between the matrices and . This transformation allows us to depict any contact pattern by setting the values of the interaction matrix.
Fig. 2
Polymer modeling with network-like interactions bridges the gap between the genome dynamics and organization. A, Network-like attractive and repulsive interactions in a polymer model are expressed as a heat map of the interaction matrix . The one-to-one matrix transformation converts the matrix into a contact matrix . B, Based on the stochastic dynamics in thermal fluctuations, the matrix allows for 4D polymer dynamics simulations, sampling polymer conformations, mean-squared displacement (MSD) curves of modeled monomers, and microrheology spectra depictions.
Polymer modeling with network-like interactions bridges the gap between the genome dynamics and organization. A, Network-like attractive and repulsive interactions in a polymer model are expressed as a heat map of the interaction matrix . The one-to-one matrix transformation converts the matrix into a contact matrix . B, Based on the stochastic dynamics in thermal fluctuations, the matrix allows for 4D polymer dynamics simulations, sampling polymer conformations, mean-squared displacement (MSD) curves of modeled monomers, and microrheology spectra depictions.As live-cell imaging experiments with single nucleosomes have revealed [36], [37], the dynamic organization of chromatin domains occurs in a system experiencing stochastic thermal fluctuations, which inevitably drive the movements of the genome molecules present in the microscale cell environment. Polymer physics and polymer simulations are powerful ways to understand the relationship between chromatin dynamics and organization [38], [39], [21], [22], [23]. In our polymer modeling system, the interaction matrix also allows us to create simulations of 4D polymer dynamics featuring stochastic thermal fluctuations. With these simulations, we can conduct conformational sampling, calculate mean-squared displacement (MSD) curves of modeled monomers, and depict microrheology spectra [15], [16] (Fig. 2B). Notably, the matrix , which is the equivalent of an ensemble-averaged contact matrix, provides dynamics information. The 4D polymer dynamics simulation demonstrates the dynamic change in a modeled chromatin fiber that occurs along with time evolution. For example, the simulation can reveal both open and compact structures embedded in the dynamics. Since the stochastic dynamics of the polymer model are ergodic [40], the time-averaged contact matrix in the dynamics coincides with an ensemble-averaged contact matrix consistent with the matrix . Furthermore, we can sample many polymer conformations in thermal equilibrium and quantitatively analyze the structural information such as the gyration radius. Moreover, recently, we developed a method, microrheology for Hi-C data, that can analyze the dynamic rheology property as a measure of the rigidity and flexibility of genomic regions along with the time evolution [16]. We can obtain a microrheology spectrum using the microrheology transformation formula [41] of the MSD curves derived from the matrix . The viscoelastic response to the periodic perturbation with a frequency is expressed as a heat map of a rheological quantity along two axes of the genomic coordinate and the frequency. The spectrum reveals the dynamic and hierarchical properties of the 3D genome organization embedded in the contact matrix.The positive and negative interactions in the interaction matrix imply that attractive and repulsive forces between two monomers do work. However, these forces should be thought of in theory as auxiliary, rather than as measurably real. Importantly, the matrix can be converted into measurable quantities such as the contact matrix, MSD curves, and dynamic rheological spectra. If these quantities predict the nature of chromosomes, the theory would be valid.
Deciphering Hi-C data by using the PHi-C optimization procedure
So far, we have explained the theoretical impacts of defining the contacts and Hi-C data-driven polymer modeling with the interaction matrix . We applied the theory to experimental Hi-C data to assess its validity. Mathematically, the inverse transformation from the contact matrix to might be sufficient to decode Hi-C data. However, this does not work well in practice because in experiments, we do not know the contact kernel function, there is experimental noise, and the number of sequencing reads might be insufficient for the ensemble-averaged contact matrix. Therefore, we developed an optimization algorithm for deciphering Hi-C data. Although other data-driven polymer models have also developed computational methods for finding optimized solutions to reconstruct 3D structures and contact maps [42], [43], [44], [45], [46], [35], the molecular dynamics (MD) simulations and Monte Carlo (MC) sampling of polymer models have high computational costs associated with the optimization procedures. Conversely, in our method, we can determine the one-to-one matrix transformation needed to calculate a contact matrix without the MD and MC simulations. This transformation could reduce the computational costs of optimization.Fig. 3 summarizes the strategy and flowchart of our optimization procedure. First, we convert a temporary interaction matrix into a contact matrix and compare it to an input Hi-C contact matrix by using a cost function. Then, we slightly change the value of a randomly selected element of the interaction matrix and calculate the cost function. If the cost function decreases, we accept the change and update the interaction matrix. Practically, the cost function decreases with each iteration of these steps. Finally, we can obtain an optimized interaction matrix . Of note, another study used a similar strategy based on the Gaussian effective model (GEM) [34]. To complete the description of the procedure, we have to define the cost function in detail. The values of the contact probability are in the range of more than three orders of magnitude. Therefore, we might neglect the contributions of relatively fewer values by comparing two contact matrices in the linear scale. Thus, we compare two contact matrices in the logarithmic scale and use the Frobenius norm as the distance between them, as follows:
Fig. 3
PHi-C’s optimization procedure outputs an optimized interaction matrixfrom an input contact matrix. Iterative optimization steps update a temporary interaction matrix so that the difference between the contact matrices and in the logarithmic scale decreases.
PHi-C’s optimization procedure outputs an optimized interaction matrixfrom an input contact matrix. Iterative optimization steps update a temporary interaction matrix so that the difference between the contact matrices and in the logarithmic scale decreases.Taken together with the simulation pipeline shown in Fig. 2, we developed a 4D simulation method, PHi-C (polymer dynamics deciphered from Hi-C data) [15]. Unlike other data-driven polymer modeling methods (Table 2), which also use experimental Hi-C data to obtain optimized physical parameters of the polymer model, PHi-C allows 2D Hi-C data to be interpreted as physical interaction parameters of the matrix and 4D polymer dynamics consistent with dynamic features of the genomic loci observed in live-cell imaging experiments to be demonstrated.
Table 2
Data-driven polymer modeling methods reconstructing the input Hi-C matrix. The average Pearson’s correlation values were calculated from Pearson’s correlation values to compare experimental and reconstructed contact matrices in each paper.
Method
Average Pearson’s correlation
Comparison between Hi-C matrices
Output models
Available from
MiChroM [44]
0.956
CHi-C and CReconstructed
3D structures
PRISMR [46]
0.93
CHi-C and CReconstructed
3D structures
GEM [34]
0.91
CHi-C and CReconstructed
4D dynamics
https://github.com/gletreut/gem_reconstruction
HLM [35]
0.955
Selected contact pairs
3D structures
PHi-C [15]
0.960
log10CHi-C and log10CReconstructed
4D dynamics
https://github.com/soyashinkai/PHi-C
Data-driven polymer modeling methods reconstructing the input Hi-C matrix. The average Pearson’s correlation values were calculated from Pearson’s correlation values to compare experimental and reconstructed contact matrices in each paper.
Demonstration of PHi-C with microrheology analysis
Here, we will demonstrate how to decode Hi-C data by using PHi-C. The loop extrusion model is powerful enough to provide molecular mechanistic insight into how TADs are formed on chromatin through active molecular processes, and can also predict features of Hi-C maps by using molecular perturbations [29], [47]. Also, recent Hi-C experiments have shown that condensin, cohesin, and CTCF as molecular candidates of the loop extrusion factors (LEFs) and the boundary elements (BEs) contribute 3D genome folding [48], [49], [50]. The depletion of cohesin release factor, Wapl, strengthened looping interactions at TAD boundaries due to an increase in cohesin processivity [51], [52]. Loop extrusion polymer simulations taking into account the processivity of LEFs and the boundary strength of BEs predicted the consequences of cohesin, CTCF, and Wapl perturbations [47]. However, the parameters of the polymer model are not currently measurable within the living cell nucleus. Instead, we can determine the parameters of our polymer model in the PHi-C method through the optimization procedure. As long as the optimized contact map is in good agreement with the input experimental Hi-C matrix, the 4D simulations and microrheological features of our polymer model will be consistent with the Hi-C data. On the other hand, our polymer modeling cannot take into account active molecular processes.Interestingly, without presuming that the modeling includes active processes, the PHi-C method depicts highly similar contact patterns for Hi-C data with perturbations to cohesin, CTCF, and Wapl [52] (Fig. 4A). These protein factors are thought to be necessary for the correct loop extrusion process in wild-type (WT) cells. Next, we applied our newly developed method, microrheology for Hi-C data [16], to characterize dynamic viscoelastic response property embedded in Hi-C data. As shown in Fig. 2, we converted each optimized interaction matrix into the microrheology spectrum of complex compliance as a measure of the dynamic flexibility of the genome (Fig. 4B). The vertical stripe patterns were variable along the chromosome, suggesting that viscoelastic responses differed depending on the genomic region. The viscoelastic responses in boundary regions demarcating adjacent square areas on the contact matrices were relatively slower than those in the intra-square areas. As we have already reported, this indicates that TAD boundaries are more rigid as nodes than intra-TAD sequences [16].
Fig. 4
PHi-C method deciphers Hi-C data with perturbations to cohesin, CTCF, and Wapl. A, Experimental (upper right triangle) and optimized (lower left triangle) contact matrices for HeLa cells (WT), auxin-induced SCC1 degradation cells, auxin-induced CTCF degradation cells, and Wapl RNAi cells [52]. SCC1 is a cohesin subunit. As a demonstration, we focused on the 64- to 69-Mb genomic region of chromosome 1 at 50-kb resolution. We observed high values for Pearson’s correlation coefficient, r, between the optimized matrices and each input Hi-C matrix. We used the contact kernel function by Eq. (4) in the PHi-C optimization. B, Microrheology spectra of complex compliance, , as a measure of the dynamic flexibility of four cases. The inverse of the frequency corresponds to a time scale. C, Maps of the ratio between the microrheology spectra of Hi-C data and the spectra of WT data.
PHi-C method deciphers Hi-C data with perturbations to cohesin, CTCF, and Wapl. A, Experimental (upper right triangle) and optimized (lower left triangle) contact matrices for HeLa cells (WT), auxin-induced SCC1 degradation cells, auxin-induced CTCF degradation cells, and Wapl RNAi cells [52]. SCC1 is a cohesin subunit. As a demonstration, we focused on the 64- to 69-Mb genomic region of chromosome 1 at 50-kb resolution. We observed high values for Pearson’s correlation coefficient, r, between the optimized matrices and each input Hi-C matrix. We used the contact kernel function by Eq. (4) in the PHi-C optimization. B, Microrheology spectra of complex compliance, , as a measure of the dynamic flexibility of four cases. The inverse of the frequency corresponds to a time scale. C, Maps of the ratio between the microrheology spectra of Hi-C data and the spectra of WT data.We also compared the microrheology spectra of the Hi-C perturbation data to the spectra of WT cells (Fig. 4C). Cohesin degradation increases rheological mobility in a manner consistent with single-nucleosome dynamics in living cells [37]. Surprisingly, we observed white vertical stripes with ratio values of zero at the boundaries of the square areas on Hi-C maps of CTCF and Wapl depletion cases, suggesting that the dynamic rheological properties of the boundaries changed very little. Red regions with positive ratio values inside domain boundaries revealed the enhancement of rheological mobility, which heightens inter-domain interactions. As a result, the insulation ability at the boundaries may decrease; it has been reported that CTCF depletion disrupts the insulation of neighboring TADs [49], [50]. Blue regions with negative ratio values showed a more rigid response with a time scale that corresponded to the inverse of the frequency . As Wapl-deficient cells accumulate contact at multi-TAD corners [52], [51], chromatin formation becomes more compact. The compaction may increase the rheological rigidity of the nested interaction regions.
Emergence of chromatin domains in systems with dynamic fluctuations
There are two novel features present in the PHi-C modeling method. First, the PHi-C optimization procedure provides the full parameter set of the polymer model with network-like interactions, which can be used to reconstruct an input Hi-C contact matrix. Notably, here, inferring 3D genome structures by using the relationship between the contact probability and the pairwise distance (Eq. (2)) is not necessary for understanding Hi-C data in terms of polymer physics. Second, PHi-C’s 4D simulations and microrheology spectra enable us to interpret ensemble-averaged Hi-C data as dynamic information, since the stochastic dynamics of the polymer model in thermal equilibrium satisfies ergodic properties [15], [16], [40]. Fig. 5 shows the snapshots of polymer conformations and the associated distance maps at each conformation in a 4D PHi-C simulation. Although it is difficult to find globular domains among the conformations and stationary high-contact triangles on the distance maps, time-averaging the contacts in the dynamics yields a time-averaged contact matrix equivalent to the ensemble-averaged contact matrix, due to the ergodicity of the stochastic dynamics.
Fig. 5
PHi-C’s 4D simulation yields the time-averaged contact matrix. For a polymer conformation at each time frame in a 4D simulation, the distance map is shown as a heat map of the normalized distance matrix . Gathering the contacts in the dynamics generates a time-averaged contact matrix. We used the optimized interaction matrix for WT cells in Fig. 4A, calculated the 4D simulation and the distance maps, and depicted the optimized. contact matrix.
PHi-C’s 4D simulation yields the time-averaged contact matrix. For a polymer conformation at each time frame in a 4D simulation, the distance map is shown as a heat map of the normalized distance matrix . Gathering the contacts in the dynamics generates a time-averaged contact matrix. We used the optimized interaction matrix for WT cells in Fig. 4A, calculated the 4D simulation and the distance maps, and depicted the optimized. contact matrix.This dynamics-based perspective can be used to amend the conventional perspective of the 3D genome organization. Recent advances in super-resolution chromatin imaging of fixed cells have revealed the physical 3D conformations of labeled chromatin domains in a few hundred kb sizes [53], [54]. Furthermore, super-resolution chromatin tracing reveals TAD-like structures, which are separated globules in single cells, and the sum of the contacts in the 3D conformations is consistent with population-averaged Hi-C data [55], [56]. Although TADs have often been depicted as distinct, separate globules in schematic figures (Fig. 6A), it is unclear what is a structural feature of TADs in microscope observations [54]. Interestingly, the snapshots by cryo-electron microscopy (EM) [57], [58] and EM tomography [59] techniques reveal not distinct chromatin globules but rather irregularly folded chromatin fibers. Furthermore, in living cells, super-resolution single-nucleosome imaging has unveiled the dynamic organization of chromatin domains [37]. Single nucleosomes mainly driven by thermal fluctuations move around within regions of a few hundred nanometers for a few seconds at a time. Therefore, without a distinct, separated globular organization, chromatin domains would be dynamically organized and behave with liquid-like properties [60], [61], [62].
Fig. 6
Schematic illustrations for understanding the dynamic organization of chromatin domains. A, Classic picture of Hi-C data depicts distinct and separated globules corresponding to genomic triangles in a Hi-C matrix. B, Dynamic picture by PHi-C depicts the emergence of chromatin domains during fluctuations within a time interval . The domain boundaries are more rigid, with less mobility than the interior, and can interfere with interdomain interactions.
Schematic illustrations for understanding the dynamic organization of chromatin domains. A, Classic picture of Hi-C data depicts distinct and separated globules corresponding to genomic triangles in a Hi-C matrix. B, Dynamic picture by PHi-C depicts the emergence of chromatin domains during fluctuations within a time interval . The domain boundaries are more rigid, with less mobility than the interior, and can interfere with interdomain interactions.Using PHi-C with microrheology analysis to process Hi-C data, we figured out that TAD boundaries are more rigid as nodes than intra-TAD sequences [16]. Relatively less mobility at the boundaries due to the rigidity than in the interior can interfere with interdomain interactions between adjacent domains. Besides, as we can infer the spatial hierarchy from the Hi-C contact map, we revealed the existence of the temporal hierarchy of genome organization. The genomic position of rigid boundaries depends on the time evolution of chromatin dynamics. Therefore, here, we propose that functional chromatin domains emerge during dynamic fluctuations within a particular time interval (Fig. 6B). Notably, this dynamic picture does not rely on the assumption of the existence of globular chromatin domains and, therefore, can reconcile bulk-based ensemble-averaged Hi-C data and dynamic genome organization in living single cells.
Summary and outlook
In this review, we have provided an overview of how the PHi-C modeling method deciphers Hi-C data and incorporates it into the concept of the dynamic 3D genome organization in a state undergoing thermal fluctuations. Without inferring static 3D genome structures, the PHi-C optimization procedure generated a contact matrix in good agreement with an input Hi-C matrix (Fig. 3), and the optimized interaction matrix of the polymer model provided the dynamic features of the 3D genomic organization (Fig. 2). As the stochastic movements of single nucleosomes and genomic loci are visible within the range of a few hundred nanometers in living cell nuclei [36], [37], [63], [64], [65], the PHi-C analysis was able to describe and depict the dynamic nature of chromatin. For example, a live-cell imaging experiment showed a marked difference in the movements of Nanog and Oct4 loci in mouse embryonic stem cells (mESCs) [66]. Our PHi-C analysis of the two genomic loci yielded a consistent result that reflected the difference and revealed that the genomic region around Nanog adopts a more compact organization than the region around Oct4
[15]. However, the physical parameters representing the spatial and temporal scales could not be determined by PHi-C analysis. Linking Hi-C data with the dynamic genomic movements in living cells is necessary for improving the PHi-C method.The PHi-C analysis method is a post-Hi-C processing pipeline and is independent of bioinformatic analysis as it generates Hi-C matrix data from sequence data. Interestingly, with respect to defining the contacts of our polymer model, the normalization condition that the contact probability of the diagonal elements satisfies may be a new criterion for evaluating the quality of Hi-C experiments. For an experiment assessing the physical properties of a chromatin fiber, the contact frequencies along the genome coordinate within a single chromosome should be constant. We recommend combining our method with other normalization techniques to reduce and eliminate experimental biases [67], [68], [69], [70], [71], [72].Practically, the size of an input contact matrix in the PHi-C pipeline depends on the sequencing depth and binning size of the matrix. In our polymer model, we assume that the single chromatin polymer is in thermal equilibrium. Therefore, the ensemble-averaged contact probability possesses quantitative meaning, and the contact matrix is not sparse but dense. According to the sequencing depth of the Hi-C experiment, an input contact matrix should be adjusted so that the matrix is as dense as possible. Moreover, the computational cost of the PHi-C optimization procedure is proportional to due to the iterative adjustments by random choice of matrix elements. So far, a 500500-sized matrix is a practical upper limit for obtaining an optimized solution within a few days. The theoretical formalism and computational methods for sparse and large-sized contact matrices still require improvement.Although 3C-based techniques have unveiled the role of 3D genome organization in diverse biological activities, our picture of the emergence of chromatin domains illustrates that the rheological rigidity at the boundaries along the 1D genome coordinate could regulate the formation of chromatin domains without the loop extrusion mechanism. Physical features such as chromatin domains must be regulated as a result of molecular orchestration on chromatin. Chromatin stiffness may create insulation at TAD boundaries [11]. Therefore, further studies are needed to elucidate how molecular interactions involving chromatin can regulate the physical stiffness and mobility of the genome. Of additional importance is the viewpoint that chromatin itself is a heterogeneous reaction field. Recently, quantitative live-imaging methods showed that intra-domain enhancer-promoter communication occurs in the absence of TADs, suggesting that boundary elements cause distal enhancers to activate target promoters independently of TAD formation in Drosophila embryos [73]. In mESCs, direct enhancer-promoter proximity does not drive contemporaneous Sox2 transcription [65]. High enrichment not only of cohesin and CTCF but also of many other regulatory proteins at TAD boundaries might facilitate the formation of a local reaction field such as a transcription hub [73]. The nucleation mechanism of the reaction field might relate to liquid–liquid phase separation [74], [75], [76].The cell-to-cell variability of chromosome structures revealed by single-cell Hi-C experiments implies that single chromosomes adopt stochastic conformations [77], [78], [79]. The summation of the conformational snapshots is consistent with the ensemble-averaged Hi-C data. Given that chromosomes are highly dynamic and that the stochastic dynamics satisfy the ergodic condition [80], PHi-C enables us to replace the ensemble-averaged Hi-C data with time-averaged data. Therefore, monitoring the spatiotemporal organization of chromatin in living cells [81] could provide consistent information about the Hi-C data of fixed cells. To understand the nature of the dynamic 3D genome, we need developments of labeling and seeing multi genomic loci with functional activities such as transcription in living cells. Mechanical manipulation and perturbation to a genomic locus could also uncover rheological features of the dynamic 3D genome. Our PHi-C model may bridge the gap between live-cell imaging and Hi-C data, contributing to the physical understanding of the dynamic 3D genome organization.
Author contributions
SS analyzed Hi-C data and generated the figures, with contributions from all authors. SS, SO, and RN wrote the manuscript.
CRediT authorship contribution statement
Soya Shinkai: Conceptualization, Methodology, Software, Data curation, Investigation, Visualization, Writing - original draft. Shuichi Onami: Supervision, Project administration, Writing - review & editing, Funding acquisition. Ryuichiro Nakato: Conceptualization, Data curation, Writing - original draft, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.