| Literature DB >> 33923796 |
Claudia Caudai1, Monica Zoppè2, Anna Tonazzini1, Ivan Merelli3, Emanuele Salerno1.
Abstract
The three-dimensional structure of chromatin in the cellular nucleus carries important information that is connected to physiological and pathological correlates and dysfunctional cell behaviour. As direct observation is not feasible at present, on one side, several experimental techniques have been developed to provide information on the spatial organization of the DNA in the cell; on the other side, several computational methods have been developed to elaborate experimental data and infer 3D chromatin conformations. The most relevant experimental methods are Chromosome Conformation Capture and its derivatives, chromatin immunoprecipitation and sequencing techniques (CHIP-seq), RNA-seq, fluorescence in situ hybridization (FISH) and other genetic and biochemical techniques. All of them provide important and complementary information that relate to the three-dimensional organization of chromatin. However, these techniques employ very different experimental protocols and provide information that is not easily integrated, due to different contexts and different resolutions. Here, we present an open-source tool, which is an expansion of the previously reported code ChromStruct, for inferring the 3D structure of chromatin that, by exploiting a multilevel approach, allows an easy integration of information derived from different experimental protocols and referred to different resolution levels of the structure, from a few kilobases up to Megabases. Our results show that the introduction of chromatin modelling features related to CTCF CHIA-PET data, histone modification CHIP-seq, and RNA-seq data produce appreciable improvements in ChromStruct's 3D reconstructions, compared to the use of HI-C data alone, at a local level and at a very high resolution.Entities:
Keywords: CHIP-seq; CTCF CHIA-PET data; HI-C data; RNA-seq; bayesian statistics; chromatin conformation; chromatin conformation capture
Year: 2021 PMID: 33923796 PMCID: PMC8072831 DOI: 10.3390/biology10040338
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Flow of ChromStruct: (blue) the input HI-C contact frequency matrix is subdivided in diagonal blocks. (red) Chromatin fibre is modeled as a chain of partially penetrable beads and subdivided into sub-chains. (green) Geometrical perturbations are performed in the quaternion algebra and the solution space is sampled by a Bayesian method. (violet) As the last step, a multilevel 3D reconstruction generates chromatin output conformations (gray) that are compatible with input and constraints.
Presence of structural information derived from H3K27ME3 CHIP-seq, RNA-seq and CTCF-binding experiments for 3.5 Mb portion of chromosome 12 [111.5 Mb–115 Mb], corresponding to blocks from 1750 to 1799 identified by block-detection algorithm.
| Block | Dimension (kb) | Tot Contacts | Data | Corr 1 | Corr 2 |
|---|---|---|---|---|---|
| 1750 | 75 | 272 | Expr genes | 0.128 | 0.134 |
| 1751 | 50 | 102 | Expr genes | 0.489 | 0.428 |
| 1752 | 65 | 295 | Expr genes | 0.246 | 0.191 |
| 1753 | 55 | 142 | Expr genes | 0.264 | 0.217 |
| 1754 | 100 | 133 | Expr genes, CTCF | 0.219 | 0.219 |
| 1755 | 70 | 41 | Expr genes, CTCF | 0.251 | 0.242 |
| 1756 | 65 | 158 | Expr genes | 0.358 | 0.295 |
| 1757 | 85 | 128 | Expr genes, CTCF | 0.286 | 0.217 |
| 1758 | 100 | 211 | Expr genes | 0.222 | 0.204 |
| 1759 | 45 | 89 | Expr genes | 0.320 | 0.371 |
| 1760 | 70 | 247 | Expr genes | 0.325 | 0.341 |
| 1761 | 70 | 153 | 0.113 | 0.185 | |
| 1762 | 50 | 149 | 0.137 | 0.280 | |
| 1763 | 55 | 178 | 0.322 | 0.364 | |
| 1764 | 70 | 168 | 0.056 | 0.119 | |
| 1765 | 100 | 228 | 0.133 | 0.161 | |
| 1766 | 50 | 81 | Expr genes | 0.154 | 0.186 |
| 1767 | 45 | 163 | Expr genes | 0.346 | 0.255 |
| 1768 | 40 | 343 | 0.164 | 0.201 | |
| 1769 | 55 | 268 | 0.222 | 0.160 | |
| 1770 | 50 | 78 | Expr genes, CTCF | 0.326 | 0.204 |
| 1771 | 60 | 38 | Expr genes | 0.178 | 0.244 |
| 1772 | 110 | 389 | Expr genes | 0.303 | 0.235 |
| 1773 | 70 | 90 | 0.041 | 0.136 | |
| 1774 | 100 | 637 | 0.230 | 0.178 | |
| 1775 | 50 | 86 | 0.184 | 0.163 | |
| 1776 | 80 | 383 | H3K27M3 | 0.233 | 0.236 |
| 1777 | 45 | 77 | H3K27M3 | 0.582 | 0.499 |
| 1778 | 60 | 143 | 0.306 | 0.318 | |
| 1779 | 65 | 179 | CTCF | 0.408 | 0.342 |
| 1780 | 55 | 77 | 0.428 | 0.443 | |
| 1781 | 85 | 66 | CTCF | 0.305 | 0.304 |
| 1782 | 105 | 249 | 0.324 | 0.241 | |
| 1783 | 50 | 39 | 0.473 | 0.443 | |
| 1784 | 85 | 218 | 0.330 | 0.313 | |
| 1785 | 63 | 45 | CTCF | 0.209 | 0.210 |
| 1786 | 70 | 123 | 0.230 | 0.257 | |
| 1787 | 45 | 104 | H3K27M3 | 0.423 | 0.291 |
| 1788 | 70 | 283 | 0.145 | 0.131 | |
| 1789 | 70 | 142 | 0.081 | 0.123 | |
| 1790 | 45 | 80 | 0.202 | 0.224 | |
| 1791 | 35 | 30 | 0.220 | 0.258 | |
| 1792 | 65 | 185 | H3K27M3 | 0.425 | 0.392 |
| 1793 | 70 | 208 | H3K27M3 | 0.407 | 0.303 |
| 1794 | 50 | 53 | −0.05 | 0.014 | |
| 1795 | 40 | 168 | H3K27M3 | 0.449 | 0.373 |
| 1796 | 140 | 1659 | H3K27M3 | 0.250 | 0.286 |
| 1797 | 70 | 240 | H3K27M3 | 0.266 | 0.155 |
| 1798 | 60 | 289 | H3K27M3 | 0.233 | 0.150 |
| 1799 | 65 | 264 | 0.141 | 0.063 |
Pearson correlation between original Contact Matrix in input and synthetic Contact Matrix produced by ChromStruct integrating HI-C, CHIP-seq, RNA-seq and CTCF data. Pearson correlation between original Contact Matrix in input and synthetic Contact Matrix produced by ChromStruct using HI-C data only.
Figure 2Comparison of distributions of Pearson correlation between contact matrices obtained with ChromStruct and original contact matrix for blocks belonging to a 3.5 Mb portion of chromosome 12. Blocks interested by expressed genes (left), H3K27ME3 (centre), and CTCF CHIA-PET (right) show a higher correlation if the relevant information is used.
Pearson correlations between synthetic contact matrices and original HI-C contact matrix of the whole chromosome 12 for two populations of conformations: using HI-C data only (Experiment 1) and using HI-C, CHIP-seq, RNA-seq and CTCF-binding site data (Experiment 2).
| HI-C Contacts | RNA-seq | CHIP-seq | CTCF-Binding | Nr of Runs | Correlation | |
|---|---|---|---|---|---|---|
| Experiment 1 | ✓ | 100 | 0.7188371 | |||
| Experiment 2 | ✓ | ✓ | ✓ | ✓ | 100 | 0.6963284 |
Pearson correlation of the original HI-C contact matrix and the ChromStruct’s synthetic contact matrix at the first reconstruction-step resolution (average dimension of blocks is 800 kb).
Figure 3(a) Reconstruction of a portion of chromosome 12 [from 111.5 Mp to 115 Mp], at a 5 kb resolution (starting point in black, end point in purple); in green, the part of chromatin interested by active genes and more expanded; in red, the part interested by H3K27ME3, more compact. (b) Reconstruction of the whole chromosome 12 at a 500 kb resolution: the part in green, interested by active genes, is not only more expanded, but also outermost in the total chromosome. (c) Representation of HI-C, CHIP-seq and RNA-seq data referred to the same portion of chromosome 12 at a 5 kb resolution (plot from ENCODE). The areas with active genes show a lower concentration of H3K27ME3, while the areas with fewer genes, which are more methylated and more compact, correspond to higher HI-C contact frequencies (more yellow in the contact matrix heatmap). (d) Plot of CHIP-seq and RNA-seq information in ChromStruct’s input: 1, −1 or 0 score for every bin associated to expressed genes, H3K27ME3 and none, respectively (see Supplementary for details).
Figure 4Graphical User Interface of ChromStruct. Three groups of quantities are displayed: the first (GEOMETRY) includes geometrical features, the second (METHOD) sets up the TADs extraction and the score function, and the third (ALGORITHM) is only related to the Simulated Annealing parameters.