| Literature DB >> 35397520 |
Abstract
BACKGROUND: Hi-C and its high nucleosome resolution variant Micro-C provide a window into the spatial packing of a genome in 3D within the cell. Even though both techniques do not directly depend on the binding of specific antibodies, previous work has revealed enriched interactions and domain structures around multiple chromatin marks; epigenetic modifications and transcription factor binding sites. However, the joint impact of chromatin marks in Hi-C and Micro-C interactions have not been globally characterized, which limits our understanding of 3D genome characteristics. An emerging question is whether it is possible to deduce 3D genome characteristics and interactions by integrative analysis of multiple chromatin marks and associate interactions to functionality of the interacting loci. RESULT: We come up with a probabilistic method PROBC to decompose Hi-C and Micro-C interactions by known chromatin marks. PROBC is based on convex likelihood optimization, which can directly take into account both interaction existence and nonexistence. Through PROBC, we discover histone modifications (H3K27ac, H3K9me3, H3K4me3, H3K4me1) and CTCF as particularly predictive of Hi-C and Micro-C contacts across cell types and species. Moreover, histone modifications are more effective than transcription factor binding sites in explaining the genome's 3D shape through these interactions. PROBC can successfully predict Hi-C and Micro-C interactions in given species, while it is trained on different cell types or species. For instance, it can predict missing nucleosome resolution Micro-C interactions in human ES cells trained on mouse ES cells only from these 5 chromatin marks with above 0.75 AUC. Additionally, PROBC outperforms the existing methods in predicting interactions across almost all chromosomes.Entities:
Keywords: Chromatin organization; Epigenetics; Hi-C; Machine learning; Micro-C
Mesh:
Substances:
Year: 2022 PMID: 35397520 PMCID: PMC8994916 DOI: 10.1186/s12864-022-08498-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Histone modifications and transcription factor binding sites used in our experiments
| Species & cell type | Histone modifications | Transcription factor binding sites |
|---|---|---|
| Human ES | H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K79me2, H3K36me3, H4K20me1, H3K27me3, H3K56ac, H3K23ac, H2AK5ac, H2A.Z, H3K9ac, H3K4me2, H4K8ac, H3K18ac | CTCF, SMC3, MAFK, CHD1, POLR2A, Dnase I, RAD21, CEBPB, MAZ, FOS, USF2, RCOR1, RFX5, ELK1, MXI1, NFE2L2 |
| Human IMR90 | H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K79me2, H3K36me3, H4K20me1, H3K27me3, H3K56ac, H3K23ac, H2AK5ac, H2A.Z, H3K9ac, H3K4me2, H4K8ac, H3K18ac | CTCF, SMC3, MAFK, CHD1, POLR2A, Dnase I, RAD21, CEBPB, MAZ, FOS, USF2, RCOR1, RFX5, ELK1, MXI1, NFE2L2 |
| Mouse ES | H3K4me3, H3K27ac, H3K36me3, H3K4me1, H3K9me3, H3K27me3, H3K9ac | CTCF, POLR2A, EP300, MAFK, CHD2, HCFC1, ZC3H11A, ZNF384 |
Fig. 1The probability of each histone modification showing up in PROBC solution where PROBC is applied to each chromosome independently on human ES cells
Fig. 2PROBC inferred genome-wide interaction probabilities between only histone modifications for human ES cells. λ1=2.5, and λ2=1.0 are found by fivefold nested cross-validation
5 histone interactions with the topmost interaction probability difference between human ES and human IMR90 cells
| Interactions | Human ES | Human IMR90 | Abs. diff. |
|---|---|---|---|
| H3K27me3 - H3K36me3 | 0.196 | 0.25 | 0.05 |
| H3K9me3 - H3K79me2 | 0 | 0.04 | 0.04 |
| H3K9me3 - H3K9me3 | 0.71 | 0.67 | 0.04 |
| H3K9me3 - H4K20me1 | 0.12 | 0.09 | 0.03 |
| H3K4me2 - H3K4me3 | 0.06 | 0.04 | 0.02 |
Fig. 3PROBC inferred genome-wide interaction probabilities considering only transcription factor binding sites for Human ES cells
Fig. 4AUC scores in predicting Hi-C interactions for each pairs of chromosome on human ES cells. We apply fivefold cross-validation to each pairs of chromosome
Fig. 5Real vs Predicted Hi-C Interactions in human ES cells chromosome 4
Fig. 6Comparison of predicted vs real Hi-C interaction matrices at 10 kb resolution for human ES cells chromosome 1
Fig. 7Hi-C prediction performance comparison of PROBC, HiC-Reg, and Rambutan by AUC score on human ES cells
Fig. 8ROC Curve for Hi-C prediction performance comparison of PROBC and distance decay heuristics with various exponents on human ES cells chromosome 1
Fig. 9TAD and Compartment prediction performance of PROBC by Normalized Variation of Information across only histone modifications, only transcription factor binding sites, and for combination of both on human ES cells
Real vs Predicted TAD boundaries in human ES cells across all chromosomes
| Resolution | ||||
|---|---|---|---|---|
| Metrics | 1 kb | 5 kb | 40 kb | 100 kb |
| True Positives | 10,245 | 13,452 | 12,546 | 4,724 |
| True Positive Rate | 12.31 | 27.59 | 62.31 | 69.83 |
| False Negatives | 72,779 | 35,304 | 7588 | 2,042 |
| False Negative Rate | 87.69 | 72.41 | 37.69 | 30.17 |
| False Positives | 41,123 | 21,134 | 6,020 | 672 |
| Predicted TAD Boundaries | 51,368 | 34,586 | 18,566 | 5,396 |
| True TAD Boundaries | 83,204 | 48,756 | 20,134 | 6,766 |
Fig. 10AUC score for interaction prediction a) human ES from mouse ES, b) mouse ES from human ES cells. We apply fivefold cross-validation to each pairs of chromosome
Fig. 11AUC score for interaction prediction on human ES from human IMR90. We apply fivefold cross-validation to each pairs of chromosome
Fig. 12PROBC is robust to changes in λ1 and λ2
Fig. 13PROBC is stable to enzyme replicates, robust across resolution parameters