| Literature DB >> 35749574 |
Hamid Alinejad-Rokny1,2,3, Rassa Ghavami Modegh4, Hamid R Rabiee4, Ehsan Ramezani Sarbandi4, Narges Rezaie5, Kin Tung Tam1, Alistair R R Forrest1.
Abstract
Hi-C is a genome-wide chromosome conformation capture technology that detects interactions between pairs of genomic regions and exploits higher order chromatin structures. Conceptually Hi-C data counts interaction frequencies between every position in the genome and every other position. Biologically functional interactions are expected to occur more frequently than transient background and artefactual interactions. To identify biologically relevant interactions, several background models that take biases such as distance, GC content and mappability into account have been proposed. Here we introduce MaxHiC, a background correction tool that deals with these complex biases and robustly identifies statistically significant interactions in both Hi-C and capture Hi-C experiments. MaxHiC uses a negative binomial distribution model and a maximum likelihood technique to correct biases in both Hi-C and capture Hi-C libraries. We systematically benchmark MaxHiC against major Hi-C background correction tools including Hi-C significant interaction callers (SIC) and Hi-C loop callers using published Hi-C, capture Hi-C, and Micro-C datasets. Our results demonstrate that 1) Interacting regions identified by MaxHiC have significantly greater levels of overlap with known regulatory features (e.g. active chromatin histone marks, CTCF binding sites, DNase sensitivity) and also disease-associated genome-wide association SNPs than those identified by currently existing models, 2) the pairs of interacting regions are more likely to be linked by eQTL pairs and 3) more likely to link known regulatory features including known functional enhancer-promoter pairs validated by CRISPRi than any of the existing methods. We also demonstrate that interactions between different genomic region types have distinct distance distributions only revealed by MaxHiC. MaxHiC is publicly available as a python package for the analysis of Hi-C, capture Hi-C and Micro-C data.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35749574 PMCID: PMC9262194 DOI: 10.1371/journal.pcbi.1010241
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Statistical summary of significant interactions identified by MaxHiC, GOTHiC, Fit-Hi-C, Fit-HiC2, HiCDC+, HiCCUPS, Peakachu, and Mustache in the GM12878 Hi-C library from Rao et al. [5]. at bin-sizes 5kb and 10kb.
Statistical summaries for HMEC library are provided in , respectively.
| 1kb | MaxHiC | GOTHiC | Fit-Hi-C | Fit-Hi-C2 | HiCDC+ | HiCCUPS | Peakachu | Mustache | All | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| percentage of significant interactions | 0.03 | 9.4 | 16.6 | 17.1 | 0.05 | - | - | - | 100 | ||
| average read-count of interactions | 18.1 | 2.9 | 1.1 | 1.5 | 11.8 | - | - | - | 1.2 | ||
| min read-count of cis interactions | 4 | 2 | 1 | 1 | 4 | - | - | - | 1 | ||
| median distance of cis interactions (kb) | 3 | 14 | 30832 | 29426 | 5 | - | - | - | 544 | ||
|
| |||||||||||
| percentage of significant interactions | 0.17 | 9.25 | 0.8 | 0.9 | 0.07 | 0.0095 | 0.0172 | 0.0091 | 100 | ||
| average read-count of interactions | 21.3 | 7.85 | 5.6 | 5.9 | 16.7 | 32.39 | 28.49 | 36.16 | 1.77 | ||
| min read-count of cis interactions | 3 | 2 | 1 | 1 | 3 | 11 | 10 | 14 | 1 | ||
| median distance of cis interactions (kb) | 130 | 85 | 5235 | 5014 | 245 | 110 | 145 | 165 | 15173 | ||
|
| |||||||||||
| percentage of significant interactions | 0.32 | 7.32 | 1.06 | 1.1 | 0.11 | 0.0094 | 0.0163 | 0.0066 | 100 | ||
| average read-count of interactions | 29.4 | 15.17 | 12.3 | 13.5 | 26.6 | 58.62 | 46.37 | 62.74 | 2.38 | ||
| min read-count of interactions | 3 | 2 | 2 | 2 | 5 | 11 | 9 | 16 | 1 | ||
| median distance of cis interactions (kb) | 270 | 170 | 7590 | 7426 | 480 | 150 | 230 | 330 | 6990 | ||
* HiCCUPS, Peakachu, and Mustache calls interactions with FDR < 0.05 as significant. For all other methods we used a threshold of P-value < 0.001 to identify significant interactions.
Fig 5Aggregate peak analysis (APA) of significant interactions called by MaxHiC and HiCCUPS.
Interactions were called at 10kb resolution using the GM12878 Hi-C dataset from Rao et al. The analysis shows the aggregate background interaction frequency of 5 bins upstream and downstream of the significantly interacting bins. For HiCCUPS the bins at the centroid of the loop regions are used. For HiCCUPS the APA for all 5,348 interactions is shown. For MaxHiC, APA plots are shown for the top 5k, 10k, 50k and all 243k significant interactions.
Fig 1Background modelling as a function of distance in MaxHiC and other HiC tools.
a) Average read-count of significant (blue) and insignificant interactions (red) at different genomic distances identified by different Hi-C interaction callers at 10kb bin size on the Rao et al. [5] GM12878 data. b) Number of significant interactions identified by the five Hi-C interaction callers in different genomic distances on Rao et al. GM12878 sample at fragment size 10kb.