| Literature DB >> 30064362 |
John C Stansfield1, Kellen G Cresswell1, Vladimir I Vladimirov2, Mikhail G Dozmorov3.
Abstract
BACKGROUND: Changes in spatial chromatin interactions are now emerging as a unifying mechanism orchestrating the regulation of gene expression. Hi-C sequencing technology allows insight into chromatin interactions on a genome-wide scale. However, Hi-C data contains many DNA sequence- and technology-driven biases. These biases prevent effective comparison of chromatin interactions aimed at identifying genomic regions differentially interacting between, e.g., disease-normal states or different cell types. Several methods have been developed for normalizing individual Hi-C datasets. However, they fail to account for biases between two or more Hi-C datasets, hindering comparative analysis of chromatin interactions.Entities:
Keywords: Chromosome conformation capture; Comparison; Differential analysis; Hi-C; HiCcompare; Normalization
Mesh:
Substances:
Year: 2018 PMID: 30064362 PMCID: PMC6069782 DOI: 10.1186/s12859-018-2288-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1HiCcompare flow chart. Processed Hi-C libraries in the form of sparse upper triangular matrices are the starting data type for HiCcompare. Data is then plotted on the MD plot, and a loess model is fit to remove bias between the libraries. Next, the filtering threshold needs to be determined. Finally, the libraries can be compared for differences and plotted again on the MD plot
Fig. 2Distance-centric (off-diagonal) view of chromatin interaction matrices. Each off-diagonal vector of interaction frequencies represents interactions at a given distance between pairs of regions. Triangles mark pairs of genomic regions interacting at the same distance. Data for chromosome 1, K562 cell line, 50 KB resolution, spanning 0–7.5 Mb is shown
Fig. 3MD plot data visualization and the effects of different normalization techniques. MD plots of the differences M between two replicated Hi-C datasets (GM12878 cell line, chromosome 11, 1 MB resolution, DpnII and MboI restriction enzymes) plotted vs. distance D between interacting regions. a Before normalization, b after loess joint normalization, c ChromoR, d Iterative Correction and Eigenvector decomposition (ICE), e Knight-Ruiz (KR), f Sequential Component Normalization (SCN). The general shift of the data above M = 0 is due to one of the Hi-C libraries having more total reads. The trends emphasized by the loess curve imposed on the data are due to distance dependent between-dataset biases which only HiCcompare’s joint normalization procedure can successfully remove
Evaluation of the effect of normalization on differential chromatin interaction detection
| Fold change | HiCcompare | MA | ICE | SCN | KR | ChromoR |
|---|---|---|---|---|---|---|
| 2 | 0.847 | 0.823 | 0.835 | 0.768 | 0.748 | 0.149 |
| 3 | 0.973 | 0.934 | 0.802 | 0.721 | 0.764 | 0.380 |
| 4 | 0.995 | 0.98 | 0.953 | 0.881 | 0.868 | 0.532 |
Matthews Correlation Coefficient of detecting 200 controlled differences in jointly (HiCcompare) vs. individually normalized Gm12878 datasets, chromosome 1, 1 MB resolution. Matrices were normalized with methods corresponding to column labels; differences were detected using HiCcompare
Gene enrichment results for HiCcompare analyses
| Pathway | 1 MB | 100 KB | 50 KB |
|---|---|---|---|
| Systemic lupus erythematosus | 3.807e-06 | 6.302e-17 | 1.025e-02 |
| Antigen processing and presentation | 3.807e-06 | 6.808e-01 | 9.974e-01 |
| 8.170e-03 | 2.354e-01 | 7.604e-01 | |
| Viral myocarditis | 8.170e-03 | 1.038e-01 | 9.657e-01 |
| Allograft rejection | 8.170e-03 | 1.518e-01 | 9.974e-01 |
| Viral carcinogenesis | 3.327e-02 | 3.659e-08 | 3.273e-01 |
| Pathways in cancer | 9.162e-01 | 2.236e-02 | 9.409e-01 |
KEGG pathways and their corresponding FDR-corrected p-values for the enrichment analyses of HiCcompare-detected differences at 1 MB, 100 KB, and 50 KB resolutions. Differentially interacting regions detected by HiCcompare were intersected with gene locations, and the overlapping genes were tested for enrichment using EnrichR [37]
Similarity between A/B compartments detected following various normalization methods
| Comparison | Mean Absolute Correlation | Mean Percentage | Jaccard A | Jaccard B |
|---|---|---|---|---|
| Loess vs. Raw | 0.9954 | 0.8537 | 0.7971 | 0.7823 |
| MA vs. Raw | 0.9950 | 0.8539 | 0.7881 | 0.7706 |
| ICE vs. Raw | 0.9795 | 0.7850 | 0.6731 | 0.6277 |
| KR vs. Raw | 0.9489 | 0.7771 | 0.5945 | 0.5000 |
| SCN vs. Raw | 0.9309 | 0.8083 | 0.6134 | 0.5495 |
| ChromoR vs. Raw | 0.8093 | 0.6810 | 0.5210 | 0.4803 |
“Correlation” - Pearson correlation coefficient between principal components defining A/B compartments in raw vs. normalized Hi-C data; “Prop. Match Sign” - the proportion of regions with matching signs defining A/B compartments; “Jaccard A/B” - Jaccard overlap statistics between A/B compartments, respectively. All values represent averages over all chromosomes