| Literature DB >> 31712779 |
Ye Zheng1, Sündüz Keleş2,3.
Abstract
The ability to simulate high-throughput chromatin conformation (Hi-C) data is foundational for benchmarking Hi-C data analysis methods. Here we present a nonparametric strategy named FreeHi-C to simulate Hi-C data from the interacting genome fragments. Data from FreeHi-C exhibit high fidelity to biological Hi-C data. FreeHi-C boosts the precision and power of differential chromatin interaction detection through data augmentation under preserved false discovery rate control.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31712779 PMCID: PMC8136837 DOI: 10.1038/s41592-019-0624-3
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1FreeHi-C enables simulating high fidelity Hi-C data.
a. FreeHi-C simulation workflow. Black arrows connect the processing procedures, and grey arrows show the data flow. b. Hierarchical clustering of the original Hi-C biological replicates and the FreeHi-C simulated replicates for the ring, trophozoite, schizont stages of P. falciparum. Heatmap clustering is obtained with the inherited R function hcluster in the pheatmap package using the default parameters. Distance is quantified by HiCRep[13]. c. Pearson correlation analysis of the A/B compartment eigenvector between the seed biological replicate (delineated at the top of each panel) and FreeHi-C simulation of 1 × sequencing depth, 5 × original sequencing depth, and other biological replicates. A/B compartment eigenvector is calculated by CscoreTool[19] (n = 3). d. Jaccard index of the TADs detected using the seed biological replicate (delineated at the top of each panel) and FreeHi-C simulation of the 1 × sequencing depth, 5 × original sequencing depth, and other biological replicates. TAD boundaries are detected using the Insulation Score[20] (n = 3). e and f. Hierarchical clustering of the FreeHi-C simulated (e) and downsampled (f) replicates matching the sequencing depth of the original P. falciparum trophozoite stage sample. Distance is calculated by HiCRep[13]. g. HiCRep[13] reproducibility of the contact matrices between pairs of biological replicates of GM12878 simulated by FreeHi-C (orange) or downsampled (purple) to 0.5 × sequencing depth of replicate6, 1 × sequencing depth of replicate6, and sequencing depths of replicate4, replicate2, and then replicate3, respectively (n = 4). In c, d, and g, the center lines indicate medians, box limits indicate the 25th and 75th percentiles. The upper whisker extends from the hinge to the largest value no further than 1.5 × inter-quartile ranges from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5 × inter-quartile of the hinge. Data beyond the end of the whiskers are outlying points and are plotted individually.
Figure 2Data augmentation with FreeHi-C simulated replicates improves differential chromatin interactions (DCIs) detection.
a (n = 16) and b (n = 16) refer to one replicate per condition (ORPC). c (n = 3) and d (n = 16) refer to multiple replicates per condition (MRPC) settings. a (n = 16) and c (n = 3) delineate observed false discovery rates of within-sample comparisons for A549 data (i.e., comparisons of replicate(s) of A549 with other replicate(s) of A549). The dashed lines are y = x. b (n = 16) and d (n = 16) display precision, computed as the percentage of top significant DCIs of each specific analysis in the gold standard differential chromatin interaction list, as a function of top-ranking DCIs. The gold standard set is defined by comparing the full set of 4 replicates of GM12878 with 4 replicates of A549 filtered by FDR ≤ 0.01. |logFC| refers to the absolute value of natural log transformed fold-change. Differential chromatin interaction detection is performed by HiCcompare[10], by converting the normalized contact counts into Z-scores, and multiHiCcompare[17], using a quasi-likelihood negative binomial generalized log-linear model (one-sided test). The p-values are adjusted by Benjamini-Hochberg procedure[18] for multiple comparisons. For all the boxplots in this figure, the center lines correspond to the medians, box limits correspond to the 25th and 75th percentiles and whiskers comprise all data points within 1.5 × the inter-quartile range.