| Literature DB >> 26328929 |
Ferhat Ay1,2, William S Noble3,4.
Abstract
The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome. In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets. Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26328929 PMCID: PMC4556012 DOI: 10.1186/s13059-015-0745-7
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Overview of Hi-C analysis pipelines. These pipelines start from raw reads and produce raw and normalized contact maps for further interpretation. The colored boxes represent alternative ways to accomplish a given step in the pipeline. RE, restriction enzyme. At each step, commonly used file formats (‘.fq’, ‘.bam’, and ‘.txt’) are indicated. a, The blue, pink and green boxes correspond to pre-truncation, iterative mapping and allowing split alignments, respectively. b, Several filters are applied to individual reads. c, The blue and pink boxes correspond to strand filters and distance filters, respectively. d, Three alternative methods for normalization
Fig. 2Impact of normalization on Hi-C contact maps. a, b Hi-C contact maps of chromosome 8 from the schizont stage of the parasite Plasmodium falciparum [16] at 10 kb resolution before and after normalization. Blue dashed lines represent the centromere location. c, d Density scatter plots of counts before (x-axis) and after (y-axis) normalization of Hi-C data from the human cell line IMR90 [15] at two different resolutions. Correlation values are computed using all intra-chromosomal contacts within human chromosome 8. Only a subset of points are shown for visualization purposes
Software tools for Hi-C data analysis
| Tool | Short-read | Mapping | Read | Read-pair | Normalization | Visualization | Confidence | Implementation |
|---|---|---|---|---|---|---|---|---|
| aligner(s) | improvement | filtering | filtering | estimation | language(s) | |||
| HiCUP [ | Bowtie/Bowtie2 | Pre-truncation | ✓ | ✓ | − | − | − | Perl, R |
| Hiclib [ | Bowtie2 | Iterative | ✓ | ✓ | Matrix balancing | ✓ | − | Python |
| HiC-inspector [ | Bowtie | − | ✓ | ✓ | − | ✓ | − | Perl, R |
| HIPPIE [ | STAR | ✓ | ✓ | ✓ | − | − | − | Python, Perl, R |
| HiC-Box [ | Bowtie2 | − | ✓ | ✓ | Matrix balancing | ✓ | − | Python |
| HiCdat [ | Subread | − | ✓ | ✓ | Three options | ✓ | − | C++, R |
| HiC-Pro [ | Bowtie2 | Trimming | ✓ | ✓ | Matrix balancing | − | − | Python, R |
| TADbit [ | GEM | Iterative | ✓ | ✓ | Matrix balancing | ✓ | − | Python |
| HOMER [ | − | − | ✓ | ✓ | Two options | ✓ | ✓ | Perl, R, Java |
| Hicpipe [ | − | − | − | − | Explicit-factor | − | − | Perl, R, C++ |
| HiBrowse [ | − | − | − | − | − | ✓ | ✓ | Web-based |
| Hi-Corrector [ | − | − | − | − | Matrix balancing | − | − | ANSI C |
| GOTHiC [ | − | − | ✓ | ✓ | − | − | ✓ | R |
| HiTC [ | − | − | − | − | Two options | ✓ | ✓ | R |
| chromoR [ | − | − | − | − | Variance stabilization | − | − | R |
| HiFive [ | − | − | ✓ | ✓ | Three options | ✓ | − | Python |
| Fit-Hi-C [ | − | − | − | − | − | ✓ | ✓ | Python |
aHiclib keeps the reads with only one mapped end (single-sided reads) for use in coverage computations
bHIPPIE states that it rescues chimeric reads. No details are given
cHiCdat reports no substantial improvement in successfully aligned read pairs when iterative mapping in Hiclib is used for Arabidopsis thaliana Hi-C data
dHiCdat provides three options for normalization: coverage and distance correction, HiCNorm and ICE
eHOMER provides two options for normalization: simpleNorm corrects for sequencing coverage only and norm corrects for coverage plus the genomic distance between loci
fHiTC provides two options for normalization: normLGF implements HiCNorm and normICE implements ICE algorithm from Hiclib
gHiFive provides three options - Probability, Express, and Binning - for normalization. The Express and Binning algorithms correspond to matrix balancing and explicit-factor correction schemes, respectively
Fig. 3Visualization of Hi-C data. An Epigenome Browser snapshot of a 4 Mb region of human chromosome 10. Top track shows Refseq genes. All other tracks display data from the human lymphoblastoid cell line GM12878. From top to bottom these tracks are: smoothed CTCF signal from ENCODE [130]; significant contact calls by Fit-Hi-C using 1 kb resolution Hi-C data (only the contacts >50 kb distance and − log(p-value) ≤25 are shown) [20]; arrowhead domain calls at 5 kb resolution [18]; Armatus multiscale domain calls for three different values of the domain-length scaling factor γ [87]; DI HMM TAD calls at 50 kb resolution [15]; and the heatmap of 10 kb resolution normalized contact counts for GM12878 Hi-C data [18]. The color scale of the heatmap is truncated to the range 20 to 400, with higher contact counts corresponding to a darker color