| Literature DB >> 30570701 |
Koustav Pal1, Mattia Forcato2, Francesco Ferrari3,4.
Abstract
In the epigenetics field, large-scale functional genomics datasets of ever-increasing size and complexity have been produced using experimental techniques based on high-throughput sequencing. In particular, the study of the 3D organization of chromatin has raised increasing interest, thanks to the development of advanced experimental techniques. In this context, Hi-C has been widely adopted as a high-throughput method to measure pairwise contacts between virtually any pair of genomic loci, thus yielding unprecedented challenges for analyzing and handling the resulting complex datasets. In this review, we focus on the increasing complexity of available Hi-C datasets, which parallels the adoption of novel protocol variants. We also review the complexity of the multiple data analysis steps required to preprocess Hi-C sequencing reads and extract biologically meaningful information. Finally, we discuss solutions for handling and visualizing such large genomics datasets.Entities:
Keywords: Chromatin 3D architecture; Chromosome conformation capture; Computational biology; Epigenomics; High-throughput sequencing
Year: 2018 PMID: 30570701 PMCID: PMC6381366 DOI: 10.1007/s12551-018-0489-1
Source DB: PubMed Journal: Biophys Rev ISSN: 1867-2450
Hi-C studies over the past decade that marked forward leaps in resolution or dataset size
| Study | Organism | Restriction enzyme | Hi-C protocol | Read pairs | Max. binning res. | |||
|---|---|---|---|---|---|---|---|---|
| 6 bp | 4 bp | |||||||
| HindIII | NcoI | DpnII | MboI | |||||
| Lieberman-Aiden et al. | Human | ✓ | ✓ | Dilution | 30M | 1 mb | ||
| Sexton et al. | Drosophila | ✓ | Simplified | 362.7M | 10 kb | |||
| Dixon et al. | Human, mouse | ✓ | Dilution | 806.1M | 40 kb | |||
| Jin et al. | Human | ✓ | Dilution | 2.9B | 5 kb | |||
| Rao et al. | Human, mouse | ✓ | ✓ | In situ | 6.5B | 950 bp | ||
| Rao et al. | Human | ✓ | In situ | 6.8B | 5 kb | |||
| Bonev et al. | Mouse | ✓ | In situ | 7.3B | 850 bp | |||
| Wang et al. | Drosophila | ✓ | In situ | 1.5B | frag. | |||
The table reports the original publication (study), organisms examined, restriction enzymes used, protocol variation, maximum number of read pairs sequenced per sample, and maximum binning resolution used in the analyses presented by the original authors. The size of restriction sites (6 bp or 4 bp) is also indicated
M is for million read pairs, B is for billion read pairs, frag. is for fragment level analysis
Fig. 1Hi-C data, from generation to contact matrix. The figure shows a schematic representation of Hi-C data analysis, starting from a cartoon depicting cross-linked chromatin and a prototypic pair of mate reads positioned on the restriction fragments from which they originate. Raw sequencing paired-end reads (in FASTQ files) are aligned to the reference genome considering the mate reads independently. Then, aligned reads (in BAM files) are assigned to their fragment of origin and paired. The paired reads are stored in a sorted file that can be in either plain text, indexed text (pairix), or binary (e.g., HDF) formats, depending on the pipeline. Finally, after filtering and binning, the read counts are stored in contact matrix files, including plain text (e.g., 2D or sparse matrix) or binary (e.g., hic or cool) file formats
List of tools for downstream analyses on Hi-C data
| Method | Compartment | TAD | Interaction | Visualization | Input format | Reference | |
|---|---|---|---|---|---|---|---|
| Text | Binary | ||||||
| CScoreTool | ✓ | RP | Zheng and Zheng | ||||
| HiTC | ✓ | ✓ | SM | Servant et al. | |||
| HOMER | ✓ | ✓ | ✓ | RP | Heinz et al. | ||
| Juicer (HiCCUPS, Arrowhead, Juicebox) | ✓ | ✓ | ✓ | ✓ | hic | Durand et al. | |
| 4D NAT | ✓ | ✓ | 2D,SM | HDF | Seaman and Rajapakse | ||
| 3DNetMod | ✓ | SM | Norton et al. | ||||
| Armatus | ✓ | 2D | Filippova et al. | ||||
| CaTCH_R | ✓ | SM | Zhan et al. | ||||
| ClusterTAD | ✓ | 2D | Oluwadare and Cheng | ||||
| domainCaller | ✓ | 2D | Dixon et al. | ||||
| deDoc | ✓ | 2D | Li et al. | ||||
| HiCDB | ✓ | 2D | Chen et al. | ||||
| HiTAD | ✓ | SM | Wang et al. | ||||
| HiCseg | ✓ | 2D | Lévy-Leduc et al. | ||||
| IC-Finder | ✓ | 2D | Haddad et al. | ||||
| InsulationScore | ✓ | 2D | Crane et al. | ||||
| Lavaburst | ✓ | 2D | Schwarzer et al. | ||||
| MrTADFinder | ✓ | SM | Yan et al. | ||||
| Matryoshka | ✓ | 2D | Malik and Patro | ||||
| TADBit | ✓ | 2D | Serra et al. | ||||
| TADTree | ✓ | 2D | Weinreb and Raphael | ||||
| TADtool | ✓ | ✓ | 2D,SM | npy | Kruse et al. | ||
| CHiCAGO | ✓ | BAM | Cairns et al. | ||||
| diffHiC | ✓ | ✓ | HDF | Lun and Smyth | |||
| FastHiC | ✓ | SM | Xu et al. | ||||
| Fit-Hi-C | ✓ | SM | Ay et al. | ||||
| GOTHiC | ✓ | RP | Mifsud et al. | ||||
| HIPPIE | ✓ | Hwang et al. | |||||
| PSYCHIC | ✓ | 2D | Ron et al. | ||||
| 3D Genome Browser | ✓ | RP,2D,SM | BUTLR | Wang et al. | |||
| HiCExplorer | ✓ | ✓ | RP | Ramírez et al. | |||
| HiGlass | ✓ | HDF | Kerpedjiev et al. | ||||
| gcMapExplorer | ✓ | HDF | Kumar et al. | ||||
| WashU Epigenome Browser | ✓ | hic, HDF | Zhou et al. | ||||
The table report the list of methods with their reference name, the capability of each tool in term of calling compartments, TADs, interactions, or for visualizing data. Tools are grouped based on their main focus in term of analysis type (compartments, TADs, interactions, calling, and visualization), and within each group are sorted alphabetically by tool name. The format of input data is reported, by specifying if the tools accepts text or binary input file formats. The last column reports the reference publication for each tool
Read pairs (RP), 2D matrix (2D), sparse matrix (SM) and python numpy matrix (npy) file formats