| Literature DB >> 33334380 |
Kai Kruse1, Clemens B Hug1, Juan M Vaquerizas2,3.
Abstract
Chromosome conformation capture data, particularly from high-throughput approaches such as Hi-C, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we present FAN-C, an easy-to-use command-line tool and powerful Python API with a broad feature set covering matrix generation, analysis, and visualisation for C-like data ( https://github.com/vaquerizaslab/fanc ). Due to its compatibility with the most prevalent Hi-C storage formats, FAN-C can be used in combination with a large number of existing analysis tools, thus greatly simplifying Hi-C matrix analysis.Entities:
Keywords: Chromatin loops; Chromosomal compartments; Chromosome conformation capture; Hi-C; Hi-C analysis; Hi-C visualisation; Topologically associating domains (TAD)
Year: 2020 PMID: 33334380 PMCID: PMC7745377 DOI: 10.1186/s13059-020-02215-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Overview of FAN-C functionality. a Overview of Hi-C from an experimental (left) and computational (right) perspective. RE: restriction enzyme. b Matrix generation features. c Hi-C matrix analysis features. d Hi-C visualisation features. e Helper tools
Feature comparison of different Hi-C analysis tools. Tools included in the comparison are Cooler [38]/HiGlass [39], Juicer [37]/Juicebox [40], HOMER [41], HiC-Pro [42], HiC-bench [43], TADbit [44], HiFive [45], HicDat [46], HiCInspector [47], HiCUP, HiCExplorer [48, 49], and HiCeekR [50]. 1: Only for interactive plotting; 2: Support for Juicer and Cooler multi-resolution files, but no native support; 3: Cooler ecosystem includes pairtools, cooler, cooltools, HiGlass, and distiller; 4: In conjunction with Juicebox; 5: Provides instructions for mapping, but no dedicated command; 6: Visualisation through Treeview; 7: With export for Fit-Hi-C; 8: Through compatibility with HiCPlotter; 9: Via HiCNorm; 10: Fit-Hi-C, C-loops, and targeted virtual 5C (in-house); 11: Only pre-processing; 12: For interactive visualisation; 13: SAM/BAM visualisation through SeqMonk; 14: via pyGenomeTracks; 15: Only when previously marked in BAM file; 16: via spacewalk; 17: no dedicated function, but possible via API; 18: Via Galaxy; 19: Includes hierarchical clustering of TADs; 20: Personal communication by developers, not currently documented; 21: insulation and compartment scores; 22: via TADkit
| FAN-C | Cooler | Juicer | HOMER | HiC-Pro | HiC-bench | TADbit | HiFive | HicDat | HiC-inspector | HiCUP | HiCExplorer | HiCeekR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Command line | x | x | x | x | x | x | x | x | x | x | x | ||
| Programmatic access (API) | x | x | x | x | x | x | |||||||
| Graphical user interface (GUI) | 1 | 1 | 4 | 1 | 11 | 12 | 18 | x | |||||
| | |||||||||||||
| Juicer | x | x | 20 | ||||||||||
| Cooler | x | x | 20 | x | |||||||||
| Custom/Native | x | x | x | x | x | x | x | x | x | x | x | x | x |
| | |||||||||||||
| Juicer | x | x | x | x | |||||||||
| Cooler | x | x | x | x | x | ||||||||
| TXT file | x | x | x | x | x | x | x | x | x | ||||
| FASTQ | x | x | x | 5 | x | x | x | 5 | x | x | x | ||
| SAM/BAM | x | x | x | x | x | x | x | x | x | x | |||
| hiclib | x | ||||||||||||
| | |||||||||||||
| Juicer | x | x | x | x | 20 | x | |||||||
| Cooler | x | x | x | 20 | x | x | |||||||
| GInteractions | x | ||||||||||||
| TXT file | x | x | x | x | x | x | x | x | x | x | x | ||
| | |||||||||||||
| Simple mapping | x | x | 5 | x | x | x | 5 | x | x | 5 | |||
| Iterative mapping | x | x | |||||||||||
| Ligation junction split | x | x | x | x | x | x | |||||||
| | |||||||||||||
| Mapping Quality | x | x | x | x | x | x | x | ||||||
| Multi-mapping reads | x | x | x | x | x | x | x | x | |||||
| Restriction site distance | x | x | x | x | x | x | x | x | x | x | x | ||
| Ligation errors | x | x | x | x | x | x | x | x | x | x | |||
| Self-ligations | x | x | x | x | x | x | x | x | x | x | x | x | |
| PCR duplicates | x | x | x | x | x | x | x | x | x | x | 15 | ||
| Unusual read density | x | x | x | ||||||||||
| Quality statistics | x | x | x | x | x | x | x | x | x | x | x | ||
| | |||||||||||||
| Fragment-level Hi-C | x | x | x | x | x | x | x | x | x | ||||
| Equi-distant bins | x | x | x | x | x | x | x | x | x | x | x | x | |
| Multi-resolution Hi-C | 2 | x | x | 20 | x | x | |||||||
| Matrix balancing | x | x | x | x | x | x | x | x | x | x | x | ||
| Probabilistic normalisation | x | x | |||||||||||
| Matrix merge | x | x | x | x | x | x | x | ||||||
| Allele-specific matrices | x | x | x | ||||||||||
| Mixed restriction cut sites | x | x | x | x | x | ||||||||
| | |||||||||||||
| Minimum coverage | x | x | x | x | x | ||||||||
| Diagonal | x | x | x | x | |||||||||
| | |||||||||||||
| PCA (sample comparison) | x | 17 | 21 | ||||||||||
| Matrix fold-change | x | x | x | 20 | x | x | |||||||
| Matrix difference | x | 17 | x | 20 | x | ||||||||
| Score/feature comparisons | x | 17 | x | x | x | x | x | ||||||
| Correlations | 17 | x | x | x | x | x | x | ||||||
| | |||||||||||||
| Insulation score | x | x | x | x | x | ||||||||
| Directionality index | x | x | x | x | |||||||||
| Arrowhead | x | ||||||||||||
| TAD calling | x | x | x | x | 19 | x | |||||||
| | |||||||||||||
| HICCUPS | x | x | x | ||||||||||
| Other | x | 7 | 10 | x | |||||||||
| | |||||||||||||
| Expected values | x | x | x | x | x | x | x | x | x | x | |||
| AB compartments | x | x | x | x | 20 | x | x | x | x | ||||
| Aggregate Hi-C matrices | x | x | x | x | |||||||||
| 3D modelling | 16 | x | |||||||||||
| | |||||||||||||
| Compaction | x | ||||||||||||
| Hi-C matrix | x | x | x | 6 | 8 | 8 | 22 | x | x | x | 13 | x | x |
| Triangular Hi-C matrix | x | x | 6 | 8 | 8 | 22 | x | x | |||||
| Other genomic tracks | x | x | x | 8 | 8 | 22 | 14 | x | |||||
| Genes | x | x | x | 8 | 8 | 22 | |||||||
| Virtual 4C | x | x | x | 8 | |||||||||
Fig. 2FAN-C matrix generation. a-c Schematic overview of the matrix generation pipeline. a Mapping features. b Processing and filtering of Hi-C read pairs. c Assembly, filtering and normalisation of the Hi-C matrix from valid read pairs. d–f FAN-C statistics plots using data from HUVEC Hi-C [25]. d Ligation error plot as in [52, 53]. Dashed line indicates expected values. e Density plot of the sum of restriction site distances (insert size) measured from the mapping location of a read to the nearest restriction site. Dashed line indicates median insert size. f Summary statistics plot showing the read pairs removed by various filters. g Coverage plot of a Hi-C matrix binned at 1 kb resolution. Dashed line indicates the chosen coverage cutoff at 25% median coverage
Approximate runtimes of the FAN-C matrix generation pipeline. Data from the GM12878 B-lymphocyte dataset [25]. Runtimes are normalised to a single thread, and expressed as minutes per 100 million read pairs. All processing performed on a Linux SGE cluster with AMD Opteron Processor 6376
| Minutes/100 M read pairs (single thread) | |
|---|---|
| BWA mem, ligation junction split | 4061.95 |
| Loading + mappability, quality, and uniqueness filters | 264.03 |
| PCR duplicate, RE distance, ligation error filters | 224.15 |
| Fragment-level assembly | 55.72 |
| Merge | 91.09 |
| 1 Mb bins | 167.68 |
| 25 kb bins | 822.15 |
| 5 kb bins | 1455.1 |
Fig. 3FAN-C analysis features. All analyses performed on GM12878 cells [25] on the 10 kb resolution matrix, unless otherwise noted. a Schematic representation of the analysis types available for FAN-C, Cooler, and Juicer matrices. b Hi-C matrix plot of a sample region with 10 kb resolution. c Log-log “Distance decay” plot of the expected normalised contact frequency against locus distance. d Log2-observed/expected (O/E) matrix for the same region as in a. e 500 kb resolution correlation matrix/A/B compartment plot of chromosome 1 (top) and its first eigenvector (EV) (bottom). f “Saddle plot” showing preferential interactions of active/active and inactive/inactive regions (top), and bar plot showing the cutoffs used for binning regions by the corresponding EV entry magnitude (bottom). Note the outlier on the far right. g Aggregate TAD plot showing the average log2-O/E in and around arrowhead domains [25]. h Aggregate loop plot showing the average log2-O/E at peaks called by HICCUPS [25]. i–n Example region on chromosome 18 highlighting additional analyses available in FAN-C and the possibility of “genome browser” style plotting. i Triangular Hi-C matrix plot. j Line plot showing CTCF occupancy (fold-change over input) as measured by ChIP. Data from GEO:GSM733752. k Heatmap showing insulation scores calculated using different window sizes. l Insulation score track for a window size of 100 kb. m Heatmap showing directionality index results for multiple window sizes. n Directionality index track for a window size of 1 Mb. o Gene plot using data from Gencode (v19) [57]
Fig. 4FAN-C comparison workflow for neuronal differentiation. ESC, embryonic stem cells; NPC, neuronal precursor cells; CN, cortical neurons. a Saddle plots showing contacts relative to expectation among regions with different compartment eigenvector values (binned by 2% percentiles). b Compartment strength barplot. c Heatmap of insulation scores at all boundaries in ESC, NPC, and CN, sorted by insulation score in CN. d Heatmap of insulation scores at differentially insulated regions between ESC and CN. e Aggregate matrices of 1mb windows centred at all boundaries in ESC, NPC, and CN. f Aggregate matrices of 1mb windows centred at ESC-specific boundaries. g Example of a differentially insulated region at the Pbx1 locus, showing Hi-C matrices for ESC and CN, a difference Hi-C matrix of CN- ESC, insulation scores of ESC and CN at various window sizes, insulation score difference between ESC and CN, and genes in the region, coloured by strand (orange = forward, cyan = reverse)