| Literature DB >> 28973466 |
Rajendra Kumar1,2, Haitham Sobhy3, Per Stenberg3,4, Ludvig Lizana1,2.
Abstract
Hi-C experiments generate data in form of large genome contact maps (Hi-C maps). These show that chromosomes are arranged in a hierarchy of three-dimensional compartments. But to understand how these compartments form and by how much they affect genetic processes such as gene regulation, biologists and bioinformaticians need efficient tools to visualize and analyze Hi-C data. However, this is technically challenging because these maps are big. In this paper, we remedied this problem, partly by implementing an efficient file format and developed the genome contact map explorer platform. Apart from tools to process Hi-C data, such as normalization methods and a programmable interface, we made a graphical interface that let users browse, scroll and zoom Hi-C maps to visually search for patterns in the Hi-C data. In the software, it is also possible to browse several maps simultaneously and plot related genomic data. The software is openly accessible to the scientific community.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28973466 PMCID: PMC5622372 DOI: 10.1093/nar/gkx644
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Tools, along with their respective function, presently available in the gcMapExplorer package
| Tools | Function |
|---|---|
| Graphical User Interfaces (GUI) | |
| browser | To browse contact maps with genomic datasets |
| cmapImporter | To import or convert data files compatible with gcMapExplorer |
| cmapNormalizer | To normalize contact maps with various methods |
| h5Converter | To convert bigwig/wig/bed file to browser compatible hdf5 format |
| Commands for normalization of Hi-C maps | |
| normKR | Knight-Ruiz matrix balancing |
| normIC | Iterative-correction matrix balancing |
| normMCFS | Median contact frequency scaling |
| Commands to import contact map files | |
| coo2cmap | COO sparse matrix format to ccmap or gcmap formats |
| homer2cmap | HOMER Hi-C matrix format to ccmap or gcmap files |
| bc2cmap | Bin-Contact files pair to ccmap or gcmap format |
| pairCoo2cmap | Paired COO sparse matrix format to ccmap or gcmap formats |
| Commands to convert genomic track files | |
| bigwig2h5 | bigwig format to browser compatible hdf5 format |
| wig2h5 | wig format to browser compatible hdf5 format |
| bed2h5 | bed format to browser compatible hdf5 format |
| encode2h5 | Download and convert files from ENCODE portal |
Figure 1.Browser (left) and importer (right) GUI applications for browsing Hi-C maps and importing data, respectively. Several options in the browser are indicated as follows: (A) Browse along, up, down, left, right, diagonal and off-diagonal of maps; (B) Zoom in and out; (C) Go to a specific coordinate on the map; (D) Change spacing between plots; (E) List all plots; (F) Reset all maps; (G) Resolution and unit of currently active map; (H) Name of contact map; (I) Selectable settings for several options. In the presented state, a colormap has been selected and thus, its options are displayed on the interface. Other available options which are not shown here can be used for genomic dataset plots and markers. (J) Real time status of mouse pointer with contact frequency. At the right side, a graphical user interface (GUI) interface for importing data are shown, with options for converting files with a COO format to either genome contact map (gcmap) or chromosomal contact map (ccmap) files. A menu for selecting the input format is displayed at the top of the window.
Figure 2.A comparison of two maps by visualization. Differences in the contact map patterns at the MEIS1 gene locus in GM12787 (left) and K562 (right) cell lines are shown. Gene expression (RNA-seq) and active histone modifications (H3K9ac and H3K27ac, ChIP-seq) (shown below) also differ between the two cell lines at the same locus.
Figure 3.A comparison of normalization methods. An example is shown to demonstrate differences between normalized maps and raw maps at a resolution of 20 kb.
Figure 4.File format of genome contact maps and genomic datasets. (A) Contact maps (2D-array as magenta) for different chromosomes at various resolutions, along with their attributes. Each map includes attributes such as minimum and maximum contact values, resolution, map shape, while the chromosome’s attributes are names along the X- and Y-axes. (B) Genomic dataset file containing several datasets (1D-array as yellow) at different resolutions. (C) A comparison of contact map file sizes that are in different formats. Sizes for Knight and Ruiz (KR) normalized maps and raw observed maps are shown separately.