Literature DB >> 28493881

shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics.

Bohdan B Khomtchouk^1,2, James R Hennessy³, Claes Wahlestedt^1,2.

Abstract

BACKGROUND: Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps.
RESULTS: We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed.
CONCLUSIONS: shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.

Entities: Chemical Disease Species

Mesh：

Year: 2017 PMID： 28493881 PMCID： PMC5426587 DOI： 10.1371/journal.pone.0176334

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Heatmap software can be generally classified into two categories: static heatmap software [1-9] and interactive heatmap software [10-20]. Static heatmaps are pictorially frozen snapshots of genomic activity displayed as colored images generated from the underlying data. Interactive heatmaps are dynamic palettes that allow users to zoom in and out of the contents of a heatmap to investigate a specific region, cluster, or even single gene while, at the same time, being able to hover the mouse pointer over any specific row and column entry in order to glean information about an individual cell’s contents (e.g., gene name, expression level, and column name). Interactive heatmaps are especially important for visualizing large gene expression datasets wherein individual gene labels eventually become unreadable due to text overlap, a common drawback seen in static heatmaps of large input data matrices. As such, interactive heatmaps are popular for examining the entire landscape of a large gene expression dataset while, at the same time, allowing users to zoom into specific sectors of the heatmap to visualize them in a magnified manner (i.e., at various resolution levels). Currently, there is a pressing need for modern libraries that are able to visually scale millions of data points at various resolutions [21]. In general, new software infrastructure that facilitates interactive navigation and smooth scaling at different resolution levels is necessary for on-the-fly calculations of both the frontend and backend algorithms in big data visualization software [22]. Even though static heatmaps are still the preferred type of publication figure in many studies, interactive heatmaps are becoming increasingly adopted by the scientific community to emphasize and visualize specific sectors of a dataset, where individual numerical values are rendered as user-specified colors. As a whole, the concept of interactivity is gradually shifting the heatmap visualization field into data analytics territory, for example, by synergizing interactive heatmap software with integrated statistical and genomic analysis suites such as PCA, differential expression, gene ontology, and network analysis [18, 23]. However, currently existing interactive heatmap software are limited by implicit restrictions on file input size, which functionally constrains their range of utility. For example, in Clustviz [23], which employs the pheatmap R package [9] for heatmap generation, input datasets larger than 1000 rows are discouraged [24] for performance reasons. Similarly, in MicroScope, the user is prompted to perform differential expression analysis on the input dataset first, thereby shrinking the number of rows rendered in the interactive heatmap to encompass only statistically significant genes [18]. In general, the standard way of thinking has been to avoid the production of big heatmaps due to a combination of various factors such as poor readability, as static heatmaps are not zoomable; computational infeasibility, since large interactive heatmaps require supercomputer-level memory resources to perform efficient, lag-free zooming and panning [25-31]; and unclear interpretation, since large heatmaps contain so much information that the standard recommended approach has been to preemptively subset the input data matrix into a smaller size [32]. Nevertheless, NGS-driven research studies often produce datasets on the order of 104 rows (e.g., transcriptome studies such as the HTA 2.0 array [33] that have up to 400,000 rows, each representing individual exons). Likewise, single-cell RNA-seq studies often produce datasets ranging from several thousand to several hundred thousand cells [34, 35], posing significant computational challenges to efficient data visualization. Currently, interactively visualizing such big data is not possible using existing state-of-the-art methodologies, despite existing efforts in this direction [36, 37]. Unlocking the computational ability to visualize interactive heatmaps on such unprecedented size scales would allow researchers to investigate high-dimensional numerical data as a colored grid of cells that is easily zoomable to any desired resolution, thereby aiding the exploratory data analysis process. With the advent of increasingly sophisticated interactive heatmap software and the rise of big data coupled with a growing community interest to examine it interactively, there has arisen an unmet and pressing need to address the computational limitations that hinder the production of large, interactive heatmaps. Examining such heatmaps would be valuable for visualizing the landscape of both global gene expression patterns as well as individual genes. Motivated to address these objectives, we propose an ultra fast and low memory user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive heatmaps in a web browser.

Materials and methods

shinyheatmap is hosted online as an R Shiny web server application. shinyheatmap may also be run locally from within R Studio, as shown here: https://github.com/Bohdan-Khomtchouk/shinyheatmap. shinyheatmap leverages the cumulative utility of R’s heatmaply [36], shiny [38], data.table [39], and gplots [40] libraries to create a cohesive web browser-based software experience requiring absolutely no programming experience from the user, or even the need to download R on a local computer. This kind of user-friendliness is geared towards the broader biological community, but will also appeal to the bioinformatics and computational biology communities. In contrast to most existing state-of-the-art heatmap software, shinyheatmap provides users with an extensive array of user-friendly hierarchical clustering methods, both in the form of multiple distance metrics as well as various linkage algorithms. This is especially useful for exploratory data analysis, particularly when the underlying data structure is unknown [41]. Since the choice of distance measure and linkage algorithm will directly influence the hierarchical clustering results, it is recommended to try different hierarchical clustering settings during analysis [41]. Agglomerative hierarchical clustering algorithms and their properties are described in detail at [42-46]. For the static heatmap generation, shinyheatmap employs the heatmap.2 function of the gplots library. For the interactive heatmap generation, shinyheatmap employs the heatmaply R package, which directly calls the plotly.js engine, in order to create fast, interactive heatmaps from large input datasets. The heatmaply R package is a descendent of the d3heatmap R package [47], which successfully creates advanced interactive heatmaps but is incapable of handling large inputs (e.g., 2000+ rows) due to memory considerations. As such, heatmaply constitutes a much-needed performance upgrade to d3heatmap, one that is made possible by the plotly R package [48], which itself relies on the sophisticated and complex plotly.js engine [49]. Therefore, it is the technical innovations of the plotly.js source code that make drawing extremely large heatmaps both a fast and efficient process. However, heatmaply also adds certain features not present in either the plotly.js engine nor the plotly R package, namely the ability to perform advanced hierarchical clustering and dendrogram-side zooming. Despite these advantages, heatmaply is inadequate for plotting large datasets beyond a certain size limit, even with computationally expensive operations like hierarchical clustering disabled; for instance in certain cases, simple input matrices as small as 5000 × 5 may pose users with severe efficiency problems during heatmap rendering and zooming, even with no clustering present [37]. Due to this limitation, we developed a high performance web plug-in to shinyheatmap, called fastheatmap [50], which can rapidly plot interactive heatmaps of datasets as large as 105—107 rows within seconds directly in a web browser. Zooming in and out of such extremely large heatmaps is achievable in milliseconds, in contrast to d3heatmap or heatmaply, which takes minutes or even hours, if it is possible at all (due to memory limitations). This constitutes an unprecedented performance benchmark that dominantly positions shinyheatmap and its high performance computing server, fastheatmap, at the leading forefront of big data genomics heatmap visualization technology. In fact, to the best of our knowledge, the shinyheatmap/fastheatmap duo is the first big data software to appear on the biological heatmap visualization scene. All source code from the fastheatmap project is made publicly available at: https://github.com/Bohdan-Khomtchouk/fastheatmap.

Results

To use shinyheatmap, input data must be in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix denotes how many reads (or fragments, for paired-end RNA-seq) have been unambiguously assigned to gene i in sample j [51]. Analogously, for other types of assays, the rows of the matrix might correspond e.g., to binding regions (with ChIP-seq), species of bacteria (with metagenomic datasets), or peptide sequences (with quantitative mass spectrometry). For detailed usage considerations, shinyheatmap provides a convenient Instructions tab panel upon login. Upon uploading the input dataset, both static and interactive heatmaps are automatically created, each in their own respective tab panel. The user can then proceed to customize the static heatmap through a suite of available parameter settings located in the sidebar panel (Fig 1). For example, hierarchical clustering, color schemes, scaling, color keys, trace, and font size can all be set to the specifications of the user. In addition, a download button is provided for users to save publication quality heatmap figures. Likewise, the user can customize the interactive heatmap through its own respective hoverable toolbar panel located at the upper right corner of the heatmap (Fig 2). This toolbar provides extensive download, zoom, pan, lasso and box select, autoscale, reset, and hover features for interacting with the heatmap. Users with large input datasets will be directed by shinyheatmap to its fastheatmap plug-in by way of a user-friendly message that automatically recognizes the dimensions of the input data matrix (Fig 3). Performance benchmarks indicate (Fig 4) that fastheatmap significantly outperforms the latest state-of-the-art interactive heatmap software by several orders of magnitude. All benchmarks were tested on a 64-bit Windows 10 Pro desktop machine with 16.0 GB of RAM and an Intel(R) Core(TM) i7-5820K CPU at 3.30 GHz.

Fig 1

shinyheatmap static heatmap.

shinyheatmap UI showcasing the visualization of a static heatmap generated from a large input dataset. Parameters such as hierarchical clustering (including options for distance metrics and linkage algorithms), color schemes, scaling, color keys, trace, and font size can all be set by the user. Progress bars appear during the heatmap rendering process to alert the user if any technical issues may arise. Sample input files of various sizes are provided as part of the web application, whose source code can be viewed on Github.

Fig 2

shinyheatmap interactive heatmap.

shinyheatmap UI showcasing the visualization of an interactive heatmap generated from a large input dataset. An embedded panel that appears top right on-hover provides extensive download, zoom, pan, lasso and box select, autoscale, reset, and other features for interacting with the heatmap.

Fig 3

fastheatmap & shinyheatmap are linked together.

A) shinyheatmap contains an auto-detector that detects the size of a user’s input matrix and, if the input matrix is too large, the user will be provided with a direct link to access shinyheatmap’s high performance computing server: fastheatmap. B) fastheatmap UI upon clicking on the URL link shown in Panel A.

Fig 4

shinyheatmap performance benchmarks.

shinyheatmap’s HPC plug-in, fastheatmap, performs >100000 faster than other state-of-the-art interactive heatmap software. “Number of Rows” denotes the number of rows in the input file, “inf” (infinity) denotes a system crash due to memory overload, “s” denotes seconds, “min” denotes minutes, and “ms” denotes milliseconds.

shinyheatmap static heatmap.

shinyheatmap interactive heatmap.

fastheatmap & shinyheatmap are linked together.

shinyheatmap performance benchmarks.

Conclusion

We provide access to a user-friendly web application designed to quickly and efficiently create static and interactive heatmaps within the R programming environment, without any prerequisite programming skills required of the user. Our software tool aims to enrich the genomic data exploration experience by providing a variety of customization options to investigate large input datasets.

24 in total

Review 1. Computational analysis of microarray data.

Authors: J Quackenbush
Journal: Nat Rev Genet Date: 2001-06 Impact factor: 53.242

2. Java Treeview--extensible visualization of microarray data.

Authors: Alok J Saldanha
Journal: Bioinformatics Date: 2004-06-04 Impact factor: 6.937

3. Molecular Property eXplorer: a novel approach to visualizing SAR using tree-maps and heatmaps.

Authors: Christopher Kibbey; Alain Calvet
Journal: J Chem Inf Model Date: 2005 Mar-Apr Impact factor: 4.956

4. GenePattern 2.0.

Authors: Michael Reich; Ted Liefeld; Joshua Gould; Jim Lerner; Pablo Tamayo; Jill P Mesirov
Journal: Nat Genet Date: 2006-05 Impact factor: 38.330

5. RNA-Seq workflow: gene-level exploratory analysis and differential expression.

Authors: Michael I Love; Simon Anders; Vladislav Kim; Wolfgang Huber
Journal: F1000Res Date: 2015-10-14

6. HeatMapper: powerful combined visualization of gene expression profile correlations, genotypes, phenotypes and sample characteristics.

Authors: Roel G W Verhaak; Mathijs A Sanders; Maarten A Bijl; Ruud Delwel; Sebastiaan Horsman; Michael J Moorhouse; Peter J van der Spek; Bob Löwenberg; Peter J M Valk
Journal: BMC Bioinformatics Date: 2006-07-12 Impact factor: 3.169

7. ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap.

Authors: Tauno Metsalu; Jaak Vilo
Journal: Nucleic Acids Res Date: 2015-05-12 Impact factor: 16.971

Review 8. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.

Authors: Georgios A Pavlopoulos; Dimitris Malliarakis; Nikolas Papanikolaou; Theodosis Theodosiou; Anton J Enright; Ioannis Iliopoulos
Journal: Gigascience Date: 2015-08-25 Impact factor: 6.524

9. Heatmapper: web-enabled heat mapping for all.

Authors: Sasha Babicki; David Arndt; Ana Marcu; Yongjie Liang; Jason R Grant; Adam Maciejewski; David S Wishart
Journal: Nucleic Acids Res Date: 2016-05-17 Impact factor: 16.971

10. MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps.

Authors: Bohdan B Khomtchouk; James R Hennessy; Claes Wahlestedt
Journal: BMC Bioinformatics Date: 2016-09-22 Impact factor: 3.169

33 in total

1. Metagenomic profiling of rhizosphere microbial community structure and diversity associated with maize plant as affected by cropping systems.

Authors: Ayomide Emmanuel Fadiji; Jerry Onyemaechi Kanu; Olubukola Oluranti Babalola
Journal: Int Microbiol Date: 2021-03-05 Impact factor: 2.479

2. Firefly genomes illuminate parallel origins of bioluminescence in beetles.

Authors: Timothy R Fallon; Sarah E Lower; Ching-Ho Chang; Manabu Bessho-Uehara; Gavin J Martin; Adam J Bewick; Megan Behringer; Humberto J Debat; Isaac Wong; John C Day; Anton Suvorov; Christian J Silva; Kathrin F Stanger-Hall; David W Hall; Robert J Schmitz; David R Nelson; Sara M Lewis; Shuji Shigenobu; Seth M Bybee; Amanda M Larracuente; Yuichi Oba; Jing-Ke Weng
Journal: Elife Date: 2018-10-16 Impact factor: 8.140

3. Metagenomic profiling of the community structure, diversity, and nutrient pathways of bacterial endophytes in maize plant.

Authors: Ayomide Emmanuel Fadiji; Ayansina Segun Ayangbenro; Olubukola Oluranti Babalola
Journal: Antonie Van Leeuwenhoek Date: 2020-08-14 Impact factor: 2.271

4. Hippo Signaling Pathway Has a Critical Role in Zika Virus Replication and in the Pathogenesis of Neuroinflammation.

Authors: Gustavo Garcia; Sayan Paul; Sara Beshara; V Krishnan Ramanujan; Arunachalam Ramaiah; Karin Nielsen-Saines; Melody M H Li; Samuel W French; Kouki Morizono; Ashok Kumar; Vaithilingaraja Arumugaswami
Journal: Am J Pathol Date: 2020-02-05 Impact factor: 4.307

5. Cardioinformatics: the nexus of bioinformatics and precision cardiology.

Authors: Bohdan B Khomtchouk; Diem-Trang Tran; Kasra A Vand; Matthew Might; Or Gozani; Themistocles L Assimes
Journal: Brief Bioinform Date: 2020-12-01 Impact factor: 11.622

6. KRAS pathway expression changes in pancreatic cancer models by conventional and experimental taxanes.

Authors: M Oliverius; D Flasarova; B Mohelnikova-Duchonova; M Ehrlichova; V Hlavac; M Kocik; O Strouhal; P Dvorak; I Ojima; P Soucek
Journal: Mutagenesis Date: 2019-12-19 Impact factor: 3.000

7. anexVis: visual analytics framework for analysis of RNA expression.

Authors: Diem-Trang Tran; Tian Zhang; Ryan Stutsman; Matthew Might; Umesh R Desai; Balagurunathan Kuberan
Journal: Bioinformatics Date: 2018-07-15 Impact factor: 6.937

8. Ectopic activation of GABA_B receptors inhibits neurogenesis and metamorphosis in the cnidarian Nematostella vectensis.

Authors: Shani Levy; Vera Brekhman; Anna Bakhman; Assaf Malik; Arnau Sebé-Pedrós; Mickey Kosloff; Tamar Lotan
Journal: Nat Ecol Evol Date: 2020-11-09 Impact factor: 15.460

9. ORAI1 establishes resistance to SARS-CoV-2 infection by regulating tonic type I interferon signaling.

Authors: Beibei Wu; Arunachalam Ramaiah; Gustavo Garcia; Yousang Gwack; Vaithilingaraja Arumugaswami; Sonal Srikanth
Journal: bioRxiv Date: 2021-05-04

10. Nutritional modulation of heart failure in mitochondrial pyruvate carrier-deficient mice.

Authors: Kyle S McCommis; Attila Kovacs; Carla J Weinheimer; Trevor M Shew; Timothy R Koves; Olga R Ilkayeva; Dakota R Kamm; Kelly D Pyles; M Todd King; Richard L Veech; Brian J DeBosch; Deborah M Muoio; Richard W Gross; Brian N Finck
Journal: Nat Metab Date: 2020-10-26