Literature DB >> 29069305

heatmaply: an R package for creating interactive cluster heatmaps for online publishing.

Tal Galili1, Alan O'Callaghan2, Jonathan Sidi3, Carson Sievert4.   

Abstract

Summary: heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to zoom in to specific sections of the figure from the data matrix, the side dendrograms, or annotated labels. Thanks to the synergistic relationship between heatmaply and other R packages, the user is empowered by a refined control over the statistical and visual aspects of the heatmap layout. Availability and implementation: The heatmaply package is available under the GPL-2 Open Source license. It comes with a detailed vignette, and is freely available from: http://cran.r-project.org/package=heatmaply. Contact: tal.galili@math.tau.ac.il. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29069305      PMCID: PMC5925766          DOI: 10.1093/bioinformatics/btx657

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

A cluster heatmap is a popular graphical method for visualizing high dimensional data. In it, a table of numbers is scaled and encoded as a tiled matrix of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by dendrograms and extra columns of categorical annotation. The ongoing development of this iconic visualization, spanning over more than a century, has provided the foundation for one of the most widely used of all bioinformatics displays (Wilkinson and Friendly, 2009). When using the R language for statistical computing (R Core Team, 2016), there are many available packages for producing static heatmaps, such as: stats, gplots, heatmap3, fheatmap, pheatmap and others. Recently released packages also allow for more complex layouts; these include gapmap, superheat and ComplexHeatmap (Gu ). The next evolutionary step has been to create interactive cluster heatmaps, and several solutions are already available. However, these solutions, such as the idendro R package (Sieger ), are often focused on providing an interactive output that can be explored only on the researcher's personal computer. Some solutions do exist for creating shareable interactive heatmaps. However, these are either dependent on a specific online provider, such as XCMS Online, or require JavaScript knowledge to operate, such as InCHlib. In practice, when publishing in academic journals, the reader is left with a static figure only (often in a png or pdf format). To fill this gap, we have developed the heatmaply R package for easily creating a shareable HTML file that contains an interactive cluster heatmap. The interactivity is based on a client-side JavaScript code that is generated based on the user's data, after running the following command: heatmaply(data, file = ‘my_heatmap.html‘) The HTML file contains a publication-ready, interactive figure that allows the user to zoom in as well as see values when hovering over the cells. This self-contained HTML file can be made available to interested readers by uploading it to the researcher's homepage or as a Supplementary Material in the journal's server. Concurrently, this interactive figure can be displayed in RStudio's viewer pane, included in a Shiny application, or embedded in a knitr/RMarkdown HTML documents. The rest of this paper offers guidelines for creating effective cluster heatmap visualization. Figure 1 demonstrates the suggestions from this section on data from project Tycho (van Panhuis ), while the online Supplementary Material includes the interactive version, as well as several examples of using the package on real-world biological data.
Fig. 1

The (square root) number of people infected by Measles in 50 states, from 1928 to 2003. Vaccines were introduced in 1963, An interactive version is available in the following URL: https://cdn.rawgit.com/talgalili/heatmaplyExamples/564da09e/inst/doc/measles_heatmaply.html

The (square root) number of people infected by Measles in 50 states, from 1928 to 2003. Vaccines were introduced in 1963, An interactive version is available in the following URL: https://cdn.rawgit.com/talgalili/heatmaplyExamples/564da09e/inst/doc/measles_heatmaply.html

2 heatmaply: a simple example

The generation of cluster heatmaps is a subtle process (Gehlenborg and Wong, 2012; Weinstein, 2008), requiring the user to make many decisions along the way. The major decisions to be made deal with the data matrix and the dendrogram. The raw data often need to be transformed in order to have a meaningful and comparable scale, while an appropriate color palette should be picked. The clustering of the data requires us to decide on a distance measure between the observation, a linkage function, as well as a rotation and coloring of branches that manage to highlight interpretable clusters. Each such decision can have consequences on the patterns and interpretations that emerge. In this section, we go through some of the arguments in the function heatmaply, aiming to make it easy for the user to tune these important statistical and visual parameters. Our toy example visualizes the effect of vaccines on measles infection. The output is given in the static Figure 1, while an interactive version is available online in the Supplementary file ‘measles.html’. Both were created using: heatmaply(x = sqrt(measles), color = viridis, # the default Colv = NULL, hclust_method = ‘average‘, k_row = NA, # … file = c(‘measles.html‘, ‘measles.png‘)) The first argument of the function (x) accepts a matrix of the data. In the measles data, each row corresponds with a state, each column with a year (from 1928 to 2003), and each cell with the number of people infected with measles per 100 000 people. In this example, the data were scaled twice—first by not giving the raw number of cases with measles, but scaling them relatively to 100 000 people, thus making it possible to more easily compare between states. And second by taking the square root of the values. This was done since all the values in the data represent the same unit of measure, but come from a right-tailed distribution of count data with some extreme observations. Taking the square root helps with bringing extreme observations closer to one another, helping to avoid an extreme observation from masking the general pattern. Other transformations that may be considered come from Box-Cox or Yeo-Johnson family of power transformations. If each column of the data were to represent a different unit of measure, then leaving the values unchanged will often result in the entire figure being un-usable due to the column with the largest range of values taking over most of the colors in the figure. Possible per-column transformations include the scale function, suitable for data that are relatively normal. normalize, and percentize functions bring data to the comparable 0–1 scale for each column. The normalize function preserves the shape of each column’s distribution by subtracting the minimum and dividing by the maximum of all observations for each column. The percentize function is similar to ranking but with the simpler interpretation of each value being replaced by the percent of observations that have that value or below. It uses the empirical cumulative distribution function of each variable on its own values. The sparseness of the dataset can be explored using is.na10. Once the data are adequately scaled, it is important to choose a good color palette for the data. Other than being pretty, an ideal color palette should have three (somewhat conflicting) properties: (i) Colorful, spanning as wide a palette as possible so as to make differences easy to see; (ii) Perceptually uniform, so that values close to each other have similar-appearing colors compared with values that are far away, consistently across the range of values; and (iii) Robust to colorblindness, so that the above properties hold true for people with common forms of colorblindness, as well as printing well in grey scale. The default passed to the color argument in heatmaply is viridis, which offers a sequential color palette, offering a good balance of these properties. Divergent color scale should be preferred when visualizing a correlation matrix, as it is important to make the low and high ends of the range visually distinct. A helpful divergent palette available in the package is cool_warm (other alternatives in the package include RdBu, BrBG, or RdYlBu, based on the RColorBrewer package). It is also advisable to set the limits argument to range from -1 to 1. Passing NULL to the Colv argument, in our example, removed the column dendrogram (since we wish to keep the order of the columns, relating to the years). The row dendrogram is automatically calculated using hclust with a Euclidean distance measure and the average linkage function. The user can choose to use an alternative clustering function (hclustfun), distance measure (dist_method), or linkage function (hclust_method), or to have a dendrogram only in the rows/columns or none at all (through the dendrogram argument). Also, the users can supply their own dendrogram objects into the Rowv (or Colv) arguments. The preparation of the dendrograms can be made easier using the dendextend R package (Galili, 2015) for comparing and adjusting dendrograms. These choices are all left for the user to decide. Setting the k_col/k_row argument to NA makes the function search for the number of clusters (from 2 to 10) by which to color the branches of the dendrogram. The number picked is the one that yields the highest average silhouette coefficient (based on the find_k function from dendextend). Lastly, the heatmaply function uses the seriation package to find an ‘optimal’ ordering of rows and columns (Hahsler ). This is controlled using the seriation argument where the default is ‘OLO’ (optimal-leaf-order)—which rotates the branches so that the sum of distances between each adjacent leaf (label) will be minimized (i.e.: optimize the Hamiltonian path length that is restricted by the dendrogram structure). The other arguments in the example were omitted since they are self-explanatory—the exact code is available in the heatmaplyExamples package. In order to make some of the above easier, we created the shinyHeatmaply package (available on CRAN) which offers a GUI to help guide the researcher with the heatmap construction, with the functionality to export the heatmap as an html file and summaries parameter specifications to reproduce the heatmap with heatmaply. For a more detailed step-by-step demonstration of using heatmaply on biological datasets, you should explore the heatmaplyExamples package (https://github.com/talgalili//heatmaplyExamples). Click here for additional data file.
  5 in total

1.  Biochemistry. A postgenomic visual icon.

Authors:  John N Weinstein
Journal:  Science       Date:  2008-03-28       Impact factor: 47.728

2.  Points of View: Heat maps

Authors:  Nils Gehlenborg; Bang Wong
Journal:  Nat Methods       Date:  2012-03       Impact factor: 28.547

3.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data.

Authors:  Zuguang Gu; Roland Eils; Matthias Schlesner
Journal:  Bioinformatics       Date:  2016-05-20       Impact factor: 6.937

4.  Contagious diseases in the United States from 1888 to the present.

Authors:  Willem G van Panhuis; John Grefenstette; Su Yon Jung; Nian Shong Chok; Anne Cross; Heather Eng; Bruce Y Lee; Vladimir Zadorozhny; Shawn Brown; Derek Cummings; Donald S Burke
Journal:  N Engl J Med       Date:  2013-11-28       Impact factor: 91.245

5.  dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering.

Authors:  Tal Galili
Journal:  Bioinformatics       Date:  2015-07-23       Impact factor: 6.937

  5 in total
  135 in total

1.  In Silico Analysis of Micro-RNA Sequencing Data.

Authors:  Ernesto Aparicio-Puerta; Bastian Fromm; Michael Hackenberg; Marc K Halushka
Journal:  Methods Mol Biol       Date:  2021

2.  Nonsense-Mediated RNA Decay Factor UPF1 Is Critical for Posttranscriptional and Translational Gene Regulation in Arabidopsis.

Authors:  Vivek K Raxwal; Craig G Simpson; Jiradet Gloggnitzer; Juan Carlos Entinze; Wenbin Guo; Runxuan Zhang; John W S Brown; Karel Riha
Journal:  Plant Cell       Date:  2020-07-14       Impact factor: 11.277

3.  Untargeted Metabolomic Profiling of Fungal Species Populations.

Authors:  Thomas E Witte; David P Overy
Journal:  Methods Mol Biol       Date:  2022

4.  Shared Mechanisms Govern HIV Transcriptional Suppression in Circulating CD103+ and Gut CD4+ T Cells.

Authors:  Steven A Yukl; Shahzada Khan; Tsui-Hua Chen; Martin Trapecar; Frank Wu; Guorui Xie; Sushama Telwatte; Daniel Fulop; Alexander R Pico; Gregory M Laird; Kristen D Ritter; Norman G Jones; Chuanyi M Lu; Robert F Siliciano; Nadia R Roan; Jeffrey M Milush; Ma Somsouk; Steven G Deeks; Peter W Hunt; Shomyseh Sanjabi
Journal:  J Virol       Date:  2020-12-22       Impact factor: 5.103

5.  Rhizobium leguminosarum bv. trifolii NodD2 Enhances Competitive Nodule Colonization in the Clover-Rhizobium Symbiosis.

Authors:  Shaun Ferguson; Anthony S Major; John T Sullivan; Scott D Bourke; Simon J Kelly; Benjamin J Perry; Clive W Ronson
Journal:  Appl Environ Microbiol       Date:  2020-09-01       Impact factor: 4.792

6.  Functional characterization of a PROTAC directed against BRAF mutant V600E.

Authors:  Ganna Posternak; Xiaojing Tang; Pierre Maisonneuve; Ting Jin; Hugo Lavoie; Salima Daou; Stephen Orlicky; Theo Goullet de Rugy; Lauren Caldwell; Kin Chan; Ahmed Aman; Michael Prakesch; Gennady Poda; Pavel Mader; Cassandra Wong; Stefan Maier; Julia Kitaygorodsky; Brett Larsen; Karen Colwill; Zhe Yin; Derek F Ceccarelli; Robert A Batey; Mikko Taipale; Igor Kurinov; David Uehling; Jeff Wrana; Daniel Durocher; Anne-Claude Gingras; Rima Al-Awar; Marc Therrien; Frank Sicheri
Journal:  Nat Chem Biol       Date:  2020-08-10       Impact factor: 15.040

7.  Genomic appraisal of Klebsiella PGPB isolated from soil to enhance the growth of barley.

Authors:  Sheetal Sharma; Shraddha Gang; Jorg Schumacher; Martin Buck; Meenu Saraf
Journal:  Genes Genomics       Date:  2021-05-07       Impact factor: 1.839

8.  Cocultivation of an ultrasmall environmental parasitic bacterium with lytic ability against bacteria associated with wastewater foams.

Authors:  Steven Batinovic; Jayson J A Rose; Julian Ratcliffe; Robert J Seviour; Steve Petrovski
Journal:  Nat Microbiol       Date:  2021-04-29       Impact factor: 17.745

9.  An atlas connecting shared genetic architecture of human diseases and molecular phenotypes provides insight into COVID-19 susceptibility.

Authors:  Liuyang Wang; Thomas J Balmat; Alejandro L Antonia; Florica J Constantine; Ricardo Henao; Thomas W Burke; Andy Ingham; Micah T McClain; Ephraim L Tsalik; Emily R Ko; Geoffrey S Ginsburg; Mark R DeLong; Xiling Shen; Christopher W Woods; Elizabeth R Hauser; Dennis C Ko
Journal:  Genome Med       Date:  2021-05-17       Impact factor: 15.266

10.  A Fifth of the Protein World: Rossmann-like Proteins as an Evolutionarily Successful Structural unit.

Authors:  Kirill E Medvedev; Lisa N Kinch; R Dustin Schaeffer; Jimin Pei; Nick V Grishin
Journal:  J Mol Biol       Date:  2020-12-31       Impact factor: 5.469

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.