Literature DB >> 26209431

dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering.

Tal Galili1.   

Abstract

UNLABELLED: dendextend is an R package for creating and comparing visually appealing tree diagrams. dendextend provides utility functions for manipulating dendrogram objects (their color, shape and content) as well as several advanced methods for comparing trees to one another (both statistically and visually). As such, dendextend offers a flexible framework for enhancing R's rich ecosystem of packages for performing hierarchical clustering of items.
AVAILABILITY AND IMPLEMENTATION: The dendextend R package (including detailed introductory vignettes) is available under the GPL-2 Open Source license and is freely available to download from CRAN at: (http://cran.r-project.org/package=dendextend) CONTACT: Tal.Galili@math.tau.ac.il.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2015        PMID: 26209431      PMCID: PMC4817050          DOI: 10.1093/bioinformatics/btv428

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Hierarchical cluster analysis (HCA) is a widely used family of unsupervised statistical methods for classifying a set of items into some hierarchy of clusters (groups) according to the similarities among the items. The R language (R Core Team, 2014)—a leading, cross-platform and open source statistical programming environment—has many implementations of HCA algorithms (Chipman and Tibshirani, 2006; Hornik, 2014; Schmidtlein ; Witten and Tibshirani, 2010). The output of these various algorithms is stored in the hclust object class, while the dendrogram class is an alternative object class that is often used as the go-to intermediate representation step for visualizing an HCA output. In many R packages, a figure output is adjusted by supplying the plot function with both an object to be plotted and various graphical parameters to be modified (colors, sizes, etc.). However, different behavior happens in the (base R) plot.dendrogram function, in which the function is given a dendrogram object that contains within itself (most of) the graphical parameters to be used when plotting the tree. Internally, the dendrogram class is represented as a nested list of lists with attributes for colors, height, etc. (with useful methods from the stats package). Until now, no comprehensive framework has been available in R for flexibly controlling the various attributes in dendrogram's class objects. The dendextend package aims to fill this gap by providing a significant number of new functions for controlling a dendrogram's structure and graphical attributes. It also implements methods for visually and statistically comparing different dendrogram objects. The package is extensively validated through unit-testing (Wickham, 2011), offers a C++ speed-up (Eddelbuettel and François, 2011) for some of the core functions through the dendextendRcpp package, and includes three detailed vignettes. The dendextend package is primarily geared towards HCA. For phylogeny analysis, the phylo object class (from the ape package) is recommended (Paradis ). A comprehensive comparison of dendextend, ape, as well as other software for tree analysis, is available in the supplementary materials.

2 Description

2.1 Updating a dendrogram for visualization

The function set(), in dendextend, accepts a dendrogram (i.e. dend) as input and returns it after some adjustment. The parameter what is a character indicating the property of the tree to be adjusted (see Table 1) based on value. The user can repeatedly funnel a tree, through different configuration of the set function, until a desired outcome is reached.
Table 1.

Available options for the ‘what’ parameter when using the set function for adjusting the look of a dendrogram

DescriptionOption name
Set the labels' names, color (per color, or with k clusters), size, turn to characterlabels, labels_to_character, labels_colors, labels_cex, labels_to_character
Set the leaves' point type, color, size, heightleaves_pch, leaves_col, leaves_cex, hang_leaves
Set all nodes' point type, color, sizenodes_pch, nodes_col, nodes_cex
Set branches' line type, color, width - per branch, based on clustering the labels, and for specific labelsbranches_lty, branches_col, branches_lwd, branches_k_color, by_labels_branches_lty, by_labels_branches_col, by_labels_branches_lwd
Available options for the ‘what’ parameter when using the set function for adjusting the look of a dendrogram Figure 1 is created by clustering a vector of 1 to 5 into a dendrogram:
Fig. 1.

A dendrogram after modifying various graphical attributes

A dendrogram after modifying various graphical attributes The above code uses the convenient forward-pipe operator %>% (Milton and Wickham, 2014), which is just like running: Next, the tree is plotted after repeatedly using the set function: The ‘value’ vector is recycled in a depth-first fashion, with the root node considered as having a branch (which is not plotted by default). The parameters of the new tree can be explored using the functions get_nodes_attr and get_leaves_attr. Also, we can rotate and prune a tree with the respective functions.

2.2 Comparing two dendrograms

The tanglegram function allows the visual comparison of two dendrograms, from different algorithms or experiments, by facing them one in front of the other and connecting their labels with lines. Distinct branches are marked with a dashed line. For easier and nicer plotting, dendlist concatenates the two dendrograms together, while untangle attempts to rotate trees with un-aligned labels in search for a good layout. Figure 2 demonstrates a comparison of two clustering algorithms (single versus complete linkage) on a subset of 15 flowers from the famous Iris dataset. The entanglement function measures the quality of the tanglegram layout. Measuring the correlation between tree topologies can be calculated using different measures with cor.dendlist (Sokal and Rohlf, 1962), Bk_plot (Fowlkes and Mallows, 1983), or dist.dendlist. Permutation test and bootstrap confidence intervals are available. The above methods offer sensitivity and replicability analysis for researchers who are interested in validating their hierarchical clustering results.
Fig. 2.

A tanglegram for comparing two clustering algorithms used on 15 flowers from the Iris dataset. Similar sub-trees are connected by lines of the same color, while branches leading to distinct sub-trees are marked by a dashed line

A tanglegram for comparing two clustering algorithms used on 15 flowers from the Iris dataset. Similar sub-trees are connected by lines of the same color, while branches leading to distinct sub-trees are marked by a dashed line

3 Enhancing other packages

The R ecosystem is abundant with functions that use dendrograms, and dendextend offers many functions for interacting and enhancing their visual display: The function rotate_DendSer (Hurley and Earle, 2013) rotates a dendrogram to optimize a visualization-based cost function. Other functions allow the highlighting of un-even creation of clusters with the dynamicTreeCut package (Langfelder ), as well as of ‘significant’ clusters based on the pvclust package (Suzuki and Shimodaira, 2006). Previously mentioned functions can be combined to create a highly customized (rotated, colorful, etc.) static heatmap using heatplot.2 from gplots (Warnes ), or a D3 interactive heatmap using the d3heatmap package. The circlize_dendrogram function produces a simple circular tree layout, while more complex circular layouts can be achieved using the circlize package (Gu ). Aside from R base graphics, a ggplot2 dendrogram may be created using the as.ggdend function. In conclusion, the dendextend package simplifies the creation, comparison and integration of dendrograms into fine-tuned (publication quality) graphs. A demonstration of the package on various datasets is available in the supplementary materials.
  6 in total

1.  Hybrid hierarchical clustering with applications to microarray data.

Authors:  Hugh Chipman; Robert Tibshirani
Journal:  Biostatistics       Date:  2005-11-21       Impact factor: 5.899

2.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering.

Authors:  Ryota Suzuki; Hidetoshi Shimodaira
Journal:  Bioinformatics       Date:  2006-04-04       Impact factor: 6.937

3.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.

Authors:  Peter Langfelder; Bin Zhang; Steve Horvath
Journal:  Bioinformatics       Date:  2007-11-16       Impact factor: 6.937

4.  circlize Implements and enhances circular visualization in R.

Authors:  Zuguang Gu; Lei Gu; Roland Eils; Matthias Schlesner; Benedikt Brors
Journal:  Bioinformatics       Date:  2014-06-14       Impact factor: 6.937

5.  A framework for feature selection in clustering.

Authors:  Daniela M Witten; Robert Tibshirani
Journal:  J Am Stat Assoc       Date:  2010-06-01       Impact factor: 5.033

6.  APE: Analyses of Phylogenetics and Evolution in R language.

Authors:  Emmanuel Paradis; Julien Claude; Korbinian Strimmer
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

  6 in total
  321 in total

1.  Comparison of Molecular Subtyping and Antimicrobial Resistance Detection Methods Used in a Large Multistate Outbreak of Extensively Drug-Resistant Campylobacter jejuni Infections Linked to Pet Store Puppies.

Authors:  Lavin A Joseph; Louise K Francois Watkins; Jessica Chen; Kaitlin A Tagg; Christy Bennett; Hayat Caidi; Jason P Folster; Mark E Laughlin; Lia Koski; Rachel Silver; Lauren Stevenson; Scott Robertson; Janet Pruckler; Megin Nichols; Hannes Pouseele; Heather A Carleton; Colin Basler; Cindy R Friedman; Aimee Geissler; Kelley B Hise; Rachael D Aubert
Journal:  J Clin Microbiol       Date:  2020-09-22       Impact factor: 5.948

2.  Comparative Chemometric Analysis for Classification of Acids and Bases via a Colorimetric Sensor Array.

Authors:  Michael J Kangas; Raychelle M Burks; Jordyn Atwater; Rachel M Lukowicz; Billy Garver; Andrea E Holmes
Journal:  J Chemom       Date:  2017-10-13       Impact factor: 2.467

3.  Examining inter-family differences in intra-family (parent-adolescent) dynamics using grid-sequence analysis.

Authors:  Miriam Brinberg; Gregory M Fosco; Nilam Ram
Journal:  J Fam Psychol       Date:  2017-12

4.  Phenetic Comparison of Prokaryotic Genomes Using k-mers.

Authors:  Maxime Déraspe; Frédéric Raymond; Sébastien Boisvert; Alexander Culley; Paul H Roy; François Laviolette; Jacques Corbeil
Journal:  Mol Biol Evol       Date:  2017-10-01       Impact factor: 16.240

5.  Following the terrestrial tracks of Caulobacter - redefining the ecology of a reputed aquatic oligotroph.

Authors:  Roland C Wilhelm
Journal:  ISME J       Date:  2018-08-14       Impact factor: 10.302

6.  Single-Cell RNA Sequencing Resolves Molecular Relationships Among Individual Plant Cells.

Authors:  Kook Hui Ryu; Ling Huang; Hyun Min Kang; John Schiefelbein
Journal:  Plant Physiol       Date:  2019-02-04       Impact factor: 8.340

7.  Development of a biomarker database toward performing disease classification and finding disease interrelations.

Authors:  Shaikh Farhad Hossain; Ming Huang; Naoaki Ono; Aki Morita; Shigehiko Kanaya; Md Altaf-Ul-Amin
Journal:  Database (Oxford)       Date:  2021-03-11       Impact factor: 3.451

8.  Identifying the ligated amino acid of archaeal tRNAs based on positions outside the anticodon.

Authors:  Tal Galili; Hila Gingold; Shaul Shaul; Yoav Benjamini
Journal:  RNA       Date:  2016-08-11       Impact factor: 4.942

9.  GenomeRunner web server: regulatory similarity and differences define the functional impact of SNP sets.

Authors:  Mikhail G Dozmorov; Lukas R Cara; Cory B Giles; Jonathan D Wren
Journal:  Bioinformatics       Date:  2016-04-01       Impact factor: 6.937

10.  MUC16 mutations improve patients' prognosis by enhancing the infiltration and antitumor immunity of cytotoxic T lymphocytes in the endometrial cancer microenvironment.

Authors:  Jing Hu; Jing Sun
Journal:  Oncoimmunology       Date:  2018-08-06       Impact factor: 8.110

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.