Literature DB >> 28645171

UpSetR: an R package for the visualization of intersecting sets and their properties.

Jake R Conway1, Alexander Lex2, Nils Gehlenborg1.   

Abstract

MOTIVATION: Venn and Euler diagrams are a popular yet inadequate solution for quantitative visualization of set intersections. A scalable alternative to Venn and Euler diagrams for visualizing intersecting sets and their properties is needed.
RESULTS: We developed UpSetR, an open source R package that employs a scalable matrix-based visualization to show intersections of sets, their size, and other properties.
AVAILABILITY AND IMPLEMENTATION: UpSetR is available at https://github.com/hms-dbmi/UpSetR/ and released under the MIT License. A Shiny app is available at https://gehlenborglab.shinyapps.io/upsetr/ . CONTACT: nils@hms.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28645171      PMCID: PMC5870712          DOI: 10.1093/bioinformatics/btx364

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The visualization of sets and their intersections is a common challenge for researchers who are dealing with biological and biomedical data. For example, a researcher might need to compare multiple algorithms that identify single nucleotide polymorphisms (Xu , Supplementary Fig. S1) or show orthologs of genes in newly sequenced species across genomes of related species (D’Hont , Supplementary Fig. S2). Although many alternative set visualization techniques exist (Alsallakh ), such data are typically visualized using Venn and Euler diagrams. Such diagrams can be generated with R packages such as venneuler (Wilkinson, 2012) and VennDiagram (Chen and Boutros, 2011). These closely related techniques have well known shortcomings, as they are hard to generate for more than a small number of sets. The visual representation of intersection size by irregularly shaped and unaligned areas makes it hard to answer essential questions such as ‘What is the biggest intersection?’ or ‘Is intersection X larger than intersection Y?’ (Cleveland and McGill, 1984).

2 Materials and methods

Here we present an R package named ‘UpSetR’ based on the ‘UpSet’ technique (Lex ; Lex and Gehlenborg, 2014) that employs a matrix-based layout to show intersections of sets and their sizes. It is implemented using ggplot2 (Wickham, 2009) and allows data analysts to easily generate generate UpSet plots for their own data. UpSetR support three input formats: (i) a table in which the rows represent elements and columns include set assignments and additional attributes; (ii) sets of elements names; and (iii) an expression describing the size of the set intersections as introduced by the venneuler package (Wilkinson, 2012). UpSetR provides support for the visualization of attributes associated with the elements contained in the sets, enabling researchers to explore and characterize the intersections. UpSetR differs from the original UpSet technique as it is optimized for static plots and for integration into typical bioinformatics workflows. We also provide a Shiny app that allows researchers to create publication-quality UpSet plots directly in a web browser. UpSetR visualizes intersections of sets as a matrix in which the rows represent the sets and the columns represent their intersections (Fig. 1 and Supplementary Figs. S1 and S2 for comparisons of Venn and Euler diagrams with UpSetR plots). For each set that is part of a given intersection, a black filled circle is placed in the corresponding matrix cell. If a set is not part of the intersection, a light gray circle is shown. A vertical black line connects the topmost black circle with the bottommost black circle in each column to emphasize the column-based relationships. The size of the intersections is shown as a bar chart placed on top of the matrix so that each column lines up with exactly one bar. A second bar chart showing the size of the each set is shown to the left of the matrix.
Fig. 1

An UpSetR plot of variants across eight ICGC cancer studies with three intersection queries, one element query, four attribute plots, and two set metadata plots. Data for LUSC-KR and LAML-KR are based on whole-genome sequencing, all others on whole-exome sequencing. The three intersection queries are the one-way intersections of LUSC-US (blue) and THCA-SA (purple), and the two-way intersection of LUSC-KR and LAML-KR (green). The element query (yellow) selects mutations classified as deletions. Three custom transition/transversion plots display the relative frequency of substitution events for the intersection queries. The bar plot attribute plot displays the contribution of variants unique to the THCA-SA cohort (purple) to each mutation type. Set metadata is plotted to the left of the set size bar (charts)

An UpSetR plot of variants across eight ICGC cancer studies with three intersection queries, one element query, four attribute plots, and two set metadata plots. Data for LUSC-KR and LAML-KR are based on whole-genome sequencing, all others on whole-exome sequencing. The three intersection queries are the one-way intersections of LUSC-US (blue) and THCA-SA (purple), and the two-way intersection of LUSC-KR and LAML-KR (green). The element query (yellow) selects mutations classified as deletions. Three custom transition/transversion plots display the relative frequency of substitution events for the intersection queries. The bar plot attribute plot displays the contribution of variants unique to the THCA-SA cohort (purple) to each mutation type. Set metadata is plotted to the left of the set size bar (charts)

3 Usage scenario

To illustrate the utility and features of UpSetR, we retrieved variant calls for eight cancer studies from from the ICGC Data Portal (see Supplementary Material). Each cancer study represents a set and each variant represents an element that is contained in one or more sets (Supplementary Fig. S3). UpSetR supports queries on the data to highlight features. Intersection queries can be used to select subsets of elements in the dataset defined by an intersection. Queries are assigned a unique color and their results are plotted on top of the intersection size bar chart. For example, this can be used to select elements in particular intersections (Supplementary Fig. S4). Additionally, UpSetR supports queries for the selection of elements based on attributes associated with the elements in the sets. Attributes can be numerical, Boolean or categorical. In our example, element attributes are chromosome, genomic location, and variant type (deletion, insertion, substitution) associated with each variant. UpSetR element queries select elements across intersections and sets based on particular attribute values. Basic built-in queries can be extended to arbitrarily complex queries by providing a custom query function that operates on any combination of attributes. Element queries can be used to select variants of a particular type, such as deletions, and to view them across intersections (Supplementary Fig. S5). UpSetR provides integration of additional attribute plots that visualize attributes of elements selected by an intersection or element query. Support for scatter plots and histograms is built into UpSetR. Additional plot types can be integrated by providing in a function that returns a ggplot object to visualize the data. When attribute or intersection queries are applied, query results can also be overlaid on attribute plots in addition to the intersection size bar plot. Figure 1 demonstrates how these features, including the visualization of metadata about the sets, can be combined into a plot that among other issues, reveals a notable over-representation of unique deletions among the variants in the THCA-SA study.

4 Conclusion

UpSetR is a highly customizable tool for data exploration and generation of set visualizations. By making UpSetR compatible with the input formats of the existing popular Venn and Euler diagram packages and by offering a Shiny web interface, we incentivize the use of UpSet diagrams and enable users without programming skills to generate effective set visualizations. Through its seamless integration with ggplot2 and its ability to apply virtually any query, it is possible to customize and explore data in ways not supported by any other set visualization package. In addition, the integration of UpSetR with ggplot2 allows developers to extend UpSetR for use in their own software packages. Click here for additional data file.
  5 in total

1.  Exact and approximate area-proportional circular Venn and Euler diagrams.

Authors:  Leland Wilkinson
Journal:  IEEE Trans Vis Comput Graph       Date:  2012-02       Impact factor: 4.579

2.  A fast and accurate SNP detection algorithm for next-generation sequencing data.

Authors:  Feng Xu; Weixin Wang; Panwen Wang; Mulin Jun Li; Pak Chung Sham; Junwen Wang
Journal:  Nat Commun       Date:  2012       Impact factor: 14.919

3.  The banana (Musa acuminata) genome and the evolution of monocotyledonous plants.

Authors:  Angélique D'Hont; France Denoeud; Jean-Marc Aury; Franc-Christophe Baurens; Françoise Carreel; Olivier Garsmeur; Benjamin Noel; Stéphanie Bocs; Gaëtan Droc; Mathieu Rouard; Corinne Da Silva; Kamel Jabbari; Céline Cardi; Julie Poulain; Marlène Souquet; Karine Labadie; Cyril Jourda; Juliette Lengellé; Marguerite Rodier-Goud; Adriana Alberti; Maria Bernard; Margot Correa; Saravanaraj Ayyampalayam; Michael R Mckain; Jim Leebens-Mack; Diane Burgess; Mike Freeling; Didier Mbéguié-A-Mbéguié; Matthieu Chabannes; Thomas Wicker; Olivier Panaud; Jose Barbosa; Eva Hribova; Pat Heslop-Harrison; Rémy Habas; Ronan Rivallan; Philippe Francois; Claire Poiron; Andrzej Kilian; Dheema Burthia; Christophe Jenny; Frédéric Bakry; Spencer Brown; Valentin Guignon; Gert Kema; Miguel Dita; Cees Waalwijk; Steeve Joseph; Anne Dievart; Olivier Jaillon; Julie Leclercq; Xavier Argout; Eric Lyons; Ana Almeida; Mouna Jeridi; Jaroslav Dolezel; Nicolas Roux; Ange-Marie Risterucci; Jean Weissenbach; Manuel Ruiz; Jean-Christophe Glaszmann; Francis Quétier; Nabila Yahiaoui; Patrick Wincker
Journal:  Nature       Date:  2012-08-09       Impact factor: 49.962

4.  UpSet: Visualization of Intersecting Sets.

Authors:  Alexander Lex; Nils Gehlenborg; Hendrik Strobelt; Romain Vuillemot; Hanspeter Pfister
Journal:  IEEE Trans Vis Comput Graph       Date:  2014-12       Impact factor: 4.579

5.  VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R.

Authors:  Hanbo Chen; Paul C Boutros
Journal:  BMC Bioinformatics       Date:  2011-01-26       Impact factor: 3.307

  5 in total
  596 in total

1.  Proteome-Wide Analysis of Cysteine Reactivity during Effector-Triggered Immunity.

Authors:  Evan W McConnell; Philip Berg; Timothy J Westlake; Katherine M Wilson; George V Popescu; Leslie M Hicks; Sorina C Popescu
Journal:  Plant Physiol       Date:  2018-12-03       Impact factor: 8.340

2.  PM2.5 Filter Extraction Methods: Implications for Chemical and Toxicological Analyses.

Authors:  Courtney Roper; Lisandra Santiago Delgado; Damien Barrett; Staci L Massey Simonich; Robert L Tanguay
Journal:  Environ Sci Technol       Date:  2018-12-12       Impact factor: 9.028

3.  HBEGF+ macrophages in rheumatoid arthritis induce fibroblast invasiveness.

Authors:  David Kuo; Jennifer Ding; Ian S Cohn; Fan Zhang; Kevin Wei; Deepak A Rao; Cristina Rozo; Upneet K Sokhi; Sara Shanaj; David J Oliver; Adriana P Echeverria; Edward F DiCarlo; Michael B Brenner; Vivian P Bykerk; Susan M Goodman; Soumya Raychaudhuri; Gunnar Rätsch; Lionel B Ivashkiv; Laura T Donlin
Journal:  Sci Transl Med       Date:  2019-05-08       Impact factor: 17.956

4.  EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure.

Authors:  Marius Alfred Dieckmann; Sebastian Beyvers; Rudel Christian Nkouamedjo-Fankep; Patrick Harald Georg Hanel; Lukas Jelonek; Jochen Blom; Alexander Goesmann
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

5.  Patterns of host use by brood parasitic Maculinea butterflies across Europe.

Authors:  András Tartally; Jeremy A Thomas; Christian Anton; Emilio Balletto; Francesca Barbero; Simona Bonelli; Markus Bräu; Luca Pietro Casacci; Sándor Csősz; Zsolt Czekes; Matthias Dolek; Izabela Dziekańska; Graham Elmes; Matthias A Fürst; Uta Glinka; Michael E Hochberg; Helmut Höttinger; Vladimir Hula; Dirk Maes; Miguel L Munguira; Martin Musche; Per Stadel Nielsen; Piotr Nowicki; Paula S Oliveira; László Peregovits; Sylvia Ritter; Birgit C Schlick-Steiner; Josef Settele; Marcin Sielezniew; David J Simcox; Anna M Stankiewicz; Florian M Steiner; Giedrius Švitra; Line V Ugelvig; Hans Van Dyck; Zoltán Varga; Magdalena Witek; Michal Woyciechowski; Irma Wynhoff; David R Nash
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-04-01       Impact factor: 6.237

6.  In vivo phage display: identification of organ-specific peptides using deep sequencing and differential profiling across tissues.

Authors:  Karlis Pleiko; Kristina Põšnograjeva; Maarja Haugas; Päärn Paiste; Allan Tobi; Kaarel Kurm; Una Riekstina; Tambet Teesalu
Journal:  Nucleic Acids Res       Date:  2021-04-19       Impact factor: 16.971

7.  Genome-wide association and HLA fine-mapping studies identify risk loci and genetic pathways underlying allergic rhinitis.

Authors:  Johannes Waage; Marie Standl; John A Curtin; Leon E Jessen; Jonathan Thorsen; Chao Tian; Nathan Schoettler; Carlos Flores; Abdel Abdellaoui; Tarunveer S Ahluwalia; Alexessander C Alves; Andre F S Amaral; Josep M Antó; Andreas Arnold; Amalia Barreto-Luis; Hansjörg Baurecht; Catharina E M van Beijsterveldt; Eugene R Bleecker; Sílvia Bonàs-Guarch; Dorret I Boomsma; Susanne Brix; Supinda Bunyavanich; Esteban G Burchard; Zhanghua Chen; Ivan Curjuric; Adnan Custovic; Herman T den Dekker; Shyamali C Dharmage; Julia Dmitrieva; Liesbeth Duijts; Markus J Ege; W James Gauderman; Michel Georges; Christian Gieger; Frank Gilliland; Raquel Granell; Hongsheng Gui; Torben Hansen; Joachim Heinrich; John Henderson; Natalia Hernandez-Pacheco; Patrick Holt; Medea Imboden; Vincent W V Jaddoe; Marjo-Riitta Jarvelin; Deborah L Jarvis; Kamilla K Jensen; Ingileif Jónsdóttir; Michael Kabesch; Jaakko Kaprio; Ashish Kumar; Young-Ae Lee; Albert M Levin; Xingnan Li; Fabian Lorenzo-Diaz; Erik Melén; Josep M Mercader; Deborah A Meyers; Rachel Myers; Dan L Nicolae; Ellen A Nohr; Teemu Palviainen; Lavinia Paternoster; Craig E Pennell; Göran Pershagen; Maria Pino-Yanes; Nicole M Probst-Hensch; Franz Rüschendorf; Angela Simpson; Kari Stefansson; Jordi Sunyer; Gardar Sveinbjornsson; Elisabeth Thiering; Philip J Thompson; Maties Torrent; David Torrents; Joyce Y Tung; Carol A Wang; Stephan Weidinger; Scott Weiss; Gonneke Willemsen; L Keoki Williams; Carole Ober; David A Hinds; Manuel A Ferreira; Hans Bisgaard; David P Strachan; Klaus Bønnelykke
Journal:  Nat Genet       Date:  2018-07-16       Impact factor: 38.330

8.  Reproducible and replicable comparisons using SummarizedBenchmark.

Authors:  Patrick K Kimes; Alejandro Reyes
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

9.  Omics Data Reveal Putative Regulators of Einkorn Grain Protein Composition under Sulfur Deficiency.

Authors:  Titouan Bonnot; Pierre Martre; Victor Hatte; Mireille Dardevet; Philippe Leroy; Camille Bénard; Natalia Falagán; Marie-Laure Martin-Magniette; Catherine Deborde; Annick Moing; Yves Gibon; Marie Pailloux; Emmanuelle Bancel; Catherine Ravel
Journal:  Plant Physiol       Date:  2020-04-15       Impact factor: 8.340

10.  Pregnancy-specific transcriptional changes upon endotoxin exposure in mice.

Authors:  Kenichiro Motomura; Roberto Romero; Adi L Tarca; Jose Galaz; Gaurav Bhatti; Bogdan Done; Marcia Arenas-Hernandez; Dustyn Levenson; Rebecca Slutsky; Chaur-Dong Hsu; Nardhy Gomez-Lopez
Journal:  J Perinat Med       Date:  2020-09-25       Impact factor: 1.901

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.