Literature DB >> 28057677

PanViz: interactive visualization of the structure of functionally annotated pangenomes.

Thomas Lin Pedersen1,2, Intawat Nookaew3,4, David Wayne Ussery3,4, Maria Månsson5.   

Abstract

Summary: PanViz is a novel, interactive, visualization tool for pangenome analysis. PanViz allows visualization of changes in gene group (groups of similar genes across genomes) classification as different subsets of pangenomes are selected, as well as comparisons of individual genomes to pangenomes with gene ontology based navigation of gene groups. Furthermore it allows for rich and complex visual querying of gene groups in the pangenome. PanViz visualizations require no external programs and are easily sharable, allowing for rapid pangenome analyses. Availability and Implementation: PanViz is written entirely in JavaScript and is available on https://github.com/thomasp85/PanViz . A companion R package that facilitates the creation of PanViz visualizations from a range of data formats is released through Bioconductor and is available at https://bioconductor.org/packages/PanVizGenerator . Contact: thomasp85@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Mesh:

Year:  2017        PMID: 28057677      PMCID: PMC5859990          DOI: 10.1093/bioinformatics/btw761

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Visualization plays an integral part in modern biology research, as the size and complexity of biological data has increased (Land ). Many new tools for analysis of pangenomes have recently been published (Grant ; Hallin ; Lechat ; Rokicki ). However, several of these do not scale well. A general tendency for pangenome visualizations is to use the chromosome of a reference genome as an axis and plot synteny between genomes along that. Visualization based on a reference genome fails to take into account novel pan-genes not in the reference. As more genomes are added, the reference genome becomes less representative of the full dataset. Some attempts have been made to create reference free, scalable pangenome visualization of different types. GenoSets (Cain ) is a visualization that uses parallel sets to facilitate gene group selections based on presence-absence in pangenome subsets and GenomeRing (Herbig ) tries to overcome the reference bias by merging all chromosomes into a superchromosome that can be used as a backbone for visualization. Here we present a new interactive visualization, PanViz, aimed at letting users explore the structure of functionally annotated pangenomes and pangenome subsets, while performing visual queries to search for gene groups. PanViz is based purely on GO annotation and the presence/absence pattern of gene groups, and is thus not dependent on a single reference genome

2 Implementation

PanViz is written entirely in JavaScript using D3 (Bostock ). It is completely self-contained, embedded in a single HTML file, and does not require any connection to external sources. A companion R package, PanVizGenerator, has been released on Bioconductor (Gentleman ; Huber ) that facilitates the creation of new PanViz visualizations. The input data needed by PanVizGenerator is a pangenome matrix giving the presence/absence pattern of each gene group across the included genomes, as well as a Gene Ontology (GO) (Ashburner ) based functional annotation of each gene group. The latter can be derived by analyzing a representative sequence for each gene group using e.g. InterProScan (Jones ; Zdobnov and Apweiler, 2001) or Blast2GO (Conesa ). A gene group in this context is a group of similar genes across genomes. The simple nature of the input data means that PanViz works with any pangenome tool, though the method of achieving a pangenome matrix might vary.

3 Overview

PanViz consists of four main areas (see Supplementary Fig. S1 in supplementary material). The left part is reserved for (pan)genome navigation (Supplementary Fig. S1A), the center part for pangenome visualization (Supplementary Fig. S1B), while the right part is for legends and additional lookup information (Supplementary Fig. S1C). In the bottom a list of all the currently selected gene groups is available, as well as tools to modify the selection (Supplementary Fig. S1D).

3.1 Genome navigation

The genomes in the pangenome are represented by a dual linked view with a principal component analysis/multidimensional scaling based scatterplot on top, and a zoomable hierarchical clustering in the bottom (Supplementary Fig. S1A) both based on the pangenome matrix. Both views support selecting single genomes in order to transition into the genome-pangenome comparison state, and the dendrogram allows for selection of subsets of the pangenome by selecting the branch points of the dendrogram. The overview plots are also linked to the gene group table (Supplementary Fig. S1D) so that all genomes containing the gene group currently hovered over will be highlighted.

3.2 Pangenome overview

The main view of the visualization is a radial representation of the 3 presence-based gene group groupings in the pangenome: Core, Accessory and Singleton gene groups (Supplementary Fig. S1B1). Each of these is furthermore divided based on the distribution of top level biological process GO terms. As different sub-pangenomes are selected, the changes in the pangenome are animated by moving sections of each GO term arc around. After the animation ends the dynamics can furthermore be shown as chords when hovering over a specific GO arc.

3.3 Genome-pangenome comparison

When one or two genomes are selected the main view transitions into a stacked bar chart showing the pangenome in the middle (Supplementary Fig. S1B2). The genomes are represented by their GO term composition and weighted bezier curves connectes the genes in the genomes to their location in the pangenome (if present). If two genomes are selected the proportion of each GO term they share with each other and the pangenome is visible as a darker shaded bezier curve.

3.4 Gene ontology naivgation

To gain insight into the distribution of lower level GO terms a treemap weighted by the number of gene groups in the current pangenome having a specific term is available upon selecting a top level GO term bar from the pangenome (Supplementary Fig. S1B3). The treemap is zoomable and features descriptions of each included GO term.

3.5 Visual querying

As each visual element represents of a set of gene groups it makes sense to build a querying mechanism based on set arithmetic (union, intersection, complement, etc.). An icon on top of the gene group table indicates the different set operations available (Supplementary Fig. S1D). The operations will be performed between the current content of the table and the gene groups contained in the visual element selected. Based on the six different operations it is possible to intuitively create very complex gene group queries guided by the insights gained from the visualization.

4 Conclusion

PanViz offers a novel and unbiased approach to visualizing the structure of pangenome data. Interactions and animations are utilized to invite users to investigate the data, and the reliance of a single self-contained HTML file makes it easy to share with fellow researchers. The main visualization is fully scalable to thousands of gene groups and genomes as it relies on summaries, but larger pangenomes will require faster hardware due to the dynamic nature of the visualization. PanViz helps researchers in understanding how different pangenomes differ on a functional level rather than simply in terms of shared gene groups. Future work will focus on implementing state saving within the URL to facilitate sharing of different states of a PanViz visualization, as well as performance improvement to the implementation.

Funding

This work was supported by The Danish Agency for Science, Technology and Innovation. Conflict of Interest: none declared. Click here for additional data file.
  14 in total

1.  InterProScan--an integration platform for the signature-recognition methods in InterPro.

Authors:  E M Zdobnov; R Apweiler
Journal:  Bioinformatics       Date:  2001-09       Impact factor: 6.937

2.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

3.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

4.  The genome BLASTatlas-a GeneWiz extension for visualization of whole-genome homology.

Authors:  Peter F Hallin; Tim T Binnewies; David W Ussery
Journal:  Mol Biosyst       Date:  2008-03-17

5.  Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.

Authors:  Ana Conesa; Stefan Götz; Juan Miguel García-Gómez; Javier Terol; Manuel Talón; Montserrat Robles
Journal:  Bioinformatics       Date:  2005-08-04       Impact factor: 6.937

6.  GenoSets: visual analytic methods for comparative genomics.

Authors:  Aurora A Cain; Robert Kosara; Cynthia J Gibas
Journal:  PLoS One       Date:  2012-10-03       Impact factor: 3.240

7.  Comparing thousands of circular genomes using the CGView Comparison Tool.

Authors:  Jason R Grant; Adriano S Arantes; Paul Stothard
Journal:  BMC Genomics       Date:  2012-05-23       Impact factor: 3.969

8.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  SynTView - an interactive multi-view genome browser for next-generation comparative microorganism genomics.

Authors:  Pierre Lechat; Erika Souche; Ivan Moszer
Journal:  BMC Bioinformatics       Date:  2013-09-22       Impact factor: 3.169

Review 10.  Insights from 20 years of bacterial genome sequencing.

Authors:  Miriam Land; Loren Hauser; Se-Ran Jun; Intawat Nookaew; Michael R Leuze; Tae-Hyuk Ahn; Tatiana Karpinets; Ole Lund; Guruprased Kora; Trudy Wassenaar; Suresh Poudel; David W Ussery
Journal:  Funct Integr Genomics       Date:  2015-02-27       Impact factor: 3.410

View more
  7 in total

Review 1.  Pangenomics in Microbial and Crop Research: Progress, Applications, and Perspectives.

Authors:  Sumit Kumar Aggarwal; Alla Singh; Mukesh Choudhary; Aundy Kumar; Sujay Rakshit; Pardeep Kumar; Abhishek Bohra; Rajeev K Varshney
Journal:  Genes (Basel)       Date:  2022-03-27       Impact factor: 4.141

2.  MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.

Authors:  Ye Peng; Shanmei Tang; Dan Wang; Huanzi Zhong; Huijue Jia; Xianghang Cai; Zhaoxi Zhang; Minfeng Xiao; Huanming Yang; Jian Wang; Karsten Kristiansen; Xun Xu; Junhua Li
Journal:  Gigascience       Date:  2018-11-01       Impact factor: 6.524

3.  Hierarchical sets: analyzing pangenome structure through scalable set visualizations.

Authors:  Thomas Lin Pedersen
Journal:  Bioinformatics       Date:  2017-06-01       Impact factor: 6.937

4.  Metaviz: interactive statistical and visual analysis of metagenomic data.

Authors:  Justin Wagner; Florin Chelaru; Jayaram Kancherla; Joseph N Paulson; Alexander Zhang; Victor Felix; Anup Mahurkar; Niklas Elmqvist; Héctor Corrada Bravo
Journal:  Nucleic Acids Res       Date:  2018-04-06       Impact factor: 16.971

5.  PanACEA: a bioinformatics tool for the exploration and visualization of bacterial pan-chromosomes.

Authors:  Thomas H Clarke; Lauren M Brinkac; Jason M Inman; Granger Sutton; Derrick E Fouts
Journal:  BMC Bioinformatics       Date:  2018-06-27       Impact factor: 3.169

6.  viromeBrowser: A Shiny App for Browsing Virome Sequencing Analysis Results.

Authors:  David F Nieuwenhuijse; Bas B Oude Munnink; Marion P G Koopmans
Journal:  Viruses       Date:  2021-03-09       Impact factor: 5.048

7.  An object-oriented framework for evolutionary pangenome analysis.

Authors:  Ignacio Ferrés; Gregorio Iraola
Journal:  Cell Rep Methods       Date:  2021-09-27
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.