Literature DB >> 29028899

Phandango: an interactive viewer for bacterial population genomics.

James Hadfield1, Nicholas J Croucher2, Richard J Goater3, Khalil Abudahab3, David M Aanensen2,3, Simon R Harris1.   

Abstract

SUMMARY: Fully exploiting the wealth of data in current bacterial population genomics datasets requires synthesizing and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.
AVAILABILITY AND IMPLEMENTATION: Phandango is a web application freely available for use at www.phandango.net and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/phandango.
© The Author 2017. Published by Oxford University Press.

Entities:  

Year:  2018        PMID: 29028899      PMCID: PMC5860215          DOI: 10.1093/bioinformatics/btx610

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Bacterial population genomics has advanced rapidly in terms of numbers of genomes sequenced, with recent publications involving analyses of hundreds or even thousands of bacterial genomes. Such studies often base their understanding upon a phylogenetic tree, onto which epidemiological, comparative genomic and phenotypic data can be mapped. In bacterial species which undergo homologous recombination, horizontal sequence transfer means that whole-genome phylogenies often have to be adjusted to mitigate the confounding effects of recombination using methods such as Gubbins (Croucher ) or BRAT NextGen (Marttinen ). These methods also predict regions of horizontally imported DNA in the genome of each bacterial isolate, which can only be practically interpreted when displayed in the context of the phylogeny. An alternative approach to large-scale comparative genomics is to investigate the distribution of the pan-genome across a set of isolates using software such as ROARY (Page ). Finally, increasing sample sizes have opened the way for genetic and phenotypic data to be combined in genome-wide association studies (GWAS) using programs such as PLINK or SEER (Purcell ; Lees ). These approaches have proved successful in identifying serotype switching within populations or finding variants associated within antimicrobial resistance (Chewapreecha ; Croucher ). Increasingly, web application development provides us with methods to link and visualise complex genomic data interactively (Argimón ). However, recombination, pan-genome and GWAS analyses all produce large amounts of output data that are typically explored separately in visually distinct styles, relative to a phylogeny, a reference sequence or both. Currently, exploratory analyses are often represented as single static images that provide a simple overview but do not allow visual investigation of the data or the ability to relate output from multiple analyses to one another. The ability to interactively visualize such complex and information rich datasets would allow clearer interpretation and facilitate novel biological discoveries. Phandango is an interactive web application which runs directly in web browsers. Data are uploaded by dragging and dropping files onto the browser window and analysed client side such that no data are transferred to servers. Figure 1 illustrates the resulting grid layout produced when a phylogenetic tree, an associated metadata file, a reference sequence annotation file and the output from Gubbins and BRATNextGen are simultaneously uploaded into Phandango. The resulting visualization is fully interactive, allowing users to manipulate and zoom both the phylogeny and along the length of the reference sequence using intuitive controls. The space allocated to panels within the grid can be easily adjusted by dragging. The framework allows loci of interest highlighted by any of the supported population genomic analysis data formats to be easily cross-referenced with functional information associated with the reference genome. This means that multiple population genomic analyses can be interactively compared in a single environment.
Fig. 1

Screenshot of recombination events inferred in a collection of Streptococcus pneumoniae genomes by Gubbins (blue blocks) and BRATNextGen (yellow blocks) with green blocks representing overlap between methods

Screenshot of recombination events inferred in a collection of Streptococcus pneumoniae genomes by Gubbins (blue blocks) and BRATNextGen (yellow blocks) with green blocks representing overlap between methods Phandango is versatile in the types of data format which can be displayed, all of which are detailed on the GitHub page. Briefly, phylogenies are expected in Newick format, recombination, GWAS and pan-genome data are expected in the default output formats of the software that produced them (currently, supported software are Gubbins, BRATNextGen, PLINK, SEER and ROARY), genome annotations are expected in GFF3 format and metadata in simple CSV format. Since all of these inputs are simple text files, it is relatively simple for any custom data structure to be converted by the user into one of these formats and subsequently displayed.

2 User interface

Phandango initially presents the entirety of the user’s data (normally consisting of the entire phylogeny and the entire reference sequence or pan-genome) simultaneously. The exact nature of the layout depends on the data loaded—for instance, one can view simply a phylogeny and associated metadata, or a genome annotation together with GWAS results without a phylogeny. The user can then quickly and easily zoom into regions of the genomic data, effectively expanding the view horizontally to focus on particular genomic loci. This allows rapid biological interpretation of complex data by quickly viewing the genomic regions of interest in greater detail. Combined with the ability to interact with the phylogeny by zooming to focus on particular leaf nodes or selecting and drawing sub-trees, the user can, for example, explore lineage-specific recombination or pan-genome profiles and compare these results against the overall dataset. Hovering over the genome annotation (top) or the metadata (between the phylogeny and the genomic information) displays any annotation associated with that data. A line graph is automatically generated and displayed under the genomic information panel. Depending on the data type displayed, the line graph represents either the recombination prevalence along the sequence or the number of isolates containing a particular gene. If subclades are selected on the tree, a second line graph is overlaid showing the same data for the selected taxa. In this way, features of sublineages may be easily compared with those of the overall dataset.

3 Conclusions

Phandango is an intuitive, user-friendly application that requires no installation or command line knowledge. It allows rapid viewing and interactive exploration of large genomic datasets and aids biological understanding of complex data through linking the output of multiple genomic analysis methods into a single, intuitive interface.

Funding

This work was supported by Wellcome Trust grant number 098051 awarded to the Wellcome Trust Sanger Institute. N.J.C is funded by a Sir Henry Dale Fellowship, jointly funded by the Wellcome Trust and Royal Society (Grant Number 104169/Z/14/Z). D.M.A., R.J.G. and K.A. are funded through The Centre for Genomic Pathogen Surveillance and Wellcome Trust grant number 099202. Conflict of Interest: none declared.
  8 in total

1.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

2.  Rapid pneumococcal evolution in response to clinical interventions.

Authors:  Nicholas J Croucher; Simon R Harris; Christophe Fraser; Michael A Quail; John Burton; Mark van der Linden; Lesley McGee; Anne von Gottberg; Jae Hoon Song; Kwan Soo Ko; Bruno Pichon; Stephen Baker; Christopher M Parry; Lotte M Lambertsen; Dea Shahinas; Dylan R Pillai; Timothy J Mitchell; Gordon Dougan; Alexander Tomasz; Keith P Klugman; Julian Parkhill; William P Hanage; Stephen D Bentley
Journal:  Science       Date:  2011-01-28       Impact factor: 47.728

3.  Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins.

Authors:  Nicholas J Croucher; Andrew J Page; Thomas R Connor; Aidan J Delaney; Jacqueline A Keane; Stephen D Bentley; Julian Parkhill; Simon R Harris
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

4.  Detection of recombination events in bacterial genomes from large population samples.

Authors:  Pekka Marttinen; William P Hanage; Nicholas J Croucher; Thomas R Connor; Simon R Harris; Stephen D Bentley; Jukka Corander
Journal:  Nucleic Acids Res       Date:  2011-11-07       Impact factor: 16.971

5.  Roary: rapid large-scale prokaryote pan genome analysis.

Authors:  Andrew J Page; Carla A Cummins; Martin Hunt; Vanessa K Wong; Sandra Reuter; Matthew T G Holden; Maria Fookes; Daniel Falush; Jacqueline A Keane; Julian Parkhill
Journal:  Bioinformatics       Date:  2015-07-20       Impact factor: 6.937

6.  Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes.

Authors:  John A Lees; Minna Vehkala; Niko Välimäki; Simon R Harris; Claire Chewapreecha; Nicholas J Croucher; Pekka Marttinen; Mark R Davies; Andrew C Steer; Steven Y C Tong; Antti Honkela; Julian Parkhill; Stephen D Bentley; Jukka Corander
Journal:  Nat Commun       Date:  2016-09-16       Impact factor: 14.919

7.  Microreact: visualizing and sharing data for genomic epidemiology and phylogeography.

Authors:  Silvia Argimón; Khalil Abudahab; Richard J E Goater; Artemij Fedosejev; Jyothish Bhai; Corinna Glasner; Edward J Feil; Matthew T G Holden; Corin A Yeats; Hajo Grundmann; Brian G Spratt; David M Aanensen
Journal:  Microb Genom       Date:  2016-11-30

8.  Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes.

Authors:  Claire Chewapreecha; Pekka Marttinen; Nicholas J Croucher; Susannah J Salter; Simon R Harris; Alison E Mather; William P Hanage; David Goldblatt; Francois H Nosten; Claudia Turner; Paul Turner; Stephen D Bentley; Julian Parkhill
Journal:  PLoS Genet       Date:  2014-08-07       Impact factor: 5.917

  8 in total
  174 in total

1.  Municipal Wastewater Surveillance Revealed a High Community Disease Burden of a Rarely Reported and Possibly Subclinical Salmonella enterica Serovar Derby Strain.

Authors:  Sabrina Diemert; Tao Yan
Journal:  Appl Environ Microbiol       Date:  2020-08-18       Impact factor: 4.792

2.  Genomic and Functional Analysis of Emerging Virulent and Multidrug-Resistant Escherichia coli Lineage Sequence Type 648.

Authors:  Katharina Schaufler; Torsten Semmler; Jukka Corander; Sebastian Guenther; Lothar H Wieler; Darren J Trott; Johann Pitout; Gisele Peirano; Jonas Bonnedahl; Monika Dolejska; Ivan Literak; Stephan Fuchs; Niyaz Ahmed; Mirjam Grobbel; Carmen Torres; Alan McNally; Derek Pickard; Christa Ewers; Nicholas J Croucher
Journal:  Antimicrob Agents Chemother       Date:  2019-05-24       Impact factor: 5.191

3.  GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens.

Authors:  Zhemin Zhou; Nabil-Fareed Alikhan; Martin J Sergeant; Nina Luhmann; Cátia Vaz; Alexandre P Francisco; João André Carriço; Mark Achtman
Journal:  Genome Res       Date:  2018-07-26       Impact factor: 9.043

Review 4.  Real-Time Analysis and Visualization of Pathogen Sequence Data.

Authors:  Richard A Neher; Trevor Bedford
Journal:  J Clin Microbiol       Date:  2018-10-25       Impact factor: 5.948

5.  Macroevolution of gastric Helicobacter species unveils interspecies admixture and time of divergence.

Authors:  Annemieke Smet; Koji Yahara; Mirko Rossi; Alfred Tay; Steffen Backert; Ensser Armin; James G Fox; Bram Flahou; Richard Ducatelle; Freddy Haesebrouck; Jukka Corander
Journal:  ISME J       Date:  2018-06-25       Impact factor: 10.302

6.  A Novel Glaesserella sp. Isolated from Pigs with Severe Respiratory Infections Has a Mosaic Genome with Virulence Factors Putatively Acquired by Horizontal Transfer.

Authors:  Anne E Watt; Glenn F Browning; Alistair R Legione; Rhys N Bushell; Andrew Stent; Ross S Cutler; Neil D Young; Marc S Marenda
Journal:  Appl Environ Microbiol       Date:  2018-05-17       Impact factor: 4.792

7.  Phylogenetic and Biogeographic Patterns of Vibrio parahaemolyticus Strains from North America Inferred from Whole-Genome Sequence Data.

Authors:  John J Miller; Bart C Weimer; Ruth Timme; Catharina H M Lüdeke; James B Pettengill; DJ Darwin Bandoy; Allison M Weis; James Kaufman; B Carol Huang; Justin Payne; Errol Strain; Jessica L Jones
Journal:  Appl Environ Microbiol       Date:  2021-01-15       Impact factor: 4.792

8.  Prevalence, phylogenomic insights, and phenotypic characterization of Salmonella enterica isolated from meats in the Tamale metropolis of Ghana.

Authors:  Frederick Adzitey; Gabriel Ayum Teye; Daniel Gyamfi Amoako
Journal:  Food Sci Nutr       Date:  2020-05-22       Impact factor: 2.863

9.  ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses.

Authors:  Natasha Pavlovikj; Joao Carlos Gomes-Neto; Jitender S Deogun; Andrew K Benson
Journal:  PeerJ       Date:  2021-05-21       Impact factor: 2.984

10.  Independent Evolution with the Gene Flux Originating from Multiple Xanthomonas Species Explains Genomic Heterogeneity in Xanthomonas perforans.

Authors:  E A Newberry; R Bhandari; G V Minsavage; S Timilsina; M O Jibrin; J Kemble; E J Sikora; J B Jones; N Potnis
Journal:  Appl Environ Microbiol       Date:  2019-10-01       Impact factor: 4.792

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.