Literature DB >> 35916725

PanExplorer: A web-based tool for exploratory analysis and visualization of bacterial pan-genomes.

Alexis Dereeper1,2, Marilyne Summo3,2, Damien F Meyer1.   

Abstract

MOTIVATION: As pan-genome approaches are largely employed for bacterial comparative genomics and evolution analyses, but still difficult to be carried out by non-bioinformatician biologists, there is a need for an innovative tool facilitating the exploration of bacterial pan-genomes.
RESULTS: PanExplorer is a web application providing various genomic analyses and reports, giving intuitive views that enable a better understanding of bacterial pan-genomes. As an example, we produced the pan-genome for 121 Anaplasmataceae strains (including 30 Ehrlichia, 15 Anaplasma, 68 Wolbachia).
AVAILABILITY AND IMPLEMENTATION: PanExplorer is written in Perl CGI and relies on several JavaScript libraries for visualization (hotmap.js, MauveViewer, CircosJS). It is freely available at http://panexplorer.southgreen.fr. The source code has been released in a GitHub repository https://github.com/SouthGreenPlatform/PanExplorer. A documentation section is available on PanExplorer website. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35916725      PMCID: PMC9477528          DOI: 10.1093/bioinformatics/btac504

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1 Introduction

In the past decade, the pan-genome concept has been largely employed to investigate the bacterial comparative genomics and evolution analyses (Hyun ; Jonkheer ; Vernikos, 2020). Many programs have been developed for this purpose such as Roary (Page ) or more recently PanACoTA (Perrin ), and a need is still present for the efficient storage, exploitation and visualization of data derived from pan-genome analyses. To address this question, we developed a web-based application, PanExplorer, which performs online pan-genome analysis and displays resulting information as a comprehensive and easy-to-use solution, through several modules facilitating the exploration of gene clusters and interpretation of data. Several web applications have been recently published for this purpose. PanX (Ding ) is a sophisticated web application for browsing among clusters after pre-computed pan-genome analyses, but it does not allow matrix visualization, synteny exploration nor genome import. On the other hand, PGAweb (Chen ), PGAPX (Zhao ) and PanWeb (Pantoja ) are web servers for running online the PGAP software (Zhao ) on users’ Genbank files, but output files and images cannot be manipulated and browsed interactively. Thus, in contrast to the existing web-based tools for exploring pan-genomes, PanExplorer brings together in a single web application a whole set of data representation modules that allow to view pan-genomic information from different angles and offers the possibility to submit online a customized selection of bacteria strains if completely assembled and annotated.

2 The PanExplorer application

The application takes a list of GenBank identifiers as input and the server will execute successively: (i) retrieval of the corresponding published complete genome sequences and their annotations, (ii) gene clustering and pan-genome analysis using PGAP (Zhao ), Roary (Page ) or PanACoTA (Perrin ), (iii) attribution of the Clusters of Orthologous Groups (COG) functional categories using RPSblast against the COG database (Tatusov ) and (iv) GC content calculation using SkewIT (Lu ) (Fig. 1A). The PanExplorer application is developed in Perl CGI, web interactivity is allowed by external JavaScript libraries: hotmap.js (https://github.com/nconrad/hotmap), MauveViewer (https://github.com/PATRIC3/mauve-viewer), CircosJS (https://github.com/nicgirault/circosJS), MSAViewer (Yachdav ), phylotree.js (Shank ) and D3 (Bostock ).
Fig. 1.

The PanExplorer application process. (A) Overview of the process: data upload, pre-processing using a PGAP/Roary/PanACoTA-based pipeline, data exploration. (B) After data processing, the application provides an overview of results, this includes PAV matrix, distribution of core and accessory genome and global phylogeny. Exploratory analysis consists of Circos representations of core-genes and strain-specific genes, synteny investigation or visual inspection of clusters of interest

The PanExplorer application process. (A) Overview of the process: data upload, pre-processing using a PGAP/Roary/PanACoTA-based pipeline, data exploration. (B) After data processing, the application provides an overview of results, this includes PAV matrix, distribution of core and accessory genome and global phylogeny. Exploratory analysis consists of Circos representations of core-genes and strain-specific genes, synteny investigation or visual inspection of clusters of interest

3 Interactive data exploration

3.1 Pan-genome visualization

As a presence/absence variation (PAV) matrix (Fig. 1B) using the hotmap.js javascript library. This overview allows to easily identify and distinguish core-genes (present in all strains), dispensable genes (genes from the accessory genome) and strain-specific genes. Furthermore, this section allows a particular cluster of interest to be selected for further investigation and displayed in detail (Fig. 1B). This section plots the distribution of COG function categories for each analyzed genome and highlights potentially over- or under-represented functional category in given strains. In addition, it also reports in a table list COG and COG categories assigned for each gene cluster of the core-genome.

3.2 Circos

The physical map of core-genes and strain-specific genes can be displayed as a circular genomic representation (Krzywinski ) (Fig. 1B), for each genome taken independently. Each gene is colorized according to its COG category. It also plots the GC skew values, in sliding windows along the genome.

3.3 Synteny

The conservation of gene order between genomes can be investigated using two graphical representations (Fig. 1B). A Hive Plot built with d3.js (Bostock ) is displayed for a global macro-synteny overview, allowing to detect rearrangements between strains (translocation, duplication or inversion) while a Mauve viewer (Darling ) allows the zoom in depth until giving access to the gene information. For both representations, each cluster of the core-genome is materialized as a link between genomes. Comparison is possible after the selection of three genomes among those available in the project.

3.4 Visual inspection of a specific cluster

PanExplorer offers the possibility to focus on any cluster of the pan-genome, and get access to its composition in terms of protein or DNA sequences, to their alignment using Muscle (Edgar ) and subsequent gene-based phylogeny (NJ distance tree) or colorized minimum spanning network (Fig. 1B). This examination can be reached from any cluster tables, outputted either from the cluster or gene search panel, or directly after clicking on a gene from Circos or pan-genome matrix overview.

3.5 Dynamic reevaluation using a subset of genomes

The application allows to reduce the PAV matrix after the selection of a subset of strains from the original collection of genomes (e.g. for focusing on one species or one particular phenotype), which finally results to a different list of genes defined as core-genes.

4 Conclusions

PanExplorer offers broad-spectrum of tools for easily exploring pan-genomes for scientists without programing skills (automatization of process, no installation is required). Thanks to a connection to NCBI ftp server, the application guarantees an up-to-date availability of public genomes, to be analyzed on-the-fly, and represents a versatile tool for genome exploration filling a need for bacteriologist community. Among perspectives and further development, new functionalities might be implemented shortly such as online pan-Genome Wide Association Studies (pan-GWAS) based on Scoary software (Brynildsrud ) or COG statistical enrichment studies. By allowing visualization of high dimensional data, PanExplorer can assist investigations of pan-genome dynamics of various bacterial lifestyles. It combines efficiently in-depth search of individual genes and their mutations as well as broad examination of pan-genomes. With routine publication of bacterial genomes, PanExplorer is now a web application of choice to assist microbiological research for a better control of bacterial infectious diseases.
  18 in total

1.  Mauve: multiple alignment of conserved genomic sequence with rearrangements.

Authors:  Aaron C E Darling; Bob Mau; Frederick R Blattner; Nicole T Perna
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

2.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

3.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

4.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

5.  PGAP: pan-genomes analysis pipeline.

Authors:  Yongbing Zhao; Jiayan Wu; Junhui Yang; Shixiang Sun; Jingfa Xiao; Jun Yu
Journal:  Bioinformatics       Date:  2011-11-29       Impact factor: 6.937

6.  Roary: rapid large-scale prokaryote pan genome analysis.

Authors:  Andrew J Page; Carla A Cummins; Martin Hunt; Vanessa K Wong; Sandra Reuter; Matthew T G Holden; Maria Fookes; Daniel Falush; Jacqueline A Keane; Julian Parkhill
Journal:  Bioinformatics       Date:  2015-07-20       Impact factor: 6.937

7.  MSAViewer: interactive JavaScript visualization of multiple sequence alignments.

Authors:  Guy Yachdav; Sebastian Wilzbach; Benedikt Rauscher; Robert Sheridan; Ian Sillitoe; James Procter; Suzanna E Lewis; Burkhard Rost; Tatyana Goldberg
Journal:  Bioinformatics       Date:  2016-07-13       Impact factor: 6.937

8.  SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.

Authors:  Jennifer Lu; Steven L Salzberg
Journal:  PLoS Comput Biol       Date:  2020-12-04       Impact factor: 4.475

9.  Comparative pangenomics: analysis of 12 microbial pathogen pangenomes reveals conserved global structures of genetic and functional diversity.

Authors:  Jason C Hyun; Jonathan M Monk; Bernhard O Palsson
Journal:  BMC Genomics       Date:  2022-01-04       Impact factor: 3.969

10.  PGAweb: A Web Server for Bacterial Pan-Genome Analysis.

Authors:  Xinyu Chen; Yadong Zhang; Zhewen Zhang; Yongbing Zhao; Chen Sun; Ming Yang; Jinyue Wang; Qian Liu; Baohua Zhang; Meili Chen; Jun Yu; Jiayan Wu; Zhong Jin; Jingfa Xiao
Journal:  Front Microbiol       Date:  2018-08-21       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.