| Literature DB >> 35474671 |
Ignacio Ferrés1,2, Gregorio Iraola1,2,3,4.
Abstract
Pangenome analysis is fundamental to explore molecular evolution occurring in bacterial populations. Here, we introduce Pagoo, an R framework that enables straightforward handling of pangenome data. The encapsulated nature of Pagoo allows the storage of complex molecular and phenotypic information using an object-oriented approach. This facilitates to go back and forward to the data using a single programming environment and saving any stage of analysis (including the raw data) in a single file, making it sharable and reproducible. Pagoo provides tools to query, subset, compare, visualize, and perform statistical analyses, in concert with other microbial genomics packages available in the R ecosystem. As working examples, we used 1,000 Escherichia coli genomes to show that Pagoo is scalable, and a global dataset of Campylobacter fetus genomes to identify evolutionary patterns and genomic markers of host-adaptation in this pathogen.Entities:
Keywords: R; bacterial comparative genomics; bacterial evolution; data visualization; object-oriented programming; pangenome analysis; pangenome reconstruction
Year: 2021 PMID: 35474671 PMCID: PMC9017228 DOI: 10.1016/j.crmeth.2021.100085
Source DB: PubMed Journal: Cell Rep Methods ISSN: 2667-2375
Figure 1Framework and overall design of Pagoo
(A) Example of the relational structure implemented to store, link, and operate over different pangenome data types.
(B) General description of the workflow from assembled genomes to Pagoo analysis. Once pangenome files are created with any available pangenome reconstruction software, these files can be loaded to create the Pagoo object. The specific R6 classes store and manage different data types that can store all the information in a single file or perform comparative analyses using the R console interface or the Pagoo Shiny application.
Figure 2Results extracted from the pangenome object
Exploration of the C. fetus pangenome using information directly extracted from the pangenome object and customized esthetics.
(A) Pangenome and core genome curves with gray circles representing different sub-samples at increasing numbers of genomes; the black lines show the fitting to the power law and exponential decay functions, respectively.
(B) The distribution of genes in different subsets of genomes.
(C) The distribution of the pangenome in core genes and accessory genes (shell and cloud genes).
(D) A principal components analysis generated from the gene presence/absence matrix whose first principal component (PC1) clearly distinguishes between two groups of genomes, one of mostly bovine-derived strains (right), and the other with various hosts (left).
(E) A heatmap (right) with gene count (paralogs) abundance (in columns) and per organisms in (rows, associated to a phylogenetic tree [left]) inferred from a concatenated alignment of core genes. Between the tree and the heatmap, the color bars indicate, from left to right, the host associated to each isolate, and three different depth levels of subpopulations inferred by the Rhierbaps package.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Pagoo | This paper | |
| Prokka | ||
| ggplot2 | ||
| DECIPHER | ||
| Phangorn | ||
| Rhierbaps | ||
| Ape | ||
| Roary | ||
| Scripts to run all the analysis | This paper | |
| Singularity | ||
| Singularity container hosted at singularity-hub | ||