Literature DB >> 34055480

ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses.

Natasha Pavlovikj1, Joao Carlos Gomes-Neto2,3, Jitender S Deogun1, Andrew K Benson2,3.   

Abstract

Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.
© 2021 Pavlovikj et al.

Entities:  

Keywords:  Bacteria; High-performance computing; High-throughput computing; Pan-genome; Pipeline; Population-genomics; Scalability; Workflow-management system

Year:  2021        PMID: 34055480      PMCID: PMC8142932          DOI: 10.7717/peerj.11376

Source DB:  PubMed          Journal:  PeerJ        ISSN: 2167-8359            Impact factor:   2.984


  81 in total

1.  Whole-genome sequencing for national surveillance of Shiga toxin-producing Escherichia coli O157.

Authors:  Timothy J Dallman; Lisa Byrne; Philip M Ashton; Lauren A Cowley; Neil T Perry; Goutam Adak; Liljana Petrovska; Richard J Ellis; Richard Elson; Anthony Underwood; Jonathan Green; William P Hanage; Claire Jenkins; Kathie Grant; John Wain
Journal:  Clin Infect Dis       Date:  2015-04-17       Impact factor: 9.079

2.  GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens.

Authors:  Zhemin Zhou; Nabil-Fareed Alikhan; Martin J Sergeant; Nina Luhmann; Cátia Vaz; Alexandre P Francisco; João André Carriço; Mark Achtman
Journal:  Genome Res       Date:  2018-07-26       Impact factor: 9.043

3.  The comprehensive antibiotic resistance database.

Authors:  Andrew G McArthur; Nicholas Waglechner; Fazmin Nizam; Austin Yan; Marisa A Azad; Alison J Baylay; Kirandeep Bhullar; Marc J Canova; Gianfranco De Pascale; Linda Ejim; Lindsay Kalan; Andrew M King; Kalinka Koteva; Mariya Morar; Michael R Mulvey; Jonathan S O'Brien; Andrew C Pawlowski; Laura J V Piddock; Peter Spanogiannopoulos; Arlene D Sutherland; Irene Tang; Patricia L Taylor; Maulik Thaker; Wenliang Wang; Marie Yan; Tennison Yu; Gerard D Wright
Journal:  Antimicrob Agents Chemother       Date:  2013-05-06       Impact factor: 5.191

4.  Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter.

Authors:  Samuel K Sheppard; Xavier Didelot; Guillaume Meric; Alicia Torralbo; Keith A Jolley; David J Kelly; Stephen D Bentley; Martin C J Maiden; Julian Parkhill; Daniel Falush
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-01       Impact factor: 11.205

5.  A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic.

Authors:  Matthew T G Holden; Li-Yang Hsu; Kevin Kurt; Lucy A Weinert; Alison E Mather; Simon R Harris; Birgit Strommenger; Franziska Layer; Wolfgang Witte; Herminia de Lencastre; Robert Skov; Henrik Westh; Helena Zemlicková; Geoffrey Coombs; Angela M Kearns; Robert L R Hill; Jonathan Edgeworth; Ian Gould; Vanya Gant; Jonathan Cooke; Giles F Edwards; Paul R McAdam; Kate E Templeton; Angela McCann; Zhemin Zhou; Santiago Castillo-Ramírez; Edward J Feil; Lyndsey O Hudson; Mark C Enright; Francois Balloux; David M Aanensen; Brian G Spratt; J Ross Fitzgerald; Julian Parkhill; Mark Achtman; Stephen D Bentley; Ulrich Nübel
Journal:  Genome Res       Date:  2013-01-08       Impact factor: 9.043

6.  Multilocus sequence typing as a replacement for serotyping in Salmonella enterica.

Authors:  Mark Achtman; John Wain; François-Xavier Weill; Satheesh Nair; Zhemin Zhou; Vartul Sangal; Mary G Krauland; James L Hale; Heather Harbottle; Alexandra Uesbeck; Gordon Dougan; Lee H Harrison; Sylvain Brisse
Journal:  PLoS Pathog       Date:  2012-06-21       Impact factor: 6.823

7.  Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.

Authors:  Nicole E Wheeler; Paul P Gardner; Lars Barquist
Journal:  PLoS Genet       Date:  2018-05-08       Impact factor: 5.917

8.  Factors driving effective population size and pan-genome evolution in bacteria.

Authors:  Louis-Marie Bobay; Howard Ochman
Journal:  BMC Evol Biol       Date:  2018-10-12       Impact factor: 3.260

9.  SGI-4 in Monophasic Salmonella Typhimurium ST34 Is a Novel ICE That Enhances Resistance to Copper.

Authors:  Priscilla Branchu; Oliver J Charity; Matt Bawn; Gaetan Thilliez; Timothy J Dallman; Liljana Petrovska; Robert A Kingsley
Journal:  Front Microbiol       Date:  2019-05-24       Impact factor: 5.640

View more
  1 in total

1.  Systems-Based Approach for Optimization of Assembly-Free Bacterial MLST Mapping.

Authors:  Natasha Pavlovikj; Joao Carlos Gomes-Neto; Jitender S Deogun; Andrew K Benson
Journal:  Life (Basel)       Date:  2022-04-30
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.