Literature DB >> 33010178

BacWGSTdb 2.0: a one-stop repository for bacterial whole-genome sequence typing and source tracking.

Ye Feng1,2, Shengmei Zou2, Hangfei Chen1, Yunsong Yu1, Zhi Ruan1.   

Abstract

An increasing prevalence of hospital acquired infections and foodborne illnesses caused by pathogenic and multidrug-resistant bacteria has stimulated a pressing need for benchtop computational techniques to rapidly and accurately classify bacteria from genomic sequence data, and based on that, to trace the source of infection. BacWGSTdb (http://bacdb.org/BacWGSTdb) is a free publicly accessible database we have developed for bacterial whole-genome sequence typing and source tracking. This database incorporates extensive resources for bacterial genome sequencing data and the corresponding metadata, combined with specialized bioinformatics tools that enable the systematic characterization of the bacterial isolates recovered from infections. Here, we present BacWGSTdb 2.0, which encompasses several major updates, including (i) the integration of the core genome multi-locus sequence typing (cgMLST) approach, which is highly scalable and appropriate for typing isolates belonging to different lineages; (ii) the addition of a multiple genome analysis module that can process dozens of user uploaded sequences in a batch mode; (iii) a new source tracking module for comparing user uploaded plasmid sequences to those deposited in the public databases; (iv) the number of species encompassed in BacWGSTdb 2.0 has increased from 9 to 20, which represents bacterial pathogens of medical importance; (v) a newly designed, user-friendly interface and a set of visualization tools for providing a convenient platform for users are also included. Overall, the updated BacWGSTdb 2.0 bears great utility in continuing to provide users, including epidemiologists, clinicians and bench scientists, with a one-stop solution to bacterial genome sequence analysis.
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Year:  2021        PMID: 33010178      PMCID: PMC7778894          DOI: 10.1093/nar/gkaa821

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The history of the world is intertwined with the impact that bacterial infectious diseases have had on humans. The advent of antimicrobials has fostered the belief that, as Sir McFarland Burnett stated in 1962, ‘Almost all of the major practical problems of dealing with infectious disease had been solved’ (1). However, this was soon led to disillusion due to the later emergence and rapid dissemination of antimicrobial resistance (AMR). In particular, in the current era of globalization, increasing international travel and food transportation have led to cross-board bacterial transmission events and even pandemics (2). A well-known example in the recent decade is the large outbreak caused by Shiga-toxin-producing Escherichia coli O104:H4, which started in Germany in the summer of 2011 with the consumption of sprouts and later spread in only two months to at least 16 countries (3). Although relatively rare, such mode of transmission also take place with hospital-acquired infections. For example, Acinetobacter baumannii, a notorious multi-drug resistant nosocomial bacteria, was believed to have initially infected American soldiers in Iraq and from them to have been brought back to military hospitals in the United States, after which it disseminated rapidly throughout the entire nation (4). Given the possible cross-border transmission nature of pathogenic bacteria and the consequent threat to global public health, there is a pressing need for global surveillance and the early detection of infectious disease outbreaks. Traditional epidemiological studies have usually focused initially on patients with certain epidemiological links, recovered bacterial isolates from these subjects’ clinical samples and finally determined whether their isolates had clonal relationships through bacterial typing techniques (5). However, such suspected links are often missing due to untimely or incomplete epidemiological surveys or asymptomatic carriers (6). Therefore, it is imperative to establish a reverse strategy, i.e. when bacterial isolates are found to be sufficiently similar by high-resolution typing techniques, they are deemed to be a consequence of infection from a common source (7). Due to its ultimate single base pair resolution, whole-genome sequencing (WGS) has fulfilled this demand and gradually replaced pulsed-field gel electrophoresis (PFGE) and conventional seven-locus multi-locus sequence typing (MLST) as the new ‘gold standard’ typing technique (8,9). Concomitantly, there is a pressing need to organize, standardize and share bacterial genome sequences in a worldwide accessible database to fully exert the power of this reverse strategy on international surveillance and the early outbreak detection of bacterial infections. In the year 2015, we initially introduced BacWGSTdb, a bacterial whole genome sequence typing and source tracking database designed for nine bacterial organisms of medical importance (10). This database incorporates extensive resources from bacterial genome sequence data as well as the corresponding metadata retrieved from the NCBI GenBank and BioSample database (11,12). By implementing a reference genome-based single-nucleotide polymorphism (SNP) approach, BacWGSTdb provides instant comparisons between user-uploaded genomes and an unprecedentedly large global set of genomes. Thus, clinicians, microbiologists and epidemiologists who work in medical facilities or public health institutions with no specialist knowledge of bioinformatics can use the database for the determination of clonal relationships and source tracking. Bench scientists can also use BacWGSTdb as a convenient tool for preliminary evolutionary and comparative genomic analyses. Since the first version of BacWGSTdb, a vast number of additional genomic sequences was determined experimentally and, more importantly, a variety of newly discovered phenotypic traits (e.g. AMR and virulence) could be associated with the WGS data in the public domain (13,14). The demand for database updates and a uniform platform for real-time in silico prediction of these phenotypic traits based on genomic sequence data has become urgent. Here, we updated BacWGSTdb to version 2.0, which reflects not only a large increase in the curated dataset of bacterial genome sequencing data for the existing bacterial organisms, but also the new addition of eleven common pathogenic species. In addition to its updated content, a novel module for tracking the source of newly sequenced plasmids has been incorporated. AMR has challenged the treatment of infectious diseases which pose a serious threat to public health. As the major vector carrying AMR genes, plasmids are prone to horizontal transfer and offer AMR to originally antimicrobial susceptible bacteria, making treatment even more difficult. In this sense, tracing the transmission of AMR-carrying plasmids is of equal importance to tracing that of bacteria. Furthermore, the phylogenetic analysis in BacWGSTdb 2.0 is enhanced by adding the core genome MLST (cgMLST) approach, thereby meeting different genotyping demands. Taken together, we expect that BacWGSTdb 2.0 will continue to benefit users by providing a one-stop repository for bacterial whole-genome sequence typing and source tracking.

DATABASE UPDATE AND ENHANCEMENTS

BacWGSTdb 1.0 included nine bacterial species, while in this update, eleven more species have been newly added. The detailed species and the number of isolates included in BacWGSTdb are listed in Table 1. Thereby, BacWGSTdb has covered almost all common nosocomial, community and foodborne bacterial pathogens.
Table 1.

Name of species and number of isolates in BacWGSTdb

SpeciesVersion 1Version 2Number of isolates (Version 1)Number of isolates (Version 2)
Acinetobacter baumannii 10264260
Bacillus anthracis 158227
Bacillus cereus 1017
Campylobacter coli 820
Campylobacter jejuni 1621
Clostridioides difficile 2322
Enterococcus faecalis 1523
Enterococcus faecium 1885
Escherichia coli 632821 733
Klebsiella pneumoniae 32799020
Listeria monocytogenes 3243
Mycobacterium abscessus 1611
Mycobacterium tuberculosis 25326512
Salmonella enterica 527614 658
Staphylococcus aureus 489011 505
Streptococcus agalactiae 1398
Streptococcus pneumoniae 30188347
Streptococcus suis 1252
Vibrio cholerae 818
Yersinia pestis 257369
Name of species and number of isolates in BacWGSTdb The usage of BacWGSTdb includes two function modules, ‘Tools’ and ‘Browse’. The former allows users to upload their own FASTA-formatted genome sequence(s) to find closely related records in the database. The latter allows users to browse the isolates in the database, compare their clinical and microbiological traits, and investigate their phylogenetic relationships. With the release of BacWGSTdb 2.0, we are introducing multiple major changes in both the backend and the web interface that provide users with more effective and efficient ways to browse and query the WGS data and to quickly cluster and identify related sequences for uncovering potential transmission sources. This helps clinicians or public health scientists investigate hospital-acquired or foodborne infection outbreaks. In addition, BacWGSTdb 2.0 also provides the online analyses of conventional seven-locus MLST, predictions of AMR and virulence genes, and source tracking of plasmid sequences. An overview of the database schematic is shown in Figure 1.
Figure 1.

Overview of the content and function modules of BacWGSTdb 2.0. ‘Infrastructure’ lists the public database and tools integrated in BacWGSTdb 2.0. ‘Browse’ functions to visualize and compare the genetic relationships among isolates deposited in BacWGSTdb 2.0. ‘Tools’ functions for whole genome sequence typing and source tracking based on user uploaded sequence(s). ‘Species’ represents 20 bacterial species currently supported by BacWGSTdb 2.0.

Overview of the content and function modules of BacWGSTdb 2.0. ‘Infrastructure’ lists the public database and tools integrated in BacWGSTdb 2.0. ‘Browse’ functions to visualize and compare the genetic relationships among isolates deposited in BacWGSTdb 2.0. ‘Tools’ functions for whole genome sequence typing and source tracking based on user uploaded sequence(s). ‘Species’ represents 20 bacterial species currently supported by BacWGSTdb 2.0.

Single genome analysis

In BacWGSTdb 1.0, the module Single Genome Analysis performs conventional seven-locus MLST analysis and searches of the genetically closest relatives in the database following the SNP approach upon the user's uploaded preassembled single genome. In detail, MUMmer 3.22 is used for the alignment with the reference genome and the subsequent SNP identification; and the phylogenetic tree is generated by Clearcut 1.0 (15,16). The most important update to this module in BacWGSTdb 2.0 is that it now offers both SNP and cgMLST analysis to compare the user uploaded genome sequence against those deposited in the database. The SNP approach compares single nucleotide differences between isolates to a designated reference genome, which can be used to investigate the clonal relationship among isolates sharing a high genetic relatedness (e.g. collected from outbreaks of hospital or foodborne infections). By comparison, the cgMLST approach, another widely used sequence typing approach for bacterial genomes, is an extension of conventional seven-locus MLST scheme that expands the range of target genes to whole genome level and is often used as a solution to provide highly detailed phylogenetic relatedness of a species and is suitable for investigating the middle/long-term evolutionary history of bacterial pathogens. Thus, the two approaches are complementary and meet different genotyping demands (17). To perform a cgMLST analysis, a pre-defined reference database (cgMLST scheme) for each species, which contains all known allelic variants in the coding regions for all genomes deposited in BacWGSTdb, is prepared at the backend of BacWGSTdb 2.0 by LOCUST 1.0 (18). The user uploaded genomic sequence is compared to the species-specific cgMLST scheme using BLASTn (19). The identified allelic profile continues to make comparisons with that of each isolate deposited in BacWGSTdb 2.0 for determination of the closely related isolates according to their number of pairwise allelic differences. GrapeTree is used to construct and visualize the minimal spanning tree generated based on the cgMLST allelic profiles, which supports manipulations of both tree layout and the user specified metadata attributes (20) (Figure 2).
Figure 2.

Screenshots of the updated web interface of the Single Genome Analysis module and the detailed outputs using a Salmonella enterica isolate as an example. After uploading the preassembled genomic sequence and setting the appropriate parameters, the analytical results return and can subsequently be classified into three sections: conventional seven-locus MLST, identification of AMR and virulence genes, and source tracking of plasmids and bacteria. In particular, the phylogenetic analysis revealed that the query isolate shows high degree of genetic relatedness in the database, suggesting that these isolates might have originated from the same source. The entire analysis process takes 3–5 min.

Screenshots of the updated web interface of the Single Genome Analysis module and the detailed outputs using a Salmonella enterica isolate as an example. After uploading the preassembled genomic sequence and setting the appropriate parameters, the analytical results return and can subsequently be classified into three sections: conventional seven-locus MLST, identification of AMR and virulence genes, and source tracking of plasmids and bacteria. In particular, the phylogenetic analysis revealed that the query isolate shows high degree of genetic relatedness in the database, suggesting that these isolates might have originated from the same source. The entire analysis process takes 3–5 min. Another important update in this module is the integration of a novel function for typing and source tracking of plasmids. BacWGSTdb 2.0 deposits all complete plasmid sequences from the NCBI GenBank database as well as their metadata. When a draft bacterial genome, which usually contains the plasmid sequence of the sequenced isolate, is uploaded, its plasmid replicon types will be determined by searching the genomic sequence against the database of plasmid replicon genes with BLASTn (21). Moreover, pairwise distances between the users’ sequence and those of each of the plasmids in BacWGSTdb are computed using Mash 2.2 with maximal P-value set to 0.1 and minimal identity set to 0.9 (22). The analytical results include a table listing the records of similar plasmids based on Mash distance, together with a world map displaying the records with available geographical location data (Figure 2). This update makes it possible to trace the transmission routes of plasmids, which we believe are of at least equal importance with that of chromosomes. Furthermore, the in silico identification of acquired AMR and virulence genes are also incorporated into BacWGSTdb 2.0. The user uploaded sequence is searched against the ResFinder 3.2 and VFDB 2019 database by BLASTn, with minimal identity and a threshold coverage of 90% (23–25).

Multiple genome analysis

The newly designed multiple genome analysis module can process up to 30 user uploaded genome sequences in a batch mode. In silico identifications of conventional seven-locus MLST, AMR and virulence genes are performed for each of the uploaded genomes. A more important purpose for this module is that when users collect multiple isolates which they suspect belong to the same outbreak event, the module can help determine whether there is a clonal relationship among these isolates according to the pairwise comparison of the cgMLST alleles or SNP differences. To this end, the phylogenetic relatedness among the user uploaded multiple genomic sequences will be determined following both the SNP and cgMLST approaches (Figure 3).
Figure 3.

Screenshots of the updated web interface of the Multiple Genome Analysis module and the detailed outputs using nine S. enterica isolates as an example. The Results page lists for each of the uploaded genomes the conventional seven-locus MLST results and the predictions of AMR and virulence genes. In addition, the phylogenetic trees based on the SNP and cgMLST approaches both reveal that the query isolates did not involve an outbreak event, since they differ from one another by over 100 SNPs or cgMLST loci. The entire analysis process takes 5–8 min.

Screenshots of the updated web interface of the Multiple Genome Analysis module and the detailed outputs using nine S. enterica isolates as an example. The Results page lists for each of the uploaded genomes the conventional seven-locus MLST results and the predictions of AMR and virulence genes. In addition, the phylogenetic trees based on the SNP and cgMLST approaches both reveal that the query isolates did not involve an outbreak event, since they differ from one another by over 100 SNPs or cgMLST loci. The entire analysis process takes 5–8 min.

Browse and Search

The Browse and Search module is designed to visualize and compare the genetic relationships among isolates deposited in BacWGSTdb. Each ‘Browse’ page lists the clinical and microbiological metadata of each of the isolates deposited in the database, including their seven-locus MLST sequence type, host, clinical outcome, collection date and geographical location. Data can be sorted by clicking on a specific column header and downloaded as a tab-delimited file. The updated annotations on the AMR and virulence genes for each isolate are also displayed. In BacWGSTdb 1.0, users can select multiple isolates belonging to the same sequence type, and build phylogenetic trees following the SNP approach. This limit has been broken in BacWGSTdb 2.0: users can select and compare any isolate of interest, even from among those belonging to different sequence types. Both the SNP and cgMLST approaches will be applied for establishment of phylogenetic trees. In addition, a newly designed search function enables users to look up keywords (e.g. sequence types) of interest based on varied categories (Figure 4). We therefore believe that the updated Browse and Search module will allow users to retrieve information from the database in a convenient and time-saving manner.
Figure 4.

Screenshots of the updated Browse interface. Users can sort and select isolates based on their various attributes. For the selected isolates, phylogenetic trees following both the SNP and cgMLST approaches are provided.

Screenshots of the updated Browse interface. Users can sort and select isolates based on their various attributes. For the selected isolates, phylogenetic trees following both the SNP and cgMLST approaches are provided.

CONCLUSIONS AND FUTURE PERSPECTIVES

Next-generation sequencing and bioinformatics are expediting pathogen characterization, transforming the response to infectious disease outbreaks, and providing new insights into disease emergence and transmission. Standardized and user-friendly online databases make WGS analysis more accessible, even to those lacking bioinformatics expertise. Here, we reported a major update of BacWGSTdb, including the significantly expanded content of the database, additional analytical and visualization tools, and a newly designed, user-friendly interface, all of which greatly facilitate the genomic epidemiological surveillance of bacterial pathogens. We believe these additions significantly enrich our database, which is expected to provide a one-stop solution to bacterial genome analysis and, more importantly, translate whole genome sequencing from proof-of-concept to routine use in clinical practice.
  25 in total

1.  An outbreak of multidrug-resistant Acinetobacter baumannii-calcoaceticus complex infection in the US military health care system associated with military operations in Iraq.

Authors:  Paul Scott; Gregory Deye; Arjun Srinivasan; Clinton Murray; Kimberly Moran; Ed Hulten; Joel Fishbain; David Craft; Scott Riddell; Luther Lindler; James Mancuso; Eric Milstrey; Christian T Bautista; Jean Patel; Alessa Ewell; Tacita Hamilton; Charla Gaddy; Martin Tenney; George Christopher; Kyle Petersen; Timothy Endy; Bruno Petruccelli
Journal:  Clin Infect Dis       Date:  2007-05-08       Impact factor: 9.079

2.  The global dissemination of bacterial infections necessitates the study of reverse genomic epidemiology.

Authors:  Zhi Ruan; Yunsong Yu; Ye Feng
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

3.  GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens.

Authors:  Zhemin Zhou; Nabil-Fareed Alikhan; Martin J Sergeant; Nina Luhmann; Cátia Vaz; Alexandre P Francisco; João André Carriço; Mark Achtman
Journal:  Genome Res       Date:  2018-07-26       Impact factor: 9.043

4.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

Review 5.  Defining and combating antibiotic resistance from One Health and Global Health perspectives.

Authors:  Sara Hernando-Amado; Teresa M Coque; Fernando Baquero; José L Martínez
Journal:  Nat Microbiol       Date:  2019-08-22       Impact factor: 17.745

6.  LOCUST: a custom sequence locus typer for classifying microbial isolates.

Authors:  Lauren M Brinkac; Erin Beck; Jason Inman; Pratap Venepally; Derrick E Fouts; Granger Sutton
Journal:  Bioinformatics       Date:  2017-06-01       Impact factor: 6.937

7.  Identification of acquired antimicrobial resistance genes.

Authors:  Ea Zankari; Henrik Hasman; Salvatore Cosentino; Martin Vestergaard; Simon Rasmussen; Ole Lund; Frank M Aarestrup; Mette Voldby Larsen
Journal:  J Antimicrob Chemother       Date:  2012-07-10       Impact factor: 5.790

8.  Escherichia coli O104:H4 infections and international travel.

Authors:  David C Alexander; Weilong Hao; Matthew W Gilmour; Sandra Zittermann; Alicia Sarabia; Roberto G Melano; Analyn Peralta; Marina Lombos; Keisha Warren; Yuri Amatnieks; Evangeline Virey; Jennifer H Ma; Frances B Jamieson; Donald E Low; Vanessa G Allen
Journal:  Emerg Infect Dis       Date:  2012-03       Impact factor: 6.883

9.  BacWGSTdb, a database for genotyping and source tracking bacterial pathogens.

Authors:  Zhi Ruan; Ye Feng
Journal:  Nucleic Acids Res       Date:  2015-10-03       Impact factor: 16.971

10.  Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data.

Authors:  Nenad Macesic; Oliver J Bear Don't Walk; Itsik Pe'er; Nicholas P Tatonetti; Anton Y Peleg; Anne-Catrin Uhlemann
Journal:  mSystems       Date:  2020-05-26       Impact factor: 6.496

View more
  34 in total

1.  5NosoAE: a web server for nosocomial bacterial antibiogram investigation and epidemiology survey.

Authors:  Chih-Chieh Chen; Yen-Yi Liu; Ya-Chu Yang; Chu-Yi Hsu
Journal:  Nucleic Acids Res       Date:  2022-05-25       Impact factor: 19.160

2.  Imported Pet Reptiles and Their "Blind Passengers"-In-Depth Characterization of 80 Acinetobacter Species Isolates.

Authors:  Franziska Unger; Tobias Eisenberg; Ellen Prenger-Berninghoff; Ursula Leidner; Torsten Semmler; Christa Ewers
Journal:  Microorganisms       Date:  2022-04-24

3.  Genomic Characterization of an O101:H9-ST167 NDM-5-Producing Escherichia coli Strain from a Kitten in Italy.

Authors:  Gherard Batisti Biffignandi; Aurora Piazza; Federica Marchesini; Paola Prati; Alessandra Mercato; Aseel Abu Alshaar; Giuseppina Andreoli; Davide Sassera; Roberta Migliavacca
Journal:  Microbiol Spectr       Date:  2022-06-06

4.  Epidemiological and genomic characteristics of Acinetobacter baumannii from different infection sites using comparative genomics.

Authors:  Xingchen Bian; Xiaofen Liu; Xuefei Zhang; Xin Li; Jing Zhang; Huajun Zheng; Sichao Song; Xiang Li; Meiqing Feng
Journal:  BMC Genomics       Date:  2021-07-12       Impact factor: 3.969

5.  An insight into the genome of a Listeria monocytogenes strain isolated from a bloodstream infection and phylogenetic analysis.

Authors:  Weizhong Wang; Juan Xu; Yanmin Chen; Zhongliang Zhu; Fang He
Journal:  J Clin Lab Anal       Date:  2021-05-16       Impact factor: 2.352

Review 6.  Cracking the Challenge of Antimicrobial Drug Resistance with CRISPR/Cas9, Nanotechnology and Other Strategies in ESKAPE Pathogens.

Authors:  Tanzeel Zohra; Muhammad Numan; Aamer Ikram; Muhammad Salman; Tariq Khan; Misbahud Din; Muhammad Salman; Ayesha Farooq; Afreenish Amir; Muhammad Ali
Journal:  Microorganisms       Date:  2021-04-29

7.  First Identification of a Multidrug-Resistant Pseudomonas putida Co-Carrying Five β-Lactam Resistance Genes Recovered from a Urinary Tract Infection in China.

Authors:  Danni Bao; Linyao Huang; Jianxin Yan; Yexuzi Li; Zhi Ruan; Tian Jiang
Journal:  Infect Drug Resist       Date:  2022-04-28       Impact factor: 4.003

8.  Whole genome sequencing of methicillin-resistant and methicillin-sensitive Staphylococcus aureus isolated from 4 horses in a veterinary teaching hospital and its ambulatory service.

Authors:  Paloma Morice; Marion Allano; Chantale Provost; Julie-Hélène Fairbrother; Carl A Gagnon; Frédéric Sauvé
Journal:  Can J Vet Res       Date:  2021-07       Impact factor: 1.310

9.  Epidemiology of Meropenem/Vaborbactam Resistance in KPC-Producing Klebsiella pneumoniae Causing Bloodstream Infections in Northern Italy, 2018.

Authors:  Paolo Gaibani; Donatella Lombardo; Linda Bussini; Federica Bovo; Beatrice Munari; Maddalena Giannella; Michele Bartoletti; Pierluigi Viale; Tiziana Lazzarotto; Simone Ambretti
Journal:  Antibiotics (Basel)       Date:  2021-05-06

10.  The Emergence and Molecular Characteristics of New Delhi Metallo β-Lactamase-Producing Escherichia coli From Ducks in Guangdong, China.

Authors:  Min-Ge Wang; Yang Yu; Dong Wang; Run-Shi Yang; Ling Jia; Da-Tong Cai; Si-Lin Zheng; Liang-Xing Fang; Jian Sun; Ya-Hong Liu; Xiao-Ping Liao
Journal:  Front Microbiol       Date:  2021-07-05       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.