Literature DB >> 17984072

xBASE2: a comprehensive resource for comparative bacterial genomics.

Roy R Chaudhuri¹, Nicholas J Loman, Lori A S Snyder, Christopher M Bailey, Dov J Stekel, Mark J Pallen.

Abstract

xBASE is a genome database aimed at helping laboratory-based bacteriologists make best use of bacterial genome sequence data, with a particular emphasis on comparative genomics. The latest version, xBASE 2.0 (http://xbase.bham.ac.uk), now provides comprehensive coverage of all bacterial genomes and features an updated modularized backend and an improved user interface, which includes a taxonomy browser and a powerful full-text search facility.

Entities: Species

Mesh：

Substances：
Codon

Year: 2007 PMID： 17984072 PMCID： PMC2238843 DOI： 10.1093/nar/gkm928

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

xBASE grew out of coliBASE, CampyDB and several similar projects, which were restricted to selected taxonomic groups of bacteria (1,2). In its initial implementation, xBASE was simply an umbrella term applied to a set of distinct databases that relied on a similar schema. In creating xBASE2, we have developed a new integrated, comprehensive bacterial genomes database, with greatly increased coverage, many new features and an improved user interface and code base.

FEATURES

The implementation of a scalable xBASE schema has allowed us to incorporate all bacterial genome sequences available from the NCBI (currently >800), including finished genomes and partially assembled genomes from the WGS division of GenBank. At the time of writing, 516 complete genome sequences and 367 draft genome sequences are included in the database, with the total number expected to reach well over 1000 by the end of 2007. The current design is intended to scale to at least 10 000 genomes. The database is updated monthly via an automated process so that new genome sequence data are incorporated into xBASE soon after they are released into the public domain. A numerical scheme for naming updates allows researchers to record and cite the particular version of xBASE used in their studies. The wealth of data within xBASE presents new challenges in locating a genome sequence of interest. The user interface provides a number of options to address this problem. The xBASE top page contains a complete list of the included genomes, which can be filtered to a more manageable list of ‘popular’ genomes, where popularity is determined by hit rate of each genome sequence, i.e. is set by the xBASE user community. A taxonomy browser is provided, which can be toggled between a full taxonomy view and a list of popular taxa. Although xBASE2 maintains the identities of the earlier databases (coliBASE, CampyDB, etc.) to facilitate the transition to the new version for our current user base, these are now merely flags of convenience for a small subset of all potential taxon-specific views. For example, the coliBASE tab takes the user to set of genomes defined by the link from the taxon Enterobacteriaceae. However, users now can select their genome sets of interest from any point on the NCBI bacterial taxonomic hierarchy, from phyla such as Actinobacteria to subspecies like Zymomonas mobilis subsp. mobilis. Genome sequences can also be located using the full-text search facility, with results ranked on the popularity of the organism. The number of genomes within xBASE2 means that pre-calculating all pairwise genome sequence alignments is now no longer feasible. Instead, MUMmer (3) alignments are now performed on demand, within seconds, and are cached so that they are redisplayed instantly. Nucleotide alignments are now performed using the nucmer component of MUMmer. This has the advantage over mummer in that it allows comparisons between multiple-contig genomes, which is particularly useful when working with unfinished genomes. The alignment display code has been modified to allow multiple pairwise comparisons to be stacked, so that the same region can be viewed in many different genome sequences (Figure 1).

Figure 1.

Stacked alignment of regions from multiple Escherichia coli and Shigella genomes. The region displayed shows the ‘black hole’ deletion of the cadA gene from the Shigella genomes. The product of this gene has been shown to attenuate Shigella virulence (5). An XY plot allows the plotting of data such as nucleotide composition and principal axes derived from correspondence analysis of relative synonymous codon usage data (Figure 2). This facilitates identification of genes that show unusual patterns of composition or codon usage that might be indicative of horizontal gene transfer. The graphical display of genomic regions now includes genomic coordinates and at high zoom nucleotide sequences are displayed (Figure 3). Circular chromosomes and plasmids are now handled correctly, so the display can wrap around the origin of the sequence. An xBASE blog has been launched, with regular contributions from the development team, including status updates, discussions of new features and requests from users for additional functionality.

Figure 2.

Figure 3.

Alignment of two Staphylococcus aureus genomes, zoomed in to show the nucleotide sequence. The MW2 gene MW1571, which encodes a tRNA methyltransferase, is split into two in the NCTC 8325 genome. The display indicates that this is due to a single base deletion that occurs in a run of 5 T residues, suggesting that it may be an artefact caused by a sequencing error.

Plot of the two principal axes determined by correspondence analysis of relative synonymous codon usage of genes from E. coli strain 536. The plot shows a typical ‘rabbit's head’ (6), with the right ‘ear’ corresponding to highly expressed genes with optimal codon usage. The left ‘ear’ includes genes likely to be of foreign origin, most of which show a low GC content. Alignment of two Staphylococcus aureus genomes, zoomed in to show the nucleotide sequence. The MW2 gene MW1571, which encodes a tRNA methyltransferase, is split into two in the NCTC 8325 genome. The display indicates that this is due to a single base deletion that occurs in a run of 5 T residues, suggesting that it may be an artefact caused by a sequencing error.

XBASE2 IMPLEMENTATION

The xBASE database backend has been completely rewritten to make use of a modified BioSQL schema, with additional tables to allow storage of genome alignment data and a flattened annotation table to aid rapid full-text searching. The code has also been modularized. The streamlined code brings benefits both to maintainers and end user, allowing new features to be developed rapidly and delivering a significant improvement in response time. Performance has also been improved by investment in new hardware. The full-text indexing facilities offered by the tsearch2 component of Postgresql have proven vital in creating a fast, accurate search facility. Researchers can now rapidly search for text strings such as gene names, phrases and keywords within both genome annotation and the documentation associated with each genome in the Genomes online database (GOLD) (4). xBASE source code and database is freely available on request.

FUTURE CHALLENGES

xBASE is funded until 2012. A key challenge will be maintaining usability in the face of thousands of new genome sequences, which are likely to be delivered within this timeframe by the application of next-generation sequencing to bacteriology. Within xBASE, multiple stacked pairwise comparisons will have to be supplemented with a view that collapses down identical regions and only shows representative displays of syntenic regions. Another challenge will be integration of experimental data (e.g. microarray and chromatin immunoprecipitation data) and networks (e.g. of protein–protein interactions) into the xBASE facility. Work has already begun on xBASE 3.0, which will feature a new ‘Web 2.0’ user interface exploiting AJAX and advanced browser visual layout techniques, so that different data sets can be overlain and manipulated on the same page as a ‘mashup’ and full zooming and panning across a genome become possible (as in Google Maps). User tracking will enable us to provide a personalized interface, which will include a community annotation facility. We are thus confident that xBASE will continue to serve the bacteriology community well into the second decade of the new millennium. xBASE is available at http://xbase.bham.ac.uk.

6 in total

1. Evidence for horizontal gene transfer in Escherichia coli speciation.

Authors: C Médigue; T Rouxel; P Vigier; A Hénaut; A Danchin
Journal: J Mol Biol Date: 1991-12-20 Impact factor: 5.469

2. "Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli.

Authors: A T Maurelli; R E Fernández; C A Bloch; C K Rode; A Fasano
Journal: Proc Natl Acad Sci U S A Date: 1998-03-31 Impact factor: 11.205

3. coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics.

Authors: Roy R Chaudhuri; Arshad M Khan; Mark J Pallen
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. Versatile and open software for comparing large genomes.

Authors: Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal: Genome Biol Date: 2004-01-30 Impact factor: 13.583

5. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide.

Authors: Konstantinos Liolios; Nektarios Tavernarakis; Philip Hugenholtz; Nikos C Kyrpides
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

6. xBASE, a collection of online databases for bacterial comparative genomics.

Authors: Roy R Chaudhuri; Mark J Pallen
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

6 in total

80 in total

1. Optical mapping and sequencing of the Escherichia coli KO11 genome reveal extensive chromosomal rearrangements, and multiple tandem copies of the Zymomonas mobilis pdc and adhB genes.

Authors: Peter C Turner; Lorraine P Yomano; Laura R Jarboe; Sean W York; Christy L Baggett; Brélan E Moritz; Emily B Zentz; K T Shanmugam; Lonnie O Ingram
Journal: J Ind Microbiol Biotechnol Date: 2011-11-11 Impact factor: 3.346

2. Substitutions in the BamA β-barrel domain overcome the conditional lethal phenotype of a ΔbamB ΔbamE strain of Escherichia coli.

Authors: Rene Tellez; Rajeev Misra
Journal: J Bacteriol Date: 2011-10-28 Impact factor: 3.490

3. Complete sequence of a novel 178-kilobase plasmid carrying bla(NDM-1) in a Providencia stuartii strain isolated in Afghanistan.

Authors: Patrick McGann; Jun Hang; Robert J Clifford; Yu Yang; Yoon I Kwak; Robert A Kuschner; Emil P Lesho; Paige E Waterman
Journal: Antimicrob Agents Chemother Date: 2012-01-30 Impact factor: 5.191

4. Deletion of TnAbaR23 results in both expected and unexpected antibiogram changes in a multidrug-resistant Acinetobacter baumannii strain.

Authors: Mandira Kochar; Marialuisa Crosatti; Ewan M Harrison; Barbara Rieck; Jacqueline Chan; Chrystala Constantinidou; Mark Pallen; Hong-Yu Ou; Kumar Rajakumar
Journal: Antimicrob Agents Chemother Date: 2012-01-30 Impact factor: 5.191

5. Performance comparison of benchtop high-throughput sequencing platforms.

Authors: Nicholas J Loman; Raju V Misra; Timothy J Dallman; Chrystala Constantinidou; Saheer E Gharbia; John Wain; Mark J Pallen
Journal: Nat Biotechnol Date: 2012-05 Impact factor: 54.908

6. Genome sequences of two plant growth-promoting fluorescent Pseudomonas strains, R62 and R81.

Authors: N Mathimaran; R Srivastava; A Wiemken; A K Sharma; T Boller
Journal: J Bacteriol Date: 2012-06 Impact factor: 3.490

7. The homodimeric GBS1074 from Streptococcus agalactiae.

Authors: Anshuman Shukla; Mark Pallen; Mark Anthony; Scott A White
Journal: Acta Crystallogr Sect F Struct Biol Cryst Commun Date: 2010-10-27

8. RamA, a member of the AraC/XylS family, influences both virulence and efflux in Salmonella enterica serovar Typhimurium.

Authors: Andrew M Bailey; Al Ivens; Rob Kingsley; Jennifer L Cottell; John Wain; Laura J V Piddock
Journal: J Bacteriol Date: 2010-01-15 Impact factor: 3.490

9. Characterization of 17 chaperone-usher fimbriae encoded by Proteus mirabilis reveals strong conservation.

Authors: Lisa Kuan; Jessica N Schaffer; Christos D Zouzias; Melanie M Pearson
Journal: J Med Microbiol Date: 2014-05-08 Impact factor: 2.472

10. Comparative analysis of two Neisseria gonorrhoeae genome sequences reveals evidence of mobilization of Correia Repeat Enclosed Elements and their role in regulation.

Authors: Lori A S Snyder; Jeff A Cole; Mark J Pallen
Journal: BMC Genomics Date: 2009-02-09 Impact factor: 3.969