Literature DB >> 15608206

BacMap: an interactive picture atlas of annotated bacterial genomes.

Paul Stothard1, Gary Van Domselaar, Savita Shrivastava, Anchi Guo, Brian O'Neill, Joseph Cruz, Michael Ellison, David S Wishart.   

Abstract

BacMap is an interactive visual database containing fully labeled, zoomable and searchable chromosome maps from more than 170 bacterial (archaebacterial and eubacterial) species. It uses a recently developed visualization tool (CGView) to generate high-resolution circular genome maps from sequence feature information. Each map includes an interface that allows the image to be expanded and rotated. In the default view, identified genes are drawn to scale and colored according to coding directions. When a region of interest is expanded, gene labels are displayed. Each label is hyperlinked to a custom 'gene card' which provides several fields of information concerning the corresponding DNA and protein sequences. Each genome map is searchable via a local BLAST search and a gene name/synonym search. BacMap is freely available at http://wishart.biology.ualberta.ca/BacMap/.

Entities:  

Mesh:

Year:  2005        PMID: 15608206      PMCID: PMC540029          DOI: 10.1093/nar/gki075

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Since the first bacterial genome was completed in 1995 (1), more than 170 bacterial genomes have been sequenced (2). Another 500 are currently being sequenced and many will likely be released in the coming year. With current sequencing technology, it is now possible to sequence and assemble an entire bacterial genome in less than a week. This flood of extremely valuable genomic data is threatening to overwhelm our capacity to assimilate and process it. Never before has so much information been available about so many different bacterial species. A growing challenge facing microbiologists and bioinformaticians alike is to find ways to better manage, display and compare these data. Several database resources have appeared over the past few years to help in this regard. GenBank and EMBL now maintain up-to-date microbial genome sequence archives, while MIP's PEDANT (3) and TIGR's CMR (4) provide more detailed annotations and statistics on many microbial genomes. However, the tendency for most consolidated microbial databases is to present sequence-related data in a text-only or in a relatively limited graphic format. This text-only approach ignores the rich information that can be gained by displaying interactive graphical maps of individual genes or open reading frames (ORFs) in their genomic context or by comparing different genomic maps to one another. Given the enormous success of Ensembl's (5) dual textual and graphical approach to presenting metazoan genomes, we decided to investigate the possibility of using a similar concept to present bacterial genomic data. Here, we describe a highly visual, fully interactive, self-updating, web-enabled database called BacMap. BacMap is a unique resource that allows users to rapidly compare, explore and search through hundreds of bacterial genomes. BacMap is essentially an electronic atlas of bacterial genomes with hundreds of colorful, clickable maps, all linked to thousands of megabytes of detailed annotation. By pre-calculating the image maps and annotations, BacMap allows users to rapidly search, zoom in, rotate and zoom out of its circular genomic maps. Furthermore, no special browser plugins or applets are required, making BacMap fully compatible with a wide range of web browsers, hardware configurations and operating systems.

DATABASE DESCRIPTION

The BacMap homepage contains a list of all publicly released eubacterial and archaebacterial genomes (177 at the time of this writing). This list is presented alphabetically, according to genus, species and strain. Each bacterial name is hyperlinked to a ‘species card’, which provides detailed information about the organism in tabular format, including its taxonomy, gram staining properties and number of chromosomes. A brief description of the species, discussing its physiology, general characteristics, ecological niche and relevance to human or animal disease is also given. In addition, an image of the organism (if available) is provided. Below each genome entry is a list of the genome's constituent chromosomes, sorted by length. Five buttons are provided for each chromosome: ‘MAP’, ‘TEXT SEARCH’, ‘BLAST’, ‘STATS’ and ‘DOWNLOAD’. The ‘MAP’ button displays a graphical map of the chromosome in a new window. The ‘TEXT SEARCH’ and ‘BLAST’ buttons are linked to the text search and BLAST search interfaces for the chromosome. The ‘STATS’ button displays several graphs concerning the chromosome's genomic and proteomic characteristics. Finally, the ‘DOWNLOAD’ button opens the data download page for the chromosome. Clicking on the ‘MAP’ button generates a full-screen circular image of the entire bacterial chromosome. On the lower edge of the image is a brief synopsis of the chromosome, including the GenBank accession number of the source sequence. The full view map consists of two concentric rings of forward and reverse strand genes (protein and RNA), with tick marks indicating chromosomal position. Some maps may contain additional feature rings (e.g. COG functional classifications) depending on which annotations are currently available for the chromosome. Clicking on any of the tick marks in the sequence ruler expands the map by a pre-defined step, and centers the view on the base closest to the tick mark that was clicked. The map view can also be manipulated using the control panel located at the bottom of the map. The ‘Expand +’ button zooms in on the current view, while ‘Expand −’ returns to a view showing more of the map with reduced detail. The view can also be shifted along the chromosome backbone in the clockwise and counterclockwise directions, using the ‘Rotate +’ and ‘Rotate −’ buttons, respectively. Hyperlinked gene labels are visible from the first zoom level onwards. Pointing to a gene label displays the start and stop positions of the gene, as well as its known or predicted function. Clicking on the gene label replaces the map view with the corresponding ‘gene card’. The rapid response to these operations is achieved by pre-rendering all the images for each chromosome and the corresponding HTML image maps. The image rendering is done by a recently developed in-house program called CGView, which was designed to generate annotated images of circular chromosomes or plasmids. On average, more than 4000 PNG images and HTML image maps were generated for each chromosome in BacMap. These pre-rendered images typically occupy 100 MB of disk per chromosome. The chromosome maps can be explored manually, or with the assistance of two search tools integrated with the BacMap database. One tool is a Boolean text search, which can be used to search for specific gene names, protein names, alternate names or partial names. Any matches returned from a text search are shown on a dynamically generated search results map. A textual list of the matching genes is also returned, with hyperlinks provided so that the relevant pre-rendered chromosomal map views and gene cards can be quickly retrieved. The second searching method uses BLAST (6) to identify genes that are similar to a user-supplied sequence. The BLAST method can be used as an alternate route to find genes/proteins if the text search is unsuccessful, or as a means to identify orthologs and paralogs of a sequence of interest. As with the text search, the BLAST results are shown graphically and textually, with hyperlinks provided for accessing the chromosome maps and gene cards. Figure 1 provides a montage of BacMap images to demonstrate the image quality, utility and general operation of the BacMap database.
Figure 1

A screenshot montage of the BacMap database. Two different views of the chromosomal map for Aeropyrum pernix are shown in the foreground. The full chromosome map is on the left, while an expanded view providing more detail is on the right. Behind the full chromosome map is a nucleotide composition graph for plasmid lp28-1 from Borrelia burgdorferi, and next to this graph is a gene card displaying textual information about a predicted gene from Staphylococcus aureus. These views of data can be accessed from the BacMap homepage, or from the hyperlinked results of BLAST searches and text searches.

A particularly useful feature of BacMap is the extensive annotation provided for each gene. These annotations, presented as gene cards, are accessible by clicking on the gene labels displayed on individual chromosome maps, or by using the text and BLAST searches. The cards are built using a variety of public databases, such as UniProt (7) and PDB (8), as well as numerous in-house prediction programs. Indeed, many of the annotation methods employed for BacMap were originally developed and validated in the preparation of the CyberCell database—a comprehensive molecular database on Escherichia coli (9). Each time a new bacterial chromosome is added to the BacMap database, an initial set of gene cards is constructed. These preliminary cards, which can be generated rapidly, serve as a source of basic gene and protein information until the more extensively annotated cards have emerged from the analysis pipeline. The final cards typically contain about 50+ fields of annotation, including information on a variety of sequence statistics, potential orthologs and paralogs, predicted function, predicted secondary structure and predicted subcellular location. The annotations are continually updated as more information becomes available and as better prediction programs are developed. A partial listing of BacMap's current annotation fields along with their sources and/or methods is given in Table 1.
Table 1.

Annotations added to gene entries in the BacMap database

AnnotationSource and/or method
DNA sequenceGenBank record parsing
Downstream 100 basesGenBank record parsing
Following geneGenBank record parsing
GC contentCalculated from sequence
Gene OntologyInterPro (10)
Gene positionGenBank record parsing
NCBI GI numberGenBank record parsing
OrthologuesBLAST (6) with heuristics
ParaloguesBLAST (6) with heuristics
Pfam familyPfam (11)
Preceding geneGenBank record parsing
PROSITE families and domainsPROSITE (12)
Protein lengthCalculated from sequence
Protein molecular weightCalculated from sequence
Protein sequenceGenBank record parsing
Secondary structurePSIPRED (13)
Subcellular locationPSORTb (14)
Theoretical pICalculated from sequence
Transmembrane domainsPredictTM
Upstream 100 basesGenBank record parsing

The annotation fields are added in several phases, as results are obtained from the various database searching and sequence analysis programs. The fields and their contents may change as new programs are added to the BacMap annotation pipeline.

In addition to an extensive collection of genome maps and annotations, BacMap also supports flat file downloads of all textual data associated with each genome. This includes complete genomic DNA sequence data, FASTA files of all identified genes, FASTA files of all identified proteins and a flat file of all the BacMap annotations for each gene/protein. These files are accessed by clicking on the ‘DOWNLOAD’ button associated with each chromosome listed on the BacMap homepage. In addition, BacMap provides a number of charts and graphs for each chromosome, which illustrate nucleotide and amino acid usage, distribution of protein lengths, distribution of functions and distribution of predicted subcellular locations. These charts, which are regularly updated in conjunction with the BacMap gene cards, can be accessed using the ‘STATS’ buttons. While BacMap does not yet support relational queries for comparative genomics, it does support cross-genome comparisons. By separately querying and displaying chromosome segments from two or more species, users may visually compare bacterial orthologs, operon structure, gene synteny or gene context, chromosomal gene distributions or functional localizations. The BacMap database is automatically updated on a weekly basis using custom updating software. This software obtains new bacterial genome GenBank records from the NCBI and then passes them to the programs that generate the BacMap database content. The only database entries that require some limited manual editing are the ‘species cards’. To summarize, BacMap is an interactive electronic atlas of bacterial genomes. It builds from and extends upon the very successful visualization concepts originally introduced in Ensembl, providing a ‘map-centric’ or visually centered approach to exploring bacterial genomic data. The rich annotations in BacMap, coupled with its detailed, color-coded image maps, should permit users to look at bacterial genomes with increasing detail. The BacMap database can be freely accessed at http://wishart.biology.ualberta.ca/BacMap/.
  14 in total

1.  The Comprehensive Microbial Resource.

Authors:  J D Peterson; L A Umayam; T Dickinson; E K Hickey; O White
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide.

Authors:  A Bernal; U Ear; N Kyrpides
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  The PSIPRED protein structure prediction server.

Authors:  L J McGuffin; K Bryson; D T Jones
Journal:  Bioinformatics       Date:  2000-04       Impact factor: 6.937

4.  Recent improvements to the PROSITE database.

Authors:  Nicolas Hulo; Christian J A Sigrist; Virginie Le Saux; Petra S Langendijk-Genevaux; Lorenza Bordoli; Alexandre Gattiker; Edouard De Castro; Philipp Bucher; Amos Bairoch
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  The distribution and query systems of the RCSB Protein Data Bank.

Authors:  Philip E Bourne; Kenneth J Addess; Wolfgang F Bluhm; Li Chen; Nita Deshpande; Zukang Feng; Ward Fleri; Rachel Green; Jeffrey C Merino-Ott; Wayne Townsend-Merino; Helge Weissig; John Westbrook; Helen M Berman
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  The InterPro Database, 2003 brings increased coverage and new features.

Authors:  Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Daniel Barrell; Alex Bateman; David Binns; Margaret Biswas; Paul Bradley; Peer Bork; Phillip Bucher; Richard R Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Laurent Falquet; Wolfgang Fleischmann; Sam Griffiths-Jones; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; Rodrigo Lopez; Ivica Letunic; David Lonsdale; Ville Silventoinen; Sandra E Orchard; Marco Pagni; David Peyruc; Chris P Ponting; Jeremy D Selengut; Florence Servant; Christian J A Sigrist; Robert Vaughan; Evgueni M Zdobnov
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  The PEDANT genome database.

Authors:  Dmitrij Frishman; Martin Mokrejs; Denis Kosykh; Gabi Kastenmüller; Grigory Kolesov; Igor Zubrzycki; Christian Gruber; Birgitta Geier; Andreas Kaps; Kaj Albermann; Andreas Volz; Christian Wagner; Matthias Fellenberg; Klaus Heumann; Hans-Werner Mewes
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

8.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

Authors:  R D Fleischmann; M D Adams; O White; R A Clayton; E F Kirkness; A R Kerlavage; C J Bult; J F Tomb; B A Dougherty; J M Merrick
Journal:  Science       Date:  1995-07-28       Impact factor: 47.728

9.  PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria.

Authors:  Jennifer L Gardy; Cory Spencer; Ke Wang; Martin Ester; Gábor E Tusnády; István Simon; Sujun Hua; Katalin deFays; Christophe Lambert; Kenta Nakai; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

10.  The Pfam protein families database.

Authors:  Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more
  21 in total

Review 1.  Biodiversity and functional genomics in the human microbiome.

Authors:  Xochitl C Morgan; Nicola Segata; Curtis Huttenhower
Journal:  Trends Genet       Date:  2012-11-07       Impact factor: 11.639

2.  Assessing in silico the recruitment and functional spectrum of bacterial enzymes from secondary metabolism.

Authors:  Valery Veprinskiy; Leonhard Heizinger; Maximilian G Plach; Rainer Merkl
Journal:  BMC Evol Biol       Date:  2017-01-26       Impact factor: 3.260

3.  Interplay between DtxR and nitric oxide reductase activities: a functional genomics approach indicating involvement of homologous protein domains in bacterial pathogenesis.

Authors:  Shwetank Gupta; Saurabh Bansal; Jahar K Deb; Bishwajit Kundu
Journal:  Int J Exp Pathol       Date:  2007-10       Impact factor: 1.925

4.  Omp85(Tt) from Thermus thermophilus HB27: an ancestral type of the Omp85 protein family.

Authors:  Jutta Nesper; Alexander Brosig; Philippe Ringler; Geetika J Patel; Shirley A Müller; Jörg H Kleinschmidt; Winfried Boos; Kay Diederichs; Wolfram Welte
Journal:  J Bacteriol       Date:  2008-05-02       Impact factor: 3.490

5.  Multi-locus sequence typing of a geographically and temporally diverse sample of the highly clonal human pathogen Bartonella quintana.

Authors:  Mardjan Arvand; Didier Raoult; Edward J Feil
Journal:  PLoS One       Date:  2010-03-19       Impact factor: 3.240

Review 6.  Campylobacter concisus and inflammatory bowel disease.

Authors:  Li Zhang; Hoyul Lee; Michael C Grimm; Stephen M Riordan; Andrew S Day; Daniel A Lemberg
Journal:  World J Gastroenterol       Date:  2014-02-07       Impact factor: 5.742

7.  PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes.

Authors:  Vetriselvi Rangannan; Manju Bansal
Journal:  BMC Res Notes       Date:  2011-07-22

8.  BacMap: an up-to-date electronic atlas of annotated bacterial genomes.

Authors:  Joseph Cruz; Yifeng Liu; Yongjie Liang; You Zhou; Michael Wilson; Jonathan J Dennis; Paul Stothard; Gary Van Domselaar; David S Wishart
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

9.  METAGENassist: a comprehensive web server for comparative metagenomics.

Authors:  David Arndt; Jianguo Xia; Yifeng Liu; You Zhou; An Chi Guo; Joseph A Cruz; Igor Sinelnikov; Karen Budwill; Camilla L Nesbø; David S Wishart
Journal:  Nucleic Acids Res       Date:  2012-05-29       Impact factor: 16.971

10.  Improving the accuracy of protein secondary structure prediction using structural alignment.

Authors:  Scott Montgomerie; Shan Sundararaj; Warren J Gallin; David S Wishart
Journal:  BMC Bioinformatics       Date:  2006-06-14       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.