| Literature DB >> 19038022 |
Hélène Chiapello1, Annie Gendrault, Christophe Caron, Jérome Blum, Marie-Agnès Petit, Meriem El Karoui.
Abstract
BACKGROUND: The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. DESCRIPTION: Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons.Entities:
Mesh:
Year: 2008 PMID: 19038022 PMCID: PMC2607288 DOI: 10.1186/1471-2105-9-498
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparison strategy used to construct the MOSAIC database. When at least two strains of a species are sequenced, genomes are first extracted from the GR database [step (1)]. Systematic intra-species pairwise genome alignments are then performed with MAUVE [step (2)]. A test for the presence of rearrangements in the pair of aligned genomes is then applied using the number and size of defined MAUVE LCBs (Locally Collinear Blocks) in step (3). The LCB analysis permits genomes to be designated either as collinear or rearranged. The collinear genomes are realigned with MGA (at least pairwise, and possibly with maximal multiple alignment if sequences of more than two strains are available). Rearranged genomes are aligned with MAUVE (maximal multiple alignments). Finally, MGA and MAUVE alignments are post-processed and genomes are segmented into backbone regions and variable segments in step (4), and integrated in the MOSAIC database together with annotations in step (5).
MAUVE parameter setup using the collinear genomes of E. coli MG1655 and Sakai strains.
| Min_rec_gap_length | 200 (default) | 1000 | 5000 | 10000 | ||||
| Number of backbone segments with a length ≤30 bp | 37 | 588 | 242 | 93 | 45 | |||
| Total number of backbone segments | 617 | 1363 | 959 | 782 | 717 | |||
| Weight | 57 (default) | 500 | 1000 | 2000 | 3000 | 5000 | 10000 | |
| Number of LCB | 1 | 113 | 47 | 25 | 10 | 4 | 4 | 1 |
Table 1a – Effect of the minimal_recursive_gap_length (Min_rec_gap_length) on backbone fragmentation (seed = 19 and weight=default). Table 1b – Effect of weight on the number of LCBs (min_rec_gap_len = 5000 and seed = 19).
The 35 maximal multiple chromosome alignments included in the current release of MOSAIC.
(1) Number of aligned genomes.
(2) Type of aligner (MGA or MAUVE).
(3) Number of Locally Collinear Blocks for MAUVE alignments.
(4) Mean ratio of backbone length to genome length.
Figure 2Example of access to a genome comparison through the MOSAIC Web interface. Twelve Streptococcus pyogenes strains are compared. (a) Main MOSAIC Table describing the general properties of the comparison. A click on the "genome comparison viewer" link gives access to the graphical overview of the five LCBs shown in (b). Selection by clicking on any LCB of the first genome allows the user to zoom in to visualize the backbone/variable segment organization resulting from the alignments, as shown in (c). Backbone regions are shown as grey bars, and variable segments as green bars; genome annotations are superimposed (genes in blue, tRNAs in red). From the main Table (a), access to browse Backbones, Intervals, or Variable Segments [as shown in (d)], is provided.