| Literature DB >> 19383706 |
Valérie Barbe1, Stéphane Cruveiller2, Frank Kunst1, Patricia Lenoble1, Guillaume Meurice3, Agnieszka Sekowska4, David Vallenet2, Tingzhang Wang4, Ivan Moszer3, Claudine Médigue2, Antoine Danchin4.
Abstract
Comparative genomics is the cornerstone of identification of gene functions. The immense number of living organisms precludes experimental identification of functions except in a handful of model organisms. The bacterial domain is split into large branches, among which the Firmicutes occupy a considerable space. Bacillus subtilis has been the model of Firmicutes for decades and its genome has been a reference for more than 10 years. Sequencing the genome involved more than 30 laboratories, with different expertises, in a attempt to make the most of the experimental information that could be associated with the sequence. This had the expected drawback that the sequencing expertise was quite varied among the groups involved, especially at a time when sequencing genomes was extremely hard work. The recent development of very efficient, fast and accurate sequencing techniques, in parallel with the development of high-level annotation platforms, motivated the present resequencing work. The updated sequence has been reannotated in agreement with the UniProt protein knowledge base, keeping in perspective the split between the paleome (genes necessary for sustaining and perpetuating life) and the cenome (genes required for occupation of a niche, suggesting here that B. subtilis is an epiphyte). This should permit investigators to make reliable inferences to prepare validation experiments in a variety of domains of bacterial growth and development as well as build up accurate phylogenies.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19383706 PMCID: PMC2885750 DOI: 10.1099/mic.0.027839-0
Source DB: PubMed Journal: Microbiology (Reading) ISSN: 1350-0872 Impact factor: 2.777
Fig. 1.Comparison between the previously published sequence of strain 168 and the strain resequenced without cloning. SNPs and indels are as indicated, as well as the uneven distribution of G+C nucleotides in the sequence. Under the line representing the genome are displayed the positions of the different regions attributed to the various members of the sequencing consortium. It can be seen that the amount of variation is dependent on the sequencing group, not on the nucleotide composition of the genome. In some regions there is precious little variation, compared with the present sequence, despite the fact that the techniques used between 10 and 20 years ago were very different from those used today.
Comparing the new sequence with the old one
| Identical genes | 3323 (78.3 %) |
| Amino acid variations | 426 (10.0 %) |
| Adjusted start codons | 50 (1.2 %) |
| C-terminal variations only | 221 (5.2 %) |
| N-terminal variations only | 4 (0.09 %) |
| C-terminal and N-terminal variations | 11 (0.26 %) |
| Fusions | 20 (0.47 %) |
| Fissions | 20 (0.47 %) |
| Newly annotated genes | 171† (4.0 %) |
*In the case of fusion/fission events, the number of new genes resulting from the event is indicated.
†Including 48 pseudogenes or gene remnants.
Fig. 2.(a) Distribution of gene length in the B. subtilis 168 genome. The absence of any overrepresentation of short CDSs supports the view that most if not all gene sequences predicted in the present annotation are authentic. (b) Correspondence analysis of the proteome of B. subtilis. Proteins in the proteome can be separated into two well-identified classes. The green cloud corresponds to proteins that are integral inner-membrane proteins (IIMPs). Note that the IIMP cloud is driven by the opposition between charged amino acids (D, E and K) and hydrophobic ones (F, L, M, W).
Fig. 3.Circular representation of the B. subtilis 168 genome for several specific genome features. Circles display the following, from the inside out. (1) GC skew (G+C/G−C using a 1 kb sliding window). (2) GC deviation (mean GC content in a 1 kb window − overall mean GC). Red areas indicate that deviation is higher than 1.5 standard deviation. (3) tRNA (dark green) and rDNA (blue). (4) Location of genomic regions with specific features differentiating them from the average sequence. Boxes coloured in light blue indicate regions of phage origin. The nonsymmetrical distribution (right and left halves of the circle) is to be emphasized. (5) Scale. (6, 7, 8) Genes having a presumed orthologue in other Bacillus species (B. licheniformis, B. amyloliquefaciens and B. pumilus respectively).
Genes of the cenome suggesting that B. subtilis is an epiphyte
| Locus | |||||
| CymR and CysL regulons | |||||
| Glucomannan utilization operon ( |