| Literature DB >> 27559429 |
Elleke F Bosma1, Jasper J Koehorst2, Sacha A F T van Hijum3, Bernadet Renckens3, Bastienne Vriesendorp4, Antonius H P van de Weijer5, Peter J Schaap2, Willem M de Vos5, John van der Oost5, Richard van Kranenburg6.
Abstract
Bacillus smithii is a facultatively anaerobic, thermophilic bacterium able to use a variety of sugars that can be derived from lignocellulosic feedstocks. Being genetically accessible, it is a potential new host for biotechnological production of green chemicals from renewable resources. We determined the complete genomic sequence of the B. smithii type strain DSM 4216(T), which consists of a 3,368,778 bp chromosome (GenBank accession number CP012024.1) and a 12,514 bp plasmid (GenBank accession number CP012025.1), together encoding 3880 genes. Genome annotation via RAST was complemented by a protein domain analysis. Some unique features of B. smithii central metabolism in comparison to related organisms included the lack of a standard acetate production pathway with no apparent pyruvate formate lyase, phosphotransacetylase, and acetate kinase genes, while acetate was the second fermentation product.Entities:
Keywords: Bacillus smithii; Biotechnology; Genome sequence; Lactic acid; Thermophile; Thermophilic bacillus
Year: 2016 PMID: 27559429 PMCID: PMC4995803 DOI: 10.1186/s40793-016-0172-8
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Fig. 1Scanning electron micrographs of B. smithii DSM 4216T
Classification and general features of B. smithii DSM 4216T according to MIGS standards
| MIGS ID | Property | Term | Evidence codea |
|---|---|---|---|
| Classification | Domain Bacteria | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Type strain: DSM 4216T | |||
| Gram stain | Positiveb | TAS [ | |
| Cell shape | Rod | IDA (Fig. | |
| Motility | Motile | TAS [ | |
| Sporulation | Terminal or sub terminal, oval or cylindrical endospores, non-swollen to slightly swollen sporangia | IDA (Fig. | |
| Temperature range | 25–65 °C | TAS [ | |
| Optimum temperature | 55 °C | IDA | |
| pH range; Optimum | 5.5–6.8; 6.5 | TAS [ | |
| Carbon source | D-glucose, D-xylose, L-xylose, L-arabinose, D-ribose, glycerol, D-adonitol, D-fructose, L-sorbose, D-galactose, L-rhamnose, inositol, D-mannitol, sucrose, D-trehalose, xylitol, Methyl-α-D-glucopyranoside, esculin, salicin, D-maltose, D-turanose, D-lyxose, D-tagatose, D-arabitol, K-gluconate, K-5-ketogluconate | IDA(API), TAS [ | |
| MIGS-6 | Habitat | Type strain: cheese. Other strains: evaporated milk, canned food, compost, hot spring soil, sugar beet juice from extraction installations. | TAS [ |
| MIGS-6.3 | Salinity | Not in 3 % NaCl (w/v) | TAS [ |
| MIGS-22 | Oxygen requirement | Facultative anaerobe | TAS [ |
| MIGS-15 | Biotic relationship | Free-living | TAS [ |
| MIGS-14 | Pathogenicity | Non-pathogen | TAS [ |
| MIGS-4 | Geographic location | USA | TAS [ |
| MIGS-5 | Sample collection | ~1946 | TAS [ |
| MIGS-4.1 | Latitude | Unknown | |
| MIGS-4.2 | Longitude | Unknown | |
| MIGS-4.4 | Altitude | Unknown |
aEvidence codes – IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project
bAs described in the species description by Nakamura et al.: “Young cells of both groups were Gram positive. With increasing age the cells became Gram variable and finally Gram negative. The KOH and aminopeptidase tests were negative, as is typical for Gram-positive organisms.”
Fig. 2Phylogenetic tree based on 16S rRNA gene sequences (left) and protein domains (right). A comparison is included (horizontal lines) between the two trees, showing the position of Bacillus smithii DSM 4216T relative to other Bacillus strains, as well as several industrially important Lactic Acid Bacterium strains. Only strains were used for which a complete genome sequence is available (as on 18 September 2014) in order to be able to perform the domain-based analysis. The 16S sequences were aligned using DECIPHER (R) [29] and the distance analysis was performed using a Jukes-Cantor correction. Phylogenetic analysis of all domains was performed by re-annotation of all proteins from selected genomes using InterProScan 5-RC7 and transformed into a absence-presence matrix. Distance was calculated using a standard Euclidean distance and clustering was performed by complete method using hclust. Tree comparison was performed by dendextend. Note that “unique” nodes between the 16S and domain-based tree are indicated with dashed lines (i.e. the order is the same but the subclustering is not). GenBank IDs of used whole genome sequences in order from top to bottom: AE016877.1, AL009126.3, CP000002.3, BA000004.3, CP012024.1, CP002472.1, CP002835.1, CP002293.1, CP001638.1, CP000557.1, CP006254.2, CP002442.1, CP002050.1, CP004008.1, CP003125.1, BA000043.1, CP000922.1, CP002222.1, CP001617.1
Project information of the whole genome sequence of B. smithii DSM 4216T
| MIGS ID | Property | Term |
|---|---|---|
| MIGS 31 | Finishing quality | Finished |
| MIGS-28 | Libraries used | Mate-pair (average 4,260 bp), paired-end (average 273 bp), PacBio (2,075 and 2,775 kbp) |
| MIGS 29 | Sequencing platforms | Illumina and PacBio |
| MIGS 31.2 | Fold coverage | Illumina paired-end: 187x, Illumina mate pair: 311x, PacBio: 56x |
| MIGS 30 | Assemblers | CLCbio Genomics Workbench 5.0, SSPACE Premium 2.0, GapFiller 1.10 |
| MIGS 32 | Gene calling method | RAST and domain analysis |
| Locus Tag | BSM4216 | |
| Genbank ID | CP012024.1 (chromosome); CP012025.1 (plasmid) | |
| GenBank Date of Release | 8 July 2015 | |
| GOLD ID | NA | |
| BIOPROJECT | PRJNA258357 | |
| MIGS 13 | Source Material Identifier | Biotechnological |
| Project relevance | DSM 4216T |
Genome statistics of B. smithii DSM 4216T
| Attribute | Value | % of total |
|---|---|---|
| Genome size (bp) | 3,381,292 | 100.0 |
| DNA coding (bp) | 2,799,365 | 82.8 |
| DNA G + C (bp) | 1,378,026 | 40.8 |
| DNA scaffolds | 2 | |
| Total genes | 3,880 | 100.0 |
| Protein coding genes | 3,627a | 93.5 |
| RNA genes | 127 | 3.3 |
| Pseudo genes | 126 | 3.2 |
| Genes in internal clusters | ND | |
| Genes with function prediction | 2,063 | 53.1 |
| Genes assigned to COGs | 2,619 | 67.4 |
| Genes with Pfam domains | 2,596 | 66.8 |
| Genes with signal peptides | 122 | 3.1 |
| Genes with transmembrane helices | 795 | 20.5 |
| CRISPR repeats | 69 |
aThis is excluding 126 pseudogenes
Summary of the B. smithii DSM 4216T genome: one chromosome and one plasmid
| Label | Size (Mb) | Topology | INSDC identifier | RefSeq ID |
|---|---|---|---|---|
| Chromosome | 3,368,778 | Circular | CP012024.1 | NZ_CP012024.1 |
| Plasmid | 12,514 | Circular | CP012025.1 | NZ_CP012025.1 |
Fig. 3Chromosome and plasmid map of B. smithii DSM 4216T. The outer circle represents base pair numbers; red are genes on the forward strand and blue on the reverse; the inner circle represents GC skew in which red is a positive GC content and green a negative
Number of genes associated with general COG functional categories
| Code | Value | % age | Description |
|---|---|---|---|
| J | 162 | 4.46 | Translation, ribosomal structure and biogenesis |
| A | 0 | 0.00 | RNA processing and modification |
| K | 179 | 4.92 | Transcription |
| L | 160 | 4.40 | Replication, recombination and repair |
| B | 1 | 0.03 | Chromatin structure and dynamics |
| D | 28 | 0.77 | Cell cycle control, Cell division, chromosome partitioning |
| V | 31 | 0.85 | Defense mechanisms |
| T | 125 | 3.44 | Signal transduction mechanisms |
| M | 132 | 3.63 | Cell wall/membrane biogenesis |
| N | 64 | 1.76 | Cell motility |
| U | 42 | 1.16 | Intracellular trafficking and secretion |
| O | 92 | 2.53 | Posttranslational modification, protein turnover, chaperones |
| C | 156 | 4.29 | Energy production and conversion |
| G | 174 | 4.79 | Carbohydrate transport and metabolism |
| E | 291 | 8.01 | Amino acid transport and metabolism |
| F | 74 | 2.04 | Nucleotide transport and metabolism |
| H | 107 | 2.94 | Coenzyme transport and metabolism |
| I | 94 | 2.59 | Lipid transport and metabolism |
| P | 154 | 4.24 | Inorganic ion transport and metabolism |
| Q | 70 | 1.93 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 382 | 10.51 | General function prediction only |
| S | 236 | 6.49 | Function unknown |
| - | 1,321 | 36.34 | Not in COGs |
Comparison of several published complete genome sequences of the genus Bacillus
| Species/strain | Genome size (bp) | GC %a | ORFsb | Plasmid number | Growthc | Ref |
|---|---|---|---|---|---|---|
|
| 3,368,778 | 40.8 | 3,635 | 1 | TT | This study |
|
| 3,018,045 | 47.2 | 3,437 | 0 | TT | [ |
|
| 3,552,226 | 46.5 | 3,306 | 0 | TT | [ |
|
| 3,073,079 | 47.3 | 2,985 | 1 | TT | [ |
|
| 2,846,746 | 41.8 | 2,863 | 0 | TT | [ |
|
| 4,317,010 | 45.9 | 4,650 | 0 | TT | [ |
|
| 4,222,748 | 46.2 | 4,286 | 0 | TT | [ |
|
| 5,426,909 | 35.3 | 5,366 | 1 | MP | [ |
|
| 4,202,353 | 43.7 | 4,066 | 0 | MP | [ |
|
| 4,214,810 | 43.5 | 4,100 | 0 | MP | [ |
|
| 3,75 Mb | 43.9 | 4,300 | 0 | TP | [ |
|
| 3,550,319 | 48.9 | 3,499 | 1 | TP | [ |
|
| 3,544,776 | 52.0 | 3,498 | 1 | TP | [ |
|
| 3,596,620 | 52.3 | 3,887 | 0 | TP | [ |
Currently available thermophilic Bacillus genomes are shown, as well as a selection of genomes of mesophilic model organisms
*Sequence not fully closed
aGC% of chromosome and plasmid combined as weighted average
bOpen Reading Frames as a total on chromosome and plasmid(s)
cMP: mesophile, TP: thermophile, TT: thermotolerant (grows at mesophilic as well as thermophilic temperatures)
Fig. 4Reconstruction of central carbon metabolism of B. smithii DSM 4216T. Blue lines indicate pathways of which the EC-number was identified only via domainome analysis; grey lines indicate pathways unidentified by both RAST annotation and domainome analysis. Abbreviations: XI: xylose isomerase; XK: xylulokinase; PTS: phosphotransferase system; GK: glucokinase; glpF: glycerol facilitator; glyK: glycerol kinase; Gly3P-DH: glycerol-3-phosphate dehydrogenase; PGI: glucose-6-phosphate isomerase; G6PDH: glucose-6-phosphate dehydrogenase; 6PGDH: 6-phosphogluconate dehydrogenase; RPI: phosphopentose isomerase; RPE: phosphopentose epimerase; TKL: transketolase; TAL: transaldolase; FBP: fructose bisphosphatase; PFK: phosphofructokinase; FBA: fructose bis-phosphate aldolase; TPI: triosephosphate isomerase; GAP: glyceraldehyde 3-phosphate dehydrogenase; PGK: phosphoglycerate kinase; PGM: phosphoglycerate mutase; ENO: enolase; PCK: phosphoenol pyruvate carboxykinase; PPC: phosphoenol pyruvate carboxylase; PYK: pyruvate kinase; PYC: pyruvate carboxylase; PDHC: pyruvate dehydrogenase complex; ME: malic enzyme; MDH: malate dehydrogenase; MQO: malate:quinone oxidoreductase; CS: citrate synthase; ACN: aconitase; ICL: isocitrate lyase; MS: malate synthase; ICD: isocitrate dehydrogenase; OOR: 2-oxoglutarate reductase; ODH: 2-oxoglutratae dehydrogenase; SCS: succinyl-CoA synthetase; SDH: succinate dehydrogenase; FH: fumarate hydratase; ALS: acetolactate synthase; NOD: non-enzymatic oxidative decarboxylation; BDH: butanediol dehydrogenase; ACH: acetoin dehydrogenase; LDHL: L-lactate dehydrogenase; ACDH: acetyl-CoA dehydrogenase; ADH: alcohol dehydrogenase; ACS: acetyl-CoA synthetase; MGS: methylglyoxal synthase; MGR: methylglyoxal reductase; GLXI: glyoxalase I; GLXII: glyoxalase II; LADH: lactaldehyde dehydrogenase