| Literature DB >> 26217108 |
Archana Sharma1, T Satyanarayana1.
Abstract
With the advent of high throughput sequencing platforms and relevant analytical tools, the rate of microbial genome sequencing has accelerated which has in turn led to better understanding of microbial molecular biology and genetics. The complete genome sequences of important industrial organisms provide opportunities for human health, industry, and the environment. Bacillus species are the dominant workhorses in industrial fermentations. Today, genome sequences of several Bacillus species are available, and comparative genomics of this genus helps in understanding their physiology, biochemistry, and genetics. The genomes of these bacterial species are the sources of many industrially important enzymes and antibiotics and, therefore, provide an opportunity to tailor enzymes with desired properties to suit a wide range of applications. A comparative account of strengths and weaknesses of the different sequencing platforms are also highlighted in the review.Entities:
Keywords: Bacillus; comparative genomics; genome sequencing; industrial enzymes; sequencing platforms
Year: 2013 PMID: 26217108 PMCID: PMC4510601 DOI: 10.4137/GEI.S12732
Source DB: PubMed Journal: Genomics Insights ISSN: 1178-6310
Figure 1Pictorial representation of whole genome sequencing.
Technical specifications of some important NGS platforms.
| Platforms | Illumina MiSeq | Ion Torrent PGM | PacBio RS | Illumina GAIIx | Illumina HiSeq 2000 |
|---|---|---|---|---|---|
| Chemistry | Reversible terminator | Proton detection | Real time fluorescence DNA polymerization | Reversible terminator | Reversible terminator |
| Instrument Cost | 128 K | 80 K (includes costing of PGM, server, OneTouch and OneTouch ES) | 695 K | 256 K | 654 K |
| Sequence yield per run | 1.5–2 Gb | 20–50 Mb on 314 chip 100–200 Mb on 316 chip 1 Gb on 318 chip | 100 Mb | 30 Gb | 600 Gb |
| Sequencing cost ($)per Gb | 502 | 1000 (318 chip) | 2000 | 148 | 41 |
| Run Time | 27 h (along with 2 h cluster generation) | 2 h | 2 h | 10 days | 11 days |
| Reported Accuracy | Mostly > Q30 | Mostly Q20 | <Q10 | Mostly > Q30 | Mostly > Q30 |
| Observed Raw Error Rate (%) | 0.80 | 1.71 | 12.86 | 0.76 | 0.26 |
| Read length (bases) | Up to 150 | ~200 | Average 1500 | Up to 150 | Up to 150 |
| Correct SNp calls (%) | 76 | ~82 | ~71 | 70 | 69 |
| Paired reads | Yes | Yes | No | Yes | Yes |
| Insert size | Up to 700 bases | Up to 250 bases | Up to 10 kb | Up to 700 bases | Up to 700 bases |
| Typical DNA Requirements | 50–1000 ng | 100–1000 ng | ~1 μg | 50–1000 ng | 50–1000 ng |
All cost calculations are based on list price quotations obtained from the manufacturer and assume expected sequence yield.
Mean mapped read length (adapter and reverse strand sequences). Subread lengths, i.e. the individual stretches of sequence from the sequenced fragment, are significantly shorter.
Adapted from Quail et al.52
Details of whole genome sequences of Bacillus spp.
| S. no. | Microorganism | Genome size (Mbp) | GC (%) | Sequencing | Annotation | Fold coverage | Reference |
|---|---|---|---|---|---|---|---|
| 1. | 5.22 | 35.4 | TIGR microbial shotgun projects | 13 | |||
| 2. | 3.99 | 45.7 | Roche 454 GS-FLX system | RAST | 70 | ||
| 3. | 5.23 | 35.0 | 454 GS-FLX | UniRef90, NCBI nr, COG, KEGG | 30 | ||
| 4. | 5.43 | 35.3 | Applied Biosystems 3700 DNA sequencers | ERGO | 6 | ||
| 5. | 3.55 | 46.5 | Combination of Roche 454 & Sanger | UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG | 9 | ||
| 6. | 4.20 | 43.7 | ABI Prism 377 | 7.1 | |||
| 7. | 4.0 | 38.77 | Roche 454 GS | PROKKA | 34 | ||
| 8. | 4.22 | 46.2 | Mega-BACE 1000/4000 ABI Prism 377 DNA | ERGO tool | 7.2 | ||
| 9. | 4.14 | 454 Newbler | Phred/Phrap/Consed | ||||
| 10. | 4.21 | 43.5 | – | ||||
| 11. | 4.09 | 43.85 | Illumina Solexa GA IIx | PGAAP | 169 | ||
| 12. | 3.60 | 43.89 | Illumina HiSeq 2000 | PGAAP | – |
Comparative analysis of some industrially important Bacillus spp.
| Characteristics | |||||
|---|---|---|---|---|---|
| Pathways | Embden-Meyerhof-Parnas Glycolytic pathway Tricarboxylic acid cycle (TCA) | TCA cycle aminoacyl-tRNA synthetases | Glycolysis, pentose cycle, TCA cycle glyoxylate bypass | – | Embden-Meyerhof pathway, Krebs cycle glyoxylate pathway |
| Open reading frames (%) | 87 | 85 | 86 | 89.6 | 83.3 |
| Predicted number | 4100 | 4066 | 4208 | 4325 | 5124 |
| Conserved with function assigned | 2379 | 2144 | – | 3691 | – |
| Conserved with unknown function | 668 | 1182 | – | 211 | – |
| Non-conserved | 1053 | 743 | – | 423 | – |
| ATG | 78 | 78 | – | – | – |
| TTG | 13 | 10 | – | – | – |
| GTG | 9.0 | 12 | – | – | – |
| Avg. gene Length (bp) | 890 | 877 | 873 | – | – |
| rRNAs | 10 | 9 | 7 | 22 | 11 |
| tRNAs | 86 | 78 | 72 | 72 | 115 |
| ABC transporters | 77 | 75 | – | – | – |
| Secondary Metabolism | 4% | – | 82 secreted protein & enzymes | – | – |
| Phage-associated genes | 268 | 42 | 71 | – | – |
| Reference | |||||
Figure 2Phylogenetic relationships among Bacillus spp. Closely related industrially important Bacillus spp. are highlighted in blue and the pathogenic Bacillus spp. are highlighted in red.
Enzymes identified in the industrially important strain of B. licheniformis and the corresponding orthologs present in B. subtilis.
| Gene ID | Gene function | Gene designation in |
|---|---|---|
| BLi00656 | α-Amylase precursor (EC 3.2.1.1) | |
| BLi02117 | α-Glucosidase (EC 3.2.1.20) | |
| BLi03021 | α- | |
| BLi01295 | Arabinan endo-1,5- | |
| BLi04220 | Arabinan endo-1,5-α- | |
| BLi04276 | Arabinogalactane endo-1,4-α-galactosidase | |
| BLi00447 | β-Galactosidase | |
| BLi04214 | β-Glucosidase | |
| BLi01882 | Cellulase (EC 3.2.1.4) | |
| BLi01881 | Cellulose 1,4-β-cellobiosidase | |
| BLi00338 | Chitinase (EC 3.2.1.14) | |
| BLi02088 | Endo-1,4-β-glucanase | |
| BLi01883 | Endo-1,4-β-mannosidase | |
| BLi00655 | Endo-1,4-β-xylanase | |
| BLi01880 | Endo-1,4-glucanase (EC 3.2.1.4) | |
| BLi00545 | Esterase/lipase | |
| BLi00340 | Glutamic acid-specific protease | |
| BLi02827 | Levanase | |
| BLi03707 | Levanase | |
| BLi03706 | Levansucrase | |
| BLi03370 | Lipase/esterase | |
| BLi00658 | Maltogenic α-amylase (EC 3.2.1.1) | |
| BLi04019 | Minor extracellular serine protease | |
| BLi01123 | Minor extracellular serine protease | |
| BLi01404 | Pectate lyase | |
| BLi03053 | Pectate lyase | |
| BLi03741 | Pectate lyase | |
| BLi03498 | Pectin methylesterase | |
| BLi04177 | Peptidase T | |
| BLi01399 | Polysugar-degrading enzyme | |
| BLi02863 | Protease | |
| BLi02862 | Protease |
Adapted from Veith et al.4
Figure 3Comparison of the orthologous gene complements of B. licheniformis ATCC 14580, B. subtilis 168 and B. halodurans C-125. Numbers in the rectangular boxes shows the number of pairwise orthologs between neighboring species (BLAST threshold E = 1 × 10−5). Numbers in the outer circles indicates the total number of CDSs predicted in each genome, numbers in areas of overlap represents the number of orthologs predicted by reciprocal BLASTP analysis (threshold E = 1 × 10−5), and the number in the center presents the number of orthologous sequences common to all three genomes. The outer circles show the paralogs in the genome of these Bacillus spp (Modified from Rey et al).27