| Literature DB >> 25722247 |
Miriam Land1, Loren Hauser, Se-Ran Jun, Intawat Nookaew, Michael R Leuze, Tae-Hyuk Ahn, Tatiana Karpinets, Ole Lund, Guruprased Kora, Trudy Wassenaar, Suresh Poudel, David W Ussery.
Abstract
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25722247 PMCID: PMC4361730 DOI: 10.1007/s10142-015-0433-4
Source DB: PubMed Journal: Funct Integr Genomics ISSN: 1438-793X Impact factor: 3.410
Fig. 1Number of bacterial and archaeal genomes sequenced each year and submitted to NCBI. Source: GenBank prokaryotes.txt file downloaded 4 February 2015
Fig. 2Number of bases added each year since 1982. The dates for the first bacterial genome (H. influenzae) to be sequenced, and 10- and 20-year anniversaries are marked. Due to the scale, WGS and GenBank bases are essentially flat. Source: GenBank and SRA, accessed 4 February 2015
Number of sequenced genomes for 6 selected phyla and the percent of all genomes found in the phyla
| Phyla | Number genomes | % of total |
|---|---|---|
| Actinobacteria | 4059 | 13 |
| Bacteroidetes/Chlorobi group | 932 | 3 |
| Cyanobacteria | 340 | 1 |
| Firmicutes | 9628 | 31 |
| Proteobacteria | 14,268 | 46 |
| Spirochaetes | 525 | 2 |
| Other | 1500 | 5 |
Source: GenBank prokaryotes.txt file downloaded 4 February 2015
Fig. 3Number of genome sequences from the largest four sources. All sources with less than 1000 genomes are combined in the “Other” category. Source: GenBank prokaryotes.txt file downloaded 4 February 2015
Number of genomes found within each GOLD-defined ecosystem
| Ecosystem | Total |
|---|---|
| Host-associated | 11,816 |
| Humans | 4973 |
| Animal | 1804 |
| Plants | 1410 |
| Mammals | 867 |
| Other | 2762 |
| Environmental | 6774 |
| Aquatic | 4559 |
| Terrestrial | 2057 |
| Other | 158 |
| Engineered systems | 1658 |
| Food production | 440 |
| Wastewater | 410 |
| Lab synthesis | 387 |
| Other | 418 |
| Total | 20,248 |
Source: GOLD, accessed 4 February 2015
Fig. 4Genome size and percent GC of 2139 finished genomes plotted for the ecosystem types of (1) engineered systems, (2) environmental sources, and (3) host-associated genomes. Source: GOLD, accessed 4 February 2015
Number of genomes found within each temperature range
| Temperature range | Number genomes |
|---|---|
| Mesophile | 3173 |
| Thermophile | 171 |
| Hyperthermophile | 75 |
| Psychrophile | 36 |
| Psychrotolerant | 17 |
| Psychrotrophic | 6 |
| Thermotolerant | 3 |
| Unknown | 20,626 |
Source: IMG Metadata Categories, accessed 4 February 2015
Number of complete and permanent draft genomes and the percent of those genomes with each project status
| Project status | Bacteria | Archaea | Plasmids | Total |
|---|---|---|---|---|
| Finished | 3060 | 173 | 1186 | 4419 |
| Permanent draft | 19,696 | 312 | 9 | 20,017 |
| Draft | 672 | 4 | 1 | 677 |
| Total | 23,428 | 489 | 1196 | 25,113 |
Source: IMG Statistics, accessed 4 February 2015
Fig. 6Core and pan-genome of 2085 E. coli genomes. Core gene families defined as those families with at least one member in at least of 95 % of genomes
Fig. 5A branching pattern of E. coli and Shigella on an alignment-free whole proteome phylogeny. Source: data used with permission from whole proteome phylogeny of E. coli and Shigella by FFP method (Jun et al. 2010)