| Literature DB >> 20953417 |
Karla B Heidelberg1, Jack A Gilbert, Ian Joint.
Abstract
The composition and activities of microbes from diverse habitats have been the focus of intense research during the past decade with this research being spurred on largely by advances in molecular biology and genomic technologies. In recent years environmental microbiology has entered very firmly into the age of the 'omics' – (meta)genomics, proteomics, metabolomics, transcriptomics – with probably others on the rise. Microbes are essential participants in all biogeochemical processes on our planet, and the practical applications of what we are learning from the use of molecular approaches has altered how we view biological systems. In addition, there is considerable potential to use information about uncultured microbes in biodiscovery research as microbes provide a rich source of discovery for novel genes, enzymes and metabolic pathways. This review explores the brief history of genomic and metagenomic approaches to study environmental microbial assemblages and describes some of the future challenges involved in broadening our approaches – leading to new insights for understanding environmental problems and enabling biodiscovery research.Entities:
Mesh:
Year: 2010 PMID: 20953417 PMCID: PMC2948669 DOI: 10.1111/j.1751-7915.2010.00193.x
Source DB: PubMed Journal: Microb Biotechnol ISSN: 1751-7915 Impact factor: 5.813
Figure 1Publicly available completed reference genomes. Data reported by publication date, or if not published, the date that genome data was deposited into NCBI GenBank Data repository (http://www.ncbi.nlm.nih.gov/sites/genome). Data presented represent a total of 1106 genomes (966 Bacteria and 70 Archaea and 70 Eukaryotes). Eukaryotic data is for full or draft genomic data only and does not include mitochondrial or plasmid projects.
Summary of Roche 454, Illumina GA and ABI SOLiD seqeuncing capabilities.
| Roche 454 FLX Titanium | Illumina Solexa GA | ABI SOLiD 3 | |
|---|---|---|---|
| Reads per run (M) | 1.25 | 250 | 320 |
| Average read length (bp) | 330 | 75 or 100 | 50 |
| Usable reads that pass quality filters | > 99.5% | 55% | 35% |
| Raw accuracy reads | 96.0–97.0% | 96.2–99.7% | 99.0 to > 99.9% |
| Primary bias | Homopolymer read errors | short read length | short read length |
| Biases in eukaryotic sequencing | Minimal low coverage of AT‐rich regions | Low coverage in AT‐rich repetitive regions | Low coverage in AT‐rich repetitive regions |
| Amplicon overrepresentation in 50 bp end regions | 5% | 56% | 11% (after amplicon end removal) |
| Saturating level of redundant sequence coverage | 43× | 188× | 841× |
Reviewed by Metzker (2010).
Reviewed by Harismendy and colleagues (2009).
Reviewed by Chan (2009).
Higher accuracy achieved by reading each base twice in a two‐base encoding scheme.
Figure 2Challenges for processing genomic sequence data and Moore's law. Moore's law describes a relationship in the rate that computational infrastructure increases – basically a doubling approximately every two years. Genomic data in public domains is growing faster than the computational technological capacity for processing. Costs for blast analysis are presented in Amazon EC2 units and do not include storage or transfer costs [Figure modified from F. Meyer (IGSB, Argonne National Lab), with data from Wilkening ].
Figure 3Schematic representation of sample handling options for metagenomic studies. One strategy is to concentrated (often after pre‐filtration to remove larger eukaryotic organisms), and the total DNA or RNA extracted for shotgun or large‐insert library construction. A second strategy is to enrich the sample for a particular community member through specific culturing techniques or through FACS. Most recently, single cell approaches are being attempted to clone and sequence the uncultured majority (Figure courtesy of Tanja Woyke, DOE Joint Genome Institute).