| Literature DB >> 18285365 |
Lutz Krause1, Naryttza N Diaz, Alexander Goesmann, Scott Kelley, Tim W Nattkemper, Forest Rohwer, Robert A Edwards, Jens Stoye.
Abstract
Metagenomics is providing striking insights into the ecology of microbial communities. The recently developed massively parallel 454 pyrosequencing technique gives the opportunity to rapidly obtain metagenomic sequences at a low cost and without cloning bias. However, the phylogenetic analysis of the short reads produced represents a significant computational challenge. The phylogenetic algorithm CARMA for predicting the source organisms of environmental 454 reads is described. The algorithm searches for conserved Pfam domain and protein families in the unassembled reads of a sample. These gene fragments (environmental gene tags, EGTs), are classified into a higher-order taxonomy based on the reconstruction of a phylogenetic tree of each matching Pfam family. The method exhibits high accuracy for a wide range of taxonomic groups, and EGTs as short as 27 amino acids can be phylogenetically classified up to the rank of genus. The algorithm was applied in a comparative study of three aquatic microbial samples obtained by 454 pyrosequencing. Profound differences in the taxonomic composition of these samples could be clearly revealed.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18285365 PMCID: PMC2367736 DOI: 10.1093/nar/gkn038
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Unrooted phylogenetic tree reconstructed from a toy example multiple alignment. The multiple alignment shown was constructed from taxaknown members of a given Pfam family (PF1,…,PF7) and EGTs matching that family (EGT1,EGT2,EGT3). A phylogenetic tree reconstructed from the alignment is illustrated on the right. The environmental gene tag EGT1 is localized in a subtree c*(EGT1) of cyanobacteria (depicted in blue). Hence, it is classified as ‘Bacteria Cyanobacteria’. As c*(EGT1) contains cyanobacteria from different genera, EGT1 is classified as an unknown taxon at the rank of genus.
Figure 2.Accuracy obtained for the taxonomic assignment of 80–120 bp long fragments from 77 complete genomes. The sensitivity (Sens), specificity (Spec), false negative rate (FNrate) and proportion of EGTs that could not be assigned to any taxonomic group (Urate) are shown as colored bars.
Figure 3.False positive rate for the phylogenetic classification of 80–120 bp long fragments from 77 complete genomes. Shown is the proportion of EGTs misclassified into different taxonomic groups for four taxonomic ranks: superkingdom, phylum, class and order.
Taxonomic characterization of three metagenomes obtained by 454 pyrosequencing. The sample size (number of reads produced), the number of identified EGTs and the proportion of EGTs for which a taxonomic origin was predicted at different taxonomic ranks are shown
| Sample | Size | EGTs | Proportion of EGTs taxonomically assigned | ||||
|---|---|---|---|---|---|---|---|
| Superkingdom (%) | Phylum (%) | Class (%) | Order (%) | Genus (%) | |||
| Coral reef | 188.445 | 3.577 | 75 | 66 | 53 | 53 | 33 |
| Stromatolite | 124.694 | 7.414 | 92 | 77 | 72 | 70 | 37 |
| Solar saltern | 582.681 | 55.605 | 92 | 71 | 57 | 56 | 42 |
| Average | 86 | 68 | 61 | 60 | 37 | ||
Figure 4.Taxonomic characterization of three environmental samples obtained by 454 pyrosequencing. Bars illustrate the proportion of EGTs classified into different taxonomic groups. pEGTs is the fraction of EGTs classified as bacteria or archaea.
Prokaryotic diversity (H′) and evenness (J) in three aquatic microbial samples at rank of phylum, class, order, and genus
| Phylum | Class | Order | Genus | |||||
|---|---|---|---|---|---|---|---|---|
| Sample | H′ | J | H′ | J | H′ | J | H′ | J |
| Coral reef | 1.2 | 0.46 | 1.7 | 0.55 | 3.9 | 0.81 | 4.2 | 0.83 |
| Stromatolite | 1.1 | 0.42 | 1.16 | 0.37 | 2.7 | 0.55 | 3.6 | 0.70 |
| Solar saltern | 0.8 | 0.31 | 1.0 | 0.32 | 1.4 | 0.28 | 2.6 | 0.45 |