| Literature DB >> 16464253 |
Yanan Yu1, Mya Breitbart, Pat McNairnie, Forest Rohwer.
Abstract
BACKGROUND: High-throughput sequencing makes it possible to rapidly obtain thousands of 16S rDNA sequences from environmental samples. Bioinformatic tools for the analyses of large 16S rDNA sequence databases are needed to comprehensively describe and compare these datasets.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16464253 PMCID: PMC1386709 DOI: 10.1186/1471-2105-7-57
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the process for analyzing microbial communities using FastGroupII. A) Protocol for high-throughput sequencing of environmental microbial communities. B) Protocol for 16S rDNA analyses used in FastGroupII. Sequences are trimmed and dereplicated according to user-specified parameters. FastGroupII can perform rarefaction analysis, and calculate the Chao1 richness estimator and the Shannon-Wiener diversity index. The output from FastGroupII is formatted for submission to sequence classification programs such as BLAST [10] and RDP Classifier [11].
Figure 2FastGroupII online analyses tool at FastGroupII Tools [6]. A FASTA formatted file containing the raw 16S rDNA sequences is first uploaded or pasted as the input file. The user then specifies the trimming and grouping criteria and selects the desired output. After submission, analysis is performed on the remote server and results are returned to the user on the same web page.
Comparison of different grouping algorithms available within FastGroupII and DOTUR. A total of 621 16S rDNA sequences were grouped 20 times using the PSI, PSI with Gaps, and Seq-Match methods. During each separate grouping, Query Sequences were chosen at random to determine if there was any effect of input order. Data from these 20 groupings are shown as the average ± standard deviation. The Tree-parsing and DOTUR methods use global alignments, so randomization was not used. The 3 methods in DOTUR use the PHYLIP distance matrix generated from a global alignment in ClustalW (FN: Furthest Neighbor, NN: Nearest Neighbor, AN: Average Neighbor).
| # of groups | 209 ± 2 | 160 ± 4 | 140 ± 3 | 200 | 132 | 122 | 126 |
| Richness (Chao1) | 599 ± 27 | 359 ± 22 | 281 ± 8 | 440 | 249 | 241 | 246 |
| Diversity (Shannon-Wiener) | 3.98 ± 0.04 | 3.62 ± 0.10 | 3.35 ± 0.19 | 4.5 | 3.58 | 3.04 | 3.07 |
| # of singletons | 148 ± 2 | 99.7 ± 3.2 | 80.8 ± 1.7 | 120 | 72 | 69 | 71 |
| # of doubletons | 28.2 ± 1.5 | 25.3 ± 2.7 | 23.2 ± 0.9 | 29 | 22 | 20 | 21 |
Speed of the 4 grouping methods in FastGroupII, and a comparison with FastGroup 1.0. The time in seconds was determined by trimming and grouping the 16S rDNA test dataset found on the FastGroupII website. A total of 621 sequences were dereplicated. A percentage sequence identity of 97% was used to group similar sequences in the PSI, PSI with Gaps and Tree-parsing method. A percentage sequence identity of 83% was used in the Seq-Match method.
| PSI | 2 |
| PSI with Gaps | 5 |
| Seq-Match | 10 |
| Tree-parsing | 7152 (ClustalW) + 0.1 (tree-parsing time) |
| FastGroup 1.0 | 360 |
Figure 3Rank-abundance curves predicted from the test dataset using FastGroupII and methods in DOTUR. The curves reveal similar grouping patterns predicted using the different methods. For clarity, the tails of singletons were excluded from the figure.