| Literature DB >> 22618877 |
Bernd W Brandt1, Marc J Bonder, Susan M Huse, Egija Zaura.
Abstract
Amplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce ample reads, but they are short, currently ∼100-450 nt (depending on the technology), as compared to the full rRNA gene of ∼1550 nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at http://www.ibi.vu.nl/programs/taxmanwww/.Entities:
Mesh:
Year: 2012 PMID: 22618877 PMCID: PMC3394339 DOI: 10.1093/nar/gks418
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data on CPU time, run time (hr:mm:ss format), physical memory (mem) and virtual memory (vmem) usage (in kb) as reported by the cluster software (PBS). BLAST was run on eight cores, the other programs on one core. Percentage improvement is calculated as the relative difference (original-trimmed)/original
| Program | Measure | Original set | Trimmed set | % Improvement | Fold improvement |
|---|---|---|---|---|---|
| BLAST | CPU time | 9:17:27 | 4:05:52 | 56 | 2.3 |
| run time | 1:12:48 | 0:41:26 | 43 | 1.8 | |
| mem | 396 360 | 189 848 | 52 | 2.1 | |
| vmem | 1 297 204 | 974 756 | 25 | 1.3 | |
| UCLUST ref | CPU time | 0:05:17 | 0:00:47 | 85 | 6.7 |
| run time | 0:05:25 | 0:00:56 | 83 | 5.8 | |
| mem | 9 456 156 | 699 388 | 93 | 14 | |
| vmem | 12 575 444 | 869 316 | 93 | 14 | |
| UCLUST ref opt | CPU time | 73:46:16 | 1:14:17 | 98 | 60 |
| run time | 73:54:41 | 1:14:35 | 98 | 59 | |
| mem | 9 374 752 | 1 384 260 | 85 | 6.8 | |
| vmem | 12 473 384 | 1 780 908 | 86 | 7.0 | |
| UCHIME | CPU time | 29:57:17 | 3:13:00 | 89 | 9.3 |
| run time | 30:00:50 | 3:13:26 | 89 | 9.3 | |
| mem | 1 009 776 | 164 896 | 84 | 6.1 | |
| vmem | 1 118 688 | 267 052 | 76 | 4.2 |
aUCLUST reference mode.
bUCLUST reference optimal mode.
cThe concordance is 93.5%. The fold improvement is the ratio (original/trimmed)
Figure 1.Partial tree view of the amplicons based on the CORE database. For each node, it shows the number of sequences targeted by the given primers, followed by number in the original reference as well as the percentage. The data used for the tree (except the percentages) is downloadable as the tab-delimited lineage file.
Figure 2.An example of pie plots for the amplicons (CORE database). The distribution of sub-categories within three taxonomic levels, shown as the chart titles, is plotted. The percentage threshold is 0 for all plots. The top panel series is obtained by clicking on Bacteria (Root pie) and Actinobacteria (Bacteria pie). Clicking a pie slice or legend label will produce the next chart and hide the legend of the previous one (except the legend of the Root pie). The bottom panel series of charts is similar, but for the phylum Actinobacteria a plot of differences, indicated by the pink header, is shown. Here, the data refers to the number of sequences missed by the amplicons as compared with the reference data. For the class Actinobacteridae, 46 out of 110 sequences are missing (see legend). The ‘100%’ in the Actinobacteridae pie slice illustrates that all missed sequences in the phylum Actinobacteria belong to the Actinobacteridae class. For Coriobacteridae, no sequences are missing (indicated by 0/9 in the legend). When hovering over a ‘legend’ label, always the number of sequences that are targeted is displayed in the pie (Actinobacteridae; cnt: 64/110). Therefore, this information is the same for both types of pies for Actinobacteria.