| Literature DB >> 23658416 |
Abstract
SUMMARY: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles. AVAILABILITY: PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website.Entities:
Mesh:
Year: 2013 PMID: 23658416 PMCID: PMC3866550 DOI: 10.1093/bioinformatics/btt254
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.An overview of the PAUDA approach
Alignment of Illumina reads from permafrost data against the KEGG database
| Method | Time | Speed-up | Reads assignedd | KOs | True KOs | |||
|---|---|---|---|---|---|---|---|---|
| PAUDA | 7 | 155 824 | 33% | 4182 | 78% | 1717 | 99.0% | |
| RAPSearch2 | 510 | 449 144 | 96% | 5237 | 98% | 1712 | 98.7% | |
| BLASTX | 30 240 | 465 588 | 100% | 5363 | 100% | 1735 | 100% | |
aThe method used.
bThe number of wall-clock minutes required on 48 cores to process all 12 datasets.
cThe speed-up over BLASTX.
dThe number and percentage of reads that obtain a KO assignment.
eThe number and percentage of different KO groups identified.
fThe number of ‘true’ KO groups identified, defined as those that account for 99% of all reads with BLASTX hits. Percentages are in comparison with the results obtained by BLASTX. Note that half of the runtime reported here for PAUDA is start up overhead and on larger datasets the speed-up is .
Fig. 2.KEGG comparison of PAUDA and BLASTX. Left: Each true KO group is represented by a dot with coordinates that correspond to the number of reads assigned to the KO group by BLASTX (on the x-axis) and PAUDA (on the y-axis). Right: To show the low abundance KO groups more clearly, here, we plot the same data on a logarithmic scale