| Literature DB >> 25161219 |
Hannes Hauswedell1, Jochen Singer1, Knut Reinert1.
Abstract
MOTIVATION: Next-generation sequencing technologies produce unprecedented amounts of data, leading to completely new research fields. One of these is metagenomics, the study of large-size DNA samples containing a multitude of diverse organisms. A key problem in metagenomics is to functionally and taxonomically classify the sequenced DNA, to which end the well-known BLAST program is usually used. But BLAST has dramatic resource requirements at metagenomic scales of data, imposing a high financial or technical burden on the researcher. Multiple attempts have been made to overcome these limitations and present a viable alternative to BLAST.Entities:
Mesh:
Year: 2014 PMID: 25161219 PMCID: PMC4147892 DOI: 10.1093/bioinformatics/btu439
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Pipeline of Lambda (BlastX-mode). The lambda_indexer takes as input a file with the subject sequences and optionally an interval file computed by seqmasker to create the index for seed identification. The query sequences are translated from DNA to amino acid alphabet, reduced and then converted into a search trie. Afterwards, the search trie is used together with the pre-computed index to find candidate regions, which are then extended and verified
Fig. 2.Banded right extension of alignment. The seed alignment is orange, the right extension purple; white solid cells are computed, grey cells are out of the band; tilted cells would have been computed, but were not, owing to the x-drop; cell with + contains the last and global maximum
Results overview of dataset I (Illumina reads)
Note: Relative measures highlighted and in comparison with BLAST; green indicates the best results of a column, followed by yellow (satisfactory) and red (worst). The ‘≳’ column contains only recalls with bit scores ≳ to BLAST’s.
Results overview of dataset II (Sanger reads)
Note: Relative measures highlighted and in comparison with BLAST; green indicates the best results of a column, followed by yellow (satisfactory) and red (worst). The ‘≳’ column contains only recalls with bit scores ≳ to BLAST’s.
Properties of the two datasets
| Dataset I | Dataset II | |
|---|---|---|
| Origin | Bovine Gut | Sargasso Sea |
| Average read length | 72 bp | 818 bp |
| Number of reads | 58 240 283 | 1 982 807 |
| Number of reads selected | 1 200 000 | 100 000 |
| Selected read lengths | 72 bp | |
| Minimum bit score | 42.0695 | 48.5243 |