| Literature DB >> 26051265 |
Alberto Policriti, Nicola Prezza.
Abstract
BACKGROUND: The high throughput of modern NGS sequencers coupled with the huge sizes of genomes currently analysed, poses always higher algorithmic challenges to align short reads quickly and accurately against a reference sequence. A crucial, additional, requirement is that the data structures used should be light. The available modern solutions usually are a compromise between the mentioned constraints: in particular, indexes based on the Burrows-Wheeler transform offer reduced memory requirements at the price of lower sensitivity, while hash-based text indexes guarantee high sensitivity at the price of significant memory consumption.Entities:
Mesh:
Year: 2015 PMID: 26051265 PMCID: PMC4464037 DOI: 10.1186/1471-2105-16-S9-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Space required by the tested tools (Vitis Vinifera genome) This plot compares the space needed by some of the most popular short-read aligners to index the Vitis Vinifera genome. We reported space on disk (storage of the index) and RAM (structures loaded in memory). Full text indexes such as the hash data structure implemented in ERNE require much more space than the succinct and compressed indexed used by the other tools. Notice that the space required by succinct (BW-ERNE) and compressed (Bowtie, SOAP2, BWA) indexes is almost the same in DNA indexing: this is due to the fact that DNA is, in general, extremely difficult to compress.
Figure 2Results on 5M 100 bp single-end reads simulated using the tool SimSeq (Vitis Vinifera genome) These experiments allowed us to judge how the presence of reliable base qualities affected the quality-aware strategies of Bowtie and BW-ERNE. The left plot shows that BW-ERNE is able to exploit at best the presence of reliable base qualities: our tool was several times faster than the other tools, while at the same time correctly aligning the highest number of reads (together with ERNE).
Results of the GCAT experiment (data coming from GCAT website).
| Tool | Total Reads | Correct | Incorrect | Unmapped |
|---|---|---|---|---|
| BW-ERNE | 11,945,249 | 97.30% | 2.311% | 0.3900% |
| Bowtie2 | 11,945,249 | 93.52% | 5.284% | 1.192% |
| Novoalign | 11,945,249 | 97.47% | 0.08329% | 2.445% |
| Novoalign3 | 11,945,249 | 97.47% | 0.08300% | 2.442% |
| BWA | 11,945,249 | 93.91% | 1.707% | 4.385% |
| BWA-SW | 11,971,702 | 94.29% | 4.139% | 1.576% |
| BWA-MEM | 11,951,583 | 97.47% | 2.515% | 0.01361% |
BW-ERNE ranks among the most precise tools, correctly aligning a number of reads comparable to that of slower aligners such as Novoalign.