| Literature DB >> 25626517 |
José Salavert1, Andrés Tomás2, Joaquín Tárraga3, Ignacio Medina4, Joaquín Dopazo5, Ignacio Blanquer6.
Abstract
BACKGROUND: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data.Entities:
Mesh:
Year: 2015 PMID: 25626517 PMCID: PMC4384339 DOI: 10.1186/s12859-014-0438-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Complete inexact search algorithm. Example for 2 errors, from top to down steps I, II and III.
Results for soap 2, Bowtie 1, BWA-backtrack and the new algorithm
|
|
|
| |
|---|---|---|---|
|
| |||
| 0 errors | 25 s | 0.51% | 11025 |
| 1 errors | 41 s | 3.22% | 71365 |
| 2 errors | 6 m 34 s | 10.25% | 243599 |
|
| |||
| 0 errors | 24 s | 0.51% | 11025 |
| 1 errors | 51 s | 3.22% | 71365 |
| 2 errors | 4 m 58 s | 10.25% | 243599 |
| 3 errors | 12 m 13 s | 22.53% | 594626 |
|
| |||
| 0 errors | 1 m 17 s | 0.51% | |
| 1 errors | 1 m 19 s | 3.22% | |
| 2 errors | 1 m 30 s | 10.29% | |
| 3 errors | 2 m 2 s | 22.65% | |
| 4 errors | 4 m 35 s | 38.73% | |
| 5 errors | 16 m 28 s | 55.44% | |
| 6 errors | 60 m 17 s | 69.78% | |
|
|
| ||
| 0 errors | 21 s | 0.51% | 11025 |
| 1 errors | 35 s | 3.22% | 72546 |
| 2 errors | 52 s | 10.29% | 253479 |
| 3 errors | 1 m 28 s | 22.66% | 644415 |
| 4 errors | 2 m 45 s | 38.74% | 1188595 |
| 5 errors | 7 m 24 s | 55.46% | 1820725 |
| 6 errors | 23 m 45 s | 69.78% | 2830556 |
|
|
| ||
| 0 errors | 21 s | 0.51% | 11025 |
| 1 errors | 31 s | 3.22% | 72546 |
| 2 errors | 51 s | 10.29% | 246841 |
| 3 errors | 1 m 21 s | 22.60% | 515399 |
| 4 errors | 2 m 1 s | 38.44% | 881503 |
| 5 errors | 3 m 3 s | 54.46% | 1296682 |
| 6 errors | 4 m 40 s | 67.46% | 1716369 |
| 7 errors | 7 m 12 s | 76.19% | 2135443 |
The dataset contains 2 million 250 bps reads.
Figure 2BWT and SW tools. 2 Million 250 bps reads. Execution times comparing the new algorithm, the modern mappers and the combination of both.
Figure 3BWT and SW tools. 2 Million 400 bps reads. Execution times comparing the new algorithm, the modern mappers and the combination of both.
Figure 4BWT and csalib runtimes. 2 Million 250 bps reads. Execution times from 0 to 7 errors with stack size 500.