| Literature DB >> 18184432 |
Andreas Döring1, David Weese, Tobias Rausch, Knut Reinert.
Abstract
BACKGROUND: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.Entities:
Mesh:
Year: 2008 PMID: 18184432 PMCID: PMC2246154 DOI: 10.1186/1471-2105-9-11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Genome comparison tools and their algorithmic components.
Figure 2SeqAn Contents Overview.
Figure 3Runtimes of String Matching Algorithms. We compared three exact string matching algorithms from SeqAn with the member function basic_string::find of the standard library, as it was implemented for Microsoft Visual C++. The left figure shows the runtimes (in ms) for searching a DNA sequence (human chromosome 21), the right figure for searching a proteine database. The search pattern was taken randomly from the sequence. The figures show the average time needed to find all occurrences of patterns of a given length.
Runtimes and internal space requirements for computing sequence alignments. The table shows average time and space requirements for aligning the genomes of two human influenca viruses, each of length about 15.6 kbp. Runtimes printed in bold face show for each library the time of the fastest algorithm for computing an alignment using edit distance.
| time (s) | space (MB) | time (s) | space (MB) | |
| Needleman-Wunsch | 3.3 | 236 | 6.3 | 236 |
| Hirschberg | 14.7 | 4 | ||
| Myers-Hirschberg | 3 | |||
| Needleman-Wunsch | 245 | |||
| Hirschberg | 6.6 | 14 | ||
| 2100 | 28.0 | ≈6000 | ||
| 933 | ||||
| 2000 | 93 | ≈6000 | ||
Runtimes and internal space requirements for finding MUMs. We compared MUMmer 3.19 [8], MGA [9], and SeqAn for different DNA sequences on a 3.2 GHz Intel Xeon computer with 3 GB of internal memory running Linux. Because MUMmer finds MUMs of not more than two sequences, its results on the Chlamydia and Escherichia coli strains are left empty. For the last dataset, we used SeqAn's external memory data strutures to limit the internal memory consumption.
| size (Mbp) | time (m:s) | space (MB) | time (m:s) | space (MB) | time (m:s) | space (MB) | |
| C. trachomatis D/UW-3/CX | 1.043 | ||||||
| C. muridarum Nigg | 1.073 | - | - | 0:06 | 33.8 | 0:04 | 31.6 |
| C. trachomatis A/HAR-13 | 1.044 | ||||||
| E. coli K12 | 4.640 | ||||||
| E. coli O157:H7 str. Sakai | 5.498 | ||||||
| E. coli CFT073 | 5.231 | - | - | 105:40 | 353.5 | 0:58 | 304.6 |
| E. coli UTI89 | 5.066 | ||||||
| E. coli 536 | 4.939 | ||||||
| E. coli APEC O1 | 5.082 | ||||||
| H. sapiens (chr. 21) | 46.94 | 2:25 | 568 | 4:44 | 1188 | 4:06 | 1307 |
| M. musculus (chr. 16) | 98.25 | ||||||
| H. sapiens (chr. 16) | 98.25 | 2:55 | 1362 | 18:48 | 1500 | 5:38 | 1627 |
| P. troglodytes (chr. 16) | 88.83 | ||||||
| H. sapiens (chr. 1) | 247.2 | insufficient memory | insufficient memory | 66:20 | 510 | ||
| M. musculus (chr. 1) | 197.1 | ||||||