| Literature DB >> 18467344 |
Kay Prüfer1, Udo Stenzel, Michael Dannemann, Richard E Green, Michael Lachmann, Janet Kelso.
Abstract
UNLABELLED: We present a tool suited for searching for many short nucleotide sequences in large databases, allowing for a predefined number of gaps and mismatches. The commandline-driven program implements a non-deterministic automata matching algorithm on a keyword tree of the search strings. Both queries with and without ambiguity codes can be searched. Search time is short for perfect matches, and retrieval time rises exponentially with the number of edits allowed. AVAILABILITY: The C++ source code for PatMaN is distributed under the GNU General Public License and has been tested on the GNU/Linux operating system. It is available from http://bioinf.eva.mpg.de/patman. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2008 PMID: 18467344 PMCID: PMC2718670 DOI: 10.1093/bioinformatics/btn223
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Keyword tree with suffix links after adding the sequences ‘CCC’, ‘GA’ and ‘GT’. The keyword tree (represented as bold lines) encodes the probe sequence as a path leading from the root node on the left side to the leaves on the right side. Suffix links are shown as arrows, but have been omitted at leaf nodes for brevity.
HGU95-A probes and Bonobo Reads against Chromosome 22
| Dataset | Edits | Gaps | Run time | Hits |
|---|---|---|---|---|
| HGU95-A probes | 0 | 0 | 0m13.31s | 93 225 |
| HGU95-A probes | 1 | 0 | 1m51.87s | 327 028 |
| HGU95-A probes | 1 | 1 | 3m36.92s | 496 296 |
| HGU95-A probes | 2 | 1 | 1h21m59s | 1 843 008 |
| Bonobo Solexa GAII data | 2 | 2 | 12h58m50s | 14.3×109 |
Benchmarking was performed on a 2.2 GHz workstation. Independently of the chosen parameters ∼260 MB RAM were used.
bBenchmarking was performed on a 1.8 GHz workstation and 8.6 GB of RAM was used during execution. The dataset contains 2.8 million reads of 38 bp length of genomic sequence from a Bonobo individual sequenced on the Solexa GAII platform.