Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A space-efficient construction of the Burrows-Wheeler transform for genomic data.

Literature DB >> 16201914

A space-efficient construction of the Burrows-Wheeler transform for genomic data.

Ross A Lippert¹, Clark M Mobarry, Brian P Walenz.

Abstract

Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important. One such data structure, the compressed suffix array (CSA), based on the Burrows-Wheeler transform, has been shown to require memory which is nearly equal to the memory requirements of the original database, while supporting common sorts of query problems time efficiently. However, building a CSA from a sequence in efficient space and time is challenging. In 2002, the first space-efficient CSA construction algorithm was presented. That implementation used (1+2 log2 |summation|)(1+epsilon) bits per character (where epsilon is a small fraction). The construction algorithm ran in as much as twice that space, in O(| summation|n log(n)) time. We have created an implementation which can also achieve these asymptotic bounds, but for small alphabets, and only uses 1/2 (1+|summation|)(1+epsilon) bits per character, a factor of 2 less space for nucleotide alphabets. We present time and space results for the CSA construction and querying of our implementation on publicly available genome data which demonstrate the practicality of this approach.

Entities: Chemical Gene

Mesh：

Year: 2005 PMID： 16201914 DOI： 10.1089/cmb.2005.12.943

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

7 in total

A space-efficient construction of the Burrows-Wheeler transform for genomic data.

1. Short read fragment assembly of bacterial genomes.

2. Parallel continuous flow: a parallel suffix tree construction tool for whole genomes.

3. BarraCUDA - a fast short read sequence aligner using graphics processing units.

Review 4. Clinical integration of next-generation sequencing technology.

5. MICA: desktop software for comprehensive searching of DNA databases.

6. Towards computational improvement of DNA database indexing and short DNA query searching.

7. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes.