Literature DB >> 16201914

A space-efficient construction of the Burrows-Wheeler transform for genomic data.

Ross A Lippert1, Clark M Mobarry, Brian P Walenz.   

Abstract

Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important. One such data structure, the compressed suffix array (CSA), based on the Burrows-Wheeler transform, has been shown to require memory which is nearly equal to the memory requirements of the original database, while supporting common sorts of query problems time efficiently. However, building a CSA from a sequence in efficient space and time is challenging. In 2002, the first space-efficient CSA construction algorithm was presented. That implementation used (1+2 log2 |summation|)(1+epsilon) bits per character (where epsilon is a small fraction). The construction algorithm ran in as much as twice that space, in O(| summation|n log(n)) time. We have created an implementation which can also achieve these asymptotic bounds, but for small alphabets, and only uses 1/2 (1+|summation|)(1+epsilon) bits per character, a factor of 2 less space for nucleotide alphabets. We present time and space results for the CSA construction and querying of our implementation on publicly available genome data which demonstrate the practicality of this approach.

Entities:  

Mesh:

Year:  2005        PMID: 16201914     DOI: 10.1089/cmb.2005.12.943

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  7 in total

1.  Short read fragment assembly of bacterial genomes.

Authors:  Mark J Chaisson; Pavel A Pevzner
Journal:  Genome Res       Date:  2007-12-14       Impact factor: 9.043

2.  Parallel continuous flow: a parallel suffix tree construction tool for whole genomes.

Authors:  Matteo Comin; Montse Farreras
Journal:  J Comput Biol       Date:  2014-03-05       Impact factor: 1.479

3.  BarraCUDA - a fast short read sequence aligner using graphics processing units.

Authors:  Petr Klus; Simon Lam; Dag Lyberg; Ming Sin Cheung; Graham Pullan; Ian McFarlane; Giles Sh Yeo; Brian Yh Lam
Journal:  BMC Res Notes       Date:  2012-01-13

Review 4.  Clinical integration of next-generation sequencing technology.

Authors:  R R Gullapalli; M Lyons-Weiler; P Petrosko; R Dhir; M J Becich; W A LaFramboise
Journal:  Clin Lab Med       Date:  2012-12       Impact factor: 1.935

5.  MICA: desktop software for comprehensive searching of DNA databases.

Authors:  William A Stokes; Benjamin S Glick
Journal:  BMC Bioinformatics       Date:  2006-10-03       Impact factor: 3.169

6.  Towards computational improvement of DNA database indexing and short DNA query searching.

Authors:  Done Stojanov; Sašo Koceski; Aleksandra Mileva; Nataša Koceska; Cveta Martinovska Bande
Journal:  Biotechnol Biotechnol Equip       Date:  2014-10-31       Impact factor: 1.632

7.  Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes.

Authors:  Jiarong Guo; John F Quensen; Yanni Sun; Qiong Wang; C Titus Brown; James R Cole; James M Tiedje
Journal:  Front Genet       Date:  2019-10-15       Impact factor: 4.599

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.