Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 BIND - an algorithm for loss-less compression of nucleotide sequence data.

Literature DB >> 22922203

BIND - an algorithm for loss-less compression of nucleotide sequence data.

Tungadri Bose¹, Monzoorul Haque Mohammed, Anirban Dutta, Sharmila S Mande.

Abstract

Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND-an algorithm specialized for compressing nucleotide sequence data. By adopting a unique 'block-length' encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/ memory usage. BIND is available for download at http://metagenomics.atc.tcs.com/compression/BIND. No license is required for academic or non-profit use.

Mesh：

Year: 2012 PMID： 22922203 DOI： 10.1007/s12038-012-9230-6

Source DB: PubMed Journal: J Biosci ISSN： 0250-5991 Impact factor: 1.826

5 in total

1. DNACompress: fast and effective DNA sequence compression.

Authors: Xin Chen; Ming Li; Bin Ma; John Tromp
Journal: Bioinformatics Date: 2002-12 Impact factor: 6.937

Review 2. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

Review 3. The impact of next-generation sequencing on genomics.

Authors: Jun Zhang; Rod Chiodini; Ahmed Badr; Genfa Zhang
Journal: J Genet Genomics Date: 2011-03-15 Impact factor: 4.275

4. On the representability of complete genomes by multiple competing finite-context (Markov) models.

Authors: Armando J Pinho; Paulo J S G Ferreira; António J R Neves; Carlos A C Bastos
Journal: PLoS One Date: 2011-06-30 Impact factor: 3.240

5. The International Nucleotide Sequence Database Collaboration.

Authors: Guy Cochrane; Ilene Karsch-Mizrachi; Yasukazu Nakamura
Journal: Nucleic Acids Res Date: 2010-11-23 Impact factor: 16.971

5 in total

3 in total

BIND - an algorithm for loss-less compression of nucleotide sequence data.

1. DNACompress: fast and effective DNA sequence compression.

Review 2. Sequencing technologies - the next generation.

Review 3. The impact of next-generation sequencing on genomics.

4. On the representability of complete genomes by multiple competing finite-context (Markov) models.

5. The International Nucleotide Sequence Database Collaboration.

1. FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

2. Efficient DNA sequence compression with neural networks.

3. Algorithms designed for compressed-gene-data transformation among gene banks with different references.