Literature DB >> 22922203

BIND - an algorithm for loss-less compression of nucleotide sequence data.

Tungadri Bose1, Monzoorul Haque Mohammed, Anirban Dutta, Sharmila S Mande.   

Abstract

Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND-an algorithm specialized for compressing nucleotide sequence data. By adopting a unique 'block-length' encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/ memory usage. BIND is available for download at http://metagenomics.atc.tcs.com/compression/BIND. No license is required for academic or non-profit use.

Mesh:

Year:  2012        PMID: 22922203     DOI: 10.1007/s12038-012-9230-6

Source DB:  PubMed          Journal:  J Biosci        ISSN: 0250-5991            Impact factor:   1.826


  5 in total

1.  DNACompress: fast and effective DNA sequence compression.

Authors:  Xin Chen; Ming Li; Bin Ma; John Tromp
Journal:  Bioinformatics       Date:  2002-12       Impact factor: 6.937

Review 2.  Sequencing technologies - the next generation.

Authors:  Michael L Metzker
Journal:  Nat Rev Genet       Date:  2009-12-08       Impact factor: 53.242

Review 3.  The impact of next-generation sequencing on genomics.

Authors:  Jun Zhang; Rod Chiodini; Ahmed Badr; Genfa Zhang
Journal:  J Genet Genomics       Date:  2011-03-15       Impact factor: 4.275

4.  On the representability of complete genomes by multiple competing finite-context (Markov) models.

Authors:  Armando J Pinho; Paulo J S G Ferreira; António J R Neves; Carlos A C Bastos
Journal:  PLoS One       Date:  2011-06-30       Impact factor: 3.240

5.  The International Nucleotide Sequence Database Collaboration.

Authors:  Guy Cochrane; Ilene Karsch-Mizrachi; Yasukazu Nakamura
Journal:  Nucleic Acids Res       Date:  2010-11-23       Impact factor: 16.971

  5 in total
  3 in total

1.  FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

Authors:  Tungadri Bose; Anirban Dutta; Mohammed Mh; Hemang Gandhi; Sharmila S Mande
Journal:  J Biosci       Date:  2015-09       Impact factor: 1.826

2.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

3.  Algorithms designed for compressed-gene-data transformation among gene banks with different references.

Authors:  Qiuming Luo; Chao Guo; Yi Jun Zhang; Ye Cai; Gang Liu
Journal:  BMC Bioinformatics       Date:  2018-06-18       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.