Literature DB >> 25173568

SeqCompress: an algorithm for biological sequence compression.

Muhammad Sardaraz1, Muhammad Tahir2, Ataul Aziz Ikram3, Hassan Bajwa4.   

Abstract

The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms.
Copyright © 2014 Elsevier Inc. All rights reserved.

Keywords:  Compression; DNA; Genome sequences; NGS technologies

Mesh:

Year:  2014        PMID: 25173568     DOI: 10.1016/j.ygeno.2014.08.007

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  2 in total

1.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

2.  LFastqC: A lossless non-reference-based FASTQ compressor.

Authors:  Sultan Al Yami; Chun-Hsi Huang
Journal:  PLoS One       Date:  2019-11-14       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.