Literature DB >> 26501129

An Adaptive Difference Distribution-based Coding with Hierarchical Tree Structure for DNA Sequence Compression.

Wenrui Dai1, Hongkai Xiong2, Xiaoqian Jiang3, Lucila Ohno-Machado4.   

Abstract

Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.

Entities:  

Year:  2013        PMID: 26501129      PMCID: PMC4617277          DOI: 10.1109/DCC.2013.45

Source DB:  PubMed          Journal:  Proc Data Compress Conf        ISSN: 2375-0383


  8 in total

1.  Biological sequence compression algorithms.

Authors:  T Matsumoto; K Sadakane; H Imai
Journal:  Genome Inform Ser Workshop Genome Inform       Date:  2000

2.  A compression algorithm for DNA sequences.

Authors:  C Xin; K Sam; L Ming
Journal:  IEEE Eng Med Biol Mag       Date:  2001 Jul-Aug

3.  DNACompress: fast and effective DNA sequence compression.

Authors:  Xin Chen; Ming Li; Bin Ma; John Tromp
Journal:  Bioinformatics       Date:  2002-12       Impact factor: 6.937

4.  Data structures and compression algorithms for genomic sequence data.

Authors:  Marty C Brandon; Douglas C Wallace; Pierre Baldi
Journal:  Bioinformatics       Date:  2009-05-15       Impact factor: 6.937

5.  Human genomes as email attachments.

Authors:  Scott Christley; Yiming Lu; Chen Li; Xiaohui Xie
Journal:  Bioinformatics       Date:  2008-11-07       Impact factor: 6.937

6.  On the future of genomic data.

Authors:  Scott D Kahn
Journal:  Science       Date:  2011-02-11       Impact factor: 47.728

7.  A novel compression tool for efficient storage of genome resequencing data.

Authors:  Congmao Wang; Dabing Zhang
Journal:  Nucleic Acids Res       Date:  2011-01-25       Impact factor: 16.971

8.  GReEn: a tool for efficient compression of genome resequencing data.

Authors:  Armando J Pinho; Diogo Pratas; Sara P Garcia
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

  8 in total
  2 in total

1.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

2.  Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes.

Authors:  Diogo Pratas; Raquel M Silva; Armando J Pinho
Journal:  Entropy (Basel)       Date:  2018-05-23       Impact factor: 2.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.