Literature DB >> 26671804

Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence Similarities.

Kin-On Cheng, Paula Wu, Ngai-Fong Law, Wan-Chi Siu.   

Abstract

Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the DNA sequence and a set of reference sequences. Hence, a set of nearly identical DNA sequences from the same population or a set of partially similar DNA sequences like chromosome sequences and DNA sequences of related species can be compressed together. For practical compressors, the compressed size is usually influenced by the compression order of sequences. Fast search algorithms for the optimal compression order are thus developed for multiple sequences compression. Experimental results on artificial and real datasets demonstrate that our proposed multiple sequences compression methods with fast compression order search are able to achieve good compression performance under different levels of similarity in the multiple DNA sequences.

Mesh:

Substances:

Year:  2015        PMID: 26671804     DOI: 10.1109/TCBB.2015.2403370

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  2 in total

1.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

2.  Sketch distance-based clustering of chromosomes for large genome database compression.

Authors:  Tao Tang; Yuansheng Liu; Buzhong Zhang; Benyue Su; Jinyan Li
Journal:  BMC Genomics       Date:  2019-12-30       Impact factor: 3.969

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.