Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Toward a Better Compression for DNA Sequences Using Huffman Encoding.

Literature DB >> 27960065

Toward a Better Compression for DNA Sequences Using Huffman Encoding.

Anas Al-Okaily¹, Badar Almarri¹, Sultan Al Yami¹, Chun-Hsi Huang¹.

Abstract

Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

Entities: Gene

Keywords: DNA sequences compression; Huffman encoding; compression algorithm

Mesh：

Year: 2016 PMID： 27960065 PMCID： PMC5372760 DOI： 10.1089/cmb.2016.0151

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

4 in total

1. Significantly lower entropy estimates for natural DNA sequences.

Authors: D Loewenstern; P N Yianilos
Journal: J Comput Biol Date: 1999 Impact factor: 1.479

2. PatternHunter: faster and more sensitive homology search.

Authors: Bin Ma; John Tromp; Ming Li
Journal: Bioinformatics Date: 2002-03 Impact factor: 6.937

3. DNACompress: fast and effective DNA sequence compression.

Authors: Xin Chen; Ming Li; Bin Ma; John Tromp
Journal: Bioinformatics Date: 2002-12 Impact factor: 6.937

4. G-SQZ: compact encoding of genomic sequence and quality data.

Authors: Waibhav Tembe; James Lowey; Edward Suh
Journal: Bioinformatics Date: 2010-07-06 Impact factor: 6.937

4 in total

Review 1. Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format.

Authors: Kirill Kryukov; Lihua Jin; So Nakagawa
Journal: Patterns (N Y) Date: 2022-07-07

2. Sequence Compression Benchmark (SCB) database-A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences.

Authors: Kirill Kryukov; Mahoko Takahashi Ueda; So Nakagawa; Tadashi Imanishi
Journal: Gigascience Date: 2020-07-01 Impact factor: 6.524

3. Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences.

Authors: Kirill Kryukov; Mahoko Takahashi Ueda; So Nakagawa; Tadashi Imanishi
Journal: Bioinformatics Date: 2019-10-01 Impact factor: 6.937

4. A self-contained and self-explanatory DNA storage system.

Authors: Min Li; Jiashu Wu; Junbiao Dai; Qingshan Jiang; Qiang Qu; Xiaoluo Huang; Yang Wang
Journal: Sci Rep Date: 2021-09-10 Impact factor: 4.379

4 in total