Literature DB >> 26671800

CoGI: Towards Compressing Genomes as an Image.

Xiaojing Xie, Shuigeng Zhou, Jihong Guan.   

Abstract

Genomic science is now facing an explosive increase of data thanks to the fast development of sequencing technology. This situation poses serious challenges to genomic data storage and transferring. It is desirable to compress data to reduce storage and transferring cost, and thus to boost data distribution and utilization efficiency. Up to now, a number of algorithms / tools have been developed for compressing genomic sequences. Unlike the existing algorithms, most of which treat genomes as one-dimensional text strings and compress them based on dictionaries or probability models, this paper proposes a novel approach called CoGI (the abbreviation of Compressing Genomes as an Image) for genome compression, which transforms the genomic sequences to a two-dimensional binary image (or bitmap), then applies a rectangular partition coding algorithm to compress the binary image. CoGI can be used as either a reference-based compressor or a reference-free compressor. For the former, we develop two entropy-based algorithms to select a proper reference genome. Performance evaluation is conducted on various genomes. Experimental results show that the reference-based CoGI significantly outperforms two state-of-the-art reference-based genome compressors GReEn and RLZ-opt in both compression ratio and compression efficiency. It also achieves comparable compression ratio but two orders of magnitude higher compression efficiency in comparison with XM--one state-of-the-art reference-free genome compressor. Furthermore, our approach performs much better than Gzip--a general-purpose and widely-used compressor, in both compression speed and compression ratio. So, CoGI can serve as an effective and practical genome compressor. The source code and other related documents of CoGI are available at: http://admis.fudan.edu.cn/projects/cogi.htm.

Mesh:

Year:  2015        PMID: 26671800     DOI: 10.1109/TCBB.2015.2430331

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  8 in total

1.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

2.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

3.  Algorithms designed for compressed-gene-data transformation among gene banks with different references.

Authors:  Qiuming Luo; Chao Guo; Yi Jun Zhang; Ye Cai; Gang Liu
Journal:  BMC Bioinformatics       Date:  2018-06-18       Impact factor: 3.169

4.  HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data.

Authors:  Haichang Yao; Yimu Ji; Kui Li; Shangdong Liu; Jing He; Ruchuan Wang
Journal:  Biomed Res Int       Date:  2019-11-16       Impact factor: 3.411

5.  Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes.

Authors:  Diogo Pratas; Raquel M Silva; Armando J Pinho
Journal:  Entropy (Basel)       Date:  2018-05-23       Impact factor: 2.524

6.  Information Theory in Computational Biology: Where We Stand Today.

Authors:  Pritam Chanda; Eduardo Costa; Jie Hu; Shravan Sukumar; John Van Hemert; Rasna Walia
Journal:  Entropy (Basel)       Date:  2020-06-06       Impact factor: 2.524

7.  SparkGC: Spark based genome compression for large collections of genomes.

Authors:  Haichang Yao; Guangyong Hu; Shangdong Liu; Houzhi Fang; Yimu Ji
Journal:  BMC Bioinformatics       Date:  2022-07-25       Impact factor: 3.307

8.  Sketch distance-based clustering of chromosomes for large genome database compression.

Authors:  Tao Tang; Yuansheng Liu; Buzhong Zhang; Benyue Su; Jinyan Li
Journal:  BMC Genomics       Date:  2019-12-30       Impact factor: 3.969

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.