Literature DB >> 26139636

ERGC: an efficient referential genome compression algorithm.

Subrata Saha1, Sanguthevar Rajasekaran1.   

Abstract

MOTIVATION: Genome sequencing has become faster and more affordable. Consequently, the number of available complete genomic sequences is increasing rapidly. As a result, the cost to store, process, analyze and transmit the data is becoming a bottleneck for research and future medical applications. So, the need for devising efficient data compression and data reduction techniques for biological sequencing data is growing by the day. Although there exists a number of standard data compression algorithms, they are not efficient in compressing biological data. These generic algorithms do not exploit some inherent properties of the sequencing data while compressing. To exploit statistical and information-theoretic properties of genomic sequences, we need specialized compression algorithms. Five different next-generation sequencing data compression problems have been identified and studied in the literature. We propose a novel algorithm for one of these problems known as reference-based genome compression.
RESULTS: We have done extensive experiments using five real sequencing datasets. The results on real genomes show that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem. It achieves compression ratios that are better than those of the currently best performing algorithms. The time to compress and decompress the whole genome is also very promising.
AVAILABILITY AND IMPLEMENTATION: The implementations are freely available for non-commercial purposes. They can be downloaded from http://engr.uconn.edu/∼rajasek/ERGC.zip. CONTACT: rajasek@engr.uconn.edu.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2015        PMID: 26139636      PMCID: PMC4838057          DOI: 10.1093/bioinformatics/btv399

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  11 in total

1.  Data structures and compression algorithms for genomic sequence data.

Authors:  Marty C Brandon; Douglas C Wallace; Pierre Baldi
Journal:  Bioinformatics       Date:  2009-05-15       Impact factor: 6.937

2.  Human genomes as email attachments.

Authors:  Scott Christley; Yiming Lu; Chen Li; Xiaohui Xie
Journal:  Bioinformatics       Date:  2008-11-07       Impact factor: 6.937

3.  The human genome contracts again.

Authors:  Dmitri S Pavlichin; Tsachy Weissman; Golan Yona
Journal:  Bioinformatics       Date:  2013-06-22       Impact factor: 6.937

4.  Genome compression: a novel approach for large collections.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Szymon Grabowski
Journal:  Bioinformatics       Date:  2013-08-21       Impact factor: 6.937

5.  Efficient storage of high throughput DNA sequencing data using reference-based compression.

Authors:  Markus Hsi-Yang Fritz; Rasko Leinonen; Guy Cochrane; Ewan Birney
Journal:  Genome Res       Date:  2011-01-18       Impact factor: 9.043

6.  Robust relative compression of genomes with random access.

Authors:  Sebastian Deorowicz; Szymon Grabowski
Journal:  Bioinformatics       Date:  2011-09-05       Impact factor: 6.937

7.  iDoComp: a compression scheme for assembled genomes.

Authors:  Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2014-10-24       Impact factor: 6.937

8.  The diploid genome sequence of an Asian individual.

Authors:  Jun Wang; Wei Wang; Ruiqiang Li; Yingrui Li; Geng Tian; Laurie Goodman; Wei Fan; Junqing Zhang; Jun Li; Juanbin Zhang; Yiran Guo; Binxiao Feng; Heng Li; Yao Lu; Xiaodong Fang; Huiqing Liang; Zhenglin Du; Dong Li; Yiqing Zhao; Yujie Hu; Zhenzhen Yang; Hancheng Zheng; Ines Hellmann; Michael Inouye; John Pool; Xin Yi; Jing Zhao; Jinjie Duan; Yan Zhou; Junjie Qin; Lijia Ma; Guoqing Li; Zhentao Yang; Guojie Zhang; Bin Yang; Chang Yu; Fang Liang; Wenjie Li; Shaochuan Li; Dawei Li; Peixiang Ni; Jue Ruan; Qibin Li; Hongmei Zhu; Dongyuan Liu; Zhike Lu; Ning Li; Guangwu Guo; Jianguo Zhang; Jia Ye; Lin Fang; Qin Hao; Quan Chen; Yu Liang; Yeyang Su; A San; Cuo Ping; Shuang Yang; Fang Chen; Li Li; Ke Zhou; Hongkun Zheng; Yuanyuan Ren; Ling Yang; Yang Gao; Guohua Yang; Zhuo Li; Xiaoli Feng; Karsten Kristiansen; Gane Ka-Shu Wong; Rasmus Nielsen; Richard Durbin; Lars Bolund; Xiuqing Zhang; Songgang Li; Huanming Yang; Jian Wang
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

9.  A novel compression tool for efficient storage of genome resequencing data.

Authors:  Congmao Wang; Dabing Zhang
Journal:  Nucleic Acids Res       Date:  2011-01-25       Impact factor: 16.971

10.  GReEn: a tool for efficient compression of genome resequencing data.

Authors:  Armando J Pinho; Diogo Pratas; Sara P Garcia
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

View more
  8 in total

1.  Comment on: 'ERGC: an efficient referential genome compression algorithm'.

Authors:  Sebastian Deorowicz; Szymon Grabowski; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2015-11-28       Impact factor: 6.937

2.  NRGC: a novel referential genome compression algorithm.

Authors:  Subrata Saha; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2016-08-02       Impact factor: 6.937

3.  A Hybrid Data-Differencing and Compression Algorithm for the Automotive Industry.

Authors:  Sabin Belu; Daniela Coltuc
Journal:  Entropy (Basel)       Date:  2022-04-19       Impact factor: 2.738

4.  Sequence Factorization with Multiple References.

Authors:  Sebastian Wandelt; Ulf Leser
Journal:  PLoS One       Date:  2015-09-30       Impact factor: 3.240

5.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

6.  Algorithms designed for compressed-gene-data transformation among gene banks with different references.

Authors:  Qiuming Luo; Chao Guo; Yi Jun Zhang; Ye Cai; Gang Liu
Journal:  BMC Bioinformatics       Date:  2018-06-18       Impact factor: 3.169

7.  HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data.

Authors:  Haichang Yao; Yimu Ji; Kui Li; Shangdong Liu; Jing He; Ruchuan Wang
Journal:  Biomed Res Int       Date:  2019-11-16       Impact factor: 3.411

8.  Sketch distance-based clustering of chromosomes for large genome database compression.

Authors:  Tao Tang; Yuansheng Liu; Buzhong Zhang; Benyue Su; Jinyan Li
Journal:  BMC Genomics       Date:  2019-12-30       Impact factor: 3.969

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.