Literature DB >> 27485445

NRGC: a novel referential genome compression algorithm.

Subrata Saha1, Sanguthevar Rajasekaran1.   

Abstract

MOTIVATION: Next-generation sequencing techniques produce millions to billions of short reads. The procedure is not only very cost effective but also can be done in laboratory environment. The state-of-the-art sequence assemblers then construct the whole genomic sequence from these reads. Current cutting edge computing technology makes it possible to build genomic sequences from the billions of reads within a minimal cost and time. As a consequence, we see an explosion of biological sequences in recent times. In turn, the cost of storing the sequences in physical memory or transmitting them over the internet is becoming a major bottleneck for research and future medical applications. Data compression techniques are one of the most important remedies in this context. We are in need of suitable data compression algorithms that can exploit the inherent structure of biological sequences. Although standard data compression algorithms are prevalent, they are not suitable to compress biological sequencing data effectively. In this article, we propose a novel referential genome compression algorithm (NRGC) to effectively and efficiently compress the genomic sequences.
RESULTS: We have done rigorous experiments to evaluate NRGC by taking a set of real human genomes. The simulation results show that our algorithm is indeed an effective genome compression algorithm that performs better than the best-known algorithms in most of the cases. Compression and decompression times are also very impressive.
AVAILABILITY AND IMPLEMENTATION: The implementations are freely available for non-commercial purposes. They can be downloaded from: http://www.engr.uconn.edu/~rajasek/NRGC.zip CONTACT: rajasek@engr.uconn.edu.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2016        PMID: 27485445      PMCID: PMC5939913          DOI: 10.1093/bioinformatics/btw505

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group.

Authors:  Sung-Min Ahn; Tae-Hyung Kim; Sunghoon Lee; Deokhoon Kim; Ho Ghang; Dae-Soo Kim; Byoung-Chul Kim; Sang-Yoon Kim; Woo-Yeon Kim; Chulhong Kim; Daeui Park; Yong Seok Lee; Sangsoo Kim; Rohit Reja; Sungwoong Jho; Chang Geun Kim; Ji-Young Cha; Kyung-Hee Kim; Bonghee Lee; Jong Bhak; Seong-Jin Kim
Journal:  Genome Res       Date:  2009-05-26       Impact factor: 9.043

3.  Human genomes as email attachments.

Authors:  Scott Christley; Yiming Lu; Chen Li; Xiaohui Xie
Journal:  Bioinformatics       Date:  2008-11-07       Impact factor: 6.937

4.  The human genome contracts again.

Authors:  Dmitri S Pavlichin; Tsachy Weissman; Golan Yona
Journal:  Bioinformatics       Date:  2013-06-22       Impact factor: 6.937

5.  Robust relative compression of genomes with random access.

Authors:  Sebastian Deorowicz; Szymon Grabowski
Journal:  Bioinformatics       Date:  2011-09-05       Impact factor: 6.937

6.  ERGC: an efficient referential genome compression algorithm.

Authors:  Subrata Saha; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-07-02       Impact factor: 6.937

7.  iDoComp: a compression scheme for assembled genomes.

Authors:  Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2014-10-24       Impact factor: 6.937

8.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

9.  A novel compression tool for efficient storage of genome resequencing data.

Authors:  Congmao Wang; Dabing Zhang
Journal:  Nucleic Acids Res       Date:  2011-01-25       Impact factor: 16.971

10.  Big Data: Astronomical or Genomical?

Authors:  Zachary D Stephens; Skylar Y Lee; Faraz Faghri; Roy H Campbell; Chengxiang Zhai; Miles J Efron; Ravishankar Iyer; Michael C Schatz; Saurabh Sinha; Gene E Robinson
Journal:  PLoS Biol       Date:  2015-07-07       Impact factor: 8.029

View more
  7 in total

Review 1.  Mining Cancer Transcriptomes: Bioinformatic Tools and the Remaining Challenges.

Authors:  Thomas Milan; Brian T Wilhelm
Journal:  Mol Diagn Ther       Date:  2017-06       Impact factor: 4.074

2.  A Hybrid Data-Differencing and Compression Algorithm for the Automotive Industry.

Authors:  Sabin Belu; Daniela Coltuc
Journal:  Entropy (Basel)       Date:  2022-04-19       Impact factor: 2.738

3.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

4.  HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data.

Authors:  Haichang Yao; Yimu Ji; Kui Li; Shangdong Liu; Jing He; Ruchuan Wang
Journal:  Biomed Res Int       Date:  2019-11-16       Impact factor: 3.411

5.  Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes.

Authors:  Diogo Pratas; Raquel M Silva; Armando J Pinho
Journal:  Entropy (Basel)       Date:  2018-05-23       Impact factor: 2.524

6.  SparkGC: Spark based genome compression for large collections of genomes.

Authors:  Haichang Yao; Guangyong Hu; Shangdong Liu; Houzhi Fang; Yimu Ji
Journal:  BMC Bioinformatics       Date:  2022-07-25       Impact factor: 3.307

7.  Sketch distance-based clustering of chromosomes for large genome database compression.

Authors:  Tao Tang; Yuansheng Liu; Buzhong Zhang; Benyue Su; Jinyan Li
Journal:  BMC Genomics       Date:  2019-12-30       Impact factor: 3.969

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.