Literature DB >> 25344501

iDoComp: a compression scheme for assembled genomes.

Idoia Ochoa1, Mikel Hernaez1, Tsachy Weissman1.   

Abstract

MOTIVATION: With the release of the latest next-generation sequencing (NGS) machine, the HiSeq X by Illumina, the cost of sequencing a Human has dropped to a mere $4000. Thus we are approaching a milestone in the sequencing history, known as the $1000 genome era, where the sequencing of individuals is affordable, opening the doors to effective personalized medicine. Massive generation of genomic data, including assembled genomes, is expected in the following years. There is crucial need for compression of genomes guaranteed of performing well simultaneously on different species, from simple bacteria to humans, which will ease their transmission, dissemination and analysis. Further, most of the new genomes to be compressed will correspond to individuals of a species from which a reference already exists on the database. Thus, it is natural to propose compression schemes that assume and exploit the availability of such references.
RESULTS: We propose iDoComp, a compressor of assembled genomes presented in FASTA format that compresses an individual genome using a reference genome for both the compression and the decompression. In terms of compression efficiency, iDoComp outperforms previously proposed algorithms in most of the studied cases, with comparable or better running time. For example, we observe compression gains of up to 60% in several cases, including H.sapiens data, when comparing with the best compression performance among the previously proposed algorithms. AVAILABILITY: iDoComp is written in C and can be downloaded from: http://www.stanford.edu/~iochoa/iDoComp.html (We also provide a full explanation on how to run the program and an example with all the necessary files to run it.).
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 25344501      PMCID: PMC5855886          DOI: 10.1093/bioinformatics/btu698

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  DNACompress: fast and effective DNA sequence compression.

Authors:  Xin Chen; Ming Li; Bin Ma; John Tromp
Journal:  Bioinformatics       Date:  2002-12       Impact factor: 6.937

2.  Human genomes as email attachments.

Authors:  Scott Christley; Yiming Lu; Chen Li; Xiaohui Xie
Journal:  Bioinformatics       Date:  2008-11-07       Impact factor: 6.937

3.  The human genome contracts again.

Authors:  Dmitri S Pavlichin; Tsachy Weissman; Golan Yona
Journal:  Bioinformatics       Date:  2013-06-22       Impact factor: 6.937

4.  High-throughput DNA sequence data compression.

Authors:  Zexuan Zhu; Yongpeng Zhang; Zhen Ji; Shan He; Xiao Yang
Journal:  Brief Bioinform       Date:  2013-12-03       Impact factor: 11.622

5.  Human genome 10th anniversary. Will computers crash genomics?

Authors:  Elizabeth Pennisi
Journal:  Science       Date:  2011-02-11       Impact factor: 47.728

6.  Robust relative compression of genomes with random access.

Authors:  Sebastian Deorowicz; Szymon Grabowski
Journal:  Bioinformatics       Date:  2011-09-05       Impact factor: 6.937

7.  FRESCO: Referential compression of highly similar sequences.

Authors:  Sebastian Wandelt; Ulf Leser
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2013 Sep-Oct       Impact factor: 3.710

8.  A novel compression tool for efficient storage of genome resequencing data.

Authors:  Congmao Wang; Dabing Zhang
Journal:  Nucleic Acids Res       Date:  2011-01-25       Impact factor: 16.971

9.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

10.  Data compression for sequencing data.

Authors:  Sebastian Deorowicz; Szymon Grabowski
Journal:  Algorithms Mol Biol       Date:  2013-11-18       Impact factor: 1.405

View more
  16 in total

1.  Comment on: 'ERGC: an efficient referential genome compression algorithm'.

Authors:  Sebastian Deorowicz; Szymon Grabowski; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2015-11-28       Impact factor: 6.937

2.  ERGC: an efficient referential genome compression algorithm.

Authors:  Subrata Saha; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-07-02       Impact factor: 6.937

3.  NRGC: a novel referential genome compression algorithm.

Authors:  Subrata Saha; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2016-08-02       Impact factor: 6.937

4.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

5.  GDC 2: Compression of large collections of genomes.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Marcin Niemiec
Journal:  Sci Rep       Date:  2015-06-25       Impact factor: 4.379

6.  Sequence Factorization with Multiple References.

Authors:  Sebastian Wandelt; Ulf Leser
Journal:  PLoS One       Date:  2015-09-30       Impact factor: 3.240

7.  Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

Authors:  Kelvin V Kredens; Juliano V Martins; Osmar B Dordal; Mauri Ferrandin; Roberto H Herai; Edson E Scalabrin; Bráulio C Ávila
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

8.  Algorithms designed for compressed-gene-data transformation among gene banks with different references.

Authors:  Qiuming Luo; Chao Guo; Yi Jun Zhang; Ye Cai; Gang Liu
Journal:  BMC Bioinformatics       Date:  2018-06-18       Impact factor: 3.169

9.  HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data.

Authors:  Haichang Yao; Yimu Ji; Kui Li; Shangdong Liu; Jing He; Ruchuan Wang
Journal:  Biomed Res Int       Date:  2019-11-16       Impact factor: 3.411

10.  Sketch distance-based clustering of chromosomes for large genome database compression.

Authors:  Tao Tang; Yuansheng Liu; Buzhong Zhang; Benyue Su; Jinyan Li
Journal:  BMC Genomics       Date:  2019-12-30       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.