Literature DB >> 21896510

Robust relative compression of genomes with random access.

Sebastian Deorowicz1, Szymon Grabowski.   

Abstract

MOTIVATION: Storing, transferring and maintaining genomic databases becomes a major challenge because of the rapid technology progress in DNA sequencing and correspondingly growing pace at which the sequencing data are being produced. Efficient compression, with support for extraction of arbitrary snippets of any sequence, is the key to maintaining those huge amounts of data.
RESULTS: We present an LZ77-style compression scheme for relative compression of multiple genomes of the same species. While the solution bears similarity to known algorithms, it offers significantly higher compression ratios at compression speed over an order of magnitude greater. In particular, 69 differentially encoded human genomes are compressed over 400 times at fast compression, or even 1000 times at slower compression (the reference genome itself needs much more space). Adding fast random access to text snippets decreases the ratio to ~300. AVAILABILITY: GDC is available at http://sun.aei.polsl.pl/gdc. CONTACT: sebastian.deorowicz@polsl.pl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2011        PMID: 21896510     DOI: 10.1093/bioinformatics/btr505

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  23 in total

1.  Compressive genomics.

Authors:  Po-Ru Loh; Michael Baym; Bonnie Berger
Journal:  Nat Biotechnol       Date:  2012-07-10       Impact factor: 54.908

2.  Comment on: 'ERGC: an efficient referential genome compression algorithm'.

Authors:  Sebastian Deorowicz; Szymon Grabowski; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2015-11-28       Impact factor: 6.937

3.  ERGC: an efficient referential genome compression algorithm.

Authors:  Subrata Saha; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-07-02       Impact factor: 6.937

4.  NRGC: a novel referential genome compression algorithm.

Authors:  Subrata Saha; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2016-08-02       Impact factor: 6.937

5.  iDoComp: a compression scheme for assembled genomes.

Authors:  Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2014-10-24       Impact factor: 6.937

6.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

7.  Adaptive efficient compression of genomes.

Authors:  Sebastian Wandelt; Ulf Leser
Journal:  Algorithms Mol Biol       Date:  2012-11-12       Impact factor: 1.405

Review 8.  Searching and Indexing Genomic Databases via Kernelization.

Authors:  Travis Gagie; Simon J Puglisi
Journal:  Front Bioeng Biotechnol       Date:  2015-02-09

9.  GDC 2: Compression of large collections of genomes.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Marcin Niemiec
Journal:  Sci Rep       Date:  2015-06-25       Impact factor: 4.379

10.  Constructing small genome graphs via string compression.

Authors:  Yutong Qiu; Carl Kingsford
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.