Literature DB >> 24747219

DSRC 2--Industry-oriented compression of FASTQ files.

Lukasz Roguski1, Sebastian Deorowicz1.   

Abstract

SUMMARY: Modern sequencing platforms produce huge amounts of data. Archiving them raises major problems but is crucial for reproducibility of results, one of the most fundamental principles of science. The widely used gzip compressor, used for reduction of storage and transfer costs, is not a perfect solution, so a few specialized FASTQ compressors were proposed recently. Unfortunately, they are often impractical because of slow processing, lack of support for some variants of FASTQ files or instability. We propose DSRC 2 that offers compression ratios comparable with the best existing solutions, while being a few times faster and more flexible.
AVAILABILITY AND IMPLEMENTATION: DSRC 2 is freely available at http://sun.aei.polsl.pl/dsrc. The package contains command-line compressor, C and Python libraries for easy integration with existing software and technical documentation with examples of usage. CONTACT: sebastian.deorowicz@polsl.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 24747219     DOI: 10.1093/bioinformatics/btu208

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  19 in total

1.  LFQC: a lossless compression algorithm for FASTQ files.

Authors:  Marius Nicolae; Sudipta Pathak; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-06-20       Impact factor: 6.937

2.  QVZ: lossy compression of quality values.

Authors:  Greg Malysa; Mikel Hernaez; Idoia Ochoa; Milind Rao; Karthik Ganesan; Tsachy Weissman
Journal:  Bioinformatics       Date:  2015-05-28       Impact factor: 6.937

3.  SPRING: a next-generation compressor for FASTQ data.

Authors:  Shubham Chandak; Kedar Tatwawadi; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2019-08-01       Impact factor: 6.937

4.  Comparison of high-throughput sequencing data compression tools.

Authors:  Ibrahim Numanagić; James K Bonfield; Faraz Hach; Jan Voges; Jörn Ostermann; Claudio Alberti; Marco Mattavelli; S Cenk Sahinalp
Journal:  Nat Methods       Date:  2016-10-24       Impact factor: 28.547

5.  Effect of lossy compression of quality scores on variant calling.

Authors:  Idoia Ochoa; Mikel Hernaez; Rachel Goldfeder; Tsachy Weissman; Euan Ashley
Journal:  Brief Bioinform       Date:  2017-03-01       Impact factor: 11.622

6.  A cluster-based approach to compression of Quality Scores.

Authors:  Mikel Hernaez; Idoia Ochoa; Tsachy Weissman
Journal:  Proc Data Compress Conf       Date:  2016-12-19

7.  CoLoRd: compressing long reads.

Authors:  Marek Kokot; Adam Gudyś; Heng Li; Sebastian Deorowicz
Journal:  Nat Methods       Date:  2022-03-28       Impact factor: 47.990

8.  GDC 2: Compression of large collections of genomes.

Authors:  Sebastian Deorowicz; Agnieszka Danek; Marcin Niemiec
Journal:  Sci Rep       Date:  2015-06-25       Impact factor: 4.379

9.  Tentacle: distributed quantification of genes in metagenomes.

Authors:  Fredrik Boulund; Anders Sjögren; Erik Kristiansson
Journal:  Gigascience       Date:  2015-09-07       Impact factor: 6.524

10.  Light-weight reference-based compression of FASTQ data.

Authors:  Yongpeng Zhang; Linsen Li; Yanli Yang; Xiao Yang; Shan He; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2015-06-09       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.