Lukasz Roguski1, Sebastian Deorowicz1. 1. Polish-Japanese Institute of Information Technology, 02-008 Warszawa and Institute of Informatics, Silesian University of Technology, 44-100 Gliwice, Poland.
Abstract
SUMMARY: Modern sequencing platforms produce huge amounts of data. Archiving them raises major problems but is crucial for reproducibility of results, one of the most fundamental principles of science. The widely used gzip compressor, used for reduction of storage and transfer costs, is not a perfect solution, so a few specialized FASTQ compressors were proposed recently. Unfortunately, they are often impractical because of slow processing, lack of support for some variants of FASTQ files or instability. We propose DSRC 2 that offers compression ratios comparable with the best existing solutions, while being a few times faster and more flexible. AVAILABILITY AND IMPLEMENTATION: DSRC 2 is freely available at http://sun.aei.polsl.pl/dsrc. The package contains command-line compressor, C and Python libraries for easy integration with existing software and technical documentation with examples of usage. CONTACT: sebastian.deorowicz@polsl.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Modern sequencing platforms produce huge amounts of data. Archiving them raises major problems but is crucial for reproducibility of results, one of the most fundamental principles of science. The widely used gzip compressor, used for reduction of storage and transfer costs, is not a perfect solution, so a few specialized FASTQ compressors were proposed recently. Unfortunately, they are often impractical because of slow processing, lack of support for some variants of FASTQ files or instability. We propose DSRC 2 that offers compression ratios comparable with the best existing solutions, while being a few times faster and more flexible. AVAILABILITY AND IMPLEMENTATION:DSRC 2 is freely available at http://sun.aei.polsl.pl/dsrc. The package contains command-line compressor, C and Python libraries for easy integration with existing software and technical documentation with examples of usage. CONTACT: sebastian.deorowicz@polsl.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Ibrahim Numanagić; James K Bonfield; Faraz Hach; Jan Voges; Jörn Ostermann; Claudio Alberti; Marco Mattavelli; S Cenk Sahinalp Journal: Nat Methods Date: 2016-10-24 Impact factor: 28.547