Literature DB >> 29617939

FaStore: a space-saving solution for raw sequencing data.

Lukasz Roguski1,2, Idoia Ochoa3, Mikel Hernaez4, Sebastian Deorowicz5.   

Abstract

Motivation: The affordability of DNA sequencing has led to the generation of unprecedented volumes of raw sequencing data. These data must be stored, processed and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. FaStore does not use any reference sequences for compression and permits the user to choose from several lossy modes to improve the overall compression ratio, depending on the specific needs.
Results: FaStore in the lossless mode achieves a significant improvement in compression ratio with respect to previously proposed algorithms. We perform an analysis on the effect that the different lossy modes have on variant calling, the most widely used application for clinical decision making, especially important in the era of precision medicine. We show that lossy compression can offer significant compression gains, while preserving the essential genomic information and without affecting the variant calling performance. Availability and implementation: FaStore can be downloaded from https://github.com/refresh-bio/FaStore. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2018        PMID: 29617939     DOI: 10.1093/bioinformatics/bty205

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

1.  SPRING: a next-generation compressor for FASTQ data.

Authors:  Shubham Chandak; Kedar Tatwawadi; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2019-08-01       Impact factor: 6.937

2.  CoLoRd: compressing long reads.

Authors:  Marek Kokot; Adam Gudyś; Heng Li; Sebastian Deorowicz
Journal:  Nat Methods       Date:  2022-03-28       Impact factor: 47.990

3.  Crumble: reference free lossy compression of sequence quality values.

Authors:  James K Bonfield; Shane A McCarthy; Richard Durbin
Journal:  Bioinformatics       Date:  2019-01-15       Impact factor: 6.937

4.  LFastqC: A lossless non-reference-based FASTQ compressor.

Authors:  Sultan Al Yami; Chun-Hsi Huang
Journal:  PLoS One       Date:  2019-11-14       Impact factor: 3.240

5.  Productive visualization of high-throughput sequencing data using the SeqCode open portable platform.

Authors:  Enrique Blanco; Mar González-Ramírez; Luciano Di Croce
Journal:  Sci Rep       Date:  2021-10-01       Impact factor: 4.379

6.  BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs.

Authors:  Rongjie Wang; Junyi Li; Yang Bai; Tianyi Zang; Yadong Wang
Journal:  PeerJ       Date:  2018-10-19       Impact factor: 2.984

7.  Better quality score compression through sequence-based quality smoothing.

Authors:  Yoshihiro Shibuya; Matteo Comin
Journal:  BMC Bioinformatics       Date:  2019-11-22       Impact factor: 3.169

8.  Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression.

Authors:  Yuansheng Liu; Jinyan Li
Journal:  PLoS Comput Biol       Date:  2021-07-19       Impact factor: 4.475

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.