Literature DB >> 21252073

Compression of DNA sequence reads in FASTQ format.

Sebastian Deorowicz1, Szymon Grabowski.   

Abstract

MOTIVATION: Modern sequencing instruments are able to generate at least hundreds of millions short reads of genomic data. Those huge volumes of data require effective means to store them, provide quick access to any record and enable fast decompression.
RESULTS: We present a specialized compression algorithm for genomic data in FASTQ format which dominates its competitor, G-SQZ, as is shown on a number of datasets from the 1000 Genomes Project (www.1000genomes.org). AVAILABILITY: DSRC is freely available at http:/sun.aei.polsl.pl/dsrc.

Entities:  

Mesh:

Year:  2011        PMID: 21252073     DOI: 10.1093/bioinformatics/btr014

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

Authors:  Tungadri Bose; Anirban Dutta; Mohammed Mh; Hemang Gandhi; Sharmila S Mande
Journal:  J Biosci       Date:  2015-09       Impact factor: 1.826

2.  LFQC: a lossless compression algorithm for FASTQ files.

Authors:  Marius Nicolae; Sudipta Pathak; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-06-20       Impact factor: 6.937

3.  DeeZ: reference-based compression by local assembly.

Authors:  Faraz Hach; Ibrahim Numanagić; S Cenk Sahinalp
Journal:  Nat Methods       Date:  2014-11       Impact factor: 28.547

4.  Quality score compression improves genotyping accuracy.

Authors:  Y William Yu; Deniz Yorukoglu; Jian Peng; Bonnie Berger
Journal:  Nat Biotechnol       Date:  2015-03       Impact factor: 54.908

5.  SCALCE: boosting sequence compression algorithms using locally consistent encoding.

Authors:  Faraz Hach; Ibrahim Numanagic; Can Alkan; S Cenk Sahinalp
Journal:  Bioinformatics       Date:  2012-10-09       Impact factor: 6.937

6.  Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification.

Authors:  Y William Yu; Deniz Yorukoglu; Bonnie Berger
Journal:  Res Comput Mol Biol       Date:  2014-04

7.  CALQ: compression of quality values of aligned sequencing data.

Authors:  Jan Voges; Jörn Ostermann; Mikel Hernaez
Journal:  Bioinformatics       Date:  2018-05-15       Impact factor: 6.937

Review 8.  Computational solutions for omics data.

Authors:  Bonnie Berger; Jian Peng; Mona Singh
Journal:  Nat Rev Genet       Date:  2013-05       Impact factor: 53.242

9.  QualComp: a new lossy compressor for quality scores based on rate distortion theory.

Authors:  Idoia Ochoa; Himanshu Asnani; Dinesh Bharadia; Mainak Chowdhury; Tsachy Weissman; Golan Yona
Journal:  BMC Bioinformatics       Date:  2013-06-08       Impact factor: 3.169

Review 10.  Bioinformatics clouds for big data manipulation.

Authors:  Lin Dai; Xin Gao; Yan Guo; Jingfa Xiao; Zhang Zhang
Journal:  Biol Direct       Date:  2012-11-28       Impact factor: 4.540

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.