Literature DB >> 22171329

Transformations for the compression of FASTQ quality scores of next-generation sequencing data.

Raymond Wan1, Vo Ngoc Anh, Kiyoshi Asai.   

Abstract

MOTIVATION: The growth of next-generation sequencing means that more effective and efficient archiving methods are needed to store the generated data for public dissemination and in anticipation of more mature analytical methods later. This article examines methods for compressing the quality score component of the data to partly address this problem.
RESULTS: We compare several compression policies for quality scores, in terms of both compression effectiveness and overall efficiency. The policies employ lossy and lossless transformations with one of several coding schemes. Experiments show that both lossy and lossless transformations are useful, and that simple coding methods, which consume less computing resources, are highly competitive, especially when random access to reads is needed.
AVAILABILITY AND IMPLEMENTATION: Our C++ implementation, released under the Lesser General Public License, is available for download at http://www.cb.k.u-tokyo.ac.jp/asailab/members/rwan. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2011        PMID: 22171329     DOI: 10.1093/bioinformatics/btr689

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  Using Genome Query Language to uncover genetic variation.

Authors:  Christos Kozanitis; Andrew Heiberg; George Varghese; Vineet Bafna
Journal:  Bioinformatics       Date:  2013-06-10       Impact factor: 6.937

2.  LFQC: a lossless compression algorithm for FASTQ files.

Authors:  Marius Nicolae; Sudipta Pathak; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-06-20       Impact factor: 6.937

3.  QVZ: lossy compression of quality values.

Authors:  Greg Malysa; Mikel Hernaez; Idoia Ochoa; Milind Rao; Karthik Ganesan; Tsachy Weissman
Journal:  Bioinformatics       Date:  2015-05-28       Impact factor: 6.937

4.  SCALCE: boosting sequence compression algorithms using locally consistent encoding.

Authors:  Faraz Hach; Ibrahim Numanagic; Can Alkan; S Cenk Sahinalp
Journal:  Bioinformatics       Date:  2012-10-09       Impact factor: 6.937

5.  Effect of lossy compression of quality scores on variant calling.

Authors:  Idoia Ochoa; Mikel Hernaez; Rachel Goldfeder; Tsachy Weissman; Euan Ashley
Journal:  Brief Bioinform       Date:  2017-03-01       Impact factor: 11.622

6.  Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification.

Authors:  Y William Yu; Deniz Yorukoglu; Bonnie Berger
Journal:  Res Comput Mol Biol       Date:  2014-04

7.  Piecewise polynomial representations of genomic tracks.

Authors:  Maxime Tarabichi; Vincent Detours; Tomasz Konopka
Journal:  PLoS One       Date:  2012-11-15       Impact factor: 3.240

8.  QualComp: a new lossy compressor for quality scores based on rate distortion theory.

Authors:  Idoia Ochoa; Himanshu Asnani; Dinesh Bharadia; Mainak Chowdhury; Tsachy Weissman; Golan Yona
Journal:  BMC Bioinformatics       Date:  2013-06-08       Impact factor: 3.169

9.  NGC: lossless and lossy compression of aligned high-throughput sequencing data.

Authors:  Niko Popitsch; Arndt von Haeseler
Journal:  Nucleic Acids Res       Date:  2012-10-12       Impact factor: 16.971

10.  Compression of FASTQ and SAM format sequencing data.

Authors:  James K Bonfield; Matthew V Mahoney
Journal:  PLoS One       Date:  2013-03-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.