Literature DB >> 29028894

AQUa: an adaptive framework for compression of sequencing quality scores with random access functionality.

Tom Paridaens1, Glenn Van Wallendael1, Wesley De Neve1,2,3, Peter Lambert1.   

Abstract

Motivation: The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores.
Results: This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49% when comparing with GNU Gzip and by up to 6.48% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47%. However, for one test file, the file size is 0.38% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee. Availability and implementation: The software is available on Github: https://github.com/tparidae/AQUa. Contact: tom.paridaens@ugent.be.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2018        PMID: 29028894     DOI: 10.1093/bioinformatics/btx607

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  3 in total

1.  FCLQC: fast and concurrent lossless quality scores compressor.

Authors:  Minhyeok Cho; Albert No
Journal:  BMC Bioinformatics       Date:  2021-12-20       Impact factor: 3.169

2.  CMIC: an efficient quality score compressor with random access functionality.

Authors:  Hansen Chen; Jianhua Chen; Zhiwen Lu; Rongshu Wang
Journal:  BMC Bioinformatics       Date:  2022-07-23       Impact factor: 3.307

3.  LCQS: an efficient lossless compression tool of quality scores with random access functionality.

Authors:  Jiabing Fu; Bixin Ke; Shoubin Dong
Journal:  BMC Bioinformatics       Date:  2020-03-18       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.