Literature DB >> 27776113

Comparison of high-throughput sequencing data compression tools.

Ibrahim Numanagić1, James K Bonfield2, Faraz Hach1,3, Jan Voges4, Jörn Ostermann4, Claudio Alberti5, Marco Mattavelli5, S Cenk Sahinalp1,3,6.   

Abstract

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

Entities:  

Mesh:

Year:  2016        PMID: 27776113     DOI: 10.1038/nmeth.4037

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


  22 in total

1.  Aligned genomic data compression via improved modeling.

Authors:  Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  J Bioinform Comput Biol       Date:  2014-12       Impact factor: 1.122

2.  A FASTQ compressor based on integer-mapped k-mer indexing for biologist.

Authors:  Yeting Zhang; Khyati Patel; Tony Endrawis; Autumn Bowers; Yazhou Sun
Journal:  Gene       Date:  2015-12-30       Impact factor: 3.688

3.  Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.

Authors:  Anthony J Cox; Markus J Bauer; Tobias Jakobi; Giovanna Rosone
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

4.  LFQC: a lossless compression algorithm for FASTQ files.

Authors:  Marius Nicolae; Sudipta Pathak; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-06-20       Impact factor: 6.937

5.  Disk-based compression of data from genome sequencing.

Authors:  Szymon Grabowski; Sebastian Deorowicz; Łukasz Roguski
Journal:  Bioinformatics       Date:  2014-12-22       Impact factor: 6.937

6.  FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets.

Authors:  Anirban Dutta; Mohammed Monzoorul Haque; Tungadri Bose; C V S K Reddy; Sharmila S Mande
Journal:  J Bioinform Comput Biol       Date:  2015-02-08       Impact factor: 1.122

7.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

8.  Compression of next-generation sequencing reads aided by highly efficient de novo assembly.

Authors:  Daniel C Jones; Walter L Ruzzo; Xinxia Peng; Michael G Katze
Journal:  Nucleic Acids Res       Date:  2012-08-16       Impact factor: 16.971

9.  Light-weight reference-based compression of FASTQ data.

Authors:  Yongpeng Zhang; Linsen Li; Yanli Yang; Xiao Yang; Shan He; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2015-06-09       Impact factor: 3.169

10.  The Scramble conversion tool.

Authors:  James K Bonfield
Journal:  Bioinformatics       Date:  2014-06-14       Impact factor: 6.937

View more
  21 in total

1.  SPRING: a next-generation compressor for FASTQ data.

Authors:  Shubham Chandak; Kedar Tatwawadi; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal:  Bioinformatics       Date:  2019-08-01       Impact factor: 6.937

2.  CALQ: compression of quality values of aligned sequencing data.

Authors:  Jan Voges; Jörn Ostermann; Mikel Hernaez
Journal:  Bioinformatics       Date:  2018-05-15       Impact factor: 6.937

3.  Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.

Authors:  Shubham Chandak; Kedar Tatwawadi; Tsachy Weissman
Journal:  Bioinformatics       Date:  2018-02-15       Impact factor: 6.937

4.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

5.  Representation of k-Mer Sets Using Spectrum-Preserving String Sets.

Authors:  Amatur Rahman; Paul Medevedev
Journal:  J Comput Biol       Date:  2020-12-07       Impact factor: 1.479

6.  LW-FQZip 2: a parallelized reference-based compression of FASTQ files.

Authors:  Zhi-An Huang; Zhenkun Wen; Qingjin Deng; Ying Chu; Yiwen Sun; Zexuan Zhu
Journal:  BMC Bioinformatics       Date:  2017-03-20       Impact factor: 3.169

7.  Optimal compressed representation of high throughput sequence data via light assembly.

Authors:  Antonio A Ginart; Joseph Hui; Kaiyuan Zhu; Ibrahim Numanagić; Thomas A Courtade; S Cenk Sahinalp; David N Tse
Journal:  Nat Commun       Date:  2018-02-08       Impact factor: 14.919

8.  BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness.

Authors:  Oriol Mazariegos-Canellas; Trien Do; Tim Peto; David W Eyre; Anthony Underwood; Derrick Crook; David H Wyllie
Journal:  BMC Bioinformatics       Date:  2017-11-13       Impact factor: 3.169

9.  Disk compression of k-mer sets.

Authors:  Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal:  Algorithms Mol Biol       Date:  2021-06-21       Impact factor: 1.405

10.  ARSDA: A New Approach for Storing, Transmitting and Analyzing Transcriptomic Data.

Authors:  Xuhua Xia
Journal:  G3 (Bethesda)       Date:  2017-12-04       Impact factor: 3.154

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.