Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparison of high-throughput sequencing data compression tools.

Literature DB >> 27776113

Comparison of high-throughput sequencing data compression tools.

Ibrahim Numanagić¹, James K Bonfield², Faraz Hach^1,3, Jan Voges⁴, Jörn Ostermann⁴, Claudio Alberti⁵, Marco Mattavelli⁵, S Cenk Sahinalp^1,3,6.

Abstract

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

Entities: Disease

Mesh：

Year: 2016 PMID： 27776113 DOI： 10.1038/nmeth.4037

Source DB: PubMed Journal: Nat Methods ISSN： 1548-7091 Impact factor: 28.547

22 in total

1. Aligned genomic data compression via improved modeling.

Authors: Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal: J Bioinform Comput Biol Date: 2014-12 Impact factor: 1.122

2. A FASTQ compressor based on integer-mapped k-mer indexing for biologist.

Authors: Yeting Zhang; Khyati Patel; Tony Endrawis; Autumn Bowers; Yazhou Sun
Journal: Gene Date: 2015-12-30 Impact factor: 3.688

3. Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.

Authors: Anthony J Cox; Markus J Bauer; Tobias Jakobi; Giovanna Rosone
Journal: Bioinformatics Date: 2012-05-03 Impact factor: 6.937

4. LFQC: a lossless compression algorithm for FASTQ files.

Authors: Marius Nicolae; Sudipta Pathak; Sanguthevar Rajasekaran
Journal: Bioinformatics Date: 2015-06-20 Impact factor: 6.937

5. Disk-based compression of data from genome sequencing.

Authors: Szymon Grabowski; Sebastian Deorowicz; Łukasz Roguski
Journal: Bioinformatics Date: 2014-12-22 Impact factor: 6.937

6. FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets.

Authors: Anirban Dutta; Mohammed Monzoorul Haque; Tungadri Bose; C V S K Reddy; Sharmila S Mande
Journal: J Bioinform Comput Biol Date: 2015-02-08 Impact factor: 1.122

7. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

8. Compression of next-generation sequencing reads aided by highly efficient de novo assembly.

Authors: Daniel C Jones; Walter L Ruzzo; Xinxia Peng; Michael G Katze
Journal: Nucleic Acids Res Date: 2012-08-16 Impact factor: 16.971

9. Light-weight reference-based compression of FASTQ data.

Authors: Yongpeng Zhang; Linsen Li; Yanli Yang; Xiao Yang; Shan He; Zexuan Zhu
Journal: BMC Bioinformatics Date: 2015-06-09 Impact factor: 3.169

10. The Scramble conversion tool.

Authors: James K Bonfield
Journal: Bioinformatics Date: 2014-06-14 Impact factor: 6.937

21 in total

1. SPRING: a next-generation compressor for FASTQ data.

Authors: Shubham Chandak; Kedar Tatwawadi; Idoia Ochoa; Mikel Hernaez; Tsachy Weissman
Journal: Bioinformatics Date: 2019-08-01 Impact factor: 6.937

2. CALQ: compression of quality values of aligned sequencing data.

Authors: Jan Voges; Jörn Ostermann; Mikel Hernaez
Journal: Bioinformatics Date: 2018-05-15 Impact factor: 6.937

3. Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.

Authors: Shubham Chandak; Kedar Tatwawadi; Tsachy Weissman
Journal: Bioinformatics Date: 2018-02-15 Impact factor: 6.937

4. SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors: Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal: Bioinformatics Date: 2022-05-18 Impact factor: 6.931

5. Representation of k-Mer Sets Using Spectrum-Preserving String Sets.

Authors: Amatur Rahman; Paul Medevedev
Journal: J Comput Biol Date: 2020-12-07 Impact factor: 1.479

6. LW-FQZip 2: a parallelized reference-based compression of FASTQ files.

Authors: Zhi-An Huang; Zhenkun Wen; Qingjin Deng; Ying Chu; Yiwen Sun; Zexuan Zhu
Journal: BMC Bioinformatics Date: 2017-03-20 Impact factor: 3.169

7. Optimal compressed representation of high throughput sequence data via light assembly.

Authors: Antonio A Ginart; Joseph Hui; Kaiyuan Zhu; Ibrahim Numanagić; Thomas A Courtade; S Cenk Sahinalp; David N Tse
Journal: Nat Commun Date: 2018-02-08 Impact factor: 14.919

8. BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness.

Authors: Oriol Mazariegos-Canellas; Trien Do; Tim Peto; David W Eyre; Anthony Underwood; Derrick Crook; David H Wyllie
Journal: BMC Bioinformatics Date: 2017-11-13 Impact factor: 3.169

9. Disk compression of k-mer sets.

Authors: Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal: Algorithms Mol Biol Date: 2021-06-21 Impact factor: 1.405

10. ARSDA: A New Approach for Storing, Transmitting and Analyzing Transcriptomic Data.

Authors: Xuhua Xia
Journal: G3 (Bethesda) Date: 2017-12-04 Impact factor: 3.154