Literature DB >> 25790783

FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets.

Anirban Dutta1, Mohammed Monzoorul Haque, Tungadri Bose, C V S K Reddy, Sharmila S Mande.   

Abstract

Sequence data repositories archive and disseminate fastq data in compressed format. In spite of having relatively lower compression efficiency, data repositories continue to prefer GZIP over available specialized fastq compression algorithms. Ease of deployment, high processing speed and portability are the reasons for this preference. This study presents FQC, a fastq compression method that, in addition to providing significantly higher compression gains over GZIP, incorporates features necessary for universal adoption by data repositories/end-users. This study also proposes a novel archival strategy which allows sequence repositories to simultaneously store and disseminate lossless as well as (multiple) lossy variants of fastq files, without necessitating any additional storage requirements. For academic users, Linux, Windows, and Mac implementations (both 32 and 64-bit) of FQC are freely available for download at: https://metagenomics.atc.tcs.com/compression/FQC .

Keywords:  Data compaction and compression; NGS data; algorithms for biological data management; sequencing data archival

Mesh:

Year:  2015        PMID: 25790783     DOI: 10.1142/S0219720015410036

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  2 in total

1.  FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

Authors:  Tungadri Bose; Anirban Dutta; Mohammed Mh; Hemang Gandhi; Sharmila S Mande
Journal:  J Biosci       Date:  2015-09       Impact factor: 1.826

2.  Comparison of high-throughput sequencing data compression tools.

Authors:  Ibrahim Numanagić; James K Bonfield; Faraz Hach; Jan Voges; Jörn Ostermann; Claudio Alberti; Marco Mattavelli; S Cenk Sahinalp
Journal:  Nat Methods       Date:  2016-10-24       Impact factor: 28.547

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.