| Literature DB >> 25790783 |
Anirban Dutta1, Mohammed Monzoorul Haque, Tungadri Bose, C V S K Reddy, Sharmila S Mande.
Abstract
Sequence data repositories archive and disseminate fastq data in compressed format. In spite of having relatively lower compression efficiency, data repositories continue to prefer GZIP over available specialized fastq compression algorithms. Ease of deployment, high processing speed and portability are the reasons for this preference. This study presents FQC, a fastq compression method that, in addition to providing significantly higher compression gains over GZIP, incorporates features necessary for universal adoption by data repositories/end-users. This study also proposes a novel archival strategy which allows sequence repositories to simultaneously store and disseminate lossless as well as (multiple) lossy variants of fastq files, without necessitating any additional storage requirements. For academic users, Linux, Windows, and Mac implementations (both 32 and 64-bit) of FQC are freely available for download at: https://metagenomics.atc.tcs.com/compression/FQC .Keywords: Data compaction and compression; NGS data; algorithms for biological data management; sequencing data archival
Mesh:
Year: 2015 PMID: 25790783 DOI: 10.1142/S0219720015410036
Source DB: PubMed Journal: J Bioinform Comput Biol ISSN: 0219-7200 Impact factor: 1.122