| Literature DB >> 30799504 |
Kirill Kryukov1, Mahoko Takahashi Ueda2, So Nakagawa1,2, Tadashi Imanishi1.
Abstract
SUMMARY: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)-a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd.Entities:
Year: 2019 PMID: 30799504 PMCID: PMC6761962 DOI: 10.1093/bioinformatics/btz144
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Compression strength and decompression speed of eight compressors. Human genome (GRCh38, 3.3 GB) was used as test dataset. ‘mfc’ and ‘dlim’ represent MFCompress and DELIMINATE, respectively. Each compressor was used with its strongest compression setting: ‘gzip -9’, ‘bzip2 -9’, ‘brotli -11’, ‘zstd –ultra -22’, ‘xz -e9’, ‘ennaf -22’, ‘delim a’, ‘MFCompressC -3’. CPU used: Intel Xeon E5-2643v3 (3.4 GHz)