| Literature DB >> 31830243 |
Jan Voges1, Tom Paridaens2, Fabian Müntefering1, Liudmila S Mainzer3,4, Brian Bliss3, Mingyu Yang5, Idoia Ochoa5, Jan Fostier2, Jörn Ostermann1, Mikel Hernaez4.
Abstract
MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data.Entities:
Year: 2020 PMID: 31830243 PMCID: PMC7141842 DOI: 10.1093/bioinformatics/btz922
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Rank of compression performance (L) and speed (R). Dots were jittered for clarity. The x-axes show the test set ID (01 and 02) from which the descriptor stream files were generated. The y-axes denote the actual ranks. Each dot depicts the ranking a codec achieved on one specific descriptor stream file. The red lines denote the mean ranks, averaged over both test items. (Color version of this figure is available at Bioinformatics online.)
Compressed sizes (in bytes) for different codec sets applied to the CRAM and DeeZ descriptor streams
| Uncompressed streams size | CRAM | CRAM+GABAC | CRAM+GABAC−gzip−rANS-0 | ||
|---|---|---|---|---|---|
| CRAM streams | 01 | 2 196 327 888 | 412 453 039 |
|
|
| 02 | 1 991 360 268 | 406 796 124 |
| 404 080 825 | |
| DeeZ streams | 01 | 2 838 530 295 | 538 155 557 |
|
|
| 02 | 2 555 531 164 | 566 444 432 |
| 564 864 140 |
Note: Best results in bold.