| Literature DB >> 25520552 |
Shuang Wang1, Xiaoqian Jiang1, Feng Chen2, Lijuan Cui2, Samuel Cheng2.
Abstract
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).Entities:
Keywords: distributed source coding; genome compression; graphical model
Year: 2014 PMID: 25520552 PMCID: PMC4256044 DOI: 10.4137/CIN.S13879
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Workflow of genome compression based on DSC.
Figure 2Workflow of genome compression based on DSC.
Figure 3The diagram of hash-based coding.
Figure 4Factor graph of genome compression based on DSC.
Figure 5The empirical statistics of (A) the DNA bases {“A”, “T”, “G”, “C”, “N”} and these of (B) the local offsets with the range from −4 to 4.
Figure 6Compression performance of the proposed codec on TAIR dataset, (A) the average code rates vs. the different maximum local offsets in syndrome coding; (B) the overall compression performance (ie, hash bits + syndromes) for all 5 chromosomes.
Figure 7Performance comparison between GRS and our proposed codec on TAIR dataset.
Performances of our proposed method on chromosome 4 of TIGR5 (35.8 MB).
| LDPC CODE LENGTH | COMPRESSION SIZE (KB) | ENCODING TIME (SECONDS) | DECODING TIME (SECONDS) |
|---|---|---|---|
| 528 | 3.68 | 0.04 | 298 |
| 1056 | 4.58 | 0.009 | 596 |
| 1584 | 5.67 | 0.01 | 787 |
| 2112 | 6.97 | 0.01 | 1102 |
| 2640 | 6.72 | 0.08 | 1374 |
Performance of GRS on chromosome 4 of TIGR5 (35.8 MB).
| COMPRESSION SIZE (KB) | ENCODING TIME (SECONDS) | DECODING TIME (SECONDS) |
|---|---|---|
| 26.34 | 12 | 6 |