| Literature DB >> 26105146 |
Renan Sales Barros1,2, Silvia Delgado Olabarriaga3, Jordi Borst4, Marianne A A van Walderveen5, Jorrit S Posthuma6, Geert J Streekstra6,4, Marcel van Herk6,7, Charles B L M Majoie4, Henk A Marquering6,4.
Abstract
The increasing size of medical imaging data, in particular time series such as CT perfusion (CTP), requires new and fast approaches to deliver timely results for acute care. Cloud architectures based on graphics processing units (GPUs) can provide the processing capacity required for delivering fast results. However, the size of CTP datasets makes transfers to cloud infrastructures time-consuming and therefore not suitable in acute situations. To reduce this transfer time, this work proposes a fast and lossless compression algorithm for CTP data. The algorithm exploits redundancies in the temporal dimension and keeps random read-only access to the image elements directly from the compressed data on the GPU. To the best of our knowledge, this is the first work to present a GPU-ready method for medical image compression with random access to the image elements from the compressed data.Entities:
Keywords: Acute care; CT perfusion; GPU; Lossless compression; Parallel processing
Mesh:
Year: 2015 PMID: 26105146 PMCID: PMC4799275 DOI: 10.1007/s11517-015-1331-6
Source DB: PubMed Journal: Med Biol Eng Comput ISSN: 0140-0118 Impact factor: 2.602
Fig. 1CTP data processing pipeline in a GPU-based cloud infrastructure: the CTP data are produced at the scanner (A), compressed in a terminal (B), sent to the GPU-based cloud infrastructure (C). While being processed, the CTP data can be transferred several times between host application memory (D) and GPU memory (E)
Fig. 2Sample slice of CTP data at the time step 12, and the intensity values of the voxels at and over time. The intensities values at are not strongly affected by the contrast agent, and the intensities values at are strongly affected by contrast agent
Fig. 3Number of bits required to represent the variation of voxel intensities over time in the selected slice. The effect of motion artifacts is visible, and for this reason, a higher amount of bits is required to represent the area around the skull. Nevertheless, this higher amount of bits (9–11 bits) is considerably smaller than the original 16 bits that are used by the uncompressed data. Furthermore, the motion affects only a small portion of the image. Only 6 % of the voxels require more than eight bits to represent their intensities variation over time
Fig. 4Data structures used in the implementation. B is a constant size array of 8-bit elements that stores the amount of bits used to encode the intensity values of a voxel. C is a constant size array of 16-bit elements used to store all the values. D is a constant size array of 32-bit elements used to store all the D sets. O is an offset to determine where a set D begins in the array D
Hardware configuration used to execute the compression methods evaluated in our experiments
| CPU name | Intel Xeon E5-2620 |
| CPU clock | 2.00 GHz |
| CPU cores | 6 |
| CPU threads | 12 |
| RAM memory | 64 GB |
| GPU name | GeForce GTX TITAN |
| GPU driver version | 331.65 |
| GPU cores | 2688 |
| GPU clock | 836 MHz |
| Dedicated video memory | 6 GB GDDR5 |
Compression time, reading time, and compression ratio for 20 datasets (mean ± SD [min., max.]) using different compression methods. The best results are underlined
| Compression method | Compression time (ms) | Reading time (ms) | Compression ratio |
|---|---|---|---|
| JPEG LS | 09911 ± 0398 [08879, 10806] | 58267 ± 2546 [49924, 62052] | 4.64 ± 0.29 [4.14, 5.55] |
| JPEG | 14552 ± 0742 [12234, 16095] | 43443 ± 1791 [37033, 44997] | 2.09 ± 0.16 [2.74, 3.55] |
| RLE | 09679 ± 0947 [08286, 11110] | 15554 ± 0634 [13468, 16669] | 2.31 ± 0.10 [2.12, 2.66] |
| DICOPP CPU | 20350 ± 2602 [14157, 24239] | 0.15 ± 0.36 [0, 1] | 2.20 ± 0.17 [1.95, 2.75] |
| DICOPP CPU PR | 17718 ± 1413 [14934, 20712] | ||
| DICOPP GPU | 05944 ± 0711 [04826, 07873] |
Fig. 5Maximum, mean, and minimum times (vertical axis) spent to compress 20 CTP datasets by using different number of threads (horizontal axis)
Total transfer time (in s) for 20 datasets compressed by different methods and using different network speeds (mean ± SD [min., max.])
| OC-3/STM-1 (s) | OC-12/STM-4 (s) | 1000BASE-T (s) | OC-48/STM-16 (s) | |
|---|---|---|---|---|
| Original Data | 207.82 | 51.79 | 32.21 | 13.42 |
| JPEG LS | 113 ± 5.3 [096, 122] | 79 ± 3.2 [68, 84] | 75 ± 3.2 [64, 79] | 71 ± 3.0 [61, 75] |
| JPEG | 127 ± 5.4 [107, 134] | 75 ± 3.0 [63, 78] | 68 ± 2.7 [58, 71] | 62 ± 2.5 [53, 64] |
| RLE | 115 ± 4.7 [100, 123] | 47 ± 1.9 [41, 50] | 39 ± 1.6 [34, 41] | 31 ± 1.4 [27, 33] |
| DICOPP CPU | 115 ± 7.9 [089, 127] | 44 ± 3.5 [32, 48] | 35 ± 3.0 [25, 38] | 26 ± 2.7 [19, 30] |
| DICOPP CPU PR | 112 ± 7.4 [090, 123] | 41 ± 2.5 [33, 45] | 45] 32 ± 2.0 [26, 36] | 36] 23 ± 1.6 [19, 27] |
| DICOPP GPU | 100 ± 6.9 [080. 111] | 29 ± 1.9 [23, 32] | 20 ± 1.3 [16, 22] | 12 ± 0.8 [09, 14] |
The total transfer time is the sum of the compression time, time to transfer the compressed data, and the decompression time. The first row shows the transfer times of an uncompressed CTP dataset. The data transfer times were calculated based on the theoretical transfer rate of the network standards: OC-3/STM-1 [16], OC-12/STM-4 [16], 1000BASE-T [28], and OC-48/STM-16 [16]. Respectively, these transfer rates are: 155, 622, 1000, and 2400 Mbps. The best results are underlined