| Literature DB >> 25309619 |
Seongjin Park1, Jeongjin Lee2, Hyunna Lee3, Juneseuk Shin4, Jinwook Seo5, Kyoung Ho Lee6, Yeong-Gil Shin5, Bohyoung Kim7.
Abstract
This paper presents a novel method for parallelizing the seeded region growing (SRG) algorithm using Compute Unified Device Architecture (CUDA) technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.Entities:
Mesh:
Year: 2014 PMID: 25309619 PMCID: PMC4189527 DOI: 10.1155/2014/856453
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Thread and memory hierarchy of CUDA architecture.
Figure 2Loading of the original volume data and mask volume data from CPU main memory to CUDA global memory. GOV is GPU original volume, CMV is CPU mask volume, TMV is threshold mask volume, and RMV is region mask volume.
Pseudocode 1Pseudocode of 3D region growing. num_updated is the number of updated voxels in the current iteration.
Synthetic datasets.
| Segmented-region size (Mvoxels) | Cube | Cylinder | Sphere | |
|---|---|---|---|---|
| Side length (pixels) | Height (pixels) | Radius (pixels) | Radius (pixels) | |
| 10 | 219 | 238 | 119 | 136 |
| 20 | 276 | 298 | 149 | 171 |
| 30 | 316 | 342 | 172 | 196 |
| 40 | 347 | 376 | 188 | 215 |
| 50 | 374 | 406 | 203 | 232 |
| 60 | 398 | 432 | 216 | 247 |
Figure 3Computation time results for the (a) cube, (b) cylinder, and (c) sphere datasets.
Figure 4Computation time ratios of single-core and quad-core CPU implementations to that of CUDA implementation for the cube dataset.
Patient datasets.
| Patient data | Number of slice images | Segmented-region size (Mvoxels) |
|---|---|---|
| Lung | 358.8 ± 14.6 (332, 376) | 13.6 ± 1.8 (11.2, 16.5) |
| Colon | 507.2 ± 31.5 (468, 580) | 7.0 ± 3.1 (2.9, 13.2) |
Note: data are means ± SD (minimum and maximum range) for 10 CT scans. The resolution of each CT image is 512 × 512.
Figure 5Volume-rendered images of a segmented (a) lung and (b) colon.
| Patient data | Computation time (s)a | |||
|---|---|---|---|---|
| CUDA | HLSL | Quad-core | Single-core | |
| Lung | 0.6 ± 0.1 (222.8 ± 22.2) | 4.5 ± 1.3 (145.1 ± 52.6) | 8.7 ± 2.8 (n/a) | 19.3 ± 2.4 (n/a) |
| Colon | 1.3 ± 0.4 (461.1 ± 189.3) | 13.2 ± 4.0 (337.1 ± 98.0) | 4.2 ± 1.8 (n/a) | 7.4 ± 3.9 (n/a) |
| Patient data |
| ||||||
|---|---|---|---|---|---|---|---|
| Overall | CUDA versus HLSL | CUDA versus quad-core CPUs | CUDA versus single-core CPUs | HLSL versus quad-core CPUs | HLSL versus single-core CPUs | Quad-core CPUs versus | |
| Lung | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 |
| Colon | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 |
Note: adata are means ± SD of the computation time (mean ± SD of the number of iterations) for 10 CT scans.