| Literature DB >> 25176282 |
Miriam Leeser1, Saoni Mukherjee, James Brock.
Abstract
BACKGROUND: Biomedical image reconstruction applications require producing high fidelity images in or close to real-time. We have implemented reconstruction of three dimensional conebeam computed tomography(CBCT) with two dimensional projections. The algorithm takes slices of the target, weights and filters them to backproject the data, then creates the final 3D volume. We have implemented the algorithm using several hardware and software approaches and taken advantage of different types of parallelism in modern processors. The two hardware platforms used are a Central Processing Unit (CPU) and a heterogeneous system with a combination of CPU and GPU. On the CPU we implement serial MATLAB, parallel MATLAB, C and parallel C with OpenMP extensions. These codes are compared against the heterogeneous versions written in CUDA-C and OpenCL.Entities:
Mesh:
Year: 2014 PMID: 25176282 PMCID: PMC4167268 DOI: 10.1186/1756-0500-7-582
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Co-ordinate system for backprojection.
Figure 2Overview of serial CPU implementation and the implementation that makes use of a combination of CPU and GPU.
Hardware details
| Processor | Clock | Number of | Cache | Memory |
|---|---|---|---|---|
| speed | cores | size | size | |
| Intel Xeon | 2.00 GHz | 6 | 15 MB | 32 GB |
| E5-2620 | ||||
| NVIDIA Tesla | 1.15 GHz | 448 | 768 KB | 6 GB |
| C2075 | ||||
| AMD Radeon | 925 MHz | 2048 | 768 KB | 3 GB |
| HD 7970 |
Figure 3A projection of the mathematical phantom (left) and the mouse phantom (right).
Figure 4Comparison of results obtained using C, OpenCL and CUDA-C run on phantom data.
Performance of different implementations (in seconds)
| Dataset | Approach | Backprojection time | Total time | Speedup over MATLAB | Speedup over C |
|---|---|---|---|---|---|
| Phantom | MATLAB | 51.06 | 51.11 | – | – |
| Phantom | C | 3.93 | 3.95 | 12.94 | – |
| Phantom | C + OpenMP (4 threads) | 0.85 | 0.89 | 57.43 | 4.44 |
| Phantom | OpenCL (NVIDIA) | 0.01 | 0.30 | 170.37 | 13.17 |
| Phantom | CUDA (NVIDIA) | 0.01 | 0.30 | 170.37 | 13.17 |
| Phantom | OpenCL (AMD) | 0.01 | 0.32 | 159.72 | 12.34 |
| Mouse scan | MATLAB | 33760.40 | 33777.33 | – | – |
| Mouse scan | MATLAB PCT | 22506.49 | 22513.90 | 1.5 | – |
| Mouse scan | C | 18451.77 | 18462.60 | 1.83 | – |
| Mouse scan | C + OpenMP | 5112.94 | 5615.65 | 6.01 | 3.29 |
| Mouse scan | OpenCL (NVIDIA) | 49.44 | 60.45 | 558.76 | 305.42 |
| Mouse scan | CUDA (NVIDIA) | 47.79 | 58.87 | 573.76 | 313.62 |
| Mouse scan | OpenCL(AMD) | 16.01 | 28.02 | 1205.47 | 658.91 |
Figure 5Runtimes of different implementations applied to phantom data (top), mouse data (middle) and mouse data for each implementation component (bottom).