| Literature DB >> 24778613 |
Yixun Liu1, Andriy Kot2, Fotis Drakopoulos2, Chengjun Yao3, Andriy Fedorov4, Andinet Enquobahrie5, Olivier Clatz6, Nikos P Chrisochoides2.
Abstract
As part of the ITK v4 project efforts, we have developed ITK filters for physics-based non-rigid registration (PBNRR), which satisfies the following requirements: account for tissue properties in the registration, improve accuracy compared to rigid registration, and reduce execution time using GPU and multi-core accelerators. The implementation has three main components: (1) Feature Point Selection, (2) Block Matching (mapped to both multi-core and GPU processors), and (3) a Robust Finite Element Solver. The use of multi-core and GPU accelerators in ITK v4 provides substantial performance improvements. For example, for the non-rigid registration of brain MRIs, the performance of the block matching filter on average is about 10 times faster when 12 hyperthreaded multi-cores are used and about 83 times faster when the NVIDIA Tesla GPU is used in Dell Workstation.Entities:
Keywords: GPU; ITK; block matching; finite element; image-guided neurosurgery; non-rigid registration
Year: 2014 PMID: 24778613 PMCID: PMC3985035 DOI: 10.3389/fninf.2014.00033
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1(A) Block matching. For a small block in the floating image, find its corresponding block in a predefined search window in the reference or fixed image, then the displacement associated with the block can be calculated. The block can be specified in both floating and reference images depending on the application. (B) Block matching results. The arrow points to the direction of the displacement and the color scale encodes the magnitude of the displacement. The metric is NCC and for clarity, only 1% of the dense displacement field is shown.
Figure 2The main filter PhysicsBasedNonRigidRegistrationMethod. This filter takes the fixed, moving, and mask images as the necessary inputs (solid line); takes the mesh as the optional input (dashed line); and outputs a deformation fieldImage/deformed moving image. Figures 3A,B elaborate on the two highlighted compononets.
Figure 3(A) The flow chart of one thread/kernel of block matching. (B) The flow chart of RobustSolver. RobustSolver includes two parts: outlier rejection and approximation to interpolation. Outlier rejection proceeds as a LTS regression (Liu et al., 2009): resolve U first, then detect outliers, remove outliers, and resolve U again. The F is used to reset the strain energy to enable the mesh to be deformed further. The difference between the two parts is the absence of outlier rejection in the approximation to interpolation part. RobustSolver supports both VNL solver and Itpack solver to resolve the linear system of equations. Compared to VNL solver, Itpacks runs faster, which is the default LS solver in RobustSolver.
Figure 4Synthetic evaluation of FEMScatteredDataPointSetToImageFilter. (A) the undeformed lung image, (B) the deformed lung image according to (C), (C) the deformation field image (ground truth), (D) the estimated deformation field image, (E) the checkboard before regsitation, (F) the checkboard after registration. The red bounding box highlights the region with significant improvement of the accuracy after registration.
Patient information of five cases from SPL of Harvard medical school.
| 1 | F | R occipital | Anaplastic oligodendroglioma WHO III/IV |
| 2 | F | L posterior temporal | Glioblastoma WHO IV |
| 3 | N/A | R frontal | Oligodendroglioma WHO II/IV |
| 4 | N/A | R occipital | N/A |
| 5 | F | R frontal | Oligoastrocytoma WHO II/IV |
The registration accuracy evaluated by HD and landmarks for five cases.
| 1 | 25.980 (12.874) | 20.099 (8.522) | 25.199 (10.853) | 22.6 (33.8) | 3.0 (15.7) |
| 2 | 9.110 (7.490) | 4.690 (2.073) | 9.695 (6.539) | 48.5 (72.3) | −6.4 (12.7) |
| 3 | 9.433 (5.542) | 5.385 (2.768) | 8.124 (4.922) | 42.9 (50.1) | 13.9 (11.2) |
| 4 | 9.695 (5.881) | 7.000 (4.002) | 9.434 (5.306) | 27.8 (32.0) | 2.7 (9.8) |
| 5 | 6.708 (4.773) | 4.123 (2.128) | 7.141 (3.020) | 38.5 (55.4) | −6.5 (36.7) |
The landmark evaluation results are listed in the parenthesis. A BSpline based non-rigid registration in 3DSlicer served as the comparison with the PBNRR. The parameters for PBNRR for all cases are: Block radius: [1,1,1], Window radius: [5,5,5], Selection fraction: 0.05, Rejection fraction: 0.25, Num of outlier rejection steps: 10, Num of approximation steps: 10, Young modulus: 694 Pa, Poisson's ratio: 0.45. The parameters for BSpline based registration are: Iteration: 20, Grid size: 18, Histogram bins: 100, Spatial samples: 50,000. Registration unit: mm, improvement unit: %.
Figure 5The Qualitative results for the five cases of the PBNRR filter. Each column corresponds to a different case, and each row from the top to the bottom: the preoperative MRI, the intra-operative MRI, and the warped preoperative MRI using PBNRR and the warped preoperative MRI using BSpline based NRR.
The running time (second) of five cases for 3 workstations.
| 1 | 54.53 | 37.73 | 33.50 | 54.40 | 37.83 | 33.62 | 136.72 | 116.25 | 105.46 |
| 2 | 60.36 | 41.49 | 37.60 | 59.72 | 41.44 | 37.57 | 155.70 | 126.70 | 120.95 |
| 3 | 52.19 | 35.79 | 32.25 | 52.45 | 35.90 | 32.38 | 131.51 | 111.05 | 102.54 |
| 4 | 65.14 | 44.60 | 40.24 | 65.54 | 45.60 | 40.75 | 173.15 | 145.60 | 135.79 |
| 5 | 52.36 | 35.44 | 32.20 | 52.50 | 35.59 | 32.55 | 129.22 | 111.17 | 101.42 |
Dell 1: one Intel ® Core™ i7 CPU 260 @ 2.80 GHz, NVIDIA Quadro 6000 card, and 8 GB RAM. Dell 2: Intel(R) Xeon(R) CPU X5690 @ 3.47 GHz, Quadro 6000, and 96 GB RAM. Cray XK7: one AMD 6276 Interlagos Processor with 8 Bulldozer cores, 32 GB RAM, and NVIDIA Tesla K20X.
Block matching running time (second) using CPU and GPU.
| 1 | 19.06 | 1.83 | 0.49 | 31.29 | 3.32 | 0.37 |
| 2 | 21.08 | 1.96 | 0.54 | 34.57 | 3.58 | 0.41 |
| 3 | 18.88 | 1.96 | 0.50 | 30.98 | 3.27 | 0.37 |
| 4 | 23.43 | 2.65 | 0.60 | 38.19 | 3.96 | 0.45 |
| 5 | 18.97 | 1.77 | 0.48 | 28.13 | 3.29 | 0.37 |
| Average speedup | 10.07 | 38.85 | 9.35 | 82.70 | ||
The speedup is with respect to 1 thread.