Literature DB >> 33266904

Nonrigid Medical Image Registration Using an Information Theoretic Measure Based on Arimoto Entropy with Gradient Distributions.

Bicao Li¹, Huazhong Shu², Zhoufeng Liu¹, Zhuhong Shao³, Chunlei Li¹, Min Huang⁴, Jie Huang¹.

Abstract

This paper introduces a new nonrigid registration approach for medical images applying an information theoretic measure based on Arimoto entropy with gradient distributions. A normalized dissimilarity measure based on Arimoto entropy is presented, which is employed to measure the independence between two images. In addition, a regularization term is integrated into the cost function to obtain the smooth elastic deformation. To take the spatial information between voxels into account, the distance of gradient distributions is constructed. The goal of nonrigid alignment is to find the optimal solution of a cost function including a dissimilarity measure, a regularization term, and a distance term between the gradient distributions of two images to be registered, which would achieve a minimum value when two misaligned images are perfectly registered using limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization scheme. To evaluate the test results of our presented algorithm in non-rigid medical image registration, experiments on simulated three-dimension (3D) brain magnetic resonance imaging (MR) images, real 3D thoracic computed tomography (CT) volumes and 3D cardiac CT volumes were carried out on elastix package. Comparison studies including mutual information (MI) and the approach without considering spatial information were conducted. These results demonstrate a slight improvement in accuracy of non-rigid registration.

Entities: CellLine Chemical Disease Gene Species

Keywords: Arimoto entropy; free-form deformations; gradient distributions; non-rigid registration; nonextensive entropy; normalized divergence measure

Year: 2019 PMID： 33266904 PMCID： PMC7514671 DOI： 10.3390/e21020189

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Volume registration is an essential task of image processing, especially in medical field, such as aiding diagnosis, surgical applications, and image-guided radiation therapy [1,2]. The images to be registered are generally obtained at different times and from different imaging sensors, namely, multi-modality imaging. Different medical imaging modalities could provide various and complementary information. For instance, CT and MRI display the anatomic structures of an organ, while positron emission tomography (PET) and single photon emission computed tomography (SPECT) provide the functional and metabolic information. Therefore, in clinic, these multi-modal volumes are often registered and fused together, in this way, much complementary information derived from different modalities are supplied with the physicians to improve the diagnosis accuracy and assessment efficiency of lesion progression. Image registration is related by the process to find the optimal mapping function between two images to be aligned [3]. In recent years, the registration algorithms using the similarity based on information theory have been attracted more and more attention in medical image registration, among which maximization of MI was early reported for registration of medical images from different modalities by Collignon et al. [4], Maes et al. [5], Wells et al. [6]. Studholme et al. [7] studied a normalized similarity measure called normalized MI (NMI) to tackle the problem of changing field of view (FOV). MI and NMI are both estimated by the probability distributions of images to be registered. Besides, the concept of cumulative probability distributions was introduced into image registration, and cumulative residual entropy (CRE) was investigated in [8]. Additionally, the relations between CRE and Shannon entropy were ulteriorly researched in [9], and a comparison with MI was reported in [10]. In this new measure, cumulative density functions (CDF) instead of probability density functions (PDF) were adopted to calculate values of the similarity measure, which illustrates a good robustness to noise. The aforementioned measures based on information theory—such as MI, NMI, and CRE—are constructed by Shannon entropy. Nonetheless, the additivity of Shannon entropy signifies that it is an extensive entropy. However, Antolin et al. [11] pointed out that the extensive entropy does not consider the correlation of two variables. Consequently, they presented a similarity measure exploiting Tsallis entropy. Subsequently, this new divergence was employed to construct a non-rigid registration model for medical images [12,13]. Illuminated by the reference [11], a divergence measure based on Arimoto entropy was presented called the Jensen–Arimoto divergence (JAD) [14]. In [15], some properties, such as the concavity of Arimoto entropy and the boundedness of JAD, have been further investigated. However, the registration method based on JAD does not take the spatial information between voxels into account, which is of significance to medical image registration. This paper aims to present a novel nonrigid registration method of medical images adopting a normalized measure based on Arimoto entropy and gradient distributions. Firstly, the properties of Arimoto entropy and JAD are analyzed, and a distance of gradient distributions is constructed. Secondly, a nonrigid deformation model is chosen and the registration process is formulated by an optimization procedure. In the sequel, the continuous probability distributions are estimated using Parzen window method applying B-splines and the analytical gradient of objective function can be obtained. Finally, the L-BFGS optimization [16] is adopted to obtain the optimal deformation parameters. To assess the performance of our registration framework for medical images, several groups of non-rigid experiments on simulated volumes and real 3D data are implemented. Our contributions are twofold. Firstly, the related measures based on Shannon entropy do not consider the correlation. Therefore, we present a normalized measure based on Arimoto entropy, a non-extensive entropy, as the dissimilarity measure. Secondly, in the existing measures, such as MI, NMI and JAD, the intensity values are directly exploited to calculate the similarity measure, while the spatial information has been not considered. To take the spatial information between voxels into account, a distance term of gradient distributions is constructed and incorporated into the objective function. The rest of this work is arranged as follows. In Section 2, the knowledge of information theory was firstly reviewed, and then introduce the Arimoto entropy and JAD measure, constructing gradient distribution distance. We formulate the model of nonrigid registration and give the detailed description of the registration method in Section 3 adopting a normalized measure based on Arimoto entropy with gradient distribution. Section 4 demonstrates the nonrigid test results on 3D MR volumes and real 3D clinical datasets, with the compared results to other registration algorithms illustrated. Finally, we provide the conclusions and perspectives in Section 5.

2. Preliminaries

For this section, we briefly review the theoretical concept of information theory, and then introduce the Arimoto entropy and JAD, along with studying their properties. In addition, the gradient distributions of reference image and float image are constructed and a distance of them is derived.

2.1. Shannon Entropy and Mutual Information

For an arbitrary random variable X(x1, x2,…, x), with its probability distributions p(x1, x2,…, x), the Shannon entropy of X is used to measure the amount of average uncertainty included in this random variable, which is also employed to measure the account of information provided by this random variable. Then, considering another random variable Y, H(X|Y) is remarked as the conditional entropy of X when Y is known. The reduction in uncertainty due to Y is called the MI. MI of X and Y is defined by where p(x) and p(y) denote marginal probability of two random variables, as well as p(x, y) being the joint distribution of them. MI is applied as a measure of the dependence between X and Y. MI is symmetric in X and Y and always nonnegative [17].

2.2. Arimoto Entropy

Arimoto [18] introduced a generalized form of Shannon entropy, Arimoto entropy is defined by Boekee et al. [19] investigated some significant properties of Arimoto entropy, here, we only exhibit several useful properties as follows. Non-negativity: Pseudo-additivity: Concavity: Symmetry: Upper bound: See [19] for the detailed proof of these properties. Property one ensures that Arimoto entropy is non-negative. Its pseudo-addivity illustrates that Arimoto entropy accounts for a non-extensive entropy. The parameter α in (3) accounts for the degree of non-extensivity.

2.3. Jensen Arimoto Divergence

For a random variable X, and is probability distributions on X. JAD is defined to be [15] where denotes Arimoto entropy and is a weighted vector to constrain and . In the following, we review the properties of JAD. JAD has these properties of non-negativity, symmetry. Also, JAD is identical to 0 when and only when all of the probability distributions are the same as each other. The proof has been reported in [ Li et al. pointed that a distance must fulfill four requirements [20]. JAD does not meet the triangle inequality, so JAD is not a true distance metric. Nonetheless, JAD can still be adopted to measure the disparity among the probability distributions of two random variables. The JA divergence has the maximum when According to Proposition 1 and 2, it is obviously observed that JAD is bounded, .

2.4. Gradient Distributions Distance

Given the reference image R and float image F, ∇F, and ∇R represent the gradients of F and R, along with q(∇F) and p(∇R) representing the gradient distributions of F and R. Kullback–Leibler divergence (KLD) can be used to calculate the distance of q(∇F) and p(∇R) as where x denotes any point of image gradient, x = [x, y, z]T. KLD is also known as relative entropy, measuring the diversity of two probability distributions. In (10), we use the convention that 0·log(0/0) = 0. In other word, when R and F are completely registered, KLD between gradient distributions defined in (10) achieves the minimum value. Considering the spatial transformation T, μ denoted by the parameters of transformation model, the gradient distribution distance of F and the transformed F is given by where ∇F(T(x)) and ∇R(x) represent the gradients of transformed float image and reference image, and d is the dimension of the images to be registered. Subsequently, the Parzen method is employed to estimate the gradient distributions. Denote β(0) and β(3) by zero-order and three-order B-spline, respectively. In the process of image registration, the transformation parameters μ does not affect the reference image, the gradient of reference image is also constant in registration process. Consequently, the gradient of reference image can be calculated before registration to improve the implementation efficiency. In addition, a zero-order B-spline is exploited to estimate the gradient distributions ∇R(x) of reference image R(x), with the probability density function of ∇R(x) defined by where Ω is volume domain to estimate probabilities. V denotes the number of voxels of Ω domain, as well as d being the image dimension. r represents intensity levels of ∇R(x), and ‘∇’ is the gradient operator. The three-order B-spline is adopted to compute the gradient distributions ∇F(T(x)) of transformed float image, with the probability density function of ∇F(T(x)) shown as In Equations (12) and (13), Δb and Δb are the widths of bins. and is the minimum values in two images, and f represents intensity levels of ∇F(T(x)). Substituting (12) and (13) into (11), we can obtain the following formula. where H is the Shannon entropy. In this paper, KLD of gradient distributions will be applied as a distance term in medical image registration, regularizing that the gradient distribution of float image is similar to the gradient distribution of reference image.

3. Description of Proposed Nonrigid Registration Method

The details of the registration method using a normalized information theoretic measure based on Arimoto entropy with gradient distributions is described. Firstly, an appropriate transformation model needs to be selected, where the registration criteria (objective function or cost function) and optimization algorithm is applied to optimize the criteria. Finally, the images to be registered are aligned using an optimal solution obtained by the optimization scheme. Figure 1 displayed the block diagram of our nonrigid registration framework.

Figure 1

Block diagram of our registration algorithm.

3.1. Formulation

Figure 2a,b depict the corresponding planes in two 3D MR images, which account for T1-weighted MR and T2-weighted MR. Non-rigid registration is formulated as the process of searching for the optimal spatial deformation function of reference image R and float image F as where g(x; μ) is the deformation function, μ denotes the vector of deformation parameters, with x and y being the coordinates of arbitrary point in R and F, respectively. We can formulate the non-rigid registration of F to R as where D represents a dissimilarity measure that can achieves its minimum in registration of R(x) and F(g(x; μ)).

Figure 2

(a) MR T1 image; (b) MR T2 image; (c) deformation field; (d) deformation vector.

However, the process of image registration is an ill-posed issue, and a penalty term need to be incorporated to obtain a smooth transformation. Considering the regularization term, the cost function E is expressed by

3.2. Transformation Model

Clinically, there exist some large deformations between medical images. Therefore, a nonrigid transformation is generally employed to deal with organ or tissue deformation. Figure 2c,d display the deformation fields and vectors of the nonrigid transformation between Figure 2a,b. The free-form deformations (FFD) model can preferably described local deformations between medical images. Therefore, cubic B-splines are employed to constructed FFD model [22] to simulate this elastic deformation. In a 3D image, Φ is a mesh with the size of n × n × n, as well as the control points [ω, ω, ω], and δ is the size of spacing. Then, 3D deformation function of x = [x, y, z] could be defined as where μ is the vector of deformation coefficients, and β(3) is the third-order B spine function.

3.3. Registration Criteria

The dissimilarity measure D is the most significant part of objective functions, which is used to measure the difference between two images to be registered. When D achieve the minimum value, the similarity of two volumes is maximized, resulting in the two volumes completely registered. We have employed JAD as similarity measure to register medical images in the presence of non-rigid transformation, in which a negative sign is assigned to the JAD to construct the dissimilarity measure [15]. In this paper, according to Proposition 2 in Section 2.3, a normalized dissimilarity measure based on JAD is introduced. Its definition is given as In [15], JAD is expressed by where f = (f1, f2, …, f) and r = (r1, r2, …, r) are the intensity values in F(g(x; μ)) and R(x). Also, M is bins number. Also, p(f|r) is the conditional probability. JAD and A(ω) are substituted into (19). In consequence, the dissimilarity measure D in (19) is rewritten as where M is the number of bins. To address the problem of nonrigid registration, the smooth deformation needs to be acquired by regularizing the deformation model. Incorporated with the regularization term, the objective function E is rewritten by where D is the dissimilarity measure defined in (21), and S is the regularization term, with its expression given as follows. However, objective function shown in (22) does not take into account the spatial information between voxels. To deal with the issue, the distance between two gradient distributions q(∇F(g(x; μ))) and p(∇R(x)) displayed in (14) is introduced to (22). As a result, the nonrigid registration process is expressed by where KLD represents gradient distribution distance, as well as λ1 and λ2 being weight parameters, balancing the tradeoff among a dissimilarity measure D, a regularization S, and a distance term KLD.

3.4. Optimization

Newton–Raphson algorithm has been widely exploited, in which second-order derivatives can show better convergence [23] compared with these strategies based on first-order gradient. L-BFGS [24] does not calculate second-order information. Thus, a high computation efficiency can be achieved. A second-order Taylor approximation [16] of E with respect to μ is given as where Δμ is the increment of μ, ∇ is gradient operation. The deformation parameter μ of the L-BFGS optimization algorithm is updated as In the sequel, the derivative of objective function E with respect to μ need to computed. The pseudo code of our registration approach is displayed in Algorithm 1. To solve the optimization process, we need calculate the analytical gradient of the objective function E. Traditionally, the probability distributions expressed in (21) was not continuous. Hence, the continuous probability density function (pdf) needs to be estimated by Parzen-window method. The continuous marginal and joint pdfs of two images to be registered have been calculated [15]. The continuous expression of gradient distribution distance has been also provided in Section 2.4. Equalization (18) is substituted into (23), the continuity of smoothness term is acquired. Consequently, the objective function E is continuous and its analytical derivative with respect to μ can be calculated.

Derivative of the Objective Function

The objective function defined in (24) includes dissimilarity measure D, a regularization term S, and a distance term KLD. The derivative of D is deduced as According to (20), we obtain where represents derivative of estimated joint probability. The derivative of is calculated by with β(0) and β′(3) being the zero-order B-Splines and the derivative of the three-order B-Splines, respectively. R0 and F0 are the minimal intensities in R(x) and F(g(x; μ)), as well as being the gradient of the deformed float image F(g(x; μ)), can be estimated by FFD model. To obtain the derivative of the penalty term S, we rewrite (23) as where x represents the points in image region, and V denotes the number of pixels. The derivative of S has been provided by Staring and Klein [25], where denotes Hessian matrix of deformation function g(x; μ), is the Jacobi of Hessian matrix. Next, we calculate the derivative of KLD, where and represent the gradient distributions of the transforms float image and reference image, respectively. The derivative of is calculated as where β′(3) is derivative of three-order B-spline, and represents the second-order gradient of F(g(x; μ)), as well as being the derivative of deformation function g(x; μ) with respect to the parameter μ. Substituting (12), (13), and (35) into (34), we can obtain the derivative of gradient distribution distance. In the terms of (29), (33), and (34), the derivative of E will be easily calculated.

4. Experiments and Results

To evaluate the registration method using the normalized JAD with gradient distribution (NJAD-GD), we designed several groups of tests and performed on simulated and real 3D data, respectively. In Section 4.1, the experimental data is depicted, including simulated and real medical images. The non-rigid registration of simulated MR volumes is performed in Section 4.2. The tests on real 3D thoracic CT images and 3D cardiac data are implemented, and the experimental results are shown in Section 4.3 and Section 4.4, respectively. Our nonrigid registration algorithm employing JAD and gradient distributions was implemented in the elastix package [26].

4.1. Experimental Data

In this paper, simulated brain MR volumes, thoracic CT volumes and real 3D cardiac CT images were exploited as experimental data. The detailed descriptions of brain MR and 3D thoracic CT images have been reported in [15]. Additionally, non-rigid tests were also performed on twelve 4D cardiac CT sequences acquired from twelve patients. Each of 4D CT sequence consists of 10 3D cardiac CT images, which were obtained from one whole cardiac cycle of one patient. These CT images have 256 × 256 pixels along axial direction. Figure 3 exhibits 10 3D cardiac CT volumes of one 4D CT sequence. It is obviously observed that some elastic deformations are existed between 10 images.

Figure 3

The axis slice of 10 3D cardiac CT images in one 4D sequence. (a–j) represent the 10 frames acquired from one whole cardiac cycle of one patient.

4.2. Nonrigid Registration of Simulated Brain Images

Simulated brain volumes were firstly used to design the elastic alignment experiments. Furthermore, we employ a multiresolution hierarchical strategy with three levels to carry out these non-rigid tests. Also, a comparison with JAD without gradient distribution and MI is also reported. We selected 60 warping indexes (see parameters m of the warping function in [15]), which were yielded randomly from the interval [1,7]. Consequently, 60 float images were produced based on the 60 deformations for each pair of test volumes and 540 nonrigid trials of three pairs of brain MR volumes for NJAD-GD, JAD, and MI algorithms in total. To assess quantitatively test results of these trails, we exploit registration error as the evaluation standard. Here, the registration error is defined as the difference of true values that can be calculated by warping indexes and the obtained values by optimization strategy. In the registration trails of brain images, the involved parameters are set as follows: the nonextensive parameter α = 1.5, bins M = 16, the number of random samples N = 2000, δ = 20 × 20 × 20. The weighting parameters λ1 = 0.005 and λ2 = 0.001 can provide a good tradeoff among three terms: D, S, and gradient distribution term KLD in the objective function E. Figure 4 shows the test results of all 540 non-rigid registrations. From Figure 4, the NJAD-GD registration algorithm could result in the lower errors of three pairs of test volumes compared to other two approaches.

Figure 4

The registration results of the simulated 3D brain MR T1 & MR T2, MR T1 & MR PD, and MR T2 & MR PD volumes using three algorithms. The red color crosses for each box represents these outliers.

4.3. Experiments of 3D Thoracic CT Images

3D thoracic CT volumes were chosen as the test images to carry out non-rigid registrations. These volumes consist of four 4D sequences, and each of them includes 10 3D volumes. A three-level implementation scheme was still employed to decrease registration accuracy and improve computation efficiency. We denote the 10 volumes from each 4D sequence by T00-T90, in which the maximal inhalation and maximum exhalation are included, with indicated by T10 and T60, respectively. Then, we designed the following experiments: in each 4D sequence, the T60 frame is applied as reference image and the residual nine frames are chosen as float image, leading to nine non-rigid tests. Hence, 36 trails of elastic alignments were yielded in total for four 4D CT sequences. We also compared the results adopting NJAD-GD and JAD without considering spatial information. Finally, 72 elastic registration tests were conducted for two methods. In order to quantify the test errors, target registration error (TRE) and Hausdorff distance meansure (HDM) were calculated. HDM is a widely-used measure to calculate the distance of two clouds of points. In the 3D thoracic CT registration, the manually marked landmarks can be applied to calculate HDM of two images. Figure 5 and Figure 6 demonstrate the registration results of 72 tests, along with TREs before registration and after alignment. Figure 7 illustrates the box-and-whisker plots of HDM values of four 4D CT sequences. It is observed from these results that the registration errors applying NJAD-GD algorithm are less than these obtained by the method based on JAD without gradient distribution.

Figure 5

The TREs obtained when employing NJAD-GD algorithm, the registration method based on JAD without gradient distribution.

Figure 6

Statistics of TREs before registration and after alignment exploiting the NJAD-GD, JAD methods.

Figure 7

HDMs obtained when employing NJAD-GD algorithm, the registration method based on JAD without gradient distribution.

In implementation of experiments, the nonextensive parameter α = 1.5, bins M = 16, the number of random samples N = 8000, the spacing of mesh points δ = 20 × 20 × 20. The weighting parameters λ1 = 0.005 and λ2 = 0.001.

4.4. Registration of 3D Cardiac CT Image

The cardiac CT data consists of 12 groups of 4D image sequence, and each of which includes 10 3D images acquired from one whole cardiac cycle. One cardiac cycle consists of the phase of systole and phase of diastole. In each 4D CT sequence, two 3D images with the maximum deformation were employed as the test images, and 12 nonrigid registration experiments were carried out adopting NJAD-GD approach. Figure 8 illustrates the checkboard of 12 examples adopting our non-rigid framework (α = 1.50) for 12 4D CT sequences. As it can be seen, the registration algorithm based on gradient distribution demonstrates the accuracy results.

Figure 8

Registration results of 12 groups of 3D cardiac images. (a–l) display the test results of patient 1 to 12, respectively. In each group, left image represents the checkboard before registration, and the right accounts for the result after registration.

In these tests, a multiresolution scheme with three levels was also exploited to implement these nonrigid registrations. Due to the large deformation, the number of random samples N and the spacing of mesh points δ were set to 10,000 and 10 × 10 × 10, with other parameters being as follows: bins M = 16, the weighting parameters λ1 = 0.005 and λ2 = 0.001, the maximum number of iterations of the limited memory BFGS scheme NMAX = 200.

5. Conclusions

In this work, we review the definition and properties of Arimoto entropy, with an information measure based on Arimoto entropy, called JAD. The gradient distributions of reference image and float image are constructed and a distance between them is derived. Additionally, a normalized dissimilarity measure based on JAD was presented. A nonrigid registration method exploiting the normalized measure with gradient distributions is proposed. Arimoto entropy is regarded as a generalized form of the classical Shannon entropy. In the aforementioned section, it is proofed that the JAD measure is equal to MI when α tends to 1. We adopted FFDs as the parameter space for non-rigid registration, along with objective function E including three elements: the normalized JAD as the dissimilarity measure, a regularization to acquire the smooth deformation and a distance term of the gradient distributions.

11 in total

Nonrigid Medical Image Registration Using an Information Theoretic Measure Based on Arimoto Entropy with Gradient Distributions.

1. Introduction

2. Preliminaries

2.1. Shannon Entropy and Mutual Information

2.2. Arimoto Entropy

2.3. Jensen Arimoto Divergence

2.4. Gradient Distributions Distance

3. Description of Proposed Nonrigid Registration Method

3.1. Formulation

3.2. Transformation Model

3.3. Registration Criteria

3.4. Optimization

Derivative of the Objective Function

4. Experiments and Results

4.1. Experimental Data

4.2. Nonrigid Registration of Simulated Brain Images

4.3. Experiments of 3D Thoracic CT Images

4.4. Registration of 3D Cardiac CT Image

5. Conclusions

1. PET-CT image registration in the chest using free-form deformations.

2. Non-Rigid Multi-Modal Image Registration Using Cross-Cumulative Residual Entropy.

3. Jensen-Tsallis divergence and atomic dissimilarity for position and momentum space electron densities.

4. 3D nonrigid medical image registration using a new information theoretic measure.

5. elastix: a toolbox for intensity-based medical image registration.

6. Multi-modal volume registration by maximization of mutual information.

7. Nonrigid image registration using an entropic similarity.

8. Multimodality image registration by maximization of mutual information.

Review 9. A review of recent developments in image-guided radiation therapy in cervix cancer.

Review 10. A Review on Medical Image Registration as an Optimization Problem.