Geng Chen1, Jian Zhang1,2, Yong Zhang3, Bin Dong4, Dinggang Shen1,5, Pew-Thian Yap1. 1. Department of Radiology and Biomedical Research Imaging Center (BRIC) University of North Carolina, Chapel Hill, United States of America. 2. School of Information and Electrical Engineering, Hunan University of Science & Technology, Xiangtan, China. 3. Vancouver Research Center, Huawei, Burnaby, Canada. 4. Beijing International Center for Mathematical Research, Peking University, Beijing, China. 5. Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea.
Abstract
Diffusion MRI derives its contrast from MR signal attenuation induced by the movement of water molecules in microstructural environments. Associated with the signal attenuation is the reduction of signal-to-noise ratio (SNR). Methods based on total variation (TV) have shown superior performance in image noise reduction. However, TV denoising can result in stair-casing effects due to the inherent piecewise-constant assumption. In this paper, we propose a tight wavelet frame based approach for edge-preserving denoising of diffusion-weighted (DW) images. Specifically, we employ the unitary extension principle (UEP) to generate frames that are discrete analogues to differential operators of various orders, which will help avoid stair-casing effects. Instead of denoising each DW image separately, we collaboratively denoise groups of DW images acquired with adjacent gradient directions. In addition, we introduce a very efficient method for solving an ℓ0 denoising problem that involves only thresholding and solving a trivial inverse problem. We demonstrate the effectiveness of our method qualitatively and quantitatively using synthetic and real data.
Diffusion MRI derives its contrast from MR signal attenuation induced by the movement of water molecules in microstructural environments. Associated with the signal attenuation is the reduction of signal-to-noise ratio (SNR). Methods based on total variation (TV) have shown superior performance in image noise reduction. However, TV denoising can result in stair-casing effects due to the inherent piecewise-constant assumption. In this paper, we propose a tight wavelet frame based approach for edge-preserving denoising of diffusion-weighted (DW) images. Specifically, we employ the unitary extension principle (UEP) to generate frames that are discrete analogues to differential operators of various orders, which will help avoid stair-casing effects. Instead of denoising each DW image separately, we collaboratively denoise groups of DW images acquired with adjacent gradient directions. In addition, we introduce a very efficient method for solving an ℓ0 denoising problem that involves only thresholding and solving a trivial inverse problem. We demonstrate the effectiveness of our method qualitatively and quantitatively using synthetic and real data.
Diffusion MRI affords in vivo insights into brain tissue microstructure and allows reconstruction of white matter pathways for neuroscience studies involving development, aging, and disorders [1-5]. However, since diffusion MRI derives its contrast from MR signal attenuation, it suffers from low signal-to-noise-ratio (SNR), which complicates subsequent quantitative analyses. To improve SNR, multiple repetitive scans are typically acquired and averaged for noise reduction. This however inevitably prolongs acquisition times and is hence prohibitive in clinical settings. Post-acquisition algorithms, such as total variation (TV) denoising [6], have been widely adopted due to their ability to remove noise without requiring additional acquisition time.Diffusion-weighted (DW) images are typically acquired with non-collinear gradient directions. As shown in Fig 1, DW images that are scanned with similar gradient directions share a lot of commonalities. However, these commonalities diminish very quickly if the difference between the gradient directions increases. Denoising performance can be improved by making full use of information between images scanned with similar gradient directions; however, images scanned with very different gradient directions have to be avoided to reduce artifacts. We can also observe from the figure that the DW images are typically very noisy, indicating the great importance of denoising.
Fig 1
Diffusion-weighted images scanned at different gradient directions.
The left and middle images were scanned with similar gradient directions. The right image was scanned at a nearly perpendicular gradient direction with respect to the reference.
Diffusion-weighted images scanned at different gradient directions.
The left and middle images were scanned with similar gradient directions. The right image was scanned at a nearly perpendicular gradient direction with respect to the reference.In this paper, we propose a group ℓ0 minimization denoising framework that utilizes tight wavelet frames and takes advantage of the correlation between DW images scanned with neighboring gradient directions. The power of tight wavelet frames lies in their ability to sparsely approximate piecewise smooth functions and the existence of fast decomposition and reconstruction algorithms associated with them. In contrast, TV based methods are effective on restoring images that are piecewise constant, e.g., binary or cartoon-like images. They will, however, cause staircasing effects in images that are not piecewise constant [6].TV denoising is typically realized by penalizing the ℓ1-norm of image gradients. Instead of ℓ1 regularization, which has been shown in the theory of compressed sensing [7] to produce sparse solutions, we opt to use ℓ0 regularization. In [8], wavelet frame based ℓ0 regularization shows better edge-preserving quality compared with the conventional ℓ1 regularization. In [9], iterative hard threshoding algorithms show better performance than iterative soft thresholding algorithms. Based on these facts, we propose a group version of ℓ0 minimization to take advantage of the correlation between DW images. Extensive experiments were carried out using synthetic data with different levels of noncentral chi (nc-χ) noise and real diffusion MRI data. The experimental results demonstrate that the proposed method outperforms TV denoising and non-local means (NLM) denoising [10]. Part of this work has been presented in a workshop [11]. Herein, we provide additional examples, results, derivations, and insights that are not part of the workshop publication. The rest of the paper is organized as follows: In Approach Section, we will provide detailed descriptions for our method. In Experiments Section, we will demonstrate the effectiveness of our method using extensive experiments on synthetic data and real data. In Discussion Section, we will provide in-depth discussions of our method. Finally, we will conclude this work in Conclusion Section.
Approach
We will provide first a brief introduction to framelets, followed by details on how framelets can be incorporated into an ℓ0 minimization framework for DWI denoising.
Tight framelets
A wavelet system is called a tight wavelet frame of [12] if
where 〈⋅, ⋅〉 is the inner product of . It is clear that an orthonormal basis is a tight frame, since the identity hold for arbitrary orthonormal bases in . When X(Ψ) forms an orthonormal basis of , X(Ψ) is called an orthonormal wavelet basis. Tight frames are generalization of orthonormal bases with greater redundancy—a property central to sparse representation and often desirable in applications such as denoising [13].Given a set of generators , which are desirably (anti)symmetric and compact functions, the corresponding quasi-affine system X(Ψ) from level J is the collection of dilations and shifts of Ψ:
with
When X(Ψ) forms a (tight) frame of , each function ψ, r = 1, …, R, is called a (tight) framelet and the whole system X(Ψ) is called a (tight) wavelet frame system. A tight wavelet frame is also called a Parseval frame. Note that in the literature the affine (or wavelet) system, which corresponds to the decimated wavelet (frame) transforms, is commonly used. The quasi-affine system above, introduced and analyzed in [14], corresponds to the undecimated wavelet (frame) transforms and essentially oversamples the wavelet frame system starting from level J − 1 and downwards. In this paper, we focus on the quasi-affine system because it has been shown to work better in image restoration [12]. We set J = 0 and consider only l < 0.An approach to constructing framelets Ψ is by utilizing multiresolution analysis (MRA) [12]. One starts with a refinable function ϕ with refinement mask satisfying
and , where denotes Fourier transform of ϕ. The key is to find the masks that gives
The finite sequences a1, …, a are called wavelet frame masks, or the high pass filters of the system. The refinement mask a0 is also known as the low pass filter. The two equations above can be combined by defining ψ0 ≔ ϕ. The unitary extension principle (UEP) [14] provides a general theory for constructing MRA-based tight wavelet frames. That is, as long as {a1, …, a} are finitely supported and their Fourier series satisfy
for all ν ∈ {0, π} and ξ ∈ [−π, π], the quasi-affine system X(Ψ) forms a tight frame in .For example, consider the centered B-splines of order p, i.e.,
with j = 0 when p is even and j = 1 when p is odd. The corresponding refinement mask is given as
and the p wavelet masks as
where r = 1, 2, …, p. It is straightforward to show that the UEP conditions (6) are satisfied. Wavelet frame masks for p = 1, 2, 4 are shown in Table 1. It is worth noting that these masks corresponds to differential operators of various orders. For example, for piecewise linear B-spline, the masks a1 and a2 correspond to the first order and second order difference operators respectively up to a scaling factor.
Table 1
Wavelet frame masks.
Piecewise Constant(p = 1)
Piecewise Linear(p = 2)
Piecewise Cubic(p = 4)
a0=12[1,1]
a0=14[1,2,1]
a0=116[1,4,6,4,1]
a1=12[1,-1]
a1=24[1,0,-1]
a1=18[-1,-2,0,2,1]
a2=14[-1,2,-1]
a2=616[1,0,-2,0,1]
a3=18[-1,2,0,-2,1]
a4=116[1,-4,6,-4,1]
When a tight wavelet frame is used, the given data is considered to be sampled as a local average
Noting that [12]
where the dilated sequence is defined as
The decomposition and reconstruction down to level −L [12], i.e.,
where
can be realized with convolution using the masks. Denoting by W the L-level framelet decomposition, i.e.,
with
we have
where * denotes the convolution operator. If we use W⊤ to denote the framelet reconstruction, we have W⊤W = I, i.e., f = W⊤Wf.Given a 1-dimensional framelet system for , the d-dimensional tight wavelet frame system for can be easily constructed by using tensor products of the 1-dimensional framelets [12].
Problem formulation
Given a multi-channel or vector-valued image f of an arbitrary dimension with voxel i ∈ {1, …, N} consisting of vector f ∈ ℜ, where N is the number of voxels and M is the number of channels, we are interested in restoring its denoised counterpart u by solving the following problem:
Here, u( is the m-th channel of u. The regularization term is in fact a summation of G terms, each of which grouping a number of channels. The g-th grouping (with associated tuning parameter λ), where g = {1, 2, …, G}, is defined according to the set of weights {w}, where m ∈ {1, 2, …, M}. Channels with w ≠ 0 are included in the grouping and their weighted framelet coefficients are jointly considered via ℓ2-norm for penalization. The different groups can possibly overlap, implying that each channel can be included in different groups at the same time. This is in spirit similar to the overlapped group Lasso [15]. We set
Here λ is a constant that can be set independently of the weights.
Optimization
Problem (18) can be solved effectively using penalty decomposition (PD) [16]. Defining auxiliary variables
this amounts to minimizing the following objective function with respect to u and v ≔ {v}:In PD, we (i) alternate between solving for u and v using block coordinate descent (BCD). Once this converges, we (ii) increase μ > 0 by a multiplicative factor δ > 1 and repeat step (i). This is repeated until increasing μ does not result in further changes to the solution [16]. See Algorithm 1 for a summary of the algorithm. Convergence analysis is provided in the S1 Appendix.
First subproblem
We solve for v in the first problem, i.e., min
L(u, v). This is a group ℓ0 problem and the solution can be obtained via hard-thresholding:
where
This subproblem can be replaced using soft-thresholding to obtain an ℓ1 version of the algorithm.
Second subproblem
By taking the partial derivative with respect to u(, the solution to the second subproblem, i.e., min
L(u, v), is for each m
where we have dropped the subscript i for notation simplicity. Note that since we have , the the problem can be simplified to
Solving the above equation for u( is trivial and involves only simple division.Algorithm 1: Penalty Decomposition (PD)Data : Multi-channel image f.Parameters : Tuning parameter λ; initial penalty factor μ0 > 0; multiplicative factor δ > 1; BCD tolerance ϵBCD; PD tolerance ϵPD.Initialization : Iteration index k = 0; initial solution u0,0; a constant ϒ ≥ Φ(u0,0).Output : Denoised image u./* Main Steps */(1) For a fixed μ, obtain BCD solution (u, v) for . That is, set k′ = 0 and iterate the following steps:(1a) Solve .(1b) Solve .(1c) If u satisfies the BCD stopping criterionset (u, v) = (u, v) and go to Step (2).(1d) Set k′ ← k′ + 1 and go to Step (1).(2) If u satisfies the PD stopping criterionstop and output u. Otherwise, set μ = δμ.(3) If , set u = u0,0. Otherwise, set u = u.(4) Set k → k + 1 and go to Step (1).
Setting the weights
In our case, each channel corresponds to a DW image. In setting the weights {w}, we note that the weights should decay with the dissimilarity between gradient directions associated with a pair of DW images. To reflect this, we let G = M and set, for g, m ∈ {1, …, M},
where κ ≥ 0 is a parameter that determines the rate of decay of the weight. The exponential function is in fact modified from the probability density function of the Watson distribution [17] with concentration parameter κ. Essentially, this implies that for the g-th DW image acquired at gradient direction ν, there is a corresponding regularization group that includes a set of images with associated weights {w}. The weight is maximal at w = 1 and is attenuated when m ≠ g. Weights of images scanned at a gradient direction with angle greater than θ in relation to ν are set to 0, and the respective images are hence discarded from the group. We set θ = 30°.
Debiasing
The magnitude of the complex MR signal is commonly used because the phase of the complex signal is highly sensitive to many experimental factors [18, 19]. The magnitude MR signal is not affected by the phase error and it follows a nc-χ distribution [20, 21] rather than a Gaussian distribution and bias correction needs to be carried out especially when the SNR is low [18]. Bias correction can be performed before [22] or after [23] denoising. In our case, we adopted the latter for unbiased noise reduction [23].
Experiments
The main goal in the following experiments is to demonstrate that denoising performance can be improved by usingUEP-based tight wavelet frames, which avoids the staircasing effect;ℓ0 over ℓ1 regularization;Collaborative utilization of angularly neighboring DW images.Unless stated otherwise, we used the piecewise linear tight wavelet frame with L = 2 levels of decomposition. The optimal λ values for ℓ0 and ℓ1 were in (1, 8], determined using grid search from 0.2 to 50 in steps of 0.2 based on the maximal peak signal-to-noise ratio (PSNR) defined as
where MAX is the maximal signal value and MSE is the mean square error.For debiasing, the noise level is estimated from the image background using the method described in [24]. More advanced noise estimation methods [23, 25] can be used for improved accuracy.We utilized NLM filtering as a comparison baseline. Following the work presented in [26], we set the patch radius to 1 and search radius to 2.
Datasets
Spiral data
A synthetic dataset of a spiral was generated for quantitative evaluation. The parameters used for synthetic data simulation were consistent with the real data described next: b = 2000s/mm3, 48 gradient directions, 64 × 64 × 16 voxels with resolution 2 × 2 × 2 mm3. Three levels of 32-channel nc-χ noise [27] was added: σ = 5, 7.5, and 10, corresponding to SNR = 30, 20, 10. SNR is defined as η/σ [28], where η is the true signal value, which in our case is the white matter non-DW signal. The varying curvature reflects the various degree of bending of white matter pathways and gives us a good basis for evaluating how denoising performance changes in different conditions.
ISBI phantom
Evaluation was also performed using the realistic diffusion MRI phantom adopted in the ISBI 2013 HARDI challenge (http://hardi.epfl.ch/static/events/2013_ISBI/). A python package, called phantomαs [29], was used to generate the noise free phantom, with gradient directions and diffusion weighting consistent with the spiral data described above. Three levels of 32-channel nc-χ noise, similar to the spiral data, was added to the noise free phantom.
Real data
DW images were acquired using Siemens 3T TRIO MR scanner with the same gradient directions and b-value as the spiral data. The imaging protocol is as follows: 128 × 96 imaging matrix, voxel size of 2 × 2 × 2 mm3, TE = 97 ms, TR = 11, 300 ms, 32-channel receiver coil. Imaging acquisition was repeatedly performed on the same subject for 8 times. We averaged the 8 sets of DW images and removed the nc-χ noise bias to obtain the ground truth for evaluation. Informed written consent was obtained from the subject and the experimental protocol was approved by the Institutional Review Board of the University of North Carolina (UNC) School of Medicine. The study was carried out in accordance with the approved guidelines.
Results
The staircasing effect
The staircasing effect is often observed in denoising based on TV regularization [30]. The power of tight wavelet frames lies in their ability to sparsely approximate piecewise smooth functions. They are hence better suited for images with gradual intensity changes. In Fig 2, we show an example of how piecewise linear framelet denoising avoids the staircasing effect and results in a smoother image without blocking artifacts. In contrast, TV denoising causes patch artifacts when the image is not piecewise constant.
Fig 2
An illustration of how piecewise linear framelet denoising avoids the staircasing artifacts created by TV denoising.
Bias correction
The noise-induced bias on the estimated magnitude signal is especially prominent when the diffusion weighting is high. We removed the nc-χ bias using the method described in [27]. Fig 3 indicates that the nc-χ noise results in a noise floor especially when the signal is low [27]. This manifests as elevation of intensity value after denoising. Removing the noise bias produces an image that is closer to the ground truth.
Fig 3
Debiasing the denoising outcome overcomes the noise floor and results in (σ = 7.5).
Type of framelets and number of levels
Using the spiral data for evaluation, our results shown in Fig 4 indicate that piecewise linear framelet denoising performs better than other types of framelets.
Fig 4
ℓ0 denoising using Haar, piecewise linear, and piecewise cubic framelets (L = 2, σ = 5).
Fig 5 indicates that denoising performance improves with the increase in the number of levels, L. However, the time cost increases dramatically with L, i.e., 27 s for L = 1, 29 s for L = 2, and 378 s for L = 3 (based on a 4-core Intel i7 processor). Therefore, we choose L = 2 for reasonable denoising performance with a reasonable time cost.
Fig 5
ℓ0 denoising in relation to the level of decomposition, L (piecewise linear framelets, σ = 5).
Effects of grouping
Fig 6 shows the results of denoising with and without grouping of angularly neighboring images. Grouping can be observed to significantly improve PSNR.
Fig 6
The effects of grouping on denoising.
Comparison between methods
The PSNR and SSIM [31] results for the spiral data and ISBI phantom, shown in Figs 7 and 8, indicate that the proposed ℓ0-based framelet method gives the best performance for all noise levels. The DW images, shown in Figs 9 and 10, indicate that both ℓ1 and ℓ0 give sharper edges compared with NLM. Noise, however, is not totally removed for the case of ℓ1. Only ℓ0 is able to effectively remove noise and preserve edges. Note that, for both synthetic and real data, nc-χ bias was removed using the method described in [27].
Fig 7
Performance comparison based on the spiral data.
Fig 8
Performance comparison based on the ISBI phantom.
Fig 9
Comparison of denoised DW images given by different methods (σ = 5).
Fig 10
Comparison of denoised DW images given by different methods (σ = 7.5).
We also compared the computational times of different methods using the spiral data with a computer equipped with a 4-core Intel i7 processor. The results, shown in Table 2, indicate that our method and ℓ1 perform more efficiently than NLM.
Table 2
Computation times.
NLM
ℓ1
Proposed
Time (sec)
186
34
27
For the real data, we used the average image as the ground truth for quantitative evaluations. The results for all 8 datasets, shown in Fig 11, are consistent with Figs 7 and 8, indicating that ℓ0 gives the best performance. The visual results in Fig 12 indicate that the results given by ℓ0 is closest to the ground truth. This is confirmed by the root-mean-square error (RMSE) map computed between the denoised data and the ground truth data. In contrast, NLM over-smooths the image and edge information is hence lost.
Fig 11
Performance comparison based on the real data.
Fig 12
Comparison of denoised DW images using the real data.
Discussion
In this paper, we have introduced a method that harnesses correlations between DW images scanned with similar gradient directions for effective edge-preserving denoising. Our main contributions lie in three aspects. Firstly, UEP was employed to generate frames that were discrete analogues to differential operators of various orders; Secondly, instead of the conventional ℓ1 regularization, a very efficient method was proposed in order to solve an ℓ0 denoising problem that involves only thresholding and a trivial inverse problem; Thirdly, DW images acquired using neighboring gradient directions were used for collaborative denoising.NLM was used as a comparison baseline in our evaluation. However, similar to [22, 32], we found the performance of NLM to be unsatisfactory. NLM can be improved by designing better metrics for patch matching, instead of the conventional Euclidean distance. For instance, inspired by the human visual system, Foi and Boracchi [33] proposed a patch foveation operator for measuring patch distance. Baselice [34] proposed to measure pixel using the Kolmogorov—Smirnov distance, showing promising performance in reducing speckle noise in ultrasound images. NLM can be further improved by extending its search volume. For instance, collaborative NLM [35] extended the search volume to a number of co-denoising images to enrich the similar information used in noise reduction. Chen et al. [36, 37] proposed to improve NLM by considering the similar information in both spatial domain and diffusion wavevector domain. This idea was further employed to improve atlas building [38] and resolution enhancement [39].
Conclusion
In conclusion, we have proposed a method to remove the noise in DW images. The proposed method takes advantage of multi-channel framelet and the correlations among DW images for effective noise removal. The associated ℓ0 optimization problem is solved by an effective iterative hard thresholding algorithm. Extensive experiments on synthetic data and real data demonstrate the advantage of our method over various noise reduction methods, including TV regularization, NLM, and the ℓ1 counterpart of our method.