Kirsten Koolstra1, Rob Remis2. 1. Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands. 2. Circuits & Systems Group, Electrical Engineering, Mathematics and Computer Science Faculty, Delft University of Technology, Delft, The Netherlands.
Abstract
PURPOSE: To learn a preconditioner that accelerates parallel imaging (PI) and compressed sensing (CS) reconstructions. METHODS: A convolutional neural network (CNN) with residual connections was used to train a preconditioning operator. Training and validation data were simulated using 50% brain images and 50% white Gaussian noise images. Each multichannel training example contains a simulated sampling mask, complex coil sensitivity maps, and two regularization parameter maps. The trained model was integrated in the preconditioned conjugate gradient (PCG) method as part of the split Bregman CS method. The acceleration performance was compared with that of a circulant PI-CS preconditioner for varying undersampling factors, number of coil elements and anatomies. RESULTS: The learned preconditioner reduces the number of PCG iterations by a factor of 4, yielding a similar acceleration as an efficient circulant preconditioner. The method generalizes well to different sampling schemes, coil configurations and anatomies. CONCLUSION: It is possible to learn adaptable preconditioners for PI and CS reconstructions that meet the performance of state-of-the-art preconditioners. Further acceleration could be achieved by optimizing the network architecture and the training set. Such a preconditioner could also be integrated in fully learned reconstruction methods to accelerate the training process of unrolled networks.
PURPOSE: To learn a preconditioner that accelerates parallel imaging (PI) and compressed sensing (CS) reconstructions. METHODS: A convolutional neural network (CNN) with residual connections was used to train a preconditioning operator. Training and validation data were simulated using 50% brain images and 50% white Gaussian noise images. Each multichannel training example contains a simulated sampling mask, complex coil sensitivity maps, and two regularization parameter maps. The trained model was integrated in the preconditioned conjugate gradient (PCG) method as part of the split Bregman CS method. The acceleration performance was compared with that of a circulant PI-CS preconditioner for varying undersampling factors, number of coil elements and anatomies. RESULTS: The learned preconditioner reduces the number of PCG iterations by a factor of 4, yielding a similar acceleration as an efficient circulant preconditioner. The method generalizes well to different sampling schemes, coil configurations and anatomies. CONCLUSION: It is possible to learn adaptable preconditioners for PI and CS reconstructions that meet the performance of state-of-the-art preconditioners. Further acceleration could be achieved by optimizing the network architecture and the training set. Such a preconditioner could also be integrated in fully learned reconstruction methods to accelerate the training process of unrolled networks.
Compressed sensing (CS) allows for a reduction in scan time at the cost of longer reconstruction times, which are a consequence of the non‐linear formulation of the CS minimization problem: it uses ‐norm terms that promote sparsity of the image in certain transform domains. Examples of such sparsifying transformations are the total variation operator, wavelet transformations or a combination of these.
,
,
,
,
Preconditioning has been proposed in the past to accelerate CS reconstructions.
,
,
,
This technique can reduce the number of iterations that are needed to converge to the optimal solution.
Designing an efficient preconditioner, however, is not straightforward. It is often difficult to approximate the inverse operation of the signal model of interest in a computationally inexpensive manner.
For CS this is especially the case when multiple coil sensitivity maps are taken into account, which add an additional level of complexity to the structure of the system matrix. In Ref. [8], this problem was addressed by approximating the CS system matrix by a block circulant matrix with circulant blocks, resulting in a speed up factor of 5 when using the conjugate gradient (CG) method in combination with the split Bregman (SB) reconstruction framework. Furthermore, Ong et al. managed to construct a diagonal k‐space preconditioner for the primal dual hybrid gradient method that supports non‐Cartesian trajectories.
Although these preconditioners have already shown to successfully reduce the number of iterations needed for CS reconstructions, the approximations made in their design process often limit the acceleration factor that can be achieved.In the past decade, deep learning reconstruction approaches have also gained popularity. Once the model has been trained, these approaches are often inherently fast.
,
,
,
,
An additional advantage is that regularization parameters and regularization functions of the CS formulation can be learned during the training process,
potentially allowing larger undersampling factors. On the other hand, training a neural network requires tuning many other machine learning hyperparameters instead. While deep learning reconstructions have shown great reconstruction performance over the past years,
a difficulty of this class of approaches is the risk of the trained network not generalizing well to unseen cases or scans with different SNR and sampling patterns,
,
and the lack of understanding and detecting corresponding image artifacts. For this reason, most recent deep learning approaches integrate prior (physics) knowledge into the training process, in this way constraining the solution space.
,
,
,
The unrolled type of networks tend to be relatively large, since they follow the general structure of model‐based reconstruction algorithms, such that each network block represents one iteration of the CS problem. These type of approaches could therefore also benefit from a preconditioner that reduces the number of network blocks and hence simplifies the network structure.In this work, we integrate deep learning techniques into a physics‐driven model‐based reconstruction framework.
We explore the feasibility of learning a preconditioner using a convolutional neural network (CNN) to accelerate classical CS reconstructions. Such a preconditioner will support reconstruction acceleration without introducing uncertainty to the reconstruction quality. We learn the action of a preconditioner on a vector, such that the evaluation of the learned preconditioner will be fast and requires little memory. We first analyze the training performance of the preconditioning model on simulated test data. We then demonstrate the acceleration performance of the learned preconditioner when integrated in CG as part of the SB reconstruction framework.
The final preconditioned reconstruction algorithm is tested for multiple anatomies, coil configurations and undersampling factors, and results are compared to that obtained with the circulant preconditioner designed in Ref. [8].
METHODS
Compressed sensing
The 2D compressed sensing problem can be written as
In this formulation, is the unknown image, is the acquired k‐space data for coil i with the corresponding coil sensitivity map, is the sampling mask and is the uniform 2D Fourier transform. The finite difference operators, , and the wavelet transform, , are used as sparsifying operations that promote sparsity of the unknown image solution in its transformed domain. The regularization parameters and determine the balance between the data consistency term and the regularization terms. Such a nonlinear problem can be solved iteratively using SB.
This is a reconstruction framework in which the norm terms are decoupled from the norm terms, such that each iteration of SB solves a linear system of equations followed by Bregman parameter updates. In this algorithm, the most time‐consuming step is repeatedly solving the linear system of equations
in which
and
The Bregman parameters and the auxiliary parameters in Bregman iteration k are introduced by the Bregman scheme.
Preconditioner
Solving the linear system of equations in Equation (2) is done iteratively and can be accelerated with the help of a preconditioner. An efficient preconditioner reduces the number of iterations that are needed to solve the linear system. A reduction in iterations is typically achieved when the following holds:
The reduction in iterations needs to outweigh the additional computational costs that are introduced by incorporating the preconditioner into the iterative scheme. In this case, the total computation time of solving the preconditioned system is shorter than that of solving the original system. Since the preconditioned numerical scheme solves the same model as the original numerical scheme, shorter computation times are obtained without affecting the accuracy of the numerical solution.In this work, we train a network that learns such a preconditioner from a simulated training set. Specifically, we train a network that learns the inverse operation of Equation (2), , such that Equation (5) is satisfied. Instead of learning in the form of a full matrix, we design the model such that the network learns the action of the preconditioner on an arbitrary vector . This makes using the learned preconditioner computationally inexpensive ( s per evaluation). Moreover, little memory storage is required.
Simulation of training examples
The training set was simulated in MATLAB (Mathworks Inc, Natick, MA) and constructed from an equal amount of white Gaussian complex noise images and complex brain images. For the complex brain images, magnitude brain images (70 MP‐RAGE and 63 ‐weighted 3D brain images) were downloaded from the Human Connectome Project
(https://ida.loni.usc.edu/login.jsp) and transformed into transverse (MPRAGE:40,‐weighted:65) and sagittal (MPRAGE:36, ‐weighted:66) slices. White Gaussian noise was added to the background of the images. A Gaussian‐shaped phase was randomly added to each image, after which the resulting brain image set was augmented by rotating each image four times in steps of , resulting in 53 160 images. These were used to simulate the actual training examples, for which a wide variety of undersampling factors, sampling masks, coil sensitivity maps and regularization parameters were taken into account. Since computing for each combination is computationally expensive, we compute the training examples using only the forward operator A, according to . After this, we swap the operator’s input and output before feeding them to the network: is used as the network’s input image, while is used as the label image. In this way, the network will learn the operator , even though the training data were simulated using the forward operator A. Note that both and represent image‐domain vectors. Each pair was simulated with an integer undersampling factor ranging between 1 and 4 and Gaussian‐shaped complex coil sensitivity maps for a number of coil elements ranging between 1 and 16. The parameters and were randomly selected from the interval (0,5) and transformed into uniform regularization maps. This range was based on empirical tuning of the regularization parameters in SB reconstructions without using the learned network as preconditioner. The input images were normalized to have a maximum magnitude value of 1 and the label images were accordingly scaled. The images and the coil sensitivity maps were split into real and imaginary components, and the resulting matrices were stacked (Figure 1A). If the number of coil elements was smaller than 16, zero‐valued maps were added to the set of coil sensitivity maps until the number of coil sensitivity maps was equal to 16. This yielded a 37 channel network input. Slices for one MP‐RAGE and one ‐weighted volunteer were selected to create a model validation set.
FIGURE 1
Model input and output and network architecture. A, Each 37‐channel input contains a complex input image (), a real sampling mask (R), 16 complex coil sensitivity maps and real regularization masks ( and ). The 2‐channel output contains its corresponding complex image (). Note that all complex matrices are first split into real and imaginary components before stacking them in the network’s input and output volumes. B, A convolutional neural network with residual connections is used for training. Each residual block contains two layers.
A 2D convolution ( kernels, 128 features) was performed at every layer with ReLu (hidden layers) and tanh (output layer) activation functions
Model and training
To train the preconditioner, we used a residual neural network with three residual blocks containing two layers each,
schematically shown in Figure 1B. A 2D convolution ( kernels, 128 features) was performed at every layer. ReLu activation functions were chosen for the hidden layers, whereas the tanh activation function was chosen for the output layer to enable positive as well as negative matrix element predictions: magnitude values on the interval (0, 1) imply real and imaginary component values on the interval (, 1). We minimized the mean absolute error over 50 epochs using the Adam optimizer with an initial learning rate of , a batch size of 16 and drop out with a probability of 0.25. Training was performed in Tensorflow with a 24 GB Quadro RTX 6000 gpu, resulting in a training time of 2 days and 20 hr.Model input and output and network architecture. A, Each 37‐channel input contains a complex input image (), a real sampling mask (R), 16 complex coil sensitivity maps and real regularization masks ( and ). The 2‐channel output contains its corresponding complex image (). Note that all complex matrices are first split into real and imaginary components before stacking them in the network’s input and output volumes. B, A convolutional neural network with residual connections is used for training. Each residual block contains two layers.
A 2D convolution ( kernels, 128 features) was performed at every layer with ReLu (hidden layers) and tanh (output layer) activation functions
MR data acquisition
Experiments were performed in three healthy volunteers after giving informed consent. The Leiden University Medical Center Committee for Medical Ethics approved the experiment. Undersampled and fully sampled scans were acquired on an Ingenia 3T dual transmit MR system (Philips, Best, The Netherlands), equipped with a 15‐channel head coil and a 16‐channel knee coil (informed consent obtained). The following data were acquired:brain ‐weighted turbo spin‐echo (TSE): FOV = mm2; in‐plane resolution mm2; 4 mm slice thickness; 1 slice; TE/TR/TSE factor = 80 ms/3000 ms/16; refocusing angle = ; WFS = 2.5 pixels; scan time = 00:30 min ().brain fluid‐attenuated inversion recovery (FLAIR): FOV = mm2; in‐plane resolution mm2; 4 mm slice thickness; 1 slice; TE/TR/TSE factor = 120 ms/9000 ms/24; IR delay = 2500 ms; refocusing angle = ; WFS = 2.7 pixels; scan time = 01:30 min ().knee gradient echo (FFE): FOV = mm2; in‐plane resolution mm2; 3 mm slice thickness; 32 slices; TE/TR = 10 ms/455 ms; FA = ; WFS = 1.4 pixels; scan time = 1:01 min (), retrospectively undersampled to .brain ‐weighted inversion recovery turbo spin‐echo (IR TSE): FOV = mm2; in‐plane resolution mm2; 4 mm slice thickness; 24 slices; TE/TR/TSE factor = 20 ms/2000 ms/8; IR delay = 800 ms; refocussing angle = ; WFS = 2.6 pixels; scan time = 5:50 min (), retrospectively undersampled to .
Image reconstruction
Reconstruction of the images was performed in Python (Python Software Foundation, Beaverton, OR) and run on a Linux machine with Intel Xe 6234 CPU and 256 GB internal memory. For each data set, the complex coil sensitivity maps were estimated from the center of k‐space using ESPIRiT
and masked outside the object. SB was used as the ‐norm minimization algorithm to solve the compressed sensing problem, following the implementation described in Ref. [8]. The number of inner Bregman iterations was set to 1. The tolerance of PCG was set to . The regularization parameters were empirically tuned. Reconstructions were performed with and without the learned preconditioner, and compared to the circulant preconditioner designed in Ref. [8]. For reconstructions with the learned preconditioner, the trained TensorFlow model was imported once at the beginning of the reconstruction pipeline and used for inference in every PCG iteration of SB. The methods were compared for three matrix sizes (, and ), three different undersampling factors (, and ), two different anatomies and coil configurations (brain and knee) and three different MR contrasts (TSE, FFE and FLAIR).
RESULTS
Figure 2A shows the loss function during training for the training data (red) and the validation data (blue). Figure 2B,C shows that the performance of the trained network is not dependent on the undersampling factor and on the number of coil elements for the validation data. Figure 2D, however, shows that the validation error is smallest for large regularization parameters, for which the inverse linear system is best conditioned.
FIGURE 2
Training and validation performance. A, The mean absolute error (MAE) loss function during training for training data (red) and validation data (blue) over 50 epoch plotted on a log scale. B, C, The validation error after 50 epoch does not show large dependence on the undersampling factor and the number of coil elements. D, The validation error after 50 epoch is smallest for large regularization factors, for which the inverse operation is best defined
Training and validation performance. A, The mean absolute error (MAE) loss function during training for training data (red) and validation data (blue) over 50 epoch plotted on a log scale. B, C, The validation error after 50 epoch does not show large dependence on the undersampling factor and the number of coil elements. D, The validation error after 50 epoch is smallest for large regularization factors, for which the inverse operation is best definedThe network’s prediction performance for two validation examples. The network’s prediction, , is close to the ground truth, , which is confirmed by the small error values both for the brain case (A) and for the noise case (B). The brain example was simulated with an 8‐channel receive coil and an undersampling factor of 3, whereas the noise example was simulated with a 15‐channel receive coil and an undersampling factor of 4. A Fourier shift was applied to the sampling mask as a preprocessing step before training. The brain images in (A) were rotated by the data augmentation stepFigure 3 shows the network’s input (R, , , coil sensitivities, and image) and output for (A) a brain and (B) a noise example from the validation set. The network’s prediction of the operation is close to the ground truth image for both cases. This is confirmed by the low absolute errors (normalized ‐norm <0.18) in the error map.
FIGURE 3
The network’s prediction performance for two validation examples. The network’s prediction, , is close to the ground truth, , which is confirmed by the small error values both for the brain case (A) and for the noise case (B). The brain example was simulated with an 8‐channel receive coil and an undersampling factor of 3, whereas the noise example was simulated with a 15‐channel receive coil and an undersampling factor of 4. A Fourier shift was applied to the sampling mask as a preprocessing step before training. The brain images in (A) were rotated by the data augmentation step
The reconstruction quality and the convergence behavior with and without the learned preconditioner integrated in SB is shown in Figure 4 for an IR TSE brain scan (, matrix size , 15 coil elements) and an FFE knee scan (, matrix size , 16 coil elements). The learned preconditioner results in an acceleration factor of 4.0 in CG compared to the reconstruction without preconditioner. This is in the same range as the acceleration factor of 4.3 obtained with the circulant preconditioner. The reconstruction error plotted as a function of time for the entire SB algorithm shows that the learned preconditioner is slightly more expensive compared to the circulant preconditioner in terms of computing in each CG iteration.
FIGURE 4
Accelerated SB reconstructions of retrospectively undersampled data using the learned and the circulant preconditioner. A, The CS reconstruction of an IR TSE brain scan (, 15 coil elements) is close to the fully sampled image, which is confirmed by the low values in the error map (magnified 10 times). The learned preconditioner reduces the number of PCG iterations by a factor of 4.0 over 20 Bregman iterations, which is in the same order as the acceleration factor of 4.3 obtained with the circulant preconditioner. The final acceleration for the entire SB algorithm is slightly larger for the circulant preconditioner than for the learned preconditioner because of the slightly more efficient computation of for the circulant preconditioner compared to the learned preconditioner. B, Similar results are obtained for an FFE scan in the knee, acquired with an undersampling factor of 3 and 16 coil elements. The regularization parameters were set to (A): and (B): . The number of outer/inner Bregman iterations was set to 60/1. The reconstruction time with the learned preconditioner was (A): 16.9 s and (B): 4.7 s
Accelerated SB reconstructions of retrospectively undersampled data using the learned and the circulant preconditioner. A, The CS reconstruction of an IR TSE brain scan (, 15 coil elements) is close to the fully sampled image, which is confirmed by the low values in the error map (magnified 10 times). The learned preconditioner reduces the number of PCG iterations by a factor of 4.0 over 20 Bregman iterations, which is in the same order as the acceleration factor of 4.3 obtained with the circulant preconditioner. The final acceleration for the entire SB algorithm is slightly larger for the circulant preconditioner than for the learned preconditioner because of the slightly more efficient computation of for the circulant preconditioner compared to the learned preconditioner. B, Similar results are obtained for an FFE scan in the knee, acquired with an undersampling factor of 3 and 16 coil elements. The regularization parameters were set to (A): and (B): . The number of outer/inner Bregman iterations was set to 60/1. The reconstruction time with the learned preconditioner was (A): 16.9 s and (B): 4.7 sAccelerated SB reconstructions of prospectively undersampled data using the learned and the circulant preconditioner. A, The CS reconstructions of a FLAIR brain scan (, 15 coil elements, matrix size of ) are close to the fully sampled image. Note that the relative difference (magnified 1e6 times) between de‐reconstructed images obtained with the learned and the circulant preconditioner are neglible, since the CG tolerance level was fixed for all reconstructions. The learned preconditioner reduces the number of PCG iterations, yielding a similar performance as the circulant preconditioner. B, Similar results are obtained for a TSE brain scan in the brain, acquired with an undersampling factor of 2, 13 coil elements and a matrix size of . These results show that the performance of the preconditioner is much more stable when the preconditioner is trained on noise images only compared to when the preconditioner is trained on brain images only. This suggests that the distribution of the brain images from the human connectome project database is different than that of the acquired test images. This could be due to a difference in SNR level, image contrast or image phase, for example. Training the preconditioner on both brain and noise images in equal amount results in the best preconditioning performance. The regularization parameters were set to (A): and (B): . The number of outer/inner Bregman iterations was set to 20/1. The reconstruction time with the learned preconditioner was (A): 5.1 s and (B): 5.6 sSimilar results are presented in Figure 5 for a prospectively undersampled FLAIR scan (, matrix size , 15 coil elements) and a TSE scan (, matrix size 256 × 256, 13 coil elements) in the brain, which show the ability for the learned preconditioner to slightly outperform the circulant preconditioner. Figure 5B also shows that the model trained on both brain and noise images results in a much better preconditioning performance than a model trained on either noise or brain images.
FIGURE 5
Accelerated SB reconstructions of prospectively undersampled data using the learned and the circulant preconditioner. A, The CS reconstructions of a FLAIR brain scan (, 15 coil elements, matrix size of ) are close to the fully sampled image. Note that the relative difference (magnified 1e6 times) between de‐reconstructed images obtained with the learned and the circulant preconditioner are neglible, since the CG tolerance level was fixed for all reconstructions. The learned preconditioner reduces the number of PCG iterations, yielding a similar performance as the circulant preconditioner. B, Similar results are obtained for a TSE brain scan in the brain, acquired with an undersampling factor of 2, 13 coil elements and a matrix size of . These results show that the performance of the preconditioner is much more stable when the preconditioner is trained on noise images only compared to when the preconditioner is trained on brain images only. This suggests that the distribution of the brain images from the human connectome project database is different than that of the acquired test images. This could be due to a difference in SNR level, image contrast or image phase, for example. Training the preconditioner on both brain and noise images in equal amount results in the best preconditioning performance. The regularization parameters were set to (A): and (B): . The number of outer/inner Bregman iterations was set to 20/1. The reconstruction time with the learned preconditioner was (A): 5.1 s and (B): 5.6 s
DISCUSSION
The results in this paper showed that it is possible to learn a preconditioner for CS and PI reconstructions using a relatively small CNN. The model was designed such that the learned preconditioner operates on a vector, which makes integrating the preconditioner in a numerical scheme computationally inexpensive and memory efficient compared to learning a full preconditioning matrix. The learned preconditioner reduced the number of PCG iterations by a factor of approximately 4, hereby meeting the performance of an efficient circulant preconditioner.
The training data were simulated for a wide range of sampling masks, coil sensitivity maps and regularization parameters, such that the same trained model can be used to reconstruct images acquired with different coil configurations and of different anatomies. This provides us with a flexible preconditioning framework, in which the learned preconditioner is not restricted to a certain structure of the system matrix, as is the case for circulant preconditioners.For this proof‐of‐principle study, a relatively small CNN was chosen as network architecture, on the one hand to support accelerated reconstructions of varying matrix size and resolution and on the other hand to limit the inference time. With this simple network architecture, the learned preconditioner was able to meet and sometimes slightly improve upon the performance of the efficient circulant preconditioner. Network, model and training set optimization is expected to increase the acceleration factor further, although these design choices are not trivial. Supporting Information Figure S1 shows, for example, that using a deeper network does not directly lead to a better preconditioning performance. Furthermore, by learning the preconditioner’s action on a vector instead of learning the preconditioning matrix itself, the preconditioning performance becomes dependent on the input vector. To minimize this effect, the input vectors should be as general as possible, with as little coherent structures between training examples as possible. Therefore, combining noise and brain images in the training set results in a better preconditioning performance than using only noise or brain images. Further research is needed to investigate which image types complement the current brain and noise images best for optimal generalizability of the preconditioner. Finally, the Adam optimizer has shown good performance in combination with the current network architecture,
but it is worth investigating whether other optimizers can achieve a higher preconditioning performance for the test data in this application.The current method has demonstrated the feasibility of learning a preconditioner for 2D CS reconstructions. To provide the deep learning model with as much information about the underlying MR physics as possible, coil sensitivity maps and the sampling mask were fed to the network along with the input image and regularization parameters, yielding a 37‐channel network input for each training example. Following the same procedure for 3D reconstructions would increase the dimensionality of the network input further due to the volumetric nature of the coil sensitivity maps. Such a large network input may hinder efficient learning of the inverse operation of the SB system matrix. Coil compression and grouped convolutions could be used to overcome this limitation and coil sensitivity map‐independent preconditioners should be investigated.This work focussed on preconditioning as a means to speed up classical model‐based reconstructions, while fully learned reconstructions are often inherently fast. The advantage of accelerating reconstructions via preconditioning compared to using fully learned reconstructions is that preconditioners do not affect the reconstruction quality, whereas the stability of fully learned reconstructions still needs refining.
An ill‐designed preconditioner could, in a worst‐case scenario, either result in a slower convergence behavior than reconstruction without the preconditioner. We cannot guarantee that the learned network leads to (fast) convergence of CG in all cases, but a nonconverging reconstruction is easily and automatically detected by monitoring the CG residual. In our work we observed fast convergence behavior for all studied cases. However, besides having short reconstruction times, the large degree of freedom in fully learned reconstructions can also help to better exploit image features such as sparsity,
possibly allowing reconstructions from higher undersampling factors. Systems of similar structure as the one described in Equations (2)–(4) are often encountered in such variational networks, which could therefore also benefit from the preconditioning technique described in this work. A preconditioner could potentially be learned simultaneously with the reconstruction model, similar to how ADMM‐Net learns the inverse operation corresponding to the system matrix.
Integration of learned preconditioners in variational networks might reduce the number of network blocks (representing CS iterations) needed for convergence and hence accelerate the learned reconstruction process further. This could be particularly relevant for applications where real‐time reconstructions are needed.In conclusion, it is possible to learn a preconditioner for CS and PI reconstructions using a relatively small CNN. Such an approach can potentially help to further accelerate CS reconstructions compared to existing preconditioners. Future research is needed to optimize the efficiency of the learned preconditioner.FIGURE S1 Comparison of the preconditioning performance for different number of network layers and network blocks. During training of the network, the model was stored for each of the 50 epoch. Afterwards, the brain TSE image (see Fig. 5B) was reconstructed 50 times (x‐axis), each time using the stored model from a different training epoch as preconditioner. The y‐axis represents the outer Bregman iteration number. The colors represent the number of CG iterations needed to reach the desired tolerance level in each outer Bregman iteration. (A) The network used in this work contains 3 blocks with 2 layers each. The total number of CG iterations decreases as the epoch number increases. The convergence behavior does not vary much after 40 epoch. (B) Using a larger number of network blocks (5 blocks, 2 layers each) results in a similar convergence behavior for some of the epoch, but shows unstable behavior for larger epoch numbers. This is an indication that the network starts to overfit after approximately 40 epoch. (C) When a larger number of network layers is used in each block (3 blocks, 4 layers each) this unstable behavior is not observed, but the required number of CG iterations is still larger than for the small network used in (A). Note that the maximum number of CG iterations was set to 20 to detect nonconvergent behavior for the early epochClick here for additional data file.
Authors: Berkin Bilgic; Itthi Chatnuntawech; Mary Kate Manhard; Qiyuan Tian; Congyu Liao; Siddharth S Iyer; Stephen F Cauley; Susie Y Huang; Jonathan R Polimeni; Lawrence L Wald; Kawin Setsompop Journal: Magn Reson Med Date: 2019-05-20 Impact factor: 4.668
Authors: Kerstin Hammernik; Teresa Klatzer; Erich Kobler; Michael P Recht; Daniel K Sodickson; Thomas Pock; Florian Knoll Journal: Magn Reson Med Date: 2017-11-08 Impact factor: 4.668
Authors: Martin Uecker; Peng Lai; Mark J Murphy; Patrick Virtue; Michael Elad; John M Pauly; Shreyas S Vasanawala; Michael Lustig Journal: Magn Reson Med Date: 2014-03 Impact factor: 4.668