Literature DB >> 35607615

Stabilizing deep tomographic reconstruction: Part B. Convergence analysis and adversarial attacks.

Weiwen Wu^1,2,3, Dianlin Hu⁴, Wenxiang Cong¹, Hongming Shan^1,5, Shaoyu Wang⁶, Chuang Niu¹, Pingkun Yan¹, Hengyong Yu⁶, Varut Vardhanabhuti³, Ge Wang¹.

Abstract

Due to lack of the kernel awareness, some popular deep image reconstruction networks are unstable. To address this problem, here we introduce the bounded relative error norm (BREN) property, which is a special case of the Lipschitz continuity. Then, we perform a convergence study consisting of two parts: (1) a heuristic analysis on the convergence of the analytic compressed iterative deep (ACID) scheme (with the simplification that the CS module achieves a perfect sparsification), and (2) a mathematically denser analysis (with the two approximations: [1] AT is viewed as an inverse A- 1 in the perspective of an iterative reconstruction procedure and [2] a pseudo-inverse is used for a total variation operator H). Also, we present adversarial attack algorithms to perturb the selected reconstruction networks respectively and, more importantly, to attack the ACID workflow as a whole. Finally, we show the numerical convergence of the ACID iteration in terms of the Lipschitz constant and the local stability against noise.

Entities: Chemical

Keywords: analytic compressed iterative deep framework; bounded relative error norm; compressed sensing; deep reconstruction network; instability; kernel awareness

Year: 2022 PMID： 35607615 PMCID： PMC9122974 DOI： 10.1016/j.patter.2022.100475

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

The vulnerability of neural networks has been demonstrated with adversarial attacks in all major deep learning tasks, from misclassification examples to deep reconstruction instabilities. In the landmark paper, Antun et al. showed that deep reconstruction is unstable due to lack of kernel awareness, but sparsity-promoting reconstruction does not have such a problem. To address these instabilities, we design an analytic compressed iterative deep (ACID) network. The key idea behind ACID is to combine data-driven priors and sparsity constraints to outperform either simple-minded deep reconstruction networks or established compressed sensing-based reconstruction methods. In our study, we have not only experimentally shown the merits of ACID but also theoretically analyzed the rationale of ACID in terms of its converging behavior and solution characteristics. In the following, we put our analysis on ACID in the perspective of others’ analyses on general computational optimization in general and existing representative image reconstruction networks in particular. There are profound results in non-computability in the field of computer science. Computational optimization is important not only in the field of computer science but also to our real-world applications. The theoretical research on this theme can be traced back to Turing’s ground-breaking paper on machine intelligence and Smale’s list of problems for the twenty-first century. Recently, Bastounis et al. made remarkable progress in settling this theoretical issue. Their theory bears a major implication for Smale’s 18th problem about the boundary of artificial intelligence (AI), especially deep learning as the current mainstream of AI. They show that it is in general non-computable to construct a neural network via loss minimization and apply it to testing data, and such a neural network is generally unstable. For example, there are in principle many classification problems for which “one may have 100% success rate on arbitrarily large training and validation datasets, and yet there are uncountably many points arbitrarily close to the training data for which the trained network will fail.” Tomographic reconstruction is an important type of computational optimization problem, and, interestingly enough, deep networks for image reconstruction can and cannot be computed under different conditions. In the context of these inverse problems, the article by Antun et al. reported instabilities of deep reconstruction networks due to the lack of kernel awareness. Then, a comprehensive follow-up analysis by Antun et al. established the boundary of deep-learning-inspired tomographic reconstruction, which helps address Smale’s 18th problem. Among their contributions, the following three points are clearly made on (1) existence, (2) non-existence, and (3) the conditional existence of desirable networks. That is, while the existence of neural networks is proved in the literature for an excellent functional representation, the non-existence is proved of any algorithm that trains or computes such a neural network in a general setting. However, the conditional existence is also proved of such an algorithm to compute an accurate and stable network that solves meaningful inverse problems such as Fourier imaging from sparse data. Specifically, the existence of a network for a universal representation is well known (Theorem 2.1 in Antun et al.), but how to train a network to achieve an accurate and stable approximation is a difficult issue. It has been shown that a counterexample can always be found in a general setting so that the accuracy and robustness of a network cannot be simultaneously obtained (Theorem 2.2 in Antun et al.). On the other hand, under certain conditions, such as sparsity in levels, an accurate and stable network can be indeed obtained (Theorems 5.5 and 5.10 in Antun et al.), with the FIRENET network as a good example. At the core of the construction of FIRENET is kernel awareness. Clearly, training the network defined in subsection 5.1 in Antun et al. cannot obtain kernel awareness and is subject to the phase transition of solutions to the inverse problems. In other words, if the difference between the two images lies close to the null space of the measurement matrix and is bounded from below, the Lipschitz constant of the inverse mapping can be very large, yielding a poor imaging performance. Fortunately, an algorithm can be used to utilize sparsity in levels and find a stable and accurate neural network (Theorems 5.5 and 5.10 in Antun et al., with uniform recovery guarantees, geometric convergence, and bounds on the number of samples and the number of layers of a network for a pre-specified accuracy). In addition to the excellent work by Antun et al., active research efforts have been going on to develop deep networks for accurate and stable deep tomographic reconstruction. Representative results include the Learned Experts' Assessment-based Reconstruction Network( LEARN), ItNet network, Momentum-Net, null-space network, as well as deep equilibrium networks. In Chen et al., an iterative reconstruction algorithm in the CS framework was unrolled and trained in an end-to-end fashion. The experimental results from the resultant LEARN network on the Mayo Clinic low-dose computed tomography (CT) dataset are competitive with representative methods in terms of artifact reduction, feature preservation, and computational speed. In Genzel et al., an iterative deep-learning-based reconstruction network was designed to solve underdetermined inverse problems accurately and stably (ItNet shown in Figure 1 in Genzel et al.). In comparison with total-variation minimization, their results reveal that standard end-to-end network architectures are resilient against not only statistical noise but also adversarial perturbations. In Chun et al., another iterative neural network, referred to as Momentum-Net, was prototyped by combining data-driven regression and model-based image reconstruction (MBIR). Momentum-Net is convergent under reasonable conditions (quadratic majorization via M-Lipschitz continuous gradients). Their results show that Momentum-Net outperformed MBIR and several other networks, but the effect of adversarial attacks on Momentum-Net was not evaluated. In Schwab et al., a null-space network was studied to offer a theoretical justification to deep learning-based tomographic reconstruction via so-called Φ-regularization. The convergence of the overall reconstruction workflow is proved, assuming a Lipschitz continuity and preserving the data consistency (illustrated in Figure 1 in Schwab et al.). In Gilton et al., the deep equilibrium models were adapted to find the fixed point with guaranteed convergence under the ε-Lipschitz continuity. Subsequently, the trade-off can be made between reconstruction quality and computational cost. In connection with the above results, our ACID network has significant merits and unique features. First, ACID is dedicated to overcoming the instabilities of neural networks on extensive datasets in Antun et al. As a result, we have made a solid step forward along the direction of stabilizing deep reconstruction networks, showing that accurate and stable deep reconstruction is feasible, and remains an exciting research opportunity. Second, the ACID network is the first prototype that combines an established sparsity-oriented algorithm, a data-driven direct-reconstruction network, and an iterative data fidelity enforcement (for example, LEARN and multi-domain integrative Swin transformer ignore data consistency, ItNet network lacks kernel awareness, Momentum-Net and DRONE networks miss a learned mapping from data to images, null-space network uses no sparsity, and deep equilibrium networks focus only on the fixed point that does not imply image sparsity or data fidelity). Third, the converging behavior and solution characteristics of ACID have been analyzed under a reasonable assumption. The assumption is called the bounded relative error norm (BREN), which is a special case of a Lipschitz continuity. The Lipschitz continuity we used in our convergence analysis, which is practically interpreted as the BREN property and experimentally verified in our study, is consistent with the previous studies on non-convex optimization such as in the aforementioned network convergence analyses.10, 11, 12 Furthermore, note that we do not request that the measurement matrix must satisfy a compressed sensing condition such as the restricted isometry property (RIP). This means that a standard sparsity-promotion algorithm may not give a unique solution. In this case, ACID promises to outperform the sparsity-minimization reconstruction alone, because data prior plays a significant role to fill in the gap in deep reconstruction. Last but not least, in addition to an accurate reconstruction performance, ACID has stability in two related aspects: (1) ACID can stabilize an unstable deep reconstruction network (by putting it in the ACID framework), and (2) ACID as a whole iterative procedure is resilient against adversarial attacks. Both aspects of the ACID stability are studied systematically in this work.

Results

Our ACID architecture is heuristically obtained by minimizing an overall objective function. It is necessary to perform a convergent analysis for the iterative scheme to interpret the ACID algorithm. Although the following theoretical analysis is under several approximations, our findings do improve our understanding of the initially heuristically derived ACID scheme. It is underlined that there is no closed-form solution for the non-linear optimization problem, and a computationally efficient iterative formula is preferred for a stable solution. In the process, the errors will be suppressed via ACID iterations so that the ACID algorithm will converge to a desirable solution in the intersection of the space constrained by measured data, the space of sparse solutions, and the space of data-driven deep priors. This mechanism is similar to the conventional algebraic reconstruction technique (ART)/simultaneous algebraic reconstruction technique (SART) algorithm whose convergence was rigorously proved for convex optimization.15, 16, 17, 18

Interpretation of ACID convergence

In the medical imaging field, a tomographic imaging task can be simplified to a system of linear equations: , where is a system matrix (for example, is the Radon transform for CT and the Fourier transform for MRI), is an original measurement data, is the ground truth image of the object to be reconstructed, and is data noise, with a noise level . We focus on the few-view imaging for CT and sparse sampling for MRI. In this case, the column number of the system matrix is less than its row number, that is , meaning that the inverse problem is underdetermined. For the under-deterministic problem, additional prior knowledge must be introduced to uniquely and stably recover the original image. Without loss of generality, we assume that is unitary, and is the adjoint of . satisfies the RIP of order , and is s-sparse. We further assume that the function models a well-trained neural network, and it continuously maps measurement data to an image. Although outputs an image from the measurement, which is an inverse process of the linear system , we have an approximate form: in some sense such as satisfying the aforementioned BREN. Because the system matrix is underdetermined and the neural network is unstable and may generate an artifact image, , where is observable and is in the null space of . satisfies , and if . In this work, our goal is to design an iterative framework to stabilize an unstable neural network aided by a CS-based sparsity-promoting module. As an idealized setting to show the essential idea, we assume that the input to the neural network is dataset , and the output of the CS module is Let us introduce a residual error in the projection domain, that is as a target of a correction mechanism. Then, we want to minimize the following objective function:where , , and are hyperparameters, the first term is the difference between the outputs of the neural network and the CS-based sparsifying module, the second term is the measured noise energy, the third term is the residual error energy also in the projection domain, and the last term is to enforce the sparsity of the output image of the CS module, which is subject to the data-fidelity constraint in the projection domain. Let us define Then, we can use the block coordinate descent method to optimize Equation 2 as follows: To update , we need to solve the following problem: Computing the partial derivative of the right side of (Equation 4), we have Because the neural network is well-trained to solve the problem , we assume (at least on a training dataset). By performing derivative on both sides of , we have , where is the identity matrix. This means (in the sense of a pseudo-inverse for an underdetermined matrix which can be obtained by classical methods such as truncated singular value decomposition [SVD]). In the classic and modern iterative CT reconstruction methods (e.g., SART), while , the residual error correction mechanism and the resultant cumulative effect of the whole iterative process will make the final solution converge to an optimal solution for projections that are sufficiently sampled., In this sense, treating a backprojection operator as an approximate inverse to the projection operator in each iteration is reasonable. Furthermore, in the ACID iterative framework, we also make the approximation . Hence, we have the approximation that and . In Equation 5, is the operator transforming a reconstructed image into a measurement dataset, and it is approximated as . By ignoring the observable artifact image from the neural network (since in the iterative correction, the artifact image will be gradually reduced; see the section “method details” for justification), we have . Therefore, Equation 5 can be simplified as To update , we solve the following problem: With and , Equation 7 is rewritten as follows: Computing the partial derivative of the right side of Equation 8, we have Similarly treating as and , Equation 9 can be simplified as From Equation 6, we have By substituting Equation 11 into Equation 10, we have Noting that Equation 12 can be simplified as By rewriting Equation 14 as We have via soft-threshold filtering:where the soft-thresholding kernel is defined as Combining Equations 6 and 16, we obtain a set of formulas: Let us denote and simplify Equation 18 as Clearly, Equation 19 agrees with our heuristically derived ACID network by setting . In other words, ACID is a special case of Equation 19 after the weighting parameters are properly selected. Although a unitary property is assumed for the sparse transform to obtain Equation 19, as is the case of the orthogonal wavelet decomposition, similar results can be also obtained in non-unitary cases. Poon studied the problem of recovering a 1D or 2D discrete signal that is approximately sparse in its gradient transform from an incomplete subset of its Fourier coefficients. To obtain a high-quality reconstruction with high probability, robust to noise and stable to inexact gradient sparsity of order , Poon proved that it is sufficient to draw of the available Fourier coefficients uniformly at random. With Poon’s results, we can extend Equation 19 to a non-unitary discrete gradient transform for total variation (TV) minimization. Specifically, the term in Equation 2 is specialized as a total-variation function based on discrete gradient transformwhere and represent the width and height of a reconstructed image, and the gradients on the image border are assumed to be zero. An FFT-based algorithm, FTVd, can be employed to find the sparse solution for . Note that the generic TV favors piecewise constant regions, while high-order TV encourages piecewise polynomials. Here, the input to the CS-based sparsifying module is normalized to [0, 1] to facilitate the selection of the regularized parameters, which requires de-normalization of the output of the CS module. In the CS framework, the robust null-space property ensures the stability of sparsity regularized recovery., Let us denote Equation 19 can be rewritten as The discrete gradient function defined in Equation 20 can be interpreted as with a non-unitary transform matrix . Because is not invertible, the adjoint matrix in Equation 22 is not the inverse of . However, due to the fact that both and work as a pair before and after the soft-thresholding filtering in Equation 22, can be interpreted as a pseudo-inverse of the discrete gradient transform . Hence, each pixel of at the position in Equation 22 can be expressed as follows:where is the pseudo-inverse of the soft-thresholding kernel for a given threshold . The pseudo-inverse is defined as:with the pseudo-inverse (Equation 24), although the discrete gradient transform is neither unitary nor invertible, the iterative framework (Equation 19) can still be applied for TV minimization using a compressed sensing technique. Under practically reasonable conditions such as noisy and insufficient data, the ACID iteration will converge to a feasible solution subject to an uncertain range proportional to the noise level (see the convergence analysis in the section “method details”).

BREN property

Our theoretical analysis requires the following BREN property of a reconstruction neural network to reconstruct from measurement . If a reconstruction network satisfies the BREN property, we call it a well-designed and well-trained reconstruction network, or a proper network. Definition: a reconstruction network has the BREN property if the ratio between the norm of the reconstruction error and the norm of the corresponding ground truth is less than with . For an s-sparse observable image , there are different ways to formulate the Lipschitz continuity such as our BREN property. Let us assume that the function models a well-trained neural network. Denote the output of the neural network , where the second and third terms are observable and null-space components of the error image associated with the ground truth image and the measurement matrix , the BREN property is defined as Equation 25 implies that . Remark 1: in the literature of deep imaging, including the paper on instabilities of deep reconstruction, a reconstruction network, even if it is unstable, will still produce an output not too far from the ground truth in the sense of the BREN property. The involved errors of types I and II have significant clinical impacts but the norm of these errors in combination is assumed to be small relative to that of the underlying image. This is how a proper reconstruction network is defined and commonly expected in practice. For example, the most popular loss function of a reconstruction network is in the norm so that a reconstructed image should be close to the ground truth in the sense of the norm without an adversarial attack. Furthermore, in the adversarial attack cases of our interest, the BREN property is assumed to be valid as the condition for our convergence analysis below. Remark 2: for deep reconstruction in the supervised mode, a training dataset is typically in the format of . We assume that the imaging model is linear, and we can augment the training dataset to , where is any constant within a reasonable range. With the augmented data, the network will map the input of a small norm to an output of a proportionally small norm. Alternatively, we can include the normalization layer(s) in the reconstruction network so that the network performance is insensitive to the magnitude of data and images. Remark 3: our assumption of the BREN property is needed for our convergence analysis below, just like the case for CS theory where RIP/rNSP is required for unique image recovery. If the requirement is not met, the theoretical arguments below will not be valid. We have shown that our BREN ratio is substantially less than 1 for the datasets in this PNAS study. Specifically, all the experimental results with perturbations were repeated in the CT and MRI cases26, 27, 28, 29 reported in Antun et al. Then, the BREN ratios were computed using different reconstruction networks with various perturbations. It is found that all these ratios in CT and MRI experiments are substantially less than 1. As shown in Table 1 and Figures 1, 2, 3, and 4, the AUTOMAP seems more sensitive to the perturbations; i.e., small perturbations cause large changes in the sense of the L2-norm. Clearly, the BREN property is satisfied in this context. It is easy to observe that the perturbed images contain artifacts; for example, the MED-50, AUTOMAP, and MRI-VN results. In these cases, the sparsity of reconstructed images was corrupted, and the feedforward data estimation based on these reconstruction results are usually not consistent with the original measurement. Searching for a feasible solution within the space of sparse solution is central to the traditional iterative reconstruction. Furthermore, the ACID searches for a reconstruction good in the three aspects: image sparsity, big-data-driven prior, and iterative calibration to eliminate unexplained residual data. When the final image satisfies all these three constraints, it will be our best possible solution.

Table 1

BREN ratios (%) associated with different reconstruction networks

Methods	r₁	r₂	r₃	r₄
Med-50	2.90	x	x	x
AUTOMAP	10.39	23.09	47.85	85.86
Deep MRI	2.73	8.03	13.28	x
MRI-VN	3.53	x	x	x

Figure 1

Reconstruction results using MED-50 from Antun et al.

The first to fourth images represent the original image, MED-50 result without perturbation, perturbation image, and MED-50 result with perturbation, respectively.

Figure 2

Reconstruction results using AUTOMAP from Antun et al.

The first to fourth columns represent the original, perturbation, original plus perturbation, and perturbed AUTOMAP images, respectively. The first to fourth rows represent different strengths of perturbation, where .

Figure 3

Reconstruction results using deep MRI

These results were adapted from Antun et al. The first-third columns represent the original, perturbation, and perturbed deep MRI (DM) results, respectively. The first to third rows present different strengths of perturbation, where .

Figure 4

Reconstruction results using MRI-VN from Antun et al.

The first to third images represent the original, perturbation, and perturbed MRI-VN results, respectively.

BREN ratios (%) associated with different reconstruction networks Reconstruction results using MED-50 from Antun et al. The first to fourth images represent the original image, MED-50 result without perturbation, perturbation image, and MED-50 result with perturbation, respectively. Reconstruction results using AUTOMAP from Antun et al. The first to fourth columns represent the original, perturbation, original plus perturbation, and perturbed AUTOMAP images, respectively. The first to fourth rows represent different strengths of perturbation, where . Reconstruction results using deep MRI These results were adapted from Antun et al. The first-third columns represent the original, perturbation, and perturbed deep MRI (DM) results, respectively. The first to third rows present different strengths of perturbation, where . Reconstruction results using MRI-VN from Antun et al. The first to third images represent the original, perturbation, and perturbed MRI-VN results, respectively.

Lipschitz convergence with perturbations

Let the combination of the measurement matrix and the neural network be . According to the definition of the Lipschitz constant, if we employ the norm, the Lipschitz constant is the minimal constant that holds for the following inequality: For each fixed , we generated a series of perturbations to obtain , and computed the value of the ratio . Specifically, we computed for many images and found the upper and lower bounds of the Lipschitz constant as our empirically estimated ranges in the CT and MRI cases, respectively. Note that the authors of AUTOMAP did not provide the original data and code for sufficient training and testing, we only performed this experiment on the DAGAN and Ell-50. Here, only 500 pairs of ellipse phantoms were used for Ell-50. Each pair contains and , where was generated by adding an adversarial attack on the clear image using the aforementioned adversarial method. Specifically, the lower and upper bounds in the Ell-50 case are 0.4674 and 0.6424, respectively. In contrast, 14,866 pairs of MRI images were used to determine the lower and upper bounds in the DAGAN case, and the corresponding lower and upper bounds are 1.4854 and 12.0737, respectively. We had shown the convergence of ACID with respect to PSNR in Part A of our work, and here we show the convergence of ACID in terms of the Lipschitz constant with respect to the number of iterations. As representative examples, the convergence curves in the C3 and M4 cases (see the supplemental information of Wu et al. for more details) are given in Figure 5. It can be observed that the Lipschitz constant of ACID for both CT and MRI are monotonically decreasing and finally converge to a constant scale.

Figure 5

Convergence curves of ACID in terms of the Lipschitz constant

(A and B) The convergence curves of ACID with respect to the number of iterations in the C3 and M4 cases, respectively.

Convergence curves of ACID in terms of the Lipschitz constant (A and B) The convergence curves of ACID with respect to the number of iterations in the C3 and M4 cases, respectively.

ACID against noise

Although some examples about the insensitivity of ACID against noise are reported in Part A of Wu et al., here we followed up the study in Koonjoo et al. and performed a similar local stability test on ACID with DAGAN and Ell-50 respectively built in. This local robustness was assessed using the maximum ratio between variations in the output space and variations in the imaging object space: for two adjacent images and . In this test, 500 pairs of CT phantoms were selected from the Ell-50 test dataset. Then, the additive white Gaussian noise was added, with zero mean and standard derivation 11–30 HU. In this way, we obtained 500 cases. Furthermore, 14,866 image pairs were chosen from the MRI dataset. Similarly, the additive white Gaussian noise was randomly added to each of these images to generate . A maximum output-input variation ratio of 3.023 was observed for these noisy inputs. The histograms in the CT and MRI cases are given in Figure 6. The results empirically demonstrate the local stability of the ACID reconstruction against noise.

Figure 6

ACID is locally stable with respect to noise

(A) The histogram of the output-to-input ratio between noise-free and Gaussian input data, where ACID has Ell-50 built-in, giving the maximum value of 0.229.

(B) The histogram of the output-to-input ratio between noise-free and Gaussian input data, where ACID has DAGAN embedded, with the maximum ratio of 3.023.

ACID is locally stable with respect to noise (A) The histogram of the output-to-input ratio between noise-free and Gaussian input data, where ACID has Ell-50 built-in, giving the maximum value of 0.229. (B) The histogram of the output-to-input ratio between noise-free and Gaussian input data, where ACID has DAGAN embedded, with the maximum ratio of 3.023.

Discussion

Kernel awareness and network stability

The kernel awareness is an important concept. When a reconstruction algorithm lacks the kernel awareness, a “cardinal crime” (“cardinal sin”) could be committed, which implies that a well-trained network model would potentially produce highly unstable results, defeating the purpose of medical imaging. In that scenario, the trained network would produce significantly different images from essentially identical input datasets, between which there are subtle differences representing invisible perturbations. Specifically, a deep network is trained on a dataset using an optimization technique. The learning procedure would normally converge to a network model with optimized parameters, which is usually a continuous transform such that for ,where is a suitable norm, is a measurement matrix, and a constant is a bound. To evaluate the stability of the network model, an ε-Lipschitz metric is defined as follows: A formula can be derived for a lower bound of the ε-Lipschitz index estimation for : In fact,for An inverse problem, such as few-view CT and sparse MRI, involves solving , where is an matrix, , and is measurement noise. Clearly, the transform would have a null space (kernel) with Then, there is a nonzero vector and a scale factor for a large number such thatwhere . If the training set has at least two such elements and , we have From Equation 32, the instability is intrinsic; that is, when input data are very close to the null space of the associated imaging operator and is slightly perturbed, a large variation would be induced in the reconstructed image. The instability of the trained network would yield artifacts in reconstructed images, subject to either false-positive or false-negative diagnosis.

BREN and Lipschitz continuity

Assuming the BREN property, our analysis (see the section “method details” and Figure 7, Figure 8, Figure 9) shows that ACID is stable against adversarial attacks. In fact, BREN can be viewed as a special case of the Lipschitz continuity; i.e., they are consistent.

Figure 7

ACID architecture for stabilizing deep tomographic image reconstruction

ACID consists of the following components: deep reconstruction, compressed sensing-based sparsity-promotion, analytic mapping, and iterative refinement. is original tomographic data, and , = 1, 2, 3, …, , represents an estimated residual dataset in the th iteration between and the currently reconstructed counterpart. is an output of the deep reconstruction module, and represents the image after compressed sensing-based regularization.

Figure 8

Feedforward propagation of adversarial data in the ACID framework

Figure 9

Backpropagation process of ACID

Let us first define measurement and reconstruction operators and on two metric spaces and , respectively. Let us measure an image tomographically to obtain a measurement . We assume that each image in is non-trivial in that sense that . Let us denote the measurement operator ; that is . Suppose that the measurement process is totally transparent to us. Thus, we know a 1-to-1 correspondence perfectly. For example, in our case, the measurement matrix satisfies the RIP of order , is s-sparse, , and there exists a 1-to-1 map. Then, we can define the ideal reconstruction operator . Reasonably, we assume that the operator is a Lipschitz continuous (LC) function that satisfies for a constant . With a big dataset, we can train a deep network to approximate the ideal reconstruction operator , is an LC function that satisfies for a constant . Furthermore, we assume that network is well designed and well trained so that, for a training tomographic dataset, we have , . For a new dataset from an underlying image , the BREN property requires that . Suppose that the image is close to an image in the training dataset. In this setting, we have the following relations:where Equation 33 is due to the fact the network is well designed and well trained, and Equation 34 and Equation 35 are due to the Lipschitz continuity of and . Therefore, we have Therefore, under the condition thatwe have the BREN. The condition can be simplified to , which is roughly . That is, as long as an image is fairly close to the training dataset, the BREN property is satisfied. Heuristically, if the image norm is greater than the product of the LC constant and the distance between an image to be reconstructed and its closest reference point, we have the BREN property. For a big dataset, is small, so that L can be large, which is especially true if we interpret and as appropriate low-dimensional manifolds. Because there is a 1-to-1 correspondence perfectly, we can treat the combination of the measurement matrix and the neural network as a new LC function , which satisfies The Lipschitz continuity assumption is useful to assess the convergence of a deep reconstruction algorithm. In the section “results”, we have verified the BREN property for the data used in Antun et al. Those results support the practical relevance of the BREN property. More importantly, one can calculate the Lipschitz constant directly for both the MRI and CT data using Equation 38. Unlike the establishment of the instabilities, it is mathematically insufficient to prove the general applicability of ACID using only a finite number of positive experimental results. Hence, a theoretical analysis is desirable on the convergence of the ACID iteration. Although a thorough characterization is rather challenging (since the field of non-convex optimization is still in its infancy), we have assumed an experimentally motivated BREN property of the reconstruction network, which is a special case of the Lipschitz continuity that is widely used for non-convex optimization to establish various converging properties., The BREN property means that the relative error of a deep network-based reconstruction is under control in a L2 norm. Based on BREN, we have made an initial effort to understand the converging mechanism of the ACID iteration. Specifically, we have provided not only (1) a heuristic analysis based on the simplification that the CS module allows a perfect sparsification but also (2) a mathematically denser analysis of the convergence under two approximations (the first approximation to invert an underdetermined system matrix , and the other is to minimize TV with a non-unitary transform ).

Experimental procedures

Resource availability

Lead contact

Hengyong Yu, PhD (e-mail: Hengyong-yu@ieee.org).

Materials availability

The study did not generate new unique reagents.

Method details

Adversarial attacks to a selected network

In the image reconstruction field, the continuous imaging system, can be discretized into a linear model , where is the system matrix, represents collected data, and and defines the size of the system matrix . The aim of image reconstruction is to reconstruct from for a given system matrix . To assess the stability of image reconstruction, it is necessary to compute a tiny perturbation or adversarial attack.,, In this context, Antun et al. first computed a tiny perturbationFor a well-trained neural network to solve Equation 39, similar to the adversarial attack in image classification, we compute the instabilities by formulating the following problemWith Equation 40, when , the relationship might not hold. One can consider the constrained Lasso variant of Equation 40 as follows:There is no infeasibility issue for Equation 41. An unconstrained Lasso inspired version of Equation 41 is given by With , Equation 42 is further converted towhere for image-domain post-processing38, 39, 40 and with the end-to-end network (such as AUTOMAP and iRadonMap). Note that Equation 43 works in the image domain to find perturbations. One generates a reconstructed image using an easy way and then compares the original image with a perturbed one to determine whether the perturbed image is acceptable/unacceptable. Now, we describe the details on how to generate perturbations for a single neural network. Since the neural network is a non-linear function. It is difficult to search for a global maximum for Equation 43. Here, we use the same strategy as in Antun et al. to search for tiny perturbations. In other words, one usually can reach the local maxima of Equation 43 using a gradient search method. Especially, one defines the following objective function: Regarding the optimization of Equation 44, the gradient ascent search is a very common method. See the supplemental information for details of the algorithm implementation.

Adversarial attacks to ACID as whole

The iterative process of ACID is to find the optimized solution in the intersection of (1) the space of data-driven priors, (2) the space of sparse images, and (3) the space of solutions satisfying the measurement, as shown in Figure 7. With a tiny perturbation to our proposed ACID workflow, the feedforward propagation of the perturbation is illustrated in Figure 8. Specifically, the formula of Equation 44 is converted towhere is different from as computed by the neural network and stabilized in the ACID framework. is the solution minimizing the following objective function (Equation 1). ACID architecture for stabilizing deep tomographic image reconstruction ACID consists of the following components: deep reconstruction, compressed sensing-based sparsity-promotion, analytic mapping, and iterative refinement. is original tomographic data, and , = 1, 2, 3, …, , represents an estimated residual dataset in the th iteration between and the currently reconstructed counterpart. is an output of the deep reconstruction module, and represents the image after compressed sensing-based regularization. Feedforward propagation of adversarial data in the ACID framework For the optimization problem (Equation 1), we now compute a tiny perturbation via gradient ascent search. Specifically, we compute The backpropagation process for ACID is shown in Figure 9. More clearly, we define the cost function of ACID aswhere is the perturbation, is the output of the ACID system with the perturbation , and is the corresponding output without . To find an effective , we need to compute the gradient , and then refine the perturbation using a gradient ascent algorithm. For clarity, the iteration index for ACID is changed to in this subsection. In Figure 9, there are two branches contributing to ; i.e., branches 1 and 2. To compute the gradient , we take both branches into account. Backpropagation process of ACID Because with the loss function is complicated, we cannot directly compute the gradient of . Fortunately, can be solved using the backpropagation algorithm,, which is commonly used in deep learning., Then, can be split as Now, let us start with the backpropagation process for ACID, as shown in Figure 9. First, we can decompose the ACID system into the three modules keyed to , , and respectively, where , and the whole procedure is shown in Figure 9. The input and the output of , , and are denoted as , respectively. Also, the gradient of , , and can be denoted as and , respectively. Following the same steps as for the selected network, we will use the gradient ascent method to iteratively compute adversarial attacks for the whole ACID system, and the target to be attacked will be changed from a single unstable neural network to our whole ACID workflow. There are two iterative loops: the outer loop is for gradient ascent search, and the inner loop is for ACID feedforward and backpropagation. The stopping criteria of finding an adversarial attack for the whole ACID include (1) the number of iterations reaches the maximum number of iterations for computing an adversarial attack (AA) denoted as K; or (2) the noise strength of the adversarial attack is greater than that used in attacking the single neural network recorded in our study in terms of the L2-norm. As mentioned above, the maximum number of iterations of the inner loop is K for the ACID feedforward process. Because each whole inner loop can be considered as an intermediate node, we can use the idea of backpropagation to search for a desirable perturbation. See the supplemental information for details of the algorithm implementation.

Heuristic analysis on the ACID convergence

Let denote measured data, which is generally incomplete, inconsistent, and noisy. Specifically, the data can be sinogram or k-space data. Then, we need the three key functions in the ACID scheme. First, an imaging model is the forward model from an underlying image to tomographic data, which is assumed to be linear without loss of generality. Second, the recon-net consists of a data-enhancement sub-net and a direct-reconstruction sub-net. This network may be unstable. Note that even if the recon-net is unstable, we assume that it respects the BREN property for our convergence analysis. Third, the CS module can be a standard CS algorithm or a network-version of the CS algorithm. This module is an image post-processor that maps an image reconstructed by the recon-net to a refined image within the space of sparse solutions. The loss function of the CS module can be a weighted sum of the fidelity term and the sparsity term. The fidelity term can be in the norm of the difference of the input and output images. Let be the index for iteration, , and we define the following variables. tomographic image produced by the recon-net which is assumed to be a good initial image based on the BREN assumption. represents successive refinements to tomographic image refined by the CS module which should be in the space of sparse solutions in the CS framework, and may or may not be the ground truth, depending on if RIP/rNSP is satisfied or not. estimated data based on the output of the CS module, , which should eventually become as close as possible to the measured data . unexplained residual errors based on in reference to the measured data , Specifically, this residual is defined as . This data residual will be small when is sufficiently large to obtain a good image quality. Now, let us analyze the first cycle of the ACID workflow in the following steps. The first step is to generate from the original data , which is done by the recon-net Since the recon-net may be unstable, can be generally decomposed into the following three components: (1) , the ground truth image, which is assumed to be s-sparse; (2) , artifacts in the space of sparse solutions of the CS module , which cannot be eliminated based on the sparsity consideration; and (3) , artifacts not in the space of sparse solution that can be suppressed by the CS module . That is, . Then, is processed by the CS module to obtain . That is, so that the difference between and is minimized subject to that is in the space of sparse solutions of under the constraint of the measurement. As a result, (without loss of generality, here we assume that the sparsity can be perfectly achieved). Without loss of generality, let us take CT as an example. Based on , can be estimated with the forward imaging model as . Generally, we consider where two components and are observable and unobservable, respectively (an unobservable image is in the null space of ). When is nonzero, the estimated data and the measured data must be inconsistent. This discrepancy is quantified as the data residual . When does not satisfy RIP/rNSP, the intersection of the data constrained space and the data prior space may contain many solutions, and thus it could be possible that but (that is, and explain the data equally well). Nevertheless, it is highly unlikely in practice that the residual data become zero, and the ACID iterative process will not converge immediately. The nonzero data residual can be further reconstructed into the image increment using the recon-net ; that is, . Then, the current tomographic image is updated to (the sum of two prior-consistent images are assumed to be still consistent with data-driven prior, which can be alternatively achieved by applying the recon-net to the augmented data ). Generally speaking, will be closer to the ground truth than the previous image , since our reconstructed image should explain as much as possible data . With , the residual error in the data domain will be reduced. Now, we can describe the converging mechanism of ACID, assuming that the recon-net satisfies the BREN property. Our key arguments include the following three steps: After we have , by BREN we have . That is, and , because is orthogonal to , and is orthogonal to . That is, both and are fractions of . We use the forward model to synthesize the unexplained residual data . Then, is fed to the recon-net to reconstruct . Since is due to . Then, can be reconstructed up to a new artifact image . By BREN again, and . That is, both and are fractions of . We can repeat this process for , we have that . Because the norm of is less than , . Meanwhile since Noting that both and are smaller than 1, we have That is, the ACID will converge to , which will be in the intersection of the space of solutions satisfying measured data, the space of sparse solutions, and the space of data-driven solutions. While this ACID scheme may converge to an image still containing a nonzero null-space component when RIP/rNSP is not satisfied, the key point is that under the same condition (i.e., RIP/rNSP is not satisfied) a sparsity-promoting algorithm cannot eliminate such a nonzero null space component either, and more importantly ACID has enforced the powerful deep prior so that the space of feasible solutions is greatly reduced relative to that permitted with a sparsity-promoting algorithm alone. In our experiments, we have shown that ACID with the kernel awareness embedded consistently outperforms the selected sparsity-promoting algorithms that do not utilize big-data-driven prior. In other words, the data prior is instrumental in recovering the nonzero null-space component that cannot be measured by the system matrix. Although the above analysis is not mathematically rigorous, it indeed sheds light on the inner working of ACID. This analysis adds value, especially in the current situation that a general non-convex optimization theory is yet to be developed. In the above analysis, we have assumed the BREN property of the recon-net. As a result, even if the network is not ideal (which means producing a substantial nonzero artifact image), the convergence is still guaranteed, as long as the relative error is under control (less than 100%) in the norm, which is a practically motivated condition. On the other hand, it is underlined that if the network is indeed optimized or nearly optimized so that artifact image is small in the first place, the iterative process will converge rapidly, and in that case the whole ACID workflow can be unrolled into a compact feedforward network.

Mathematical analysis on the ACID convergence

In the theoretical iterative framework (Equation 19) and with the BREN property of the neural network, we will show that the final solution will converge to an optimal image, and particularly the ground truth assuming that RIP/rNSP is satisfied, subject to a noise-induced uncertainty distance in terms of the L2 norm and the null-space component. While this convergence analysis is not rigorous, it helps rationalize the ACID workflow, and in this context the convergence to the optimal solution implies stability. Now, let us analyze the convergence of our ACID scheme. Denoting and , we have Equation 19 can be simplified to our heuristically designed ACID iteration: In this subsection, we replace in Equation 19 with in Equation 49, abusing the notation a bit. Let us analyze the convergence of our ACID network for noise-free measurement as follows. Assuming an initial image . By the BREN property, we haveSince and are orthogonal, we haveThis implies thatSince is the output of the soft-thresholding filtering, it can be expressed aswhere is a noise background in the transform domain. If we denote as the nth component of there will be which is a noise floor. Without loss of generality, in the transform domain we assume the first components span the s-sparse space of . Because only the first components of is observable, let us decompose into two parts and , where is observable and is in the null space of . Then, Equation 53 can be rewritten aswhere is in the null space of . For the case : From Equations 49 and 54, we havewhere is in the null space of . For the case : From Equations 49 and 58, we havewith . If we continue the above procedure, for , it is easy to obtain thatDenoting the ground truth image of as , that is , we have Because each component of is bounded by , we have (Equation 64), (Equation 65), (Equation 66) imply: If we continue this process, we can reachWhen , Equation 70 shows Because the parameter for the soft-thresholding kernel should match the system tolerance level, it is a noise floor. Equation 71 implies that will converge to a noise-induced uncertainty range of the imaging system. For an ideal noise-free case, the matching and . The bound (Equation 69) will monotonously decrease if it satisfies . In other words, if the image is not too noisy, the ACID algorithm will converge to a solution in the intersection of the space constrained by measured data, the space of sparse solutions, and the space of deep priors. In the above analysis, we have assumed that the input to the neural network is noise free; that is, . When there is a noise component in the projection data, this noise can be decomposed into two parts, and , where satisfying with being the observable image corresponding to the noise so that is still consistent to both the data-driven prior and the sparsity condition, and as a complement of . Because the image can be absorbed by , we can ignore and only consider . Because is outside the intersection of the three spaces constrained by (1) data-driven prior, (2) sparsity condition, and (3) measurement data, and thus makes no contribution to the final image, we can just modify the system tolerance level accordingly to accommodate the effect of the noise without affecting the above convergence analysis.

28 in total

1. Analysis of an exact inversion algorithm for spiral cone-beam CT.

Authors: Alexander Katsevich
Journal: Phys Med Biol Date: 2002-08-07 Impact factor: 3.609

Review 2. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962