Literature DB >> 31406960

Learning with Known Operators reduces Maximum Training Error Bounds.

Andreas K Maier¹, Christopher Syben¹, Bernhard Stimpel¹, Tobias Würfl¹, Mathis Hoffmann¹, Frank Schebesch¹, Weilin Fu¹, Leonid Mill¹, Lasse Kling², Silke Christiansen³.

Abstract

We describe an approach for incorporating prior knowledge into machine learning algorithms. We aim at applications in physics and signal processing in which we know that certain operations must be embedded into the algorithm. Any operation that allows computation of a gradient or sub-gradient towards its inputs is suited for our framework. We derive a maximal error bound for deep nets that demonstrates that inclusion of prior knowledge results in its reduction. Furthermore, we also show experimentally that known operators reduce the number of free parameters. We apply this approach to various tasks ranging from CT image reconstruction over vessel segmentation to the derivation of previously unknown imaging algorithms. As such the concept is widely applicable for many researchers in physics, imaging, and signal processing. We assume that our analysis will support further investigation of known operators in other fields of physics, imaging, and signal processing.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31406960 PMCID： PMC6690833 DOI： 10.1038/s42256-019-0077-5

Source DB: PubMed Journal: Nat Mach Intell ISSN： 2522-5839

Introduction

Pattern analysis and machine intelligence have been focussed predominantly on tasks that mimic perceptual problems. These are typically modelled as classification or regression tasks in which the actual reference stems from a human observer that defines the ground-truth. As we have only limited understanding on how these man-made classes emerge from the human mind, there is only limited knowledge available. As such, pattern recognition has relied on expert knowledge to design features that are suited towards a particular recognition task [1]. In order to alleviate the task of feature-design, researchers started also learning feature descriptors as a part of the training procedure [2]. Implementation of such on efficient hardware gave rise to first models that could outperform classical feature extraction methods significantly [3] and was one of the milestone works in the emerging field of deep learning. With the rise of deep learning [4], researchers became aware that these methods of general function learning are applicable to a much wider range than mere perceptual tasks. Today, machine learning is applied in a much wider range of applications. Examples range from image super resolution [5], image denoising and inpainting [6], or even computed tomography [7]. In these fields, the methods from deep learning are often directly applied and often show performances that are either en par or even significantly better than results found with state-of-the-art methods. Yet, there are also reports that present surprising results in which parts of the image are hallucinated [8, 9]. In particular [9] demonstrates that mismatches in training and test data leads to dramatic changes in the produced result. Hence, blind deep learning methods have to be performed with care in order to be successful. In this article, we explore the use of known operations within machine learning algorithms. First, we analyze the problem from a theoretical perspective and study the effect of prior knowledge in terms of maximal error bounds. This is followed by three applications in which we use prior operators to study to their effect on the respective regression or classification problem. Lastly, we discuss our observations in relation to other works in literature and give an outlook on future work. Note that some of the work presented here is based on prior conference publications [10, 11, 12, 13].

Known Operator Learning

The general idea of known operator learning is to embed entire operations into a learning problem. Figure 1 presents the idea graphically. We generally refer to the N-dimensional input of our trained algorithm as x′ ∈ ℝ. In order to increase readability, we use an extended version x ∈ ℝ such that inner products with some weight vector w′ plus bias w 0 can be conveniently written, i. e. w′ ⊤ x′ + w 0 = w ⊤ x. Before looking into the properties of this approach and in particular maximal error bounds, we shortly summarize the Universal Approximation Theorem as it is closely related to our analysis. Note that the supplementary material to this article contains all proofs for the presented the-orems in this section.

Figure 1

Schematic of the idea of known operator learning

One or more known operators (here Operator u and Operator g) are embedded into a network. Doing so, allows dramatic reduction of the number of parameters that have to be estimated during the learning process. The minimal requirement for the operator is that it must allow the computation of a sub-gradient for use in combination with the back-propagation algorithm. This requirement is met by a large class of operations.

Universal Approximation

Theorem 1 (Universal Approximation Theorem). Let φ(t) : ℝ → ℝ be a non-constant, bounded, and continuous function and u(x) be a continuous function on a compact set 𝒟 ⊂ ℝ. Then, there exist an integer N, weights w ∈ ℝ, and u ∈ ℝ that form an approximation such that the inequality holds for all x ∈ 𝒟 and ϵ > 0. Theorem 1 states that for any continuous function u(x) an approximation can be found such that the difference between true function and approximation is bounded by ϵ. With increasing number of nodes N, ϵ will decrease. In literature, this result is often referred to as Universal Approximation Theorem [14, 15] and forms the fundamental result that neural networks with just a single hidden layer are general function approximators. Yet, this type of approximation may result in a very high requirement for the choice of N which is the reason why stacked layers of different type are known to be more successful [16]. We can extend Theorem 1 to vector-valued functions u(x) : 𝒟 → ℝ on 𝒟 by postulating Theorem 1 for each of their components k Hence, universal approximation generally also applies to N-dimensional functions.

Known Operator Error Bounds

Knowing the limits of general function approximation, we are now interested in finding limits for mixing known and approximated operators. As previously mentioned, deep networks are never constructed out of a single layer, but rather take the form of the configuration shown in Figure 1. Hence, we need to consider layered networks to analyze the maximal error bounds. Instead of investigating entire networks, we choose to simplify our theoretical analysis to the special case with g(x) : S → ℝ, u(x) : 𝒟 → 𝒮, and compact sets 𝒮 ⊂ ℝ and 𝒟 ⊂ ℝ. Note that this simplification does not limit the generality of our analysis, as we can map any knowledge on the structure of the network architecture either onto the output function g(x), the intermediate function u(x), or directly as a transform of the inputs x. Generalisation to N-dimensional functions is also possible following the idea shown in Eq. 3. Previous definition of f(x) allows us to investigate different forms of approximation. In particular, we are able to introduce approximations following Theorem 1: Here |e| ≤ ϵ, |e| ≤ ϵ, and |e | ≤ ϵ denote the errors that are introduced by respective approximation of u, g, and f. Next, we are interested in finding bounds on |e | using above approximations. For the case of known u(x), we can substitute x* ≔ u(x), as u(x) is a fixed function. In this case Theorem 1 directly applies and a bound on |e | is found as |e | ≤ ϵ with |e| ≤ ϵ. If we would know g(x) in addition, e would be 0 and the bound would shrink to the case of equality. The case described in Eq. 5 is slightly more complicated, but we are also able to find general bounds as shown in Theorem 2. Theorem 2 (Known Output Operator Theorem). Let φ(x) : ℝ → ℝ be a non-constant, bounded, and continuous function and f (x) = g(u(x)) : 𝒟 → ℝ be a continuous function on 𝒟 ⊂ ℝ. Further let g(x) : 𝒮 → ℝ be Lipschitz-continuous function with Lipschitz constant l = sup{‖∇g(x)‖} with p ∈ {1, 2} on 𝒮 ⊂ ℝ and be a general function approximator of u(x) with integer w ∈ ℝ , and u ∈ ℝ. Then, — as g is known — is generally bounded for all x ∈ 𝒟 by with e = e e = [e , … e ]⊤. The bound for |e | is found using a Lipschitz constant l on g(x) which implies that the theorem will only hold, if Lipschitz-bounded functions are used for g(x). Analysis of Eq. 8 reveals that knowing u(x) in this case, would imply e = 0 which also yields equality on both sides. We further explore this idea in Theorem 3. It describes a bound for the case that both g(x) and u(x) are approximated. Theorem 3 (Unknown Operator Theorem). Let φ(x) : ℝ → ℝ be a non-constant, bounded, and continuous function with Lipschitz-bound l and f(x) = g(u(x)) : 𝒟 → ℝ be a continuous function on 𝒟 ⊂ ℝ Further let be general function approximators of g(x) : 𝒮 → ℝ and u(x) : 𝒟 → 𝒮 with integers weights w ∈ ℝ, w ∈ ℝ, g, u ∈ ℝ, and compact sets 𝒮 ⊂ ℝ and 𝒟 ⊂ ℝ. Then, is generally bounded for all x ∈ 𝒟 by where and e = [e , … e ]⊤ is the vector of errors introduced by the components of The bound is comprised of two terms in an additive relation: where the first term vanishes, if u(x) is known as |e | = 0 ∀j and the second term vanishes for known g(x) as e = 0. Hence for all of the considered cases, knowing g(x) or u(x) is beneficial and allows to shrink the maximal training error bounds. Given the previous observations, we can now also explore deeper networks that try to mimic the structure of the original function. This gives rise to Theorem 4. Theorem 4 (Unknown Operators in Deep Networks). Let u (x ) : 𝒟 → 𝒟 be a continuous function with Lipschitz-bound 𝒟 ⊂ ℝ with integer ℓ > 0. Further let f (x ) : 𝒟 → 𝒟 be a function composed of ℓ layers / function blocks defined as recursion f (x ) = f (u (x )) with f (x) = x on compact set 𝒟 ⊂ ℝ bound by Lipschitz constant Recursive function is then an approximation of f (x ). Then, is generally bounded for all x ∈ 𝒟 and for all ℓ > 0 in each component k by where e = [e , … e ]⊤ is the vector of errors introduced by If we investigate Theorem 4 closely, we identify similar properties to Theorem 3. The errors of each layer / function block u (x ) are additive. If a layer is known, the respective error vector e ≔ 0 vanishes and the respective part of the bound cancels out. Furthermore, later layers have a multiplier effect on the error as their Lipschitz constants amplify ‖e ‖. Note that the relation is shown in the supplementary material. A large advantage of Theorem 4 over Theorem 3 is that the Lipschitz constants l that appear in the error term are the ones of the true function f (x ). Therefore, the amplification effects are only dependent of the structure of the true function and independent of the actual choice of the universal function approximator. The approximator only influences the actual error e . Above observations pave the way to incorporating prior operators into different architectures. In the following, we will highlight several applications in which we explore blending deep learning with prior operators.

Application Examples

We believe that known operators have a wide range of applications in physics and signal processing. Here, we highlight three approaches to use such operators. All three applications are from the domain of medical imaging, yet the method is applicable to many more disciplines to be discovered in the future. The results presented here are based on conference contributions [10, 11, 13]. Note that the supplementary material contains descriptions of experiments, data, and additional figures that were omitted here for brevity.

Deep Learning Computed Tomography

In computed tomography, we are interested in computing a reconstruction y from a set of projection images x. Both are related by the X-ray transfrom A: Solving for y requires inversion of above formula. The Moore-Penrose inverse of A yields the following solution: This type of inversion gives rise to the class of filtered back-projection methods, as it can be shown that (AA ⊤)−1 takes the form of a circulant matrix K, i. e. K= (AA ⊤)−1 = F CF, where F denotes the Fourier transform, F its inverse, and C a diagonal matrix that corresponds to the Fourier transform of K. As K typically is associated with a large receptive field, it is typically implemented in Fourier space. In order to be applicable for other geometries, such as fan-beam reconstruction additional Parker and cosine weights have to be incorporated that can elegantly be summarised in an additional diagonal matrix W to yield where ReLu(·) suppresses negative values as the final reconstruction algorithm. Following the paradigm of known operator learning, Eq. 14 can also interpreted as a neural network structure as it only contains diagonal, circulant, and fully connected matrix operators displayed in Figure 2. A practical limitation of A is that it typically is a very large and sparse matrix. In practice, it is therefore never instantiated, but only evaluated on the fly using fast ray-tracing methods. For 3-D problems, the full matrix size is way beyond the memory restrictions of today’s compute systems. Furthermore, none of the parameters need to be trained as all of them are known for complete data acquisitions.

Figure 2

Deep Learning Computed Tomography

Reconstruction network for y= ReLu(A ⊤ KWx) from projections x to image y. As W is a diagonal matrix, it is merely a point-wise multiplication followed by convolution K and back-projection A ⊤.

Incomplete data cannot be reconstructed with this type of algorithm and would lead to strong artifacts. We can still tackle limited data problems if we apply additional training of our network. As A ⊤ is large, we treat it as fixed during the training and only allow modification of W and K. Results and experimental details are demonstrated in the supplementary material. Training of both matrices clearly improves the image reconstruction result. In particular, the trained algorithm learns to compensate for the loss of mass in areas of the reconstruction in which rays are missing. As the trained algorithm is mathematically equivalent to the original filtered back-projection method, we are able to map the trained weights back onto their original interpretation which allows comparison to state-of-the-art weights. In Figure 3, we can see that the trained weights show similarity with the approach published by Schäfer et al. [18]. In contrast to Schäfer et al. who arrived at their weights following intuition, our approach is optimal with respect to our training data. In our present model, we have to re-train the algorithm for every new geometry. This could be avoided by modelling the weights using a continuous function which is sampled by the reconstruction network.

Figure 3

Improved interpretability in deep networks

The trained reconstruction algorithm can be mapped back into its original interpretation. Hence, we can compare them to reconstruction weights after (a) Parker [17] and (b) Schäfer [18]. (c) expresses significant similarity to (b) which is also able to compensate for the loss of mass. While (b) was only arrived at heuristically (c) can be shown to be data optimal here.

Learning from Heuristic Algorithms

Incorporating known operators generally allows blending of deep learning methods with traditional image processing approaches. In particular, we are able to choose heuristic methods that are well understood and augment them with neural network layers. One example for such a heuristic method is Frangi’s vesselness [19]. The vesselness values for dark tubes are calculated using the following formula: where |λ1| < |λ2| are the eigenvalues, is the second order structureness, is the blobness measure, β, c are image-dependent parameters for blobness and structureness terms, and V 0 stands for the vesselness value. The entire multi-scale framework of Frangi filter can be mapped onto a neural network architecture [11]. In Frangi-Net, each step of the Frangi filter is replaced with a network counterpart and data normalization layers are added to enable end-to-end training. Multi-scale analysis is formed as a series of trainable filters, followed by eigenvalue computation in specialized fixed function network blocks. This is followed by another fixed function – the actual vesselness measure as described in Eq. 15. Figure 4

Figure 4

Architecture of Frangi-Net over 8 scales σ

For each single-scale a Frangi-Net computes spatial derivatives and These are used to form a Hessian matrix of which eigenvalues λ1 and λ2 are extracted. Both are used to compute structureness S and blobness R which are required to compute the final vesselness at each pixel V.

We compare the segmentation result of the proposed Frangi-Net with the original Frangi filter, and show that the Frangi-Net outperforms Frangi filter regarding all evaluation metrics. In comparison to the state-of-the-art image segmentation model U-Net, Frangi-Net contains less than 6% the number of trainable parameters, while achieving an AUC score around 0.960, which is only 1% inferior to that of the U-Net. Adding a trainable guided-filter before Frangi-Net as preprocessing step yields an AUC 0.972 with only 8.5% of the trainable parameters of U-Net which is statistically not distinguishable from U-Net’s AUC of 0.974. Hence using our approach of known operators, we are able to augment heuristic methods by blending them with methods of deep learning saving many trainable parameters.

Deriving Networks

A third application of known operator learning that we would like to highlight in this paper, is the derivation of new network architectures from mathematical relations of the signal processing problem at hand. In the following, we are interested in hybrid imaging of magnetic resonance imaging (MRI) and X-ray imaging simultaneously. One major problem is that MRI k-space acquisitions typically allow parallel projection geometries, i. e. a line through the center k-space, while X-rays are associated with a divergent geometry such as fan- or cone-beam geometries. Both modalities allow different contrast mechanisms and simultaneous acquisition and overlay in the same image would be highly desirable for improved interventional guidance. In the following, we assume to have sampled MRI projections x in k-space. By inverse Fourier Transform F , they can be transformed into parallel projections p PB = F x. Both parallel and cone-beam projections p CB are related to the volume under consideration v by associated projection operations A PB and A CB: As v appears in both relations, we can solve Eq. 16 for v using the Moore-Penrose Pseudo Inverse: Next, we can use v in Eq. 17 to yield Note that all operations on the path from k-space to p CB are known. Yet, is expensive to determine and may need significant amounts of memory. As we know from reconstruction theory, this matrix often takes the form of a circulant matrix, i. e. a convolution. As such, we can approximate it with the chain of operations F CF where C is a diagonal matrix. In order to add a few more degrees of freedom, we further add another diagonal operator in spatial domain W to yield as parallel to cone rebinning formula. In this formulation, only C and W are unknown and need to be trained. By design both matrices are diagonal and therewith only have few unknown parameters. Even though the training was conducted merely on numerical phantoms we can apply the learned algorithm on data acquired with a real MRI system without any loss of generality. Using only 15 parallel-beam MR projections we were able to compute a stacked fan-beam projection with both approaches. In Figure 5 the results of the analytical and learned algorithms are shown. The result of the learned algorithm has much sharper visual impression compared to the analytical approach which intrinsically suffers from ray-by-ray interpolation and thus from a blurring effect. Note that additional smoothing could be incorporated into the network by regularization of the filter or additional hard-coded filter steps at request.

Figure 5

Classical analytical rebinning vs. derived neural networks

The trained rebinning algorithm can directly applied to real MR projection data. Parallel-beam MR projection data is rebinned to a stacked fan-beam geometry with the analytical (a) and the learned algorithm (b). Note that the result of the learned method is much sharper as it avoids ray-by-ray interpolation.

Discussion

For many applications, we do not know which operation is required in the ideal processing pipeline. Most machine learning tasks focus either on perceptual problems or man-made classes. Therefore, we only have limited knowledge on the ideal processing chain. In many cases, the human brain seems to have identified suitable solutions. Yet, our knowledge of the human brain is incomplete and search for well-suited deep architectures is a process of trial and error. Still, deep learning has shown to be able to solve tasks that were deemed as hard or close to impossible [20]. Now that deep learning also starts addressing fields of physics and classical signal processing, we are entering areas in which we have much better understanding of the underlying processes and therefore know that kind of mathematical operations need to be present in order to solve specific problems. Yet, during the derivation of our mathematical models, we often introduce simplifications that allow more compact descriptions and a more elegant solution. Still these simplifications introduce slight errors along the way and are often compensated using heuristic correction methods [21]. In this paper, we have shown that inclusion of known operators is beneficial in terms of maximal error bounds. We demonstrated that in all cases in which we are able to use partial knowledge on the function at hand, the maximal errors that may remain after training of the network are reduced even for networks of arbitrary depth. Note that in the future tighter error bounds than the ones described in this work might be identified that are independent of the use of known operators. Yet, our error analysis is still useful, as for the case of increasing number of known operations in the network, the magnitude of the bound shrinks up to the point of identity, if all operations are known. To the knowledge of the authors, this is the first paper to attempt such a theoretical analysis of the use of known operators in neural network training. In our experiments with CT reconstruction, we could demonstrate that we are able to tackle limited angle reconstructions using a standard filtered back-projection-type of algorithm. In fact, we only adopted weights while run-time, behaviour, and computational complexity remained unchanged. As we can map the trained algorithm back onto its original interpretation, we could also investigate shape and function of the learned weights. They demonstrated similarity to a heuristic method that could previously only be explained by intuition rather than by showing optimality. For the case, of our trained weights, we can demonstrate that they are optimal with respect to the training data. Based on Frangi’s vesselness, we could develop a trainable network for vessel detection. In our experiments, we could demonstrate that training of this net already yields improved filters for vessel detection that are close in terms of performance with a much more complex U-Net. Further inclusion of a trainable denoising step yielded an accuracy that is statistically not distinguishable from U-Net. As last application of our approach, we investigated rebinning of MR data to a divergent beam geometry. For this kind of rebinning procedure, a fast convolution-based algorithm was previously unknown. Prior approaches relied on ray-by-ray interpolation that is typically introducing blurring. With our hypothesis that the inverse matrix operator takes the form of a circulant matrix in spatial domain in combination with an additional multiplicative weight, we could train a new algorithm successfully. The new approach is not just computationally efficient, it also features images of a degree of sharpness that was previously not reported in literature. Although only applications from the medical domain are shown in this paper, this does not limit the generality of our theoretical analysis. Similar problems are found in many fields, e. g. computer vision [22], image super resolution [23], or audio signal processing [24]. Obviously, known operators have been embedded into neural networks already for a long time. Already, LeCun et al. [2] suggested convolution and pooling operators. Janderberg et al. introduced differentiable spatial transformations and their resampling into deep learning [25]. Lin et al. use this for image generation [26]. Kulkarni et al. developed an entire deep convolutional graphics pipeline [27]. Zhu et al. include differentiable projectors to disentangle 3D shape information from appearance [28]. Tewari et al. integrate a differentiable model-based image generator to synthesize facial images [29]. Adler et al. shows an approach to partially learn the solution for ill-posed inverse problems[30]. Ye et al. [31] introduced the Wavelet transform as multi-scale operator, Hammernik et al. [32] mapped entire energy minimization problems onto networks, and Wu et al. even included the guided filter as layer into a network [33]. As this list could be continued with many more references, we see this as an emerging trend in deep learning. In fact, any operation that allows the computation of a sub-gradient [34] is suited to be used in combination with the back-propagation algorithm. In order to integrate a new operator, only the partial derivatives / sub-gradients with respect to its inputs and its parameters have to be implemented. This allows inclusion of a wide range of operations. To the best of our knowledge, this is the first paper giving a general argument for the effectiveness of such approaches. Next, the introduction of a known operator is also associated with a reduction of trainable parameters. We demonstrate this in this paper in all of our experiments. This allows us to work with much fewer training data and helps us to create models that can be transferred from synthetic training data to real measured data. Zarei et al. [35] drive this approach so far that they are able to train user-dependent image denoising filters using only few clicks from alternate forced-choice experiments. Thus, we believe that known operators may be a suitable approach to problems for which only limited data is available. At present we are unaware how to predict the benefit of using known operators before the actual experiment. Our analysis only focuses on maximum error bounds. Therefore, investigation of expected errors following for example the approach of Barron seems interesting for future work [36]. Also analysis of the bias variance trade-off seems interesting. In [37, Chapter 9] Duda and Hart already hinted at the positive effect of prior knowledge on this trade-off. Lastly, we believe that known operators may be key in order to gain better understanding of deep networks. Similar to our experiments with Frangi-Net, we can start replacing layers with known operations and observe the effect on the performance of the network. From our theoretical analysis, we expect that inclusion of a known operation will not or only insignificantly reduce the system’s performance. This may allow us to find configurations for networks that only have few unknown operations while showing large parts that are explainable and understood. Figure 6 shows a variant of this process that is inspired by [38]. Here, we offer a set of known operations in parallel and determine their optimal superposition by training of the network. In a second step, connections with low weights can be removed to iteratively determine the optimal sequence of operations. Furthermore, any known operator sequence can also be regarded as a hypothesis for a suitable algorithm for the problem at hand. By training, we are able to validate of falsify our hypothesis similar to our example of the derivation of a new network architecture.

Figure 6

Towards operator discovery and sequence analysis

We hypothesise that Known Operator Learning may also be used to disentangle information efficiently. Offering several operators in parallel allows the network to find the best sequence of operations during the training process. In a subsequent step, blocks can be removed step-by-step to determine the minimal block networks.

Conclusion

We believe that the use of known operators in deep networks is a promising method. In this paper, we demonstrate that the use of such reduces maximal error bounds and experimentally show an reduction in the number of trainable parameters. Furthermore, we applied this to the case of learning CT reconstruction yielding networks that are interpretable and that can be analysed with classical signal processing tools. Also mixing of deep and known operator learning is beneficial, as it allows us to build smaller networks with only 6% of the parameters of a competing U-Net while being close with respect to their performance. Lastly, the known operators can also be found using mathematical derivation of networks. While keeping large parts of the mathematical operations, we only replace inefficient or unknown operations with deep learning techniques to find entirely new imaging formulas. While all of the applications shown in this paper stem only from the medical domain, we believe that this approach is applicable to all fields of physics and signal processing which is the focus of our future work.

9 in total

1. Ridge-based vessel segmentation in color images of the retina.

Authors: Joes Staal; Michael D Abràmoff; Meindert Niemeijer; Max A Viergever; Bram van Ginneken
Journal: IEEE Trans Med Imaging Date: 2004-04 Impact factor: 10.048

2. Mastering the game of Go with deep neural networks and tree search.

Authors: David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2016-01-28 Impact factor: 49.962

Review 3. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

4. Image Reconstruction is a New Frontier of Machine Learning.

Authors: Ge Wang; Jong Chu Ye; Klaus Mueller; Jeffrey A Fessler
Journal: IEEE Trans Med Imaging Date: 2018-06 Impact factor: 10.048

Review 5. A gentle introduction to deep learning in medical image processing.

Authors: Andreas Maier; Christopher Syben; Tobias Lasser; Christian Riess
Journal: Z Med Phys Date: 2019-01-25 Impact factor: 4.820

6. Image reconstruction by domain-transform manifold learning.

Authors: Bo Zhu; Jeremiah Z Liu; Stephen F Cauley; Bruce R Rosen; Matthew S Rosen
Journal: Nature Date: 2018-03-21 Impact factor: 49.962

7. Optimal short scan convolution reconstruction for fanbeam CT.

Authors: D L Parker
Journal: Med Phys Date: 1982 Mar-Apr Impact factor: 4.071

8. Learning a variational network for reconstruction of accelerated MRI data.

Authors: Kerstin Hammernik; Teresa Klatzer; Erich Kobler; Michael P Recht; Daniel K Sodickson; Thomas Pock; Florian Knoll
Journal: Magn Reson Med Date: 2017-11-08 Impact factor: 4.668

9. Technical Note: PYRO-NN: Python reconstruction operators in neural networks.

Authors: Christopher Syben; Markus Michen; Bernhard Stimpel; Stephan Seitz; Stefan Ploner; Andreas K Maier
Journal: Med Phys Date: 2019-08-27 Impact factor: 4.071

9 in total

15 in total

Review 1. Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors: Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal: Chem Rev Date: 2020-06-10 Impact factor: 60.622

Review 2. Artificial Intelligence: reshaping the practice of radiological sciences in the 21st century.

Authors: Issam El Naqa; Masoom A Haider; Maryellen L Giger; Randall K Ten Haken
Journal: Br J Radiol Date: 2020-02-01 Impact factor: 3.039

3. Trainable joint bilateral filters for enhanced prediction stability in low-dose CT.

Authors: Fabian Wagner; Mareike Thies; Felix Denzinger; Mingxuan Gu; Mayank Patwari; Stefan Ploner; Noah Maul; Laura Pfaff; Yixing Huang; Andreas Maier
Journal: Sci Rep Date: 2022-10-20 Impact factor: 4.996

4. Enhancing digital tomosynthesis (DTS) for lung radiotherapy guidance using patient-specific deep learning model.

Authors: Zhuoran Jiang; Fang-Fang Yin; Yun Ge; Lei Ren
Journal: Phys Med Biol Date: 2021-01-26 Impact factor: 3.609

Review 5. Harnessing non-destructive 3D pathology.

Authors: Jonathan T C Liu; Adam K Glaser; Kaustav Bera; Lawrence D True; Nicholas P Reder; Kevin W Eliceiri; Anant Madabhushi
Journal: Nat Biomed Eng Date: 2021-02-15 Impact factor: 25.671

6. Multi-Scale Learned Iterative Reconstruction.

Authors: Andreas Hauptmann; Jonas Adler; Simon Arridge; Ozan Öktem
Journal: IEEE Trans Comput Imaging Date: 2020-04-27

7. Automatic dementia screening and scoring by applying deep learning on clock-drawing tests.

Authors: Shuqing Chen; Daniel Stromer; Harb Alnasser Alabdalrahim; Stefan Schwab; Markus Weih; Andreas Maier
Journal: Sci Rep Date: 2020-11-30 Impact factor: 4.379

8. Will We Ever Have Conscious Machines?

Authors: Patrick Krauss; Andreas Maier
Journal: Front Comput Neurosci Date: 2020-12-22 Impact factor: 2.380

9. Limited angle tomography for transmission X-ray microscopy using deep learning.

Authors: Yixing Huang; Shengxiang Wang; Yong Guan; Andreas Maier
Journal: J Synchrotron Radiat Date: 2020-02-13 Impact factor: 2.616

10. Rapid reconstruction of highly undersampled, non-Cartesian real-time cine k-space data using a perceptual complex neural network (PCNN).

Authors: Daming Shen; Sushobhan Ghosh; Hassan Haji-Valizadeh; Ashitha Pathrose; Florian Schiffers; Daniel C Lee; Benjamin H Freed; Michael Markl; Oliver S Cossairt; Aggelos K Katsaggelos; Daniel Kim
Journal: NMR Biomed Date: 2020-09-01 Impact factor: 4.044