Literature DB >> 32971267

Approximating anatomically-guided PET reconstruction in image space using a convolutional neural network.

Georg Schramm¹, David Rigie², Thomas Vahle³, Ahmadreza Rezaei⁴, Koen Van Laere⁴, Timothy Shepherd⁵, Johan Nuyts⁴, Fernando Boada².

Abstract

In the last two decades, it has been shown that anatomically-guided PET reconstruction can lead to improved bias-noise characteristics in brain PET imaging. However, despite promising results in simulations and first studies, anatomically-guided PET reconstructions are not yet available for use in routine clinical because of several reasons. In light of this, we investigate whether the improvements of anatomically-guided PET reconstruction methods can be achieved entirely in the image domain with a convolutional neural network (CNN). An entirely image-based CNN post-reconstruction approach has the advantage that no access to PET raw data is needed and, moreover, that the prediction times of trained CNNs are extremely fast on state of the art GPUs which will substantially facilitate the evaluation, fine-tuning and application of anatomically-guided PET reconstruction in real-world clinical settings. In this work, we demonstrate that anatomically-guided PET reconstruction using the asymmetric Bowsher prior can be well-approximated by a purely shift-invariant convolutional neural network in image space allowing the generation of anatomically-guided PET images in almost real-time. We show that by applying dedicated data augmentation techniques in the training phase, in which 16 [18F]FDG and 10 [18F]PE2I data sets were used, lead to a CNN that is robust against the used PET tracer, the noise level of the input PET images and the input MRI contrast. A detailed analysis of our CNN in 36 [18F]FDG, 18 [18F]PE2I, and 7 [18F]FET test data sets demonstrates that the image quality of our trained CNN is very close to the one of the target reconstructions in terms of regional mean recovery and regional structural similarity.

Entities: Chemical Disease Gene Species

Keywords: Image reconstruction; Machine learning; Magnetic resonance imaging; Molecular imaging; Quantification

Year: 2020 PMID： 32971267 PMCID： PMC7812485 DOI： 10.1016/j.neuroimage.2020.117399

Source DB: PubMed Journal: Neuroimage ISSN： 1053-8119 Impact factor: 6.556

Introduction

Positron emission tomography (PET) is an important quantitative clinical molecular imaging technique with high sensitivity and high specificity that is increasingly used in oncology, neurology, infection/inflammation, and cardiology. However, compared with structural imaging modalities such as x-ray computer tomography (CT) or magnetic resonance imaging (MRI), it suffers from a number of limitations. First, due to physical and technical constraints, acquired PET data are contaminated by a high level of Poisson noise which propagates into the reconstructed images. Second, reconstructed PET images from clinical whole-body scanners have a relatively low spatial resolution of ca. 4–5 mm. Such relative low resolution and the resulting partial volume effects (Erlandsson et al., 2012; Fazio and Perani, 2000; Hutton et al., 2013; Mueller-Gaertner et al., 1992; Yang et al., 1996) hinder accurate quantification of tracer uptake within small structures, such as small lymph nodi and grey matter nuclei or atrophic cortex, leading to a concomitant decrease in sensitivity and specificity. Moreover, it reduces the detection of small structures with low contrast. To overcome these challenges, especially in PET brain imaging, many techniques for anatomically-guided PET reconstruction using high-resolution structural MR images have been developed in the past (see e.g. Baete et al., 2004; Bland et al., 2017; Bowsher et al., 2004; Comtat et al., 2002; Ehrhardt et al., 2016; Knoll et al., 2016; Lipinski et al., 1997; Loeb et al., 2015; Mehranian et al., 2018; Nuyts, 2007; Schramm et al., 2017; Vunckx et al., 2012; Vunckx et al., 2014; Vunckx and Nuyts, 2010). These techniques have received increased interest recently due to the introduction of whole-body, clinical PET/MRI scanners during the last decade, which effectively removed all logistical and practical constraints for procuring the required high-resolution MR images. Unfortunately up to now, despite promising results in simulations and first studies (Shepherd et al., 2019), anatomically-guided PET reconstructions are not available for routine clinical use. The reasons for that are manifold. First, until now, no vendor has implemented any of the anatomically-guided PET reconstructions into their products which prohibits the use and evaluation by standard clinical users on large patient populations. Furthermore, these methods are not applied even in the few academic centers that are able to perform anatomically-guided PET reconstructions. This is because anatomically-guided PET reconstructions are usually rather time-consuming (ca. 5 min - 1.5 h depending on the used hardware and the data size) which complicates the inevitable (task-based) tuning of the prior weight (β) that controls the degree of anatomically-guided regularization. Moreover, so far, validation of anatomically-guided PET reconstructions in big cohorts for specific clinical tasks is missing which limits the trust of clinicians in those reconstructions. The latter is especially true in the case where structures such as gradients are not shared between the PET and MR images. In light of this, we sought to investigate whether the improvements of anatomically-guided PET reconstruction methods can be achieved entirely in the image domain with a convolutional neural network (CNN). The purpose of this work is to train a CNN to transform standard clinical PET and high-resolution MR images into an anatomically-guided PET reconstruction. Such image-based CNNs would have the advantage that they could be easily deployed due to the availability of excellent free and open-source toolboxes (Abadi et al., 2015; Chollet, 2015). They could also be applied retrospectively to all PET/MRI studies available in clinical picture archiving systems, even if the PET raw data is no longer available. Finally, since the predictions made by CNNs are very fast (a prediction of our network on a recent GPU takes ca. 1–2 s), the use of an image-based CNN approach dramatically facilitates the application, fine-tuning (by using multiple CNNs trained with different target βs), and evaluation of anatomically-guided PET reconstructions for a wider range of clinical applications.

Related work

Deep learning techniques using convolutional neural networks have shown great potential in different classical computer vision tasks such as denoising and image upsampling (super-resolution problem) as e.g. demonstrated in Dong et al. (2016); Liu et al. (2018); Zhang et al. (2018). The use of convolutional neural networks in the field of PET image post-processing and PET image reconstruction has also strongly increased in the last years. Most published works focus on convolutional neural networks designed and trained to predict high count from low count input PET images to reduce the injected activity or the acquisition time. Cui et al. (2017) proposed a stacked sparse auto-encoder framework to denoise dynamic PET images. Xiang et al. (2017) designed a 2D convolutional neural network that predicts a high count PET image from an input low count PET image and an aligned T1-weighted MR image. Wang et al. (2018) proposed a 3D conditional generative adversarial networks to predict high count PET images from low-count ones. Chen et al. (2019), designed a U-net like convolutional layer to predict full-dose PET images from ultra low dose PET images (1% of the full dose) and an aligned T1 MR image. In a more advanced approach, Kim et al. (2018) proposed an iterative reconstruction approach using a deep learned prior to obtain low noise PET reconstructions under variable noise conditions. Yang et al. (2018) designed a multilayer perceptron neural network to predict enhanced PET image patches below the classically achievable bias-variance trade-off line from a series of reconstructed patches with different degrees of regularization using the Green smoothness (log cosh) prior. Gong et al. (2019b) showed that their proposed approach of embedding a pre-trained deep residual convolutional neural network exploiting inter-patient information into iterative PET reconstruction is superior to the post-processing approach using a convolutional neural network for denoising. Moreover, the same authors also showed in Gong et al. (2019a) that a technique similar to the deep image prior method (Ulyanov et al., 2018) can be used to improve iterative anatomically-guided PET reconstruction without the need for training pairs. Hong et al. (2018) proposed a deep residual convolutional neural network to perform super-resolution on PET sinograms before image reconstruction to improve the reconstructed spatial resolution. Häggström et al. (2019) introduced a deep convolutional encoder-decoder network called DeepPET that was trained to directly predict 2D PET reconstructions from pre-corrected 2D non-time-of-flight sinogram data.

Aim of this work

In contrast to all the works of the previous subsection, the aim of this work is different. Here, we aimed to design and train a purely image-based convolutional neural network that learns to predict anatomically-guided PET reconstructions in almost real-time using standard clinical OSEM PET images and high-resolution structural MR images as input. Recently (Rigie et al., 2018), we demonstrated in an initial proof of concept study that this is possible for data from a single scanner and a single PET tracer and similar noise levels. In this work, we demonstrate that this approach can be extended to the use of multiple tracers, multiple PET scanners and noise levels and that the resulting CNN is very robust and extensible to other tracers without the need of retraining.

Materials and methods

Anatomically-guided PET reconstruction

Since the aim of this work is to approximate anatomically-guided PET reconstruction with a convolutional neural network in image space, we first give a brief overview of the former. This description is based on our earlier work in Schramm et al. (2017). In general, the penalized-likelihood PET reconstruction problem using an MRI-based penalty function can be written as where y are measured coincidences for line of response i [1], LPET is the negative of the Poisson log-likelihood, is the non-negative discretized PET image to be reconstructed, R is a penalty function (prior) and β is a non-negative scalar controlling the weight of the regularization. The forward model for computing the expected coincidences is given by In (2), the operator with matrix elements P is the forward projection including the effects of sensitivity, attenuation, finite spatial resolution, and time of flight (if available). The noise-free estimate of additive contaminations such as random and scattered events is denoted as . In the case of anatomically-guided PET reconstruction, R is a function that incorporates information from an aligned anatomical prior MR image . To investigate whether learning anatomically-guided PET reconstruction is possible, we focus in this proof of concept work on one specific prior function, namely the asymmetric Bowsher prior (Bowsher et al., 2004; Schramm et al., 2017; Vunckx and Nuyts, 2010). This prior, proposed in Vunckx and Nuyts (2010) as a refinement of Bowsher et al. (2004), is a smoothing Markov prior operating on a position-dependent set of voxels. In the discrete setting, it can be written as where (u, u) is a function penalizing differences between the reconstructed PET uptake in voxels k and j. The neighbor weights w are computed from the aligned MR image. w is 1 if voxel k is in a set B, which consists of the n most similar voxels around j, and 0 otherwise. In all reconstructions shown in this work, this set B includes the 4 out of the 18 nearest voxels around j that, according to the MR image intensity v, are most similar to voxel j. The absolute MRI intensity difference was used as the similarity measure. For the relative difference as defined in Nuyts et al. (2002) with γ = 0 was used. Solving the penalized PET reconstruction problem (1) is commonly done using iterative algorithms. In this work, the heuristic algorithm proposed in Nuyts et al. (2002) and applied to PET reconstruction with the asymmetric Bowsher prior using the relative difference prior in Vunckx and Nuyts (2010) was used for all reconstructions. The computation time of those iterative algorithms can be high since usually several iterations (updates) containing many time-consuming forward and back projection operations are needed to reach convergence. Depending on the computational efficiency of the forward and back projectors, the computational hardware, the optimization algorithm, the voxel size, and the size of the (TOF) projection data, solving (1) can take between several minutes and several hours.

Data sets

The number of data sets that were used in the work to train, validate and test our convolutional neural network are listed in detail in Table 1. For training and validation (monitoring of potential over-fitting and adaptation of the learning rate), 26 and 6 subject acquisitions with two different tracers ([18 F]FDG - tracer for glucose metabolism, [18 F]PE2I - tracer for dopamine transporter) from two different PET/MRI scanners - the Biograph mMR (Siemens Healthcare, Erlangen, Germany) (Delso et al., 2011) and the GE SIGNA PET/MR (GE Healthcare, Milwaukee, US) (Grant et al., 2016) - were used, respectively. For the mMR [18 F]FDG acquisitions, 1 min, and 3 min were created in addition to the 20 min data sets to generate [18 F]FDG OSEM input images with different noise levels. Note that in the training phase of the CNN, the Bowsher reconstructions derived from the full 20 min data were always used as target which means that our network is also implicitly trained to perform a “low counts to high counts” mapping. Evaluation (testing) of the trained network was performed on 54 cases acquired with the two tracers used for training and on 7 additional cases acquired with the [18 F]FET (tracer for amino acid metabolism for brain tumors) that were not used during training.

Table 1

Summary of subjects and acquisitions used for training, validation, and testing of our network.

Cohort	Tracer	Scanner	n	Acquisition time	Frame center

Training	[¹⁸ F]FDG	mMR	13	1 min, 3 min, 20 min	50 min p.i.
		SIGNA	3	20 min	50 min p.i.
	[¹⁸ F]PE2I	SIGNA	10	20 min	50 min p.i.
Validation	[¹⁸ F]FDG	mMR	2	1 min, 3 min, 20 min	50 min p.i.
		SIGNA	1	20 min	50 min p.i.
	[¹⁸ F]PE2I	SIGNA	3	20 min	50 min p.i.
Testing	[¹⁸ F]FDG	mMR	36	20 min	50 min p.i.
	[¹⁸ F]PE2I	SIGNA	18	20 min	50 min p.i.
	[¹⁸ F]FET	SIGNA	7	25 min	72.5 min p.i.

All data were retrospectively collected with approval from the NYU institutional review board and the Ethical Commitee of UZ Leuven, respectively.

Data acquisition and image reconstruction

Before training, validation, and testing of our convolutional neural network, conventional ordered subset maximum expectation maximization (OSEM) PET images and anatomically-guided PET images using the asymmetric Bowsher prior with a prior weight of β = 10 were reconstructed using in-house developed prototype reconstruction software. The prior weight was chosen in a previous study based on FDG brain images by an experienced radiologist. Note that β = 10 is not necessarily the optimal regularization weight for non-FDG tracers depending on the clinical task. However, careful tuning of the regularization weight β is beyond the scope of this work, and, moreover, network training can be repeated with target reconstructions that used a different level of regularization. All PET reconstruction parameters are listed in Table 2. The forward and back projection were implemented using the distance driven approach (De Man and Basu, 2004). For the OSEM images, no post smoothing was applied.

Table 2

Summary of parameters used in the OSEM and anatomically-guided Bowsher reconstructions for the mMR and SIGNA cases.

Scanner	Reconstruction	Iterations	Subsets	Voxelsize	TOF resolution	β	Resolution modeling

mMR	OSEM	3	21	1.04 × 1.04 × 2.03 mm³	-	-	4.5 mm Gaussian convolution in sinogram space (radial and transaxial)
	Bowsher	10	42	1.04 × 1.04 × 2.03 mm³	-	10	4.5 mm Gaussian convolution in sinogram space (radial and transaxial)
SIGNA	OSEM	3	28	1.39 × 1.39 × 1.39 mm³	450 ps	-	4.5 mm Gaussian convolution in image space
SIGNA	Bowsher	10	28	1.39 × 1.39 × 1.39 mm³	450 ps	10	4.5 mm Gaussian convolution in image space

For the anatomically-guided PET reconstructions, MPRAGE (sequence parameters: imaging mode 3D, flip angle 12°, TI 900 ms, TE 2.75 ms, TR 2400 ms, voxel size 1 × 1 × 1 mm3) and BRAVO (sequence parameters: imaging mode 3D, flip angle 12°, TI 450 ms, voxel size 0.7 × 1 × 1 mm3, TE 3.2 ms, TR 9.5 ms) MRI acquisitions were used as prior images for the mMR and SIGNA data sets, respectively. To mitigate the influence of small residual misalignments, e.g. due to patient motion, the anatomical prior MR image was always rigidly coregistered to the OSEM PET reconstruction by maximizing mutual information using in-house developed software.

Convolutional neural network design

Fig. 1 shows the network architecture chosen to learn anatomically-guided PET reconstruction in image space. The network takes 2 3D images (OSEM PET and structural MRI) of size (n1, n2, n3) as input and outputs an anatomically-guided PET reconstruction of the same size.

Fig. 1.

Architecture of our convolutional neural network to predict a 3D anatomically-guided PET reconstructions from an input 3D OSEM PET image and a 3D structural MR image. See text for details.

The network consists of 8 3D convolutional layers using 3×3×3 kernels and 30 features followed by a parametric rectifying unit (PreLU). The features of the first convolutional layer are split into 15 PET and 15 MRI only features which are concatenated before the second convolutional layer. A final 3D convolutional layer using a 1×1×1 kernel and a PreLU was added to guarantee that the size of the 3D output image is the same as the size of the input. After the last convolutional layer, the PET input is added to the output such that the network learns to predict the residual between the OSEM input PET and the anatomically-guided target PET image. Note that since the spatial size of the output of every layer is kept constant, which is achieved by using padding with zeros before applying the convolutions, 3D images of arbitrary size can be given as input to the network. This, in turn, allows that the network can be trained on small patches, but the predictions for the test cases can be done on the whole images provided that enough memory is available. Transforming an OSEM PET image with the help of a structural MR image into an anatomically-guided PET image is related to a constrained deblurring and denoising problem. The latter is a local problem and shift-invariant if we assume that the OSEM PET point spread function is shift-invariant, which holds for the case of brain PET imaging with a whole-body PET system. Hence, we decided to use a purely convolutional shift-invariant neural network without any fully connected, downsampling or upsampling layers. The fact that our problem is local, and the fact that the network is shift invariant makes it possible to train the network on relatively small 3D patches. This has the advantage that, first, the training becomes much faster, and second that we can create many small training patches from every single PET/MRI acquisition. This, in turn, allows to train the network on a relatively small number of PET/MRI acquisitions. In total, our network has 171421 free parameters that were optimized during training.

Data preprocessing, augmentation, and CNN training

To train our convolutional neural network, pairs of input OSEM PET and structural MR images and target anatomically-guided PET reconstructions using the asymmetric Bowsher prior (BOW) reconstructed with a prior weight of β = 10 were used. Before training and prediction, all OSEM PET and MR input images were trilinearly interpolated to a common voxel size of 1×1×1 mm3 and normalized by dividing each image by its 99.9% percentile.[2] Moreover, the field of view was cropped to the bounding box of the head in the MR image. For training, the target BOW images were interpolated and cropped in the same way and the normalization factor of the input OSEM images were applied. Data augmentation was performed by randomly flipping the contrast of the input MR image, randomly applying spatial flipping in all 3 directions of input and target images, and by using [18 F]FDG OSEM images with different acquisition times and thus different noise levels. All those data augmentation techniques were done to obtain a trained network that is robust against the input MRI contrast, the patient anatomy, and the noise level of the input OSEM PET image. Training was performed in batches of 64 small 3D patches of size (29,29,29). The data set and the location of every 3D patch was randomly sampled in all batches. The mean square error loss function was optimized using the Adam optimizer with an initial learning rate of 10−3 and 10,000 epochs with 20 steps per epoch resulting in 200,000 updates of the network weights. Note that since the patch extraction is completely randomized, the definition of an epoch is arbitrary. The learning rate was reduced by a factor of 2 as soon as the validation loss reached a plateau, but limited to be at least 10−4. The network weights were initialized with random numbers drawn from a normal distribution with standard deviation of 0.07. Training of the network was implemented in Keras v2.2.2 (Chollet, 2015) with tensorflow v1.9 (Abadi et al., 2015) backend and performed on four Nvidia P100 GPUs resulting in a training time of approximately 8.5 h. Supplementary Fig. S9 shows the training and validation loss as a function of the epoch.

Quantitative regional analysis

The quality of the anatomically-guided PET images predicted by the CNN (BOWCNN) with respect to the ground truth of the iteratively reconstructed anatomically-guided PET images (BOW) was evaluated based on two metrics. First, we computed regional recovery coefficients in different regions of interests (ROIs) defined by Freesurfer v6 (Fischl et al., 2004) segmentations of the structural MR images, where ROImean(x) denotes the mean signal intensity of image x in a given ROI. Second, we calculated the average regional structural similarity as: where SSIM(BOWCNN, BOW) denotes the structural similarity image obtained by calculating the structural similarity between BOWCNN and BOW. The voxel-wise structural similarity image was computed using the python package scikit-image version 0.14 using parameters to match the original SSIM definition proposed in Wang et al. (2004). The dynamic range L of the input floating-point images, needed to calculate the constants c1 = (0.01 L)2 and c2 = (0.03 L)2 in the SSIM definition, was set to twice the maximum of the reference BOW image. In total, 5490 ROIs in 61 test subjects were analyzed. To summarize the results, the ROIs were grouped into the following composite regions: frontal cortex, temporal cortex, occipital cortex, parietal cortex, hippocampus, cingulate cortex, thalamus, basal ganglia, cerebellum, white matter, and ventricle based on their anatomical location.

Additional network robustness tests

To test the robustness of our trained CNN with respect to the noise level of the input OSEM PET images and the contrast of the input structural MR image, three additional tests were performed on two of the [18 F]FDG data sets. First, predictions from 1 min, 3 min, and 20 min input OSEM PET images and the same T1-weighted structural MR image were compared. Second, predictions of a 20 min OSEM PET image and two types of input MRI contrasts (T1-weighted vs. FLAIR) were analyzed. Moreover, we performed predictions were we artificially degraded the image quality of the anatomical prior image by removing contrast (using a flat MR image), shifting the MR by 2.8 mm in the left-right direction, introducing a susceptibility artifact in the MR, and replacing the MR by a pseudo-CT generated from the MR (Burgos et al., 2014). Finally, we also performed a prediction for a 10 min PET/MR acquisition of the cervical spine with [18 F]DPA-714 to test the performance in an acquisition outside the brain.

Results

Fig. 2 and supplementary Table S1 show the results for RCmean and SSIM between BOWCNN and BOW evaluated in the test cases for three different PET tracers. For all three tracers and all regions except the ventricles in [18 F]PE2I, the regionally averaged RCmean ranges from 0.98 to 1.03 with a standard deviation below 0.03. For [18 F]PE2I, the ventricles show a small positive bias in the RCmean (1.08 ± 0.05). However, it should be noted that the [18 F]PE2I uptake in the ventricles is extremely low. Only 243 out of the 5490 analyzed ROIs (4.4%) showed more than 5% bias in the RCmean. Out of those, 76 ROIs were ventricles with low tracer uptake.

Fig. 2.

Boxplots of regional values for RCmean (top) and SSIMmean (bottom) between the BOWCNN and BOW in the [18 F]FDG (blue), [18 F]PE2I (orange), and [18 F]FET (green) test cases.

The regionally averaged SSIM between BOWCNN and BOW ranges from 0.958 in the thalamus to 0.974 in white matter for [18 F]FDG, from 0.963 in the basal ganglia to 0.993 in the cingulate cortex for [18 F]PE2I, and from 0.983 in the frontal cortex to 0.987 in white matter for [18 F]FET indicating that the image quality of BOWCNN is very close to BOW in all regions for all tracers. Note that no [18 F]FET cases were used during training. Fig. 3 shows one of the [18 F]FDG test cases acquired on the mMR. The visual appearance of BOWCNN compared to BOW is very similar with both showing better contrast between gray matter and white matter and fewer Gibbs artifacts in the cortical gray matter compared to the input OSEM image. Moreover, BOW and BOWCNN show more anatomical detail compared to OSEM (e.g. differences in tracer uptake in the external capsule/claustrum between the insula and the putamen indicated by the red arrow).

Fig. 3.

Example [18 F]FDG test case acquired on the mMR. (top row) structural T1-weighted MRI used as prior image in the iterative anatomically-guided PET reconstruction using the Bowsher prior. (2nd row) Standard OSEM PET reconstruction obtained from 20 min emission data. (3rd row) reference iterative anatomically-guided PET reconstruction using the Bowsher prior (BOW). (4th row) prediction of our trained convolutional neural network (BOWCNN ) using the OSEM PET image and the structural MRI as input. (5th row) absolute difference between BOWCNN and BOW. The red arrow indicates the location of the right claustrum between the insula and putamen where BOW and BOWCNN show more anatomical detail compared to OSEM. The blue arrow shows a region of fringing artifacts in BOW that less apparent in BOWCNN.

In the coronal slice, the target BOW shows small fringing artifacts (indicated by the blue arrow) that are due to suboptimal alignment of the structural MRI that was used during the anatomically-guided iterative PET reconstruction using the Bowsher prior. Those artifacts are strongly reduced in BOWCNN. Supplementary Figure S3 shows an [18 F]FDG acquisition with the SIGNA PET/MRI scanner. Again, the visual appearance and regional quantification of BOWCNN is very similar to BOW, although the noise level and the noise correlations of the input OSEM image are slightly different compared to the mMR example due to lower injected activity and the availability of time-of-flight information during reconstruction. Note that 3 [18 F]FDG data sets from the SIGNA also were used during training. Figs. 4 and 5 show two examples of network predictions for non-FDG data sets ([18 F]PE2I and [18 F]FET) acquired on the SIGNA. Again for both tracers, the similarity in the visual appearance between BOWCNN and BOW is high. Moreover, BOWCNN and BOW both show much lower noise levels compared to OSEM while preserving details and anatomical boundaries. While these results might be expected for the [18 F]PE2I cases, the promising results for the [18 F]FET cases are rather surprising since, first, no [18 F]FET data sets were used during training, and second, the regional contrasts in the [18 F]FET cases are substantially different from the [18 F]FDG and [18 F]PE2I cases used for training. Moreover, supplementary Fig. S1 shows the result of our trained CNN applied to a single [11 C]PIB image (20 min acquisition, 42 min p.i., 351 MBq) where very similar promising results were obtained in a control subject without amyloid binding (only aspecific white matter binding is seen).

Fig. 4.

Same as Fig. 3 for a [18 F]PE2I test case acquired on the SIGNA.

Fig. 5.

Same as Fig. 3 for a [18 F]FET test case acquired on the SIGNA. In this case the acquisition time was 25 min.

Fig. 6 shows predictions of the same [18 F]FDG data set as in Fig. 3 for different noise levels of the input OSEM image originating from reconstructions of 20 min, 3 min, and 1 min of emission data. In terms of regional contrast and detail, the predicted BOWCNN images are very similar to the target (the reconstructed Bowsher image from 20 min of emission data). However, it can be seen that a noise cluster visible in the anterior right putamen of the 1 min and 3 min OSEM images, indicated by the red arrows, leads to a small focus with slightly increased signal in the BOWCNN images predicted from the 1 min and 3 min OSEM images which is not seen in the BOWCNN nor the BOW from the 20 min data. All in all, this additional test indicates that in general the output of the network is very robust against the input noise level of the OSEM images which was achieved by using data sets with different noise level during training.

Fig. 6.

Impact of noise level in the input OSEM PET image on the image quality of the predicted anatomically-guided PET image (BOWCNN). The case shown here is the same as in Fig. 3. (top row left) structural T1-weighted MRI used as prior image in the iterative anatomically-guided PET reconstruction using the Bowsher prior. (top row right) reference iterative anatomically-guided PET reconstruction using the Bowsher prior (BOW). (2nd till 4th row left) OSEM PET reconstruction obtained from 20 min, 3 min, and 1 min of emission data. (2nd till 4th row left) corresponding predictions of our trained convolutional neural network (BOWCNN) using the respective OSEM PET image and the structural MRI as input. Note that although the noise level of the input OSEM images varies a lot, the noise level and the level of detail in the BOWCNN images is remarkably constant and comparable to the BOW image of the full 20 min emission data. The red arrows indicate a noise cluster in the 1 min and 3 min OSEM images that leads to a small focus with slightly increased signal in the BOWCNN images predicted from the those OSEM images which is not seen in the BOWCNN nor the BOW from the 20 min data.

Supplementary Figure S4 shows two predictions using the same [18 F]FDG input OSEM image with two different MRI contrasts (T1 weighted and FLAIR) and the reconstructed Bowsher images using the T1-weighted and the FLAIR MR image as structural prior image. Comparing the BOWCNN images predicted from T1-weighted and FLAIR MR images, it is clear that while the overall image quality is quite similar, the gyri in the FLAIR BOWCNN are slightly wider and softer. However, this is also the case for the reconstructed Bowsher image using the FLAIR MRI as structural prior image (shown in the bottom right) and can be attributed to the slightly wider and more homogeneous appearance of the gyri in the FLAIR MRI compared to the T1-weighted MRI, which is a direct result of the effects of T2-decay during the much longer readout lengths used for a FLAIR sequence which is a Fast Spin Echo based sequence. In contrast to the T1 BOWCNN image, the FLAIR BOWCNN image shows the locally increased uptake in the red nuclei in the midbrain much better as indicated by the red arrows. This might be because red nuclei show only very low contrast in the T1-weighted MRI while having very good contrast in the FLAIR MRI due to their high iron content. The results shown in supplementary Figure S2 demonstrate that the output of our trained CNN is very close to the input OSEM image when no MR information (constant MR image as input) is used. In this case, the prediction actually shows increased Gibbs overshoots similar to an iterative deblurring without any regularization. This, in turn, indicates that the trained CNN did not learn specific anatomical pattern seen in the input OSEM PET images of the training cohort, but rather performs an anatomically-guided deconvolution and denoising. Supplementary Figs. S5, S6, S7, and S8 demonstrate the amount and structure of the bias introduced by artificially degrading the image quality of the anatomical prior image in several ways is very comparable between BOW and BOWCNN. Note that BOWCNN slightly ameliorates high frequency artifacts introduced by misregistration. Supplementary Fig. S10 shows that also for a non-brain image, the image quality of BOW and BOWCNN are very similar. However, in this non-brain case acquired with another tracer not used during training, BOWCNN is slightly smoother compared to BOW.

Prediction time

Table 3 shows the prediction time of the trained network as a function of the input size and for an Intel(R) Xeon(R) CPU E5–2699 v4 and a NVIDIA Tesla P100 SXM2 GPU. For typical brain image sizes of (220,220,220) when using a 1×1×1 mm3 voxel grid, the prediction time is roughly 1 min on a Intel Xeon E5–2699 CPU and 1 s on a single Nvidia Tesla P100 GPU.

Table 3

Prediction time of the trained network as a function of input size and for an Intel(R) Xeon(R) CPU E5-2699 v4 and a NVIDIA Tesla P100 SXM2 GPU. Mean and standard deviation over seven predictions are given.

	Prediction time (ms)
Input size	Intel(R) Xeon(R) CPU E5-2699 v4	NVIDIA GPU Tesla P100 SXM2

(10,10,10,2)	14 ± 1	4 ± 0.5
(50,50,50,2)	826 ± 38	15 ± 0.1
(100,100,100,2)	5170 ± 118	100 ± 0.7
(150,150,150,2)	16,900 ± 455	360 ± 3.7
(200,200,200,2)	41,100 ± 545	851 ± 5
(250,250,250,2)	77,000 ± 1680	1720 ± 10

Discussion

The results obtained in this proof of concept study demonstrate that a purely image-based shift-invariant CNN can be used to approximate the results from anatomically-guided brain PET reconstruction using the asymmetric Bowsher prior. The predictions of our trained network preserve regional quantification compared to the anatomically-guided PET images obtained from iterative model-based reconstruction. Moreover, the similarity in the image quality between the predictions and the ground truth (the anatomically-guided PET reconstruction obtained with model-based iterative reconstruction) is also very high, as shown in our examples and quantified by the regional structural similarity (see Fig. 2). Interestingly, our network also performed well on tracers not used during training, such as the seven test cases acquired with [18 F]FET and the single case acquired with [11C]PIB (see Figs. 2 and 5, and supplementary Fig. S1). Those results and the fact that the local contrast of [18 F]FET and [11C]PIB is different from the two tracers used during training, indicates that the network did not learn the typical local contrasts and uptake patters seen during training, but rather learned how to perform the anatomically-constrained deconvolution and denoising of the input OSEM PET images. As shown in Fig. 6 and supplementary Fig. S4, the image quality of the predicted images of our trained network is robust against the input MRI contrast and the noise level of the input PET images. Both of these desirable properties were achieved by using dedicated data augmentation techniques during training, namely, random contrast flipping of the training MR images and training on OSEM PET images with different count (noise) levels. Moreover, training on data from two different PET/MRI scanners resulted in a trained network that is robust against small differences in the point spread function and noise correlations of the input OSEM PET images. Note that in our CNN training, BOW images reconstructed from the highest available counts (20 min) with a fixed prior strength were used as target, even for input OSEM images reconstructed from lower count data (1 min, 3 min). In that sense, the behavior of BOWCNN is different from BOW for low count data where the latter will lead to more noise in the image for a fixed prior strength. As shown in Fig. 6, this training strategy leads to a CNN whose output noise level depends less on the input noise level of the OSEM images. We believe that this training strategy is desirable for clinical applications. Moreover, we are convinced that by changing the target BOW images during training, a CNN that replicates the exact behavior of BOW in terms of bias and noise for a fixed prior strength also for low count data could be trained if needed. Note that the differences in the point spread function and noise correlations between the OSEM images from the SIGNA and mMR are due to slightly different crystal sizes, the availability of ca. 400 ps time of flight resolution in the SIGNA and the number of iterations and subsets used in the OSEM reconstructions. Overall these results suggest that our trained model should also perform well on tracers that we have not tested in this proof of concept work. More extensive validation of this hypothesis will be part of future work. Using a purely convolutional shift-invariant network allowed us to train the network on small 3D patches. This in turn meant that many small training patches carrying different information could be obtained from a single PET/MRI acquisition which enabled us to train the network on a relatively small number of PET/MRI acquisitions (26 in total). This property is important since to the best of our knowledge large pooled PET/MRI databases containing PET emission raw data and structural MRI data, that can be used to generate large sets of training pairs, do not exist so far. Undoubtedly, we expect that the performance of the network can be improved even further by using more training data sets. Note that with our implementation, training of our network took ca. 8.5 h such that retraining on new data can be performed overnight. We believe that the very fast prediction time (ca. 1 s on a recent GPU) is the biggest advantage of using a convolutional neural network to predict anatomically-guided PET reconstructions in image space. This means that, first, there is no clinically meaningful time delay to generate an anatomically-guided PET reconstruction next to the standard OSEM PET images. Second, those ultra-fast prediction times allow the generation of anatomically-guided PET reconstructions with different level of regularization (from different trained networks) in almost real-time. Consequently, it would be possible to quickly switch between PET images with different levels of MRI-based regularization which will help clinical readers to understand the influence of MRI-based regularization and, moreover, help to tune the level of regularization for different local clinical applications. Because our network can be applied retrospectively on reconstructed DICOM images, this will facilitate the execution of bigger multi-center studies to assess the value of anatomically-guided PET images for different clinical applications.

Limitations

Our proof of concept study has a few limitations. First of all, the similarity in image quality between the anatomically-guided PET images obtained from the network prediction and model-based iterative reconstruction was assessed by two mathematical quantities and not by an observer study using a well defined clinical application. However, given the very small differences documented by these measures and the improved sensitivity of the anatomically-guided PET reconstruction for the detection of pathology (Shepherd et al., 2019) we foresee that the clinical benefits of our proposed CNN approach will be readily apparent. We plan to present the results of an observer study evaluating the performance of anatomically-guided PET images predicted from our network in the context of brain [18F]FDG in epilepsy in upcoming work. Second, in this work, we only evaluated a network trained on anatomically-guided PET reconstructions using one specific type of segmentation-free MRI-based priors, namely, the asymmetric Bowsher prior. This prior was chosen based on a previous study (Schramm et al., 2017), where we could show that it has slightly superior performance compared to the parallel level sets prior (Ehrhardt et al., 2016). However, even though we did not investigate additional priors in our study, we expect that a network to predict PET images reconstructed with a different prior (such as e.g. parallel level sets) can attain similar performance. Last but not least, in this work we only trained and thoroughly tested our network on brain images. Extensive validation of applications of the network to regions outside the brain / head will be part of future work. Until then, we would recommend to use the CNN of this work only trained on brain images only for prediction of brain acquisitions. Third, the performance of the trained network on different tracers is of course influenced by the number and balance between tracers or PET contrasts used during training. In our case, the training data set was dominated by FDG images. To achieve even better performance for an application using a specific single tracer only, retraining the network with different tracers might be desirable.

Conclusion

We demonstrated that anatomically-guided model-based iterative PET reconstruction can be well-approximated by a purely shift-invariant convolutional neural network in image space allowing the generation of anatomically-guided PET images in almost real-time. Using dedicated data augmentation techniques during training resulted in a CNN that is robust against the used PET tracer, the noise level of the input PET images and the input MRI contrast.

Data and code availability

The trained network, the implementation of the network training and examples on how to do predictions (from clinical DICOM and nifti images) will be provided online at https://gschramm.github.io/pyapetnet/ and can be used for non-commercial research purposes upon acceptance of this article for publication. The training, validation and test data sets cannot be made available due to privacy protection restrictions of our institutions.

31 in total

1. Clinically feasible reconstruction of 3D whole-body PET/CT data using blurred anatomical labels.

Authors: Claude Comtat; Paul E Kinahan; Jeffrey A Fessler; Thomas Beyer; David W Townsend; Michel Defrise; Christian Michel
Journal: Phys Med Biol Date: 2002-01-07 Impact factor: 3.609

2. Evaluation of three MRI-based anatomical priors for quantitative PET brain imaging.

Authors: Kathleen Vunckx; Ameya Atre; Kristof Baete; Anthonin Reilhac; Christophe M Deroose; Koen Van Laere; Johan Nuyts
Journal: IEEE Trans Med Imaging Date: 2011-10-27 Impact factor: 10.048

3. Evaluation of Parallel Level Sets and Bowsher's Method as Segmentation-Free Anatomical Priors for Time-of-Flight PET Reconstruction.

Authors: Georg Schramm; Martin Holler; Ahmadreza Rezaei; Kathleen Vunckx; Florian Knoll; Kristian Bredies; Fernando Boada; Johan Nuyts
Journal: IEEE Trans Med Imaging Date: 2018-02 Impact factor: 10.048

4. PET Image Reconstruction Using Deep Image Prior.

Authors: Kuang Gong; Ciprian Catana; Jinyi Qi; Quanzheng Li
Journal: IEEE Trans Med Imaging Date: 2018-12-19 Impact factor: 10.048

Review 5. A review of partial volume correction techniques for emission tomography and their applications in neurology, cardiology and oncology.

Authors: Kjell Erlandsson; Irène Buvat; P Hendrik Pretorius; Benjamin A Thomas; Brian F Hutton
Journal: Phys Med Biol Date: 2012-10-16 Impact factor: 3.609

6. Artificial Neural Network Enhanced Bayesian PET Image Reconstruction.

Authors: Bao Yang; Leslie Ying; Jing Tang
Journal: IEEE Trans Med Imaging Date: 2018-06 Impact factor: 10.048

7. Performance measurements of the Siemens mMR integrated whole-body PET/MR scanner.

Authors: Gaspar Delso; Sebastian Fürst; Björn Jakoby; Ralf Ladebeck; Carl Ganter; Stephan G Nekolla; Markus Schwaiger; Sibylle I Ziegler
Journal: J Nucl Med Date: 2011-11-11 Impact factor: 10.057

8. DeepPET: A deep encoder-decoder network for directly solving the PET image reconstruction inverse problem.

Authors: Ida Häggström; C Ross Schmidtlein; Gabriele Campanella; Thomas J Fuchs
Journal: Med Image Anal Date: 2019-03-30 Impact factor: 8.545

9. Iterative PET Image Reconstruction Using Convolutional Neural Network Representation.

Authors: Georges El Fakhri
Journal: IEEE Trans Med Imaging Date: 2018-09-12 Impact factor: 10.048

10. Deep reconstruction model for dynamic PET images.

Authors: Jianan Cui; Xin Liu; Yile Wang; Huafeng Liu
Journal: PLoS One Date: 2017-09-21 Impact factor: 3.240

7 in total

Review 1. Applications of artificial intelligence in nuclear medicine image generation.

Authors: Zhibiao Cheng; Junhai Wen; Gang Huang; Jianhua Yan
Journal: Quant Imaging Med Surg Date: 2021-06

Review 2. Application of artificial intelligence in brain molecular imaging.

Authors: Satoshi Minoshima; Donna Cross
Journal: Ann Nucl Med Date: 2022-01-14 Impact factor: 2.668

Review 3. Machine Learning Algorithms in Neuroimaging: An Overview.

Authors: Vittorio Stumpo; Julius M Kernbach; Christiaan H B van Niftrik; Martina Sebök; Jorn Fierstra; Luca Regli; Carlo Serra; Victor E Staartjes
Journal: Acta Neurochir Suppl Date: 2022

Review 4. Artificial Intelligence-Based Image Enhancement in PET Imaging: Noise Reduction and Resolution Enhancement.

Authors: Juan Liu; Masoud Malekzadeh; Niloufar Mirian; Tzu-An Song; Chi Liu; Joyita Dutta
Journal: PET Clin Date: 2021-10

Review 5. Deep learning-based image reconstruction and post-processing methods in positron emission tomography for low-dose imaging and resolution enhancement.

Authors: Cameron Dennis Pain; Gary F Egan; Zhaolin Chen
Journal: Eur J Nucl Med Mol Imaging Date: 2022-03-21 Impact factor: 10.057

6. MR-guided motion-corrected PET image reconstruction for cardiac PET-MR.

Authors: Camila Munoz; Sam Ellis; Stephan G Nekolla; Karl P Kunze; Teresa Vitadello; Radhouene Neji; Rene M Botnar; Julia A Schnabel; Andrew J Reader; Claudia Prieto
Journal: J Nucl Med Date: 2021-05-28 Impact factor: 11.082

7. Super-resolution reconstruction for parallel-beam SPECT based on deep learning and transfer learning: a preliminary simulation study.

Authors: Zhibiao Cheng; Junhai Wen; Jun Zhang; Jianhua Yan
Journal: Ann Transl Med Date: 2022-04

7 in total