| Literature DB >> 35522642 |
Li Sun, Junxiang Chen, Yanwu Xu, Mingming Gong, Ke Yu, Kayhan Batmanghelich.
Abstract
Generative Adversarial Networks (GAN) have many potential medical imaging applications, including data augmentation, domain adaptation, and model explanation. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models either cannot scale to high-resolution or are prone to patchy artifacts. In this work, we propose a novel end-to-end GAN architecture that can generate high-resolution 3D images. We achieve this goal by using different configurations between training and inference. During training, we adopt a hierarchical structure that simultaneously generates a low-resolution version of the image and a randomly selected sub-volume of the high-resolution image. The hierarchical design has two advantages: First, the memory demand for training on high-resolution images is amortized among sub-volumes. Furthermore, anchoring the high-resolution sub-volumes to a single low-resolution image ensures anatomical consistency between sub-volumes. During inference, our model can directly generate full high-resolution images. We also incorporate an encoder with a similar hierarchical structure into the model to extract features from the images. Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation. We also demonstrate clinical applications of the proposed model in data augmentation and clinical-relevant feature extraction.Entities:
Mesh:
Year: 2022 PMID: 35522642 PMCID: PMC9413516 DOI: 10.1109/JBHI.2022.3172976
Source DB: PubMed Journal: IEEE J Biomed Health Inform ISSN: 2168-2194 Impact factor: 7.021
Important Notations in This Paper
|
| |
| The common block of the generator. | |
| The low-resolution block of the generator. | |
| The high-resolution block of the generator. | |
| The discriminator for high-resolution images. | |
| The discriminator for low-resolution images. | |
| The high-resolution block of the encoder. | |
| The ground block of the encoder. | |
|
| |
| The high-resolution sub-volume selector. | |
| The low-resolution sub-volume selector. | |
|
| |
|
| Latent representations. |
|
| Reconstructed latent representations. |
|
| GOLD score. |
|
| The index of the tarting slice for sub-volume selection. |
|
| The real high-resolution image. |
|
| The real low-resolution image. |
|
| The generated high-resolution image. |
|
| The generated high-resolution sub-volume starting at slice |
|
| The generated low-resolution image. |
|
| Intermediate feature maps for the whole image |
|
| Intermediate feature maps for the sub-volume starting at slice |
|
| Reconstructed intermediate feature maps for the whole image. |
|
| Reconstructed intermediate feature maps for the v-tii sub-volume. |
|
| The indices of the starting slices for a partition for |
Fig. 1.Left: The architecture of HA-GAN (encoder is hidden here to improve clarity). At the training time, instead of directly generating high-resolution full volume, our generator contains two branches for high-resolution sub-volume and low-resolution full volume generation, respectively. The two branches share the common block G. A sub-volume selector is used to select a part of the intermediate feature for the sub-volume generation. Right: The schematic of the hierarchical encoder trained with two reconstruction losses, one on the high-resolution sub-volume level (upper right) and another one on the low-resolution full volume level (lower right). The meanings of the notations used can be found in Table I. The model adopts 3D architecture with details presented in Supplementary Material.
Fig. 2.Inference with the hierarchical generator and encoder. Since the memory demand is lower at inference time, we directly forward input through the high-resolution branch for full image generation and encoding.
Evaluation for Image Synthesis on COPDGene Dataset
| Resolution | 1283 | 2563 | ||||
|---|---|---|---|---|---|---|
| FID↓ | MMD↓ | IS↑ | FID↓ | MMD↓ | IS↑ | |
| WGAN | 0.012±.001 | 0.092±.059 | 1.99±.07 | 0.161±.044 | 0.471±.110 | 1.97±.05 |
| VAE-GAN | 0.139±.002 | 1.065±.008 | 1.19±.03 | 0.328±.007 | 1.028±.008 | 1.18±.03 |
| 0.010±.004 | 0.089±.056 | 1.89±.04 | 0.043±.094 | 0.323±.080 | 1.96±.03 | |
| Progressive GAN | 0.015±.007 | 0.150±.072 | 1.75±.11 | 0.107±.037 | 0.287±.123 | 1.76±.11 |
| StyleGAN 2 | 0.011±.001 | 0.071±.002 | 2.03±.02 | 0.081±.003 | 0.225±.008 | 2.06±.01 |
| CCE-GAN | 0.010±.004 | 0.087±.039 | 1.97±.05 | 0.074±.038 | 0.252±.116 | 1.95±.04 |
|
|
|
|
|
|
|
|
Evaluation for Image Synthesis on GSP Dataset
| Resolution | 1283 | 2563 | ||||
|---|---|---|---|---|---|---|
| FID↓ | MMD↓ | IS↑ | FID↓ | MMD↓ | IS↑ | |
| WGAN | 0.006±.002 | 0.406±.143 | 1.37±.02 | 0.025±.013 | 0.328±.139 | 1.43±.03 |
| VAE-GAN | 0.075±.004 | 0.667±.026 | 1.03±.01 | 0.635±.040 | 0.702±.028 | 1.06±.06 |
| 0.010±.007 | 0.606±.204 | 1.39±.03 | 0.029±.016 | 0.428±.141 | 1.34±.08 | |
| Progressive GAN | 0.017±.008 | 0.818±.217 | 1.25±.10 | 0.127±.055 | 1.041±.239 | 1.25±.10 |
| StyleGAN 2 | 0.014±001 | 0.369±.175 | 1.26±.01 | 0.048±.001 | 0.370±.020 | 1.32±.01 |
| CCE-GAN | 0.005±.004 | 0.301±.147 | 1.38±.02 | 0.030±.011 | 0.411±.106 | 1.41±.04 |
| HA-GAN |
|
|
|
|
|
|
Results of Ablation Study
| Dataset | COPDGene (Lung) | GSP (Brain) | ||
|---|---|---|---|---|
| FID↓ | MMD↓ | FID↓ | MMD↓ | |
| HA-GAN w/o Low-resolution branch | 0.030±.018 | 0.071±.039 | 0.118±.078 | 0.876±.182 |
| HA-GAN w/o Encoder | 0.010±.003 | 0.034±.006 | 0.006±.003 | 0.099±.028 |
| HA-GAN w/ Deterministic | 0.014±.003 | 0.035±.007 | 0.061±.016 | 0.612±.157 |
| HA-GAN |
|
|
|
|
Fig. 3.Randomly generated images by different models and the real images. The figure illustrates that HA-GAN generates sharper images than the baselines.
Fig. 4.Comparison of the embedding of different models. We embed the features extracted from synthesized images into 2-dimensional space with MDS. The ellipses are fitted to scatters of each model for better visualization. The figures show that the embedding region of HA-GAN has the most overlapping with real images, compared to the baselines.
Training Speed (iter/s) for Different Models (Higher is Better)
| WGAN | VAE-GAN | PGGAN | CCE-GAN | StyleGAN | HA-GAN | |
|---|---|---|---|---|---|---|
| 2.0 | 1.0 | 1.3 | 1.6 | 0.35 | 0.23 |
|
Evaluation Result for GAN-Based Data Augmentation
| Method | Accuracy(%) |
|---|---|
| Baseline | 59.7 |
| Augmented with | 61.7 |
| Augmented with HA-GAN |
|
R2 for Predicting Clinical-Relevant Measurements
| Method | log | log | log |
|---|---|---|---|
| VAE-GAN | 0.215 | 0.315 | 0.375 |
| 0.512 | 0.622 | 0.738 | |
| HA-GAN |
|
|
|
We do not include the results of WGAN and Progressive GAN, because they do not incorporate an encoder.
Fig. 5.Latent space exploration on thorax CT images. The figure reports synthetic images generated by changing the latent code in two different directions, corresponding to the lung and bone volume respectively. The number shown below each slice indicates the percentage of the volume of interest that occupies the volume of lung region of the synthetic image. The segmentation masks are plotted in green.
Fig. 6.Results of memory usage test. Note that HA-GAN is the only model that can generate images sized 2563 without memory overflow on high-end GPU with 16 GB VRAM.
Number of Model Parameters and Memory Usage Under Different Resolutions
| Output Resolution | Memory Usage (MB) | #Parameters |
|---|---|---|
| 323 | 2573 | 74.7M |
| 643 | 2665 | 78.7M |
| 1283 | 3167 | 79.6M |
| 2563 | 5961 | 79.7M |