| Literature DB >> 31595105 |
Guotai Wang1,2,3, Wenqi Li1,2, Michael Aertsen4, Jan Deprest1,4,5,6, Sébastien Ourselin2, Tom Vercauteren1,2,6.
Abstract
Despite the state-of-the-art performance for medical image segmentation, deep convolutional neural networks (CNNs) have rarely provided uncertainty estimations regarding their segmentation outputs, e.g., model (epistemic) and image-based (aleatoric) uncertainties. In this work, we analyze these different types of uncertainties for CNN-based 2D and 3D medical image segmentation tasks at both pixel level and structure level. We additionally propose a test-time augmentation-based aleatoric uncertainty to analyze the effect of different transformations of the input image on the segmentation output. Test-time augmentation has been previously used to improve segmentation accuracy, yet not been formulated in a consistent mathematical framework. Hence, we also propose a theoretical formulation of test-time augmentation, where a distribution of the prediction is estimated by Monte Carlo simulation with prior distributions of parameters in an image acquisition model that involves image transformations and noise. We compare and combine our proposed aleatoric uncertainty with model uncertainty. Experiments with segmentation of fetal brains and brain tumors from 2D and 3D Magnetic Resonance Images (MRI) showed that 1) the test-time augmentation-based aleatoric uncertainty provides a better uncertainty estimation than calculating the test-time dropout-based model uncertainty alone and helps to reduce overconfident incorrect predictions, and 2) our test-time augmentation outperforms a single-prediction baseline and dropout-based multiple predictions.Entities:
Keywords: Convolutional neural networks; Data augmentation; Medical image segmentation; Uncertainty estimation
Year: 2019 PMID: 31595105 PMCID: PMC6783308 DOI: 10.1016/j.neucom.2019.01.103
Source DB: PubMed Journal: Neurocomputing ISSN: 0925-2312 Impact factor: 5.719
Fig. 1Visual comparison of different types of uncertainties and their corresponding segmentations for fetal brain. The uncertainty maps in odd columns are based on Monte Carlo simulation with N = 20 and encoded by the color bar in the left up corner (low uncertainty shown in purple and high uncertainty shown in yellow). The white arrows in (a) show the aleatoric and hybrid uncertainties in a challenging area, and the white arrows in (b) and (c) show mis-segmented regions with very low epistemic uncertainty. TTD: test-time dropout, TTA: test-time augmentation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Dice of 2D fetal brain segmentation with different N that is the number of Monte Carlo simulation runs.
Dice (%) and ASSD (mm) evaluation of 2D fetal brain segmentation with different training and testing methods. Tr – Aug: training without data augmentation. Tr + Aug: training with data augmentation. * denotes significant improvement from the baseline of single prediction in Tr – Aug and Tr + Aug respectively (p-value < 0.05). † denotes significant improvement from Tr – Aug with TTA + TTD (p-value < 0.05).
| Train | Test | Dice (%) | ASSD (mm) | ||||
|---|---|---|---|---|---|---|---|
| FCN | U-Net | P-Net | FCN | U-Net | P-Net | ||
| Tr – Aug | Baseline | 91.05 ± 3.82 | 90.26 ± 4.77 | 90.65 ± 4.29 | 2.68 ± 2.93 | 3.11 ± 3.34 | 2.83 ± 3.07 |
| TTD | 91.13 ± 3.60 | 90.38 ± 4.30 | 90.93 ± 4.04 | 2.61 ± 2.85 | 3.04 ± 2.29 | 2.69 ± 2.90 | |
| TTA | 91.99 ± 3.48* | 91.64 ± 4.11* | 92.02 ± 3.85* | 2.26 ± 2.56* | 2.51 ± 3.23* | 2.28 ± 2.61* | |
| TTA + TTD | |||||||
| Tr + Aug | Baseline | 92.03 ± 3.44 | 91.93 ± 3.21 | 91.98 ± 3.92 | 2.21 ± 2.52 | 2.12 ± 2.23 | 2.32 ± 2.71 |
| TTD | 92.08 ± 3.41 | 92.00 ± 3.22 | 92.01 ± 3.89 | 2.17 ± 2.52 | 2.03 ± 2.13 | 2.15 ± 2.58 | |
| TTA | 92.79 ± 3.34* | 92.88 ± 3.15* | 93.05 ± 2.96* | 1.88 ± 2.08 | 1.70 ± 1.75 | 1.62 ± 1.77* | |
| TTA + TTD | |||||||
Fig. 3Dice distributions of segmentation results with different testing methods for five example stacks of 2D slices of fetal brain MRI. Note TTA’s higher mean value and variance compared with TTD.
Fig. 4Normalized joint histogram of prediction uncertainty and error rate for 2D fetal brain segmentation. The average error rates at different uncertainty levels are depicted by the red curves. The dashed ellipses show that TTA leads to a lower occurrence of overconfident incorrect predictions than TTD. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Structure-wise uncertainty in terms of volume variation coefficient (VVC) vs 1−Dice for different testing methods in 2D fetal brain segmentation.
Fig. 6Visual comparison of different testing methods for 3D brain tumor segmentation. The uncertainty maps in odd columns are based on Monte Carlo simulation with N = 40 and encoded by the color bar in the left up corner (low uncertainty shown in purple and high uncertainty shown in yellow). TTD: test-time dropout, TTA: test-time augmentation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Dice (%) and ASSD (mm) evaluation of 3D brain tumor segmentation with different training and testing methods. Tr −Aug: Training without data augmentation. Tr + Aug: Training with data augmentation. W-Net is a 2.5D network and W-Net (ASC) denotes the fusion of axial, sagittal and coronal views according to [43]. * denotes significant improvement from the baseline of single prediction in Tr −Aug and Tr + Aug respectively (p-value < 0.05). † denotes significant improvement from Tr −Aug with TTA + TTD (p-value < 0.05).
| Train | Test | Dice (%) | ASSD (mm) | ||||
|---|---|---|---|---|---|---|---|
| WNet (ASC) | 3D U-Net | V-Net | WNet (ASC) | 3D U-Net | V-Net | ||
| Tr − Aug | Baseline | 87.81 ± 7.27 | 87.26 ± 7.73 | 86.84 ± 8.38 | 2.04 ± 1.27 | 2.62 ± 1.48 | 2.86 ± 1.79 |
| TTD | 88.14 ± 7.02 | 87.55 ± 7.33 | 87.13 ± 8.14 | 1.95 ± 1.20 | 2.55 ± 1.41 | 2.82 ± 1.75 | |
| TTA | 89.16 ± 6.48* | 88.58 ± 6.50* | 87.86 ± 6.97* | 1.42 ± 0.93* | 1.79 ± 1.16* | 1.97 ± 1.40* | |
| TTA + TTD | |||||||
| Tr + Aug | Baseline | 88.76 ± 5.76 | 88.43 ± 6.67 | 87.44 ± 7.84 | 1.61 ± 1.12 | 1.82 ± 1.17 | 2.07 ± 1.46 |
| TTD | 88.92 ± 5.73 | 88.52 ± 6.66 | 87.56 ± 7.78 | 1.57 ± 1.06 | 1.76 ± 1.14 | 1.99 ± 1.33 | |
| TTA | 90.07 ± 5.69* | 89.41 ± 6.05* | 88.38 ± 6.74* | 1.13 ± 0.54* | 1.45 ± 0.81 | 1.67 ± 0.98* | |
| TTA + TTD | |||||||
Fig. 7Normalized joint histogram of prediction uncertainty and error rate for 3D brain tumor segmentation. The average error rates at different uncertainty levels are depicted by the red curves. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 8Structure-wise uncertainty in terms of volume variation coefficient (VVC) vs 1–Dice for different testing methods in 3D brain tumor segmentation.