| Literature DB >> 35741682 |
Yun Jiang1, Yuan Zhang1, Xin Lin1, Jinkun Dong1, Tongtong Cheng1, Jing Liang1.
Abstract
Brain tumor semantic segmentation is a critical medical image processing work, which aids clinicians in diagnosing patients and determining the extent of lesions. Convolutional neural networks (CNNs) have demonstrated exceptional performance in computer vision tasks in recent years. For 3D medical image tasks, deep convolutional neural networks based on an encoder-decoder structure and skip-connection have been frequently used. However, CNNs have the drawback of being unable to learn global and remote semantic information well. On the other hand, the transformer has recently found success in natural language processing and computer vision as a result of its usage of a self-attention mechanism for global information modeling. For demanding prediction tasks, such as 3D medical picture segmentation, local and global characteristics are critical. We propose SwinBTS, a new 3D medical picture segmentation approach, which combines a transformer, convolutional neural network, and encoder-decoder structure to define the 3D brain tumor semantic segmentation job as a sequence-to-sequence prediction challenge in this research. To extract contextual data, the 3D Swin Transformer is utilized as the network's encoder and decoder, and convolutional operations are employed for upsampling and downsampling. Finally, we achieve segmentation results using an improved Transformer module that we built for increasing detail feature extraction. Extensive experimental results on the BraTS 2019, BraTS 2020, and BraTS 2021 datasets reveal that SwinBTS outperforms state-of-the-art 3D algorithms for brain tumor segmentation on 3D MRI scanned images.Entities:
Keywords: 3D CNN; Swin Transformer; brain tumor segmentation; depth-wise separable convolution
Year: 2022 PMID: 35741682 PMCID: PMC9221215 DOI: 10.3390/brainsci12060797
Source DB: PubMed Journal: Brain Sci ISSN: 2076-3425
Figure 1Overview of UNETR architecture. A 3D input volume (e.g., channels for MRI images) is divided into a sequence of uniform non-overlapping patches and projected into an embedding space using a linear layer. The sequence is added with a position embedding and used as an input to a transformer model. The encoded representations of different layers in the transformer are extracted and merged with a decoder via skip connections to predict the final segmentation.
Figure 2The structure of the enhanced transformer module.
Segmentation Dice results on BraTS 2019 test set.
| Method | Dice Score (%) | |||
|---|---|---|---|---|
| ET | TC | WT | AVG. | |
| 3D U-Net [ | 66.15 ± 0.339 | 66.94 ± 0.322 | 86.89 ± 0.071 | 73.33 |
| Attention U-Net [ | 67.06 ± 0.327 | 71.95 ± 0.264 | 86.69 ± 0.100 | 75.23 |
| U-Netr [ | 67.19 ± 0.346 | 74.39 ± 0.256 | 88.57 ± 0.122 | 76.72 |
| TransBTS [ | 71.08 ± 0.347 | 78.67 ± 0.207 | 89.75 ± 0.070 | 79.83 |
| VTU-Net [ | 73.53 ± 0.311 | 78.09 ± 0.242 | 89.56 ± 0.089 | 80.39 |
|
| 74.43 ± 0.294 | 79.28 ± 0.232 | 89.75 ± 0.070 | 81.15 |
Segmentation mIOU results on BraTS 2019 test set.
| Method | mIOU (%) | |||
|---|---|---|---|---|
| ET | TC | WT | AVG. | |
| 3D U-Net [ | 55.96 ± 0.308 | 52.72 ± 0.304 | 78.02 ± 0.104 | 73.33 |
| Attention U-Net [ | 57.85 ± 0.309 | 61.73 ± 0.275 | 77.76 ± 0.137 | 75.23 |
| TransBTS [ | 62.63 ± 0.322 | 69.16 ± 0.228 | 82.38 ± 0.100 | 79.83 |
| VTU-Net [ | 65.00 ± 0.300 | 69.09 ± 0.251 | 81.12 ± 0.114 | 80.39 |
|
| 66.03 ± 0.296 | 70.23 ± 0.216 | 83.33 ± 0.104 | 81.15 |
Figure 3Dice score boxplot of SwinBTS and TransBTS methods on BraTS 2019 test dataset.
Segmentation results on the BraTS 2020 validation dataset.
| Method | Dice Score (%) | 95% Hausdorff Dist. (mm) | ||||||
|---|---|---|---|---|---|---|---|---|
| ET | TC | WT | AVG. | ET | TC | WT | AVG. | |
| 3D U-Net [ | 70.63 ± 0.284 | 73.70 ± 0.128 | 85.84 ± 0.250 | 76.72 | 34.30 | 18.86 | 10.93 | 21.36 |
| V-Net [ | 68.97 | 77.90 | 86.11 | 77.66 | 43.52 | 16.15 | 14.49 | 24.72 |
| Residual U-Net [ | 71.63 | 76.47 | 82.46 | 76.85 | 37.42 | 13.11 | 12.34 | 20.95 |
| Attention U-Net [ | 71.83 ± 0.317 | 75.96 ± 0.126 | 85.57 ± 0.245 | 77.79 | 32.94 | 19.43 | 11.91 | 21.42 |
| U-Netr [ | 71.18 ± 0.297 | 75.85 ± 0.100 | 88.30 ± 0.226 | 78.44 | 34.46 | 10.63 | 8.18 | 17.75 |
| TransBTS [ | 76.31 ± 0.272 | 80.36 ± 0.075 | 88.78 ± 0.174 | 81.82 | 29.83 | 9.77 | 5.60 | 15.06 |
| VTU-Net [ | 76.45 ± 0.267 | 80.39 ± 0.107 | 88.73 ± 0.218 | 81.86 | 28.99 | 14.76 | 9.54 | 17.76 |
| SwinBTS | 77.36 ± 0.224 | 80.30 ± 0.079 | 89.06 ± 0.130 | 82.24 | 26.84 | 15.78 | 8.56 | 17.06 |
Segmentation results on BraTS 2021 validation dataset.
| Method | Dice Score (%) | 95% Hausdorff Dist. (mm) | ||||||
|---|---|---|---|---|---|---|---|---|
| ET | TC | WT | AVG. | ET | TC | WT | AVG. | |
| SwinBTS | 83.21 ± 0.222 | 84.75 ± 0.227 | 91.83 ± 0.078 | 86.60 | 16.03 | 14.51 | 3.65 | 11.39 |
Ablation experiments of each module.
| Model | Dice Score (%) | |||
|---|---|---|---|---|
| ET | TC | WT | AVG. | |
| SwinUnet3D | 71.75 | 76.74 | 88.40 | 78.96 |
| SwinUnet3D + NFCE | 73.00 | 77.48 | 89.01 | 79.83 (+0.87) |
| SwinUnet3D + NFCE + Trans | 73.42 | 77.91 | 90.07 | 80.46 (+0.63) |
| SwinBTS + NFCE + Conv | 72.55 | 77.97 | 87.85 | 79.46 (−0.37) |
| SwinUnet3D + NFCE + ETrans | 74.43 | 79.28 | 89.75 | 81.15 (+1.32) |
Experiments of different depths.
| Method | Depth | Dice Score (%) | |||
|---|---|---|---|---|---|
| ET | TC | WT | AVG. | ||
| SwinBTS | 1 | 73.06 | 78.60 | 89.08 | 80.24 |
| 2 | 74.43 | 79.28 | 89.75 | 81.15 | |
| 4 | 72.88 | 79.19 | 89.63 | 80.57 | |
Figure 4Comparing the effects of ETrans modules through heatmaps.
Figure 5T2 images add noise contrast. (a): with no noise, (b): with noise.
Experiments of different degrees of noise.
| Method | Noise-Sigma | Dice Score (%) | |||
|---|---|---|---|---|---|
| ET | TC | WT | AVG. | ||
| SwinBTS | 0 | 74.43 | 79.28 | 89.75 | 81.15 |
| 1 | 69.62 | 75.84 | 85.80 | 77.08 | |
| 5 | 59.67 | 69.85 | 82.85 | 70.79 | |
Figure 6Visual comparison of MRI image segmentation results.