| Literature DB >> 36059677 |
Liang Zhao1, Jiajun Ma1, Yu Shao1, Chaoran Jia1, Jingyuan Zhao2, Hong Yuan3.
Abstract
The global annual incidence of brain tumors is approximately seven out of 100,000, accounting for 2% of all tumors. The mortality rate ranks first among children under 12 and 10th among adults. Therefore, the localization and segmentation of brain tumor images constitute an active field of medical research. The traditional manual segmentation method is time-consuming, laborious, and subjective. In addition, the information provided by a single-image modality is often limited and cannot meet the needs of clinical application. Therefore, in this study, we developed a multimodality feature fusion network, MM-UNet, for brain tumor segmentation by adopting a multi-encoder and single-decoder structure. In the proposed network, each encoder independently extracts low-level features from the corresponding imaging modality, and the hybrid attention block strengthens the features. After fusion with the high-level semantic of the decoder path through skip connection, the decoder restores the pixel-level segmentation results. We evaluated the performance of the proposed model on the BraTS 2020 dataset. MM-UNet achieved the mean Dice score of 79.2% and mean Hausdorff distance of 8.466, which is a consistent performance improvement over the U-Net, Attention U-Net, and ResUNet baseline models and demonstrates the effectiveness of the proposed model.Entities:
Keywords: brain tumor (or Brat); dilated convolution; hybrid attention mechanism; medical image segmentation; multimodality fusion
Year: 2022 PMID: 36059677 PMCID: PMC9434799 DOI: 10.3389/fonc.2022.950706
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Illustration of brain tumor segmentation pipeline for MM-UNet.
Figure 2Proposed MM-UNet network architecture.
Details of operations performed, and settings of layers in each encoding and decoding stage of the proposed network.
| # Stages | Encoder path | # Output features and Feature size | Decoder path | # Output features and Feature size |
|---|---|---|---|---|
| 1 | Input | 4@160*160*1 | Conv2D [output Layer] [1*1] | 160*160*4 |
| Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3,BatchNorm, ReLU] | 4@160*160*32 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 160*160*128 | |
| Max Pooling [2*2] | 4@80*80*32 | Upsampling (Deconvolution layer) [2*2, strides = 2*2] | 160*160*256 | |
| 2 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 4@80*80*64 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 80*80*256 |
| Max Pooling [2*2] | 4@40*40*64 | Upsampling (Deconvolution layer) [2*2, strides = 2*2] | 80*80*512 | |
| 3 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 4@40*40*128 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 40*40*512 |
| Max Pooling [2*2] | 4@20*20*128 | Upsampling (Deconvolution layer) [2*2, strides = 2*2] | 40*40*1,024 | |
| 4 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 4@20*20*256 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 20*20*1,024 |
| Max Pooling [2*2] | 4@10*10*256 | Upsampling (Deconvolution layer) [2*2, strides = 2*2] | 20*20*2,048 | |
| 5 | Multimodal Fusion | 10*10*1,024 | Conv2D [3*3, BatchNorm, ReLU] Conv2D [3*3, BatchNorm, ReLU] | 10*10*2,048 |
Computational needs of our proposed method on BraTS 2020 dataset.
| Method | Params/M | FLOPs/G |
|---|---|---|
| Baseline | 110.1 | 69.8 |
| Baseline + Dilated Convolution Block | 110.8 | 70.9 |
| Baseline + Hybrid Attention Block | 110.7 | 69.8 |
| Our Method | 111.4 | 71.0 |
Ablation study of our proposed method on BraTS 2020 dataset.
| Method | DSC | Hausdoff95 | ||||
|---|---|---|---|---|---|---|
| ET | WT | TC | ET | WT | TC | |
| Baseline | 0.714 | 0.841 | 0.737 | 6.554 | 12.645 | 11.035 |
| Baseline + Dilated Convolution Block | 0.765 | 0.789 | 0.728 | 6.586 | 11.253 | 15.174 |
| Baseline + Hybrid Attention Block | 0.739 | 0.832 | 0.738 | 6.993 | 8.597 | 11.046 |
| Our Method | 0.762 | 0.850 | 0.765 | 6.389 | 8.243 | 10.766 |
Figure 3Sample segmentation results. Column 1, input images; column 2, brain tumor ground truth (GT) images; column 3, segmented results of U-Net; column 4, segmented results of Attention U-Net; column 5, segmented results of Res U-Net; column 6, segmented results of our method (ET, blue; TC, yellow + blue; and WT, green + yellow + blue).
Comparative results on BraTS 2020 dataset.
| Method | DSC | Hausdoff95 | ||||
|---|---|---|---|---|---|---|
| ET | WT | TC | ET | WT | TC | |
| U-Net | 0.707 | 0.825 | 0.732 | 9.035 | 12.174 | 14.361 |
| Attention U-Net | 0.710 | 0.809 | 0.703 | 13.983 | 22.887 | 22.799 |
| ResUNet | 0.723 | 0.813 | 0.738 | 6.613 | 9.075 | 11.225 |
| Our Method | 0.762 | 0.850 | 0.765 | 6.389 | 8.243 | 10.766 |
Control experiment on BraTS 2020 dataset.
| Method | DSC | Hausdoff95 | ||||
|---|---|---|---|---|---|---|
| ET | WT | TC | ET | WT | TC | |
| Spatial first | 0.737 | 0.802 | 0.713 | 41.072 | 7.990 | 30.989 |
| Channel and spatial parallelism | 0.737 | 0.851 | 0.768 | 8.381 | 7.846 | 12.359 |
| DCB after quartic downsampling | 0.773 | 0.854 | 0.750 | 6.573 | 14.199 | 11.409 |
| Dice loss as loss function | 0.745 | 0.844 | 0.729 | 6.001 | 8.206 | 11.021 |