| Literature DB >> 35720715 |
Tao Wang1,2, Junlin Lan1,2, Zixin Han1,2, Ziwei Hu1,2, Yuxiu Huang1,2, Yanglin Deng1,2, Hejun Zhang3, Jianchao Wang3, Musheng Chen3, Haiyan Jiang2,4, Ren-Guey Lee5, Qinquan Gao1,2,6, Ming Du1, Tong Tong1,2,6, Gang Chen3,7.
Abstract
The application of deep learning in the medical field has continuously made huge breakthroughs in recent years. Based on convolutional neural network (CNN), the U-Net framework has become the benchmark of the medical image segmentation task. However, this framework cannot fully learn global information and remote semantic information. The transformer structure has been demonstrated to capture global information relatively better than the U-Net, but the ability to learn local information is not as good as CNN. Therefore, we propose a novel network referred to as the O-Net, which combines the advantages of CNN and transformer to fully use both the global and the local information for improving medical image segmentation and classification. In the encoder part of our proposed O-Net framework, we combine the CNN and the Swin Transformer to acquire both global and local contextual features. In the decoder part, the results of the Swin Transformer and the CNN blocks are fused to get the final results. We have evaluated the proposed network on the synapse multi-organ CT dataset and the ISIC 2017 challenge dataset for the segmentation task. The classification network is simultaneously trained by using the encoder weights of the segmentation network. The experimental results show that our proposed O-Net achieves superior segmentation performance than state-of-the-art approaches, and the segmentation results are beneficial for improving the accuracy of the classification task. The codes and models of this study are available at https://github.com/ortonwang/O-Net.Entities:
Keywords: CNN; classification; deep learning; medical image segmentation; transformer
Year: 2022 PMID: 35720715 PMCID: PMC9201625 DOI: 10.3389/fnins.2022.876065
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
Figure 1The architecture of our proposed O-Net.
Figure 2Two successive Swin Transformer block.
Figure 3The architecture of EfficientNet Block.
Figure 4The architecture of decoder module.
Experimental results of different methods on the synapse multi-organ CT dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| V-Net Milletari et al. ( | 68.81 | – | 75.34 | 51.87 | 77.10 |
| 87.84 | 40.05 | 80.56 | 56.98 |
| DARR Fu et al. ( | 69.77 | – | 74.74 | 53.77 | 72.31 | 73.24 | 94.08 | 54.18 | 89.90 | 45.96 |
| R50 ViT Chen et al. ( | 71.29 | 32.87 | 73.73 | 55.13 | 75.80 | 72.20 | 91.51 | 45.99 | 81.99 | 73.95 |
| U-SegNet Kumar et al. ( | 72.61 | 43.94 | 85.69 | 64.33 | 75.12 | 66.41 | 91.72 | 50.59 | 84.07 | 62.96 |
| R50 U-Net Chen et al. ( | 74.68 | 36.87 | 87.74 | 63.66 | 80.60 | 78.19 | 93.74 | 56.90 | 85.87 | 74.16 |
| AION Gehlot and Gupta ( | 75.54 | 32.27 | 87.59 | 58.74 | 82.47 | 73.45 | 93.47 | 49.44 | 87.52 | 71.61 |
| R50 Att-UNet Chen et al. ( | 75.57 | 36.97 | 55.92 | 63.91 | 79.20 | 72.71 | 93.56 | 49.37 | 87.19 | 74.95 |
| U-Net Ronneberger et al. ( | 76.85 | 39.7 | 89.07 |
| 77.77 | 68.60 | 93.43 | 53.98 | 86.67 | 75.58 |
| EDNFC-Net Gehlot et al. ( | 77.21 | 35.07 | 86.08 | 62.47 | 84.31 | 78.27 | 92.61 | 57.31 | 85.36 | 71.24 |
| TransUNet Chen et al. ( | 77.48 | 31.69 | 87.23 | 63.13 | 81.87 | 77.02 | 94.08 | 55.86 | 85.08 | 75.62 |
| Att-UNet Oktay et al. ( | 77.77 | 36.02 |
| 68.88 | 77.98 | 71.11 | 93.57 | 58.04 | 87.30 | 75.75 |
| TransFuse Zhang et al. ( | 78.95 | 26.59 | 87.09 | 61.64 | 82.20 | 76.91 | 94.19 | 59.01 | 89.86 | 80.73 |
| Swin-Unet Cao et al. ( | 79.13 | 21.55 | 85.47 | 66.53 | 83.28 | 79.61 | 94.29 | 56.58 |
| 76.60 |
| O-Net |
|
| 88.36 | 67.45 |
| 77.13 |
|
| 90.03 |
|
The symbol ↑ means the higher value, the better.
The symbol ↓ means the lower value, the better.
Bold font to highlight the optimal values.
Figure 5Conparision of different methods on the Synapse multi-organ dataset by visualization. From left to right: (A) Ground Truth, (B) O-Net, (C) SwinUNet, (D) TransUNet, and (E) R50 AttUNet.
Segmentation results of different methods on the ISIC2017 dataset.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| U-Net Ronneberger et al. ( | 85.22 | 78.40 | 91.17 | 73.98 | 77.80 | 91.19 |
| R50-U-Net Xiao et al. ( | 87.48 | 80.86 | 92.99 | 78.19 | 81.70 | 92.19 |
| U-SegNet Kumar et al. ( | 87.87 | 81.22 | 90.50 | 81.13 | 82.49 | 92.33 |
| ENDFC-Net Gehlot et al. ( | 88.00 | 81.43 | 90.26 |
| 82.80 | 92.29 |
| M-Net Fu et al. ( | 88.33 | 82.25 | 94.46 | 79.04 | 83.38 | 92.67 |
| AION Gehlot and Gupta ( | 88.84 | 82.56 | 92.26 | 81.95 | 84.02 | 92.88 |
| CE-Net Gu et al. ( | 89.64 | 83.56 | 95.40 | 80.47 | 84.99 | 93.67 |
| Swin-Unet Cao et al. ( | 88.77 | 82.69 | 94.64 | 79.16 | 83.51 | 94.04 |
| TransFuse Zhang et al. ( | 89.63 | 83.78 | 95.56 | 80.35 | 84.75 | 93.73 |
| TransUNet Chen et al. ( | 89.99 | 84.21 | 95.59 | 81.21 | 85.42 | 93.97 |
|
|
|
|
| 81.72 |
|
|
Bold font to highlight the optimal values.
Classification accuracy of different methods on the ISIC2017 dataset.
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||||
| Swin Transformer Liu et al. ( | 80.22 | 89.50 | 81.18 | 62.16 | 91.76 | ||||
| AION Gehlot and Gupta ( | 81.55 | 85.33 | 76.01 | 50.74 | 86.86 | ||||
| TransMed Dai et al. ( | 84.11 | 89.19 | 80.10 | 61.90 | 92.16 | ||||
| MobileNetV3 Howard et al. ( | 84.89 | 89.33 | 81.53 | 60.83 | 90.78 | ||||
| EfficientNet-B3 Tan and Le ( | 85.22 | 90.67 | 82.64 | 66.67 | 93.33 | ||||
| Inception v4 Szegedy et al. ( | 85.33 | 89.16 | 81.45 | 60.16 | 90.39 | ||||
| ResNet50 He et al. ( | 85.44 | 91.00 | 82.97 | 68.37 | 93.92 | ||||
| DenseNet201 Huang et al. ( | 86.56 |
|
| 69.81 | 93.73 | ||||
| O-Net |
| 91.67 | 83.51 |
|
| ||||
|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
|
|
| Swin Transformer Liu et al. ( | 80.22 | 73.00 | 71.63 | 84.07 | 73.91 | 78.17 | 68.45 | 45.33 | 83.02 |
| AION Gehlot and Gupta ( | 81.55 | 77.33 | 75.69 | 85.40 | 74.40 | 82.00 | 69.73 | 54.46 | 90.48 |
| TransMed Dai et al. ( | 84.11 | 79.17 | 77.13 | 84.72 | 71.50 | 84.00 | 79.83 | 59.63 | 55.56 |
| MobileNetV3 Howard et al. ( | 84.89 | 80.50 | 78.78 | 86.51 | 75.36 | 84.83 | 74.59 | 62.75 | 92.13 |
| EfficientNet-B3 Tan and Le ( | 85.22 | 80.50 | 78.82 | 86.70 | 75.85 | 84.50 |
| 59.84 | 89.86 |
| Inception v4 Szegedy et al. ( | 85.33 | 80.83 | 79.05 | 86.39 | 74.88 | 86.00 | 75.94 | 67.37 | 93.58 |
| ResNet50 He et al. ( | 85.44 | 81.50 | 79.99 |
|
| 83.83 | 75.28 | 57.69 | 88.61 |
| DenseNet201 Huang et al. ( | 86.56 | 83.50 | 81.55 | 86.57 | 73.91 | 84.17 | 72.48 | 61.96 | 92.75 |
| O-Net |
|
|
| 84.49 | 67.63 |
| 74.19 |
|
|
Bold font to highlight the optimal values.
Figure 6Receiver Operating Characteristic curves of the different methods for classification task on the ISIC2017 dataset.
Figure 7Comparison of different methods on the ISIC2017 dataset by visualization. (A) Image, (B) Ground Truth, (C) O-Net, (D) TransUNet, (E) Swin-UNet, (F) CE-Net, (G) R50 AttUNet, and (H) UNet.
Ablation study on the encoder of CNN method.
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| MobileNetV3 Howard et al. ( |
| 76.66 | 26.27 | 86.12 | 62.25 | 82.07 |
| 94.06 | 55.72 | 88.62 | 73.77 |
| DenseNet201 Huang et al. ( | 20.01 | 78.91 | 20.45 | 87.52 | 65.52 | 82.61 | 78.20 | 95.05 | 57.42 | 86.40 | 78.65 |
| Resnet50 He et al. ( | 25.55 | 79.16 | 23.01 | 87.71 | 66.86 | 81.73 | 75.22 | 94.18 | 58.86 |
| 78.31 |
| Inception v3 Szegedy et al. ( | 23.83 | 80.36 | 22.78 | 88.09 | 63.76 | 82.19 | 79.25 | 95.16 |
| 87.12 |
|
| EfficientNet-b3 Tan and Le ( | 12.23 |
|
|
|
|
| 77.13 |
| 61.52 | 90.03 | 80.74 |
The symbol ↑ means the higher value, the better.
The symbol ↓ means the lower value, the better.
Bold font to highlight the optimal values.
Ablation study on the combination of CNN method and Swin Transformer method.
|
|
| ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||||||
| ✓ | ✓ | 78.86 | 28.86 | 87.72 | 62.19 | 83.11 | 76.67 | 94.49 | 56.61 | 89.48 | 80.58 | ||
| ✓ | ✓ | 79.93 | 26.88 | 87.90 | 68.09 | 83.89 | 76.05 | 94.42 |
| 87.32 | 78.86 | ||
| ✓ | ✓ | ✓ | 80.34 | 22.53 |
| 67.38 | 83.95 | 77.01 | 95.12 | 60.06 | 88.76 |
| |
| ✓ | ✓ | 77.55 | 31.03 | 86.14 | 63.49 | 81.59 | 75.82 | 93.68 | 54.61 | 90.19 | 74.87 | ||
| ✓ | ✓ | 79.13 | 21.55 | 85.47 | 66.53 | 83.28 | 79.61 | 94.29 | 56.58 |
| 76.60 | ||
| ✓ | ✓ | ✓ | 79.38 | 22.34 | 87.60 | 62.53 | 84.86 |
| 94.42 | 58.75 | 90.64 | 75.66 | |
| ✓ | ✓ | ✓ | 79.47 | 29.19 | 87.71 | 66.21 | 81.64 | 74.69 | 94.65 | 61.61 | 89.19 | 80.02 | |
| ✓ | ✓ | ✓ | 80.41 | 27.33 | 86.74 | 71.19 | 84.32 | 77.29 | 94.30 | 60.63 | 89.2 | 79.64 | |
| ✓ | ✓ | ✓ | ✓ |
|
| 88.36 |
|
| 77.13 |
| 61.52 | 90.03 | 80.74 |
The symbol ↑ means the higher value, the better.
The symbol ↓ means the lower value, the better.
Bold font to highlight the optimal values.
Ablation study on learning rate and batch size.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 1e-1 | 79.21 |
| 86.56 | 63.48 |
|
| 94.32 | 56.99 |
| 78.69 |
| 5e-2 |
| 21.04 |
| 67.45 | 84.44 | 77.13 |
|
| 90.03 | 80.74 |
| 1e-2 | 79.07 | 20.14 | 87.64 | 67.74 | 81.95 | 74.69 | 94.71 | 58.33 | 89.44 | 78.03 |
| 5e-3 | 79.76 | 23.07 | 88.18 |
| 83.60 | 76.92 | 94.42 | 58.84 | 88.59 | 79.03 |
| 1e-3 | 76.57 | 30.37 | 85.28 | 62.96 | 81.61 | 74.51 | 92.96 | 54.13 | 86.37 |
|
|
|
|
|
|
|
|
|
|
| ||
| 8 | 78.81 |
| 88.12 | 44.45 |
|
| 94.73 |
| 89.97 | 81.00 |
| 16 | 78.36 | 18.25 |
| 38.12 | 84.57 | 79.46 | 95.16 | 66.20 |
|
|
| 24 |
| 21.04 | 88.36 |
| 84.44 | 77.13 | 95.24 | 61.52 | 90.03 | 80.74 |
| 32 | 80.35 | 27.93 | 88.32 | 66.70 | 81.94 | 76.19 |
| 64.06 | 88.94 | 81.27 |
The symbol ↑ means the higher value, the better.
The symbol ↓ means the lower value, the better.
Bold font to highlight the optimal values.