| Literature DB >> 34711889 |
Gaihua Wang1,2, Qianyu Zhai3.
Abstract
Contextual information is a key factor affecting semantic segmentation. Recently, many methods have tried to use the self-attention mechanism to capture more contextual information. However, these methods with self-attention mechanism need a huge computation. In order to solve this problem, a novel self-attention network, called FFANet, is designed to efficiently capture contextual information, which reduces the amount of calculation through strip pooling and linear layers. It proposes the feature fusion (FF) module to calculate the affinity matrix. The affinity matrix can capture the relationship between pixels. Then we multiply the affinity matrix with the feature map, which can selectively increase the weight of the region of interest. Extensive experiments on the public datasets (PASCAL VOC2012, CityScapes) and remote sensing dataset (DLRSD) have been conducted and achieved Mean Iou score 74.5%, 70.3%, and 63.9% respectively. Compared with the current typical algorithms, the proposed method has achieved excellent performance.Entities:
Year: 2021 PMID: 34711889 PMCID: PMC8553855 DOI: 10.1038/s41598-021-00585-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The overall structure of our network.
Figure 2The main architecture of the FF module.
Results of ablation experiments.
| Method | Backbone | FF | mIoU (%) | PA (%) |
|---|---|---|---|---|
| FCN8s[ | – | 64.4 | 90.8 | |
| Ours | × 1 | 72.8 | 92.9 | |
| Ours | MobileNet v2[ | × 2 | 67.7 | 91.2 |
| Ours | EfficientNet b0[ | × 2 | 70.4 | 90 |
| Ours | × 2 | 74.5 | 93.5 | |
| Ours | ResNet101[ | × 2 | 75.8 | 93.8 |
Figure 3Visualization of PASCAL VOC 2012 (val).
Performance comparison of different models in PASCAL VOC 2012 (val).
| Method | Publication | Weights | mIoU (%) | PA (%) |
|---|---|---|---|---|
| FCN8s[ | CVPR2015 | 180 MB | 64.4 | 90.8 |
| PSPNet[ | CVPR2017 | 392 MB | 70.8 | 92 |
| Deeplab[ | ECCV2018 | 309 MB | 71.8 | 92.3 |
| UperNet[ | ECCV2018 | 817 MB | 69.4 | 91.8 |
| CCNet[ | ICCV2019 | 363 MB | 71 | 92.5 |
| DANet[ | CVPR2019 | 363 MB | 73.2 | 93 |
| DRANet[ | IEEE | 400 MB | 73.8 | 93.8 |
| SPNet[ | CVPR2020 | 346 MB | 69.2 | 91.4 |
| Ours | 279 MB | 74.5 | 93.5 |
Per-class results on PASCAL VOC 2012 (val).
| Method | Background | Airplane | Bicycle | Bird | Boat | Bottle | Bus | Car | Cat | Chair | Cow | Dining | Dog | Horse | Motorcycle | Person | Potted plants | Sheep | Sofa | Train | TV |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FCN8[ | 90.6 | 80.1 | 55.4 | 77.7 | 59.7 | 57.5 | 73.5 | 75.5 | 77.4 | 22.7 | 68 | 40.5 | 71.7 | 67.1 | 72.6 | 80.6 | 48.1 | 67.5 | 35.8 | 71.9 | 59.4 |
| PSPNet[ | 91.8 | 86.2 | 53.8 | 86.3 | 65.6 | 74.1 | 83.4 | 81 | 86.1 | 27.3 | 83.1 | 48.7 | 78.5 | 79 | 82 | 80.33 | 46.3 | 83.9 | 42.6 | 74.6 | 63.5 |
| Deeplab[ | 91.6 | 88 | 51.9 | 84.4 | 66.3 | 61.1 | 87.2 | 81.8 | 88.6 | 30.7 | 81 | 59 | 81.7 | 74.2 | 74.7 | 82.9 | 51.5 | 69.5 | 47.2 | 84.3 | 69.3 |
| UPerNet[ | 91.3 | 84.1 | 55.4 | 82.1 | 65.1 | 65.7 | 83.6 | 78.8 | 82.8 | 28.1 | 74.3 | 54.3 | 77.6 | 75.3 | 78.4 | 80.7 | 47.3 | 74.5 | 42.3 | 71.8 | 64.6 |
| CCNet[ | 91.6 | 87.2 | 48.1 | 84.2 | 67.1 | 74.9 | 84.8 | 82.1 | 87.7 | 28.9 | 81.5 | 44.3 | 77.6 | 77.7 | 71.6 | 82.3 | 48.4 | 82.8 | 40 | 81.2 | 66.9 |
| DANet[ | 92.6 | 87.3 | 54.8 | 87.5 | 66.8 | 76.9 | 86.6 | 87.2 | 85.6 | 29.4 | 82 | 56.4 | 76.5 | 79.4 | 79.3 | 83.8 | 56.1 | 78.1 | 42.3 | 78.8 | 69.3 |
| DRANet[ | 92.3 | 88.6 | 56.4 | 85.8 | 71.7 | 76 | 86 | 85.1 | 90.6 | 31.2 | 80.1 | 58.5 | 83.1 | 74.7 | 80.2 | 82.7 | 55.1 | 81.8 | 41.9 | 82.6 | 66.3 |
| SPNet[ | 90.6 | 85.4 | 51.6 | 81.7 | 65.9 | 71.6 | 90.7 | 82.5 | 85 | 24.4 | 76 | 49.8 | 77.5 | 60 | 71.4 | 80.2 | 38.9 | 79.4 | 36.8 | 83.7 | 70.3 |
| Ours | 92.8 | 88.6 | 61.3 | 87.2 | 69.3 | 76.9 | 86.4 | 83 | 89.3 | 28.3 | 81.9 | 51.6 | 82.2 | 79 | 80.4 | 84.1 | 60.8 | 79 | 46.2 | 85.4 | 70.1 |
Segmentation results on the CityScapes (val).
| Method | mIoU (%) | PA (%) |
|---|---|---|
| FCN8s[ | 62.9 | 94.4 |
| U-Net[ | 61.3 | 94.2 |
| PSPNet[ | 67.1 | 95.2 |
| DeepLab[ | 68.6 | 95.5 |
| CCNet[ | 66 | 95 |
| DANet[ | 67.4 | 95.1 |
| DRANet[ | 69.2 | 95.7 |
| SPNet[ | 67.6 | 95.1 |
| Ours | 70.3 | 95.7 |
Figure 4Visualization of CityScapes (val).
Segmentation results on the DLRSD (val).
| Network | mIoU(%) | PA(%) | FLOPs | Params |
|---|---|---|---|---|
| FCN 8s[ | 52.7 | 71.8 | 6.7 × 109 | 11,853,788 |
| U-Net[ | 59.3 | 74.4 | 1.38 × 1010 | 28,957,521 |
| PSPNet[ | 59.9 | 77.6 | 4.33 × 1010 | 23,357,160 |
| DeepLab[ | 61 | 77.4 | 4.46 × 1010 | 20,237,465 |
| UperNet[ | 60.3 | 77.6 | 1.11 × 1011 | 53,619,560 |
| DANet[ | 59.1 | 77.2 | 4.81 × 1010 | 47,565,905 |
| LedNet[ | 56.6 | 75.7 | 1.26 × 1010 | 11,419,797 |
| SPNet[ | 55.2 | 74.6 | 3.78 × 1010 | 45,371,537 |
| Ours | 63.9 | 79 | 3.81 × 1010 | 37,862,993 |
Figure 5DLRSD (val) dataset visualization results.