| Literature DB >> 36015919 |
Xiaoyu Chen1, Chuan Wang1, Jun Lu1, Lianfa Bai1, Jing Han1.
Abstract
Road-scene parsing is complex and changeable; the interferences in the background destroy the visual structure in the image data, increasing the difficulty of target detection. The key to addressing road-scene parsing is to amplify the feature differences between the targets, as well as those between the targets and the background. This paper proposes a novel scene-parsing network, Attentional Prototype-Matching Network (APMNet), to segment targets by matching candidate features with target prototypes regressed from labeled road-scene data. To obtain reliable target prototypes, we designed the Sample-Selection and the Class-Repellence Algorithm in the prototype-regression progress. Also, we built the class-to-class and target-to-background attention mechanisms to increase feature recognizability based on the target's visual characteristics and spatial-target distribution. Experiments conducted on two road-scene datasets, CamVid and Cityscapes, demonstrate that our approach effectively improves the representation of targets and achieves impressive results compared with other approaches.Entities:
Keywords: attention mechanism; intelligent vehicles; prototype learning; scene-parsing
Mesh:
Year: 2022 PMID: 36015919 PMCID: PMC9415761 DOI: 10.3390/s22166159
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The flowchart of APMNet.
Figure 2The visualization of the main steps in training and testing APMNet.
Comparison of different settings in Class-Prototype Regression.
| PU | SSA | CRA | mIoU | |
|---|---|---|---|---|
| - | 61.4 | |||
| (a) | ✓ | 65.3 | ||
| (b) | ✓ | ✓ | 65.8 | |
| (c) | ✓ | ✓ | 65.7 | |
| (d) | ✓ | ✓ | ✓ | 67.0 |
Figure 3The visualization of class prototypes. (a–d) in the figure indicate the corresponding settings according to Table 1.
Comparison of different settings in the Sample-Selection Algorithm.
| Ε | mIoU |
|---|---|
| 0.001 | 65.1 |
| 0.005 | 65.8 |
| 0.010 | 67.0 |
| 0.050 | 66.3 |
| 0.100 | 65.9 |
Comparison of the Attentional Features.
| C2C | T2B | mIoU | FPS |
|---|---|---|---|
| 74.71 | 65.5 | ||
| ✓ | 76.23 | 41.9 | |
| ✓ | 75.17 | 37.8 | |
| ✓ | ✓ | 78.88 | 28.5 |
Figure 4The visualization of the Class-to-Class Attention and the Target-to-Background Attention. (a) C2C Attention, (b) T2B Attention.
Results with CamVid; “-” indicates that the results were not given in the original paper. The bold results indicate the best for the classes.
| Algorithms | Building | Tree | Sky | Car | Sign/Symbol | Road | Pedestrian | Fence | Pole | Sidewalk | Bicycle | mIoU |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SegNet-A [ | 75.0 | 84.6 | 91.2 | 82.7 | 36.9 | 93.3 | 55.0 | 47.5 |
| 74.1 | 16.0 | - |
| SegNet-B |
|
| 92.4 | 82.1 | 20.5 |
| 57.1 | 49.3 | 27.5 | 84.4 | 30.7 | 55.6 |
| ENet [ | 74.7 | 77.8 |
| 82.4 | 51.0 | 95.1 | 67.2 | 51.7 | 35.4 |
| 34.1 | 51.3 |
| BiSeNet-A [ | 82.2 | 74.4 | 91.9 | 80.8 | 42.8 | 93.3 | 53.8 | 49.7 | 25.4 | 77.3 | 50.0 | 65.6 |
| BiSeNet-B | 83.0 | 75.8 | 92.0 | 83.7 | 46.5 | 94.6 | 58.8 |
| 31.9 | 81.4 | 54.0 | 68.7 |
| DeepLab [ | 81.5 | 74.6 | 89.0 | 82.2 | 42.3 | 92.2 | 48.4 | 27.2 | 14.3 | 75.4 | 50.0 | 61.6 |
| Dilation8 [ | 82.6 | 76.2 | 89.9 | 84.0 | 46.9 | 92.2 | 56.3 | 35.8 | 23.4 | 75.3 | 55.5 | 65.3 |
| APMNet | 86.5 | 78.8 | 92.5 |
|
| 96.3 |
| 44.4 | 39.5 | 86.5 |
|
|
Figure 5The visualization of the results of APMNet and the baseline with the CamVid dataset: (a) Inputs; (b) Ground Truth; (c) BiSeNet results; (d) APMNet results.
Results with Cityscapes; “-” indicates that the results were not given in the original paper. The bold results indicate the best for the classes.
| Algorithms | IoU Class | iIoU Class | IoU Category | iIoU Category |
|---|---|---|---|---|
| DPN [ | 66.8 | 39.1 | 86.0 | 69.1 |
| DeepLab [ | 70.4 | 42.6 | 86.4 | 67.7 |
| LDFNet [ | 71.3 | 46.3 | 88.5 | 74.2 |
| GLR [ | 77.3 | 53.4 | 90.0 | 76.8 |
| PSPNet [ | 78.4 | 56.7 | 90.6 | 78.6 |
| BiSeNet [ | 78.9 | - | - | - |
| AAF [ | 79.1 | 56.1 | 90.8 | 78.5 |
| LDN [ | 79.3 | 54.7 | 90.7 | 78.4 |
| CFNet [ | 79.6 | - | - | - |
| DFN [ | 80.3 | 58.3 | 90.8 | 79.6 |
| HANet [ | 80.9 | 58.6 | 91.2 | 79.5 |
| APMNet |
|
|
|
|
Figure 6The visualization of the results of APMNet and the baseline on the Cityscapes dataset: (a) Inputs; (b) Ground Truth; (a) BiSeNet results; (d) APMNet results.