| Literature DB >> 31671547 |
Kewei Wang1,2,3, Fuwu Yan4,5,6, Bin Zou7,8,9, Luqi Tang10,11,12, Quan Yuan13,14,15, Chen Lv16.
Abstract
The deep convolutional neural network has led the trend of vision-based road detection, however, obtaining a full road area despite the occlusion from monocular vision remains challenging due to the dynamic scenes in autonomous driving. Inferring the occluded road area requires a comprehensive understanding of the geometry and the semantics of the visible scene. To this end, we create a small but effective dataset based on the KITTI dataset named KITTI-OFRS (KITTI-occlusion-free road segmentation) dataset and propose a lightweight and efficient, fully convolutional neural network called OFRSNet (occlusion-free road segmentation network) that learns to predict occluded portions of the road in the semantic domain by looking around foreground objects and visible road layout. In particular, the global context module is used to build up the down-sampling and joint context up-sampling block in our network, which promotes the performance of the network. Moreover, a spatially-weighted cross-entropy loss is designed to significantly increases the accuracy of this task. Extensive experiments on different datasets verify the effectiveness of the proposed approach, and comparisons with current excellent methods show that the proposed method outperforms the baseline models by obtaining a better trade-off between accuracy and runtime, which makes our approach is able to be applied to autonomous vehicles in real-time.Entities:
Keywords: autonomous vehicles; occlusion reasoning; road detection; scene understanding
Year: 2019 PMID: 31671547 PMCID: PMC6864472 DOI: 10.3390/s19214711
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Comparison of road segmentation and proposed occlusion-free road segmentation. (a) RGB image; (b) visualization of the results of road segmentation; (c) visualization of the semantic representation of the scene, which could be obtained by semantic segmentation algorithms in real applications or human annotation in training phase; (d) visualization of the results of the proposed occlusion-free road segmentation. Green refers to the road area in (b) and (d).
Our network architecture in detail. Size refers to output feature maps size for an input size of 384 × 1248.
| Stage | Block Type | Size |
|---|---|---|
| Encoder | Context Down-sampling | 192 × 624 × 16 |
| Context Down-sampling | 96 × 312 × 32 | |
| Factorized blocks | 96 × 312 × 32 | |
| Context Down-sampling | 48 × 156 × 64 | |
| Dilated blocks | 48 × 156 × 64 | |
| Context down-sampling | 24 × 78 × 128 | |
| Dilated blocks | 24 × 78 × 128 | |
| Decoder | Joint Context Up-sampling | 48 × 156 × 64 |
| Bottleneck Blocks | 48 × 156 × 64 | |
| Joint Context Up-sampling | 96 × 312 × 32 | |
| Bottleneck Blocks | 96 × 312 × 32 | |
| Joint Context Up-sampling | 192 × 624 × 16 | |
| Bottleneck Blocks | 192 × 624 × 16 | |
| Deconv | 384 × 1248 × 2 |
Figure 2The proposed occlusion-free road segmentation network architecture.
Figure 3The context convolution block.
Figure 4The joint context up-sampling block.
Figure 5Residual blocks in our network.
Figure 6Visualization of the road edge region. (a) The road segmentation label; (b) road edge obtained from (a) by the Canny algorithm; (c) road edge region with a width of 10 pixels.
Figure 7An example of the KITTI-occlusion-free road segmentation (KITTI-OFRS) dataset sample. (a) the RGB image; (b) annotation of semantic segmentation; (c) annotation of full road area, white denotes road.
Evaluation results of models trained with spatially-weighted cross-entropy loss (CE-SW).
| Model | Parameters | GFLOPs | FPS | ACC | PRE | REC | F1 | IoU |
|---|---|---|---|---|---|---|---|---|
| ENet | 0.37M | 3.83 |
| 91.8% | 92.1% | 89.3% | 90.7% | 82.9% |
| ERFNet | 2.06M | 24.43 | 25 | 92.3% | 92.6% | 89.7% | 91.2% | 83.8% |
| SegNet | 29.46M | 286.03 | 16 | 92.9% | 93.6% | 90.2% | 91.8% | 84.9% |
| ORBNet | 1.91M | 48.48 | 11.5 | 92.7% | 93.4% | 89.9% | 91.6% | 84.5% |
| OFRSNet | 0.39M | 2.99 | 46 |
|
|
|
|
|
Figure 8Qualitative results on the KITTI-OFRS dataset. The columns from left to right are the results of GT, ENet, ORBNet, and OFRSNet, respectively. Red denotes false negatives; blue areas correspond to false positives, and green represents true positives.
Evaluation results of models trained with cross-entropy loss (CE). The values in parentheses are the metrics degradation compared to that when models were trained with spatially-weighted cross-entropy loss (CE-SW).
| Model | ACC | PRE | REC | F1 | IoU |
|---|---|---|---|---|---|
| ENet | 90.4%(−1.4%) | 90.5%(−1.6%) | 87.6%(−1.7%) | 89.0%(−1.7%) | 80.2%(−2.7%) |
| ERFNet | 90.5%(−1.8%) | 90.9%(−1.7%) | 87.3%(−2.4%) | 89.1%(−2.1%) | 80.3%(−3.5%) |
| SegNet | 92.1%(−0.8%) | 92.6%(−1.0%) | 89.4%(−0.8%) | 91.0%(−0.8%) | 83.5%(−1.4%) |
| ORBNet | 91.5% (−1.2%) | 92.2% (−1.2%) | 88.4% (−1.5%) | 90.2% (−1.4%) | 82.2% (−2.3%) |
| OFRSNet | 91.7%(−1.5%) | 92.4%(−1.8%) | 88.6%(−1.7%) | 90.5%(−1.7%) | 82.6%(−2.9%) |
Performance comparison of the model with and without context.
| Model | Context | Parameters | GFLOPs | ACC | PRE | REC | F1 | IoU |
|---|---|---|---|---|---|---|---|---|
| OFRSNet | w/o | 0.34M | 2.96 | 92.7% | 92.8% | 90.4% | 91.6% | 84.5% |
| OFRSNet | w/ | 0.39M | 2.99 |
|
|
|
|
|
Figure 9Qualitative results on the Cityscapes dataset using ground truth semantics as input. Green represents the detected full road area.
Figure 10Qualitative results on the Cityscapes dataset using predicted semantics as input, which were obtained by the DeepLabv3+ algorithm. Green represents the detected full road area.