| Literature DB >> 35082843 |
Qusay Sellat1, SukantKishoro Bisoy1, Rojalina Priyadarshini1, Ankit Vidyarthi2, Sandeep Kautish3, Rabindra K Barik4.
Abstract
Understanding the situation is a critical component of any self-driving system. Accurate real-time visual signal processing to create pixelwise classed pictures, also known as semantic segmentation, is critical for scenario comprehension and subsequent acceptance of this new technology. Due to the intricate interaction between pixels in each frame of the received camera data, such efficiency in terms of processing time and accuracy could not be achieved prior to recent advances in deep learning algorithms. We present an effective approach for semantic segmentation for self-driving automobiles in this study. We combine deep learning architectures like convolutional neural networks and autoencoders, as well as cutting-edge approaches like feature pyramid networks and bottleneck residual blocks, to develop our model. The CamVid dataset, which has undergone considerable data augmentation, is utilised to train and test our model. To validate the suggested model, we compare the acquired findings to various baseline models reported in the literature.Entities:
Mesh:
Year: 2022 PMID: 35082843 PMCID: PMC8786485 DOI: 10.1155/2022/6390260
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Architecture of feature pyramid network.
Figure 2Types of bottleneck residual blocks (BRBs). (a) BRB does not perform spatially nor channel dimension reduction. (b) BRB performs both, and thus no residual connection is allowed.
Details of bottleneck residual block.
| Input | Operator | Output |
|---|---|---|
|
| 1 × 1 conv2d, ReLU6 |
|
|
| 3 × 3 dwise, | ( |
| ( | linear1 × 1 conv2d | ( |
Figure 3Proposed architecture of the bottleneck residual block.
Figure 4Some samples of the CamVid dataset [20].
Details of the dataset utilised for validating the proposed architecture.
| Dataset | Videos with object class semantic labels (it presents ground truth labels of 32 semantic classes like building, tree, sky, side walk, column-pole, fence, pedestrian, and so on) |
|---|---|
| Name | CamVid |
| Size | 604 MB |
Figure 5Details of hardware specifications used in the work.
Figure 6Samples of semantic segmentation results of the images tested on the CamVid dataset. The left column represents the original image, the column in the middle represents the ground truth labeled image, and the right column represents the predicted labels.
Comparison between proposed model and other baseline models.
| Model | mCA (%) | mIoU (%) | #Params (m) |
|---|---|---|---|
| SegNet-Basic [ | 62.9 | 46.2 | — |
| SegNet [ | 65.2 | 55.6 | 29.5 |
| FCN-8s [ | — | 57 | 134.5 |
| ApesNet [ | 69.3 | 48 | — |
| ENet [ | 68.3 | 51.3 | 0.36 |
| ESPNet [ | 68.3 | 55.6 | 0.36 |
| ESCNet [ | 70.9 | 56.1 | 0.185 |
| DeepLab-LFOV [ | — | 61.6 | 37.3 |
| Dilated-8 [ | — | 65.3 | 140.8 |
| Proposed model | 78.03 | 58.275 | 5.2 |