| Literature DB >> 35632150 |
Xiaowei Xu1,2, Hao Xiong1,2, Liu Zhan1,2, Grzegorz Królczyk3, Rafal Stanislawski4, Paolo Gardoni5, Zhixiong Li3,6.
Abstract
When performing multiple target detection, it is difficult to detect small and occluded targets in complex traffic scenes. To this end, an improved YOLOv4 detection method is proposed in this work. Firstly, the network structure of the original YOLOv4 is adjusted, and the 4× down-sampling feature map of the backbone network is introduced into the neck network of the YOLOv4 model to splice the feature map with 8× down-sampling to form a four-scale detection structure, which enhances the fusion of deep and shallow semantics information of the feature map to improve the detection accuracy of small targets. Then, the convolutional block attention module (CBAM) is added to the model neck network to enhance the learning ability for features in space and on channels. Lastly, the detection rate of the occluded target is improved by using the soft non-maximum suppression (Soft-NMS) algorithm based on the distance intersection over union (DIoU) to avoid deleting the bounding boxes. On the KITTI dataset, experimental evaluation is performed and the analysis results demonstrate that the proposed detection model can effectively improve the multiple target detection accuracy, and the mean average accuracy (mAP) of the improved YOLOv4 model reaches 81.23%, which is 3.18% higher than the original YOLOv4; and the computation speed of the proposed model reaches 47.32 FPS. Compared with existing popular detection models, the proposed model produces higher detection accuracy and computation speed.Entities:
Keywords: improved YOLOv4; multi-scale detection; multiple target detection
Mesh:
Year: 2022 PMID: 35632150 PMCID: PMC9144427 DOI: 10.3390/s22103742
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Improved YOLOv4 algorithm framework.
Figure 2Architecture of CBAM. The module has two sequential sub-modules: channel and spatial.
Figure 3Graph of the change in training loss values.
Figure 4Comparison of detection results. (a) YOLOv4. (b) Improved YOLOv4. Red, blue, green box represent the labels of car, cyclist, pedestrian.
Experimental results comparing before and after the improvement of the YOLOv4 algorithm.
| Models | Improvements | AP@0.5 (%) | mAP@0.5 (%) | Model Size (MB) | ||
|---|---|---|---|---|---|---|
| Car | Pedestrian | Cyclist | ||||
| A | YOLOv4 | 87.52 | 68.21 | 78.42 | 78.05 | 256.2 |
| B | A + Add scale detection layer | 88.31 | 71.06 | 80.45 | 79.94 | 258.7 |
| C | B + DIoU-based Soft-NMS | 88.53 | 71.31 | 80.54 | 80.13 | 258.7 |
| D | B + CBAM | 89.15 | 72.68 | 81.02 | 80.95 | 269.3 |
| E | D + DIoU-based Soft-NMS | 89.52 | 73.03 | 81.15 | 81.23 | 269.3 |
Figure 5Visualized position prediction heat map with detection scale of 26 × 26. (a) Input image. (b) Heat map of yolov4 output. (c) Heat map of yolv4 + CBAM output.
Experimental results compared with other algorithms on the KITTI dataset.
| Algorithms | AP@0.5 (%) | mAP | FPS | ||
|---|---|---|---|---|---|
| Car | Pedestrian | Cyclist | |||
| Faster R-CNN | 83.07 | 62.78 | 60.83 | 68.89 | 14.21 |
| Cascade R-CNN | 88.15 | 75.24 | 74.50 | 79.30 | 8.20 |
| SSD | 75.33 | 50.06 | 49.67 | 58.35 | 45.13 |
| YOLOv3 | 80.28 | 69.01 | 75.06 | 74.78 | 40.93 |
| YOLOv4 | 87.52 | 68.21 | 78.42 | 78.05 | 51.68 |
| Improved YOLOv4 | 89.52 | 73.03 | 81.15 | 81.23 | 47.32 |
Figure 6The precision-recall curves about different detection methods on the KITTI dataset.
Experimental results compared with other algorithms on the BDD100K dataset.
| Algorithms | AP@0.5 (%) | mAP | FPS (Frames/s) | ||
|---|---|---|---|---|---|
| Car | Pedestrian | Cyclist | |||
| Faster R-CNN | 60.02 | 48.83 | 46.17 | 51.67 | 13.10 |
| Cascade R-CNN | 65.77 | 50.41 | 47.36 | 54.51 | 7.40 |
| SSD | 50.35 | 39.26 | 38.76 | 42.79 | 44.52 |
| YOLOv3 | 62.72 | 47.60 | 48.32 | 52.88 | 40.28 |
| YOLOv4 | 72.26 | 50.86 | 54.78 | 59.30 | 51.45 |
| Improved YOLOv4 | 73.92 | 54.26 | 56.53 | 61.57 | 46.83 |