| Literature DB >> 32610635 |
Jingwei Cao1, Chuanxue Song1, Silun Peng1,2, Shixin Song3, Xu Zhang1,2, Yulong Shao4, Feng Xiao1.
Abstract
Pedestrian detection is an important aspect of the development of intelligent vehicles. To address problems in which traditional pedestrian detection is susceptible to environmental factors and are unable to meet the requirements of accuracy in real time, this study proposes a pedestrian detection algorithm for intelligent vehicles in complex scenarios. YOLOv3 is one of the deep learning-based object detection algorithms with good performance at present. In this article, the basic principle of YOLOv3 is elaborated and analyzed firstly to determine its limitations in pedestrian detection. Then, on the basis of the original YOLOv3 network model, many improvements are made, including modifying grid cell size, adopting improved k-means clustering algorithm, improving multi-scale bounding box prediction based on receptive field, and using Soft-NMS algorithm. Finally, based on INRIA person and PASCAL VOC 2012 datasets, pedestrian detection experiments are conducted to test the performance of the algorithm in various complex scenarios. The experimental results show that the mean Average Precision (mAP) value reaches 90.42%, and the average processing time of each frame is 9.6 ms. Compared with other detection algorithms, the proposed algorithm exhibits accuracy and real-time performance together, good robustness and anti-interference ability in complex scenarios, strong generalization ability, high network stability, and detection accuracy and detection speed have been markedly improved. Such improvements are significant in protecting the road safety of pedestrians and reducing traffic accidents, and are conducive to ensuring the steady development of the technological level of intelligent vehicle driving assistance.Entities:
Keywords: YOLOv3; convolutional neural network; driving assistance; intelligent vehicle; pedestrian detection
Year: 2020 PMID: 32610635 PMCID: PMC7374403 DOI: 10.3390/s20133646
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
The basic parameters of the Darknet-53 network framework.
| Processing Mode | Residual Block Number (n) | Step Size | Convolution Kernel Number | Output Scale |
|---|---|---|---|---|
| Conv_BN | 1 | 32 | 416 × 416 | |
| Conv_BN | 2 | 64 | 208 × 208 | |
| Res_Conv_n | 1 | 1 | 64 | 208 × 208 |
| Conv_BN | 2 | 128 | 104 × 104 | |
| Res_Conv_n | 2 | 1 | 128 | 104 × 104 |
| Conv_BN | 2 | 256 | 52 × 52 | |
| Res_Conv_n | 8 | 1 | 256 | 52 × 52 |
| Conv_BN | 2 | 512 | 26 × 26 | |
| Res_Conv_n | 8 | 1 | 512 | 26 × 26 |
| Conv_BN | 2 | 1024 | 13 × 13 | |
| Res_Conv_n | 4 | 1 | 1024 | 13 × 13 |
Figure 1The schematic diagram of the detection process using YOLOv3.
The pedestrian detection performance of YOLOv3 under different grid cell sizes.
| Sequence Number | Grid Cell Size | mAP (%) | Average Processing Time (ms)/Frame |
|---|---|---|---|
| 1 | 7 × 7 | 83.54 | 13.5 |
| 2 | 10 × 10 | 85.22 | 13.8 |
| 3 | 14 × 14 | 85.25 | 15.1 |
Figure 2An example picture of improved grid cell size.
The receptive field size of the last feature layer of relevant size in the original YOLOv3.
| Convolutional Layer | Receptive Field Size | Feature Layer Size | Description |
|---|---|---|---|
| 11th_layer | 29 × 29 | 104 × 104 | - |
| 36th_layer | 165 × 165 | 52 × 52 | Output Layer |
| 61th_layer | 437 × 437 | 26 × 26 | Output Layer |
| 82th_layer | 917 × 917 | 13 × 13 | Output Layer |
The receptive field size of the last feature layer of relevant size in the improved YOLOv3.
| Convolutional Layer | Receptive Field Size | Feature Layer Size | Description |
|---|---|---|---|
| 29th_layer | 77 × 77 | 104 × 104 | Output Layer |
| 54th_layer | 213 × 213 | 52 × 52 | Output Layer |
| 79th_layer | 485 × 485 | 26 × 26 | Output Layer |
| 100th_layer | 965 × 965 | 13 × 13 | Output Layer |
Figure 3The sample example image of INRIA person dataset.
Figure 4The change curves of loss function about original and improved YOLOv3.
The multiple test results under different overlap thresholds.
| Sequence Number | Overlap Threshold | Precision (%) | Recall (%) | F1-Score (%) |
|---|---|---|---|---|
| 1 | 0.05 | 83.58 | 93.06 | 88.05 |
| 2 | 0.10 | 89.14 | 91.38 | 90.23 |
| 3 | 0.15 | 91.98 | 89.67 | 90.80 |
| 4 | 0.20 | 93.74 | 88.14 | 90.85 |
| 5 | 0.25 | 95.09 | 86.60 | 90.64 |
| 6 | 0.30 | 96.01 | 85.09 | 90.21 |
| 7 | 0.35 | 96.76 | 83.58 | 89.68 |
| 8 | 0.40 | 97.32 | 81.96 | 88.97 |
| 9 | 0.45 | 97.70 | 80.24 | 88.12 |
| 10 | 0.50 | 98.27 | 78.42 | 87.13 |
| 11 | 0.55 | 98.38 | 76.27 | 85.96 |
| 12 | 0.60 | 98.65 | 73.96 | 84.54 |
| 13 | 0.65 | 98.93 | 71.51 | 82.98 |
| 14 | 0.70 | 99.12 | 68.45 | 80.99 |
| 15 | 0.75 | 99.26 | 64.93 | 78.52 |
Figure 5Pedestrian detection test results in the dim scenario.
Figure 6Pedestrian detection test results under occlusion.
Figure 7Pedestrian detection test results in the multi-scale scenario.
Figure 8Pedestrian detection test results under complex background.
Figure 9Pedestrian detection test results in the object intensive scenario.
The object detection test results of various types of objects in the actual road scenarios.
| Sequence Number | Object Type | Original mAP (%) | Improved mAP (%) |
|---|---|---|---|
| 1 | Person | 79.20 | 90.60 |
| 2 | Car | 85.90 | 92.80 |
| 3 | Bus | 86.70 | 94.10 |
| 4 | Bicycle | 84.00 | 91.90 |
| 5 | Motorbike | 84.20 | 86.30 |
| Total | - | 84.00 | 91.14 |
The comparison of statistics in algorithm performance based on the INRIA person dataset.
| Sequence Number | Method | mAP (%) | Average Processing Time (ms)/Frame | System Environment |
|---|---|---|---|---|
| 1 | ACF [ | 83.17 | 65.9 | Intel Core i7-4710 HQ@2.50 GHz, 12 GB RAM |
| 2 | ACF + CNN [ | 84.87 | 295.9 | Intel Core i7-4710 HQ@2.50 GHz, 12 GB RAM |
| 3 | HOG + DWT [ | 85.12 | 1.5 | Machine of 3.4 GHz CPU |
| 4 | Original YOLOv3 | 83.54 | 13.5 | Intel(R) Core(TM) i7-7700 CPU@3.60GHz |
| Ours | Improved YOLOv3 | 90.42 | 9.6 | Intel(R) Core(TM) i7-7700 CPU@3.60GHz |