| Literature DB >> 35205460 |
Yuhong Liu1, Chunyan Han1, Lin Zhang1, Xin Gao2.
Abstract
In recent years, the pedestrian detection technology of a single 2D image has been dramatically improved. When the scene becomes very crowded, the detection performance will deteriorate seriously and cannot meet the requirements of autonomous driving perception. With the introduction of the multi-view method, the task of pedestrian detection in crowded or fuzzy scenes has been significantly improved and has become a widely used method in autonomous driving. In this paper, we construct a double-branch feature fusion structure, the first branch adopts a lightweight structure, the second branch further extracts features and gets the feature map obtained from each layer. At the same time, the receptive field is enlarged by expanding convolution. To improve the speed of the model, the keypoint is used instead of the entire object for regression without an NMS post-processing operation. Meanwhile, the whole model can be learned from end to end. Even in the presence of many people, the method can still perform better on accuracy and speed. In the standard of Wildtrack and MultiviewX dataset, the accuracy and running speed both perform better than the state-of-the-art model, which has great practical significance in the autonomous driving field.Entities:
Keywords: autonomous driving; convolution fusion; keypoints; multiview; pedestrian detection
Year: 2022 PMID: 35205460 PMCID: PMC8870950 DOI: 10.3390/e24020165
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Description of the model structure.
Figure 2Double-branch convolution fusion structure.
Figure 3Central heat map results for pedestrian detection. (a) RGB image; (b) predict object center heatmap.
Figure 4Feature projection process diagram. (a) Input: N RGB views; (b) feature maps; (c) projected feature maps.
Comparison of Wildtrack and MultiviewX.
| Dataset | Camera Number | Resolution | Area | Crowdedness |
|---|---|---|---|---|
| Wildtrack | 7 |
| 20 person/frame | |
| MultiviewX | 6 |
| 40 person/frame |
Multiview aggregation and backbone in different methods.
| Method | Multiview Aggregation | Backbone |
|---|---|---|
| RCNN andd clustering | detection results | The new DCNN |
| DeepMCD | anchor box features | GoogLeNet |
| Deep-Occlusion | anchor box features | VGG |
| MVDet | feature maps | ResNet-18 |
| Ours | feature maps | ResNet-18+feature fusion |
Performance comparison with different methods for the Wildtrack dataset.
| Method | MODA/% | MODP/% | Precision/% | Recall/% |
|---|---|---|---|---|
| RCNN and clustering | 11.3 | 18.4 | 68 | 43 |
| DeepMCD | 67.8 | 64.2 | 85 | 82 |
| Deep-Occlusion | 74.1 | 53.8 | 95 | 80 |
| MVDet | 88.2 | 75.7 | 94.7 | 93.6 |
| Ours | 90.0 | 76.2 | 94.5 | 94.7 |
Performance comparison with different methods for the MultiviewX dataset.
| Method | MODA/% | MODP/% | Precision/% | Recall/% |
|---|---|---|---|---|
| RCNN and clustering | 18.7 | 46.4 | 63.5 | 43.9 |
| DeepMCD | 70.0 | 73.0 | 85.7 | 83.3 |
| Deep-Occlusion | 75.2 | 54.7 | 97.8 | 80.2 |
| MVDet | 83.9 | 79.6 | 89.5 | 85.9 |
| Ours | 89.5 | 83.4 | 98.1 | 91.3 |
Results of the ablation experiment for the Wildtrack dataset.
| Method | MODA/% | MODP/% | Precision/% | Recall/% |
|---|---|---|---|---|
| MVDet | 88.2 | 75.7 | 94.7 | 93.6 |
| Keypoints | 88.7 | 75.3 | 95.2 | 94.5 |
| Feature fusion | 88.2 | 75.3 | 95.8 | 94.1 |
| Ours | 90.0 | 76.2 | 94.5 | 94.1 |
Results of the ablation experiment for the MultiviewX dataset.
| Method | MODA/% | MODP/% | Precision/% | Recall/% |
|---|---|---|---|---|
| MVDet | 83.9 | 79.6 | 89.5 | 85.9 |
| Keypoints | 84.7 | 80.7 | 97.8 | 86.6 |
| Feature fusion | 88.8 | 82.7 | 98.6 | 90.1 |
| Ours | 89.5 | 83.4 | 98.1 | 91.3 |
Figure 5Comparison with state-of-the-art model in Wildtrack datasets. (a) Wildtrack MODA; (b) Wildtrack MODP.
Figure 6Comparison with state-of-the-art model in MultiviewX datasets. (a) MultiviewX MODA; (b) MultiviewX MODP.
Figure 7Prediction and results under different datasets. (a) MultiviewX; (b) Wildtrack.
Running time of the models on different test sets.
| FPS | Wildtrack/ | MultiviewX/ |
|---|---|---|
| MVDet | 3.42 | 4.09 |
| Ours | 3.58 | 4.30 |