| Literature DB >> 35271157 |
Xinbiao Gao1,2,3,4, Junhua Xu1,2, Chuan Luo1,2,5, Jun Zhou1,2, Panling Huang1,2, Jianxin Deng1,2.
Abstract
Detection of human lower body provides an implementation idea for the automatic tracking and accurate relocation of automatic vehicles. Based on traditional SSD and ResNet, this paper proposes an improved detection algorithm R-SSD for human lower body detection, which utilizes ResNet50 instead of VGG16 to improve the feature extraction level of the model. According to the application of acquisition equipment, the model input resolution is increased to 448 × 448 and the model detection range is expanded. Six feature maps of the updated resolution network are selected for detection and the lower body image dataset is clustered into five categories for aspect ratio, which are evenly distributed to each feature detection map. The experimental results show that the model R-SSD detection accuracy after training reaches 85.1% mAP. Compared with the original SSD, the detection accuracy is improved by 7% mAP. The detection confidence in practical application reaches more than 99%, which lays the foundation for subsequent tracking and relocation for automatic vehicles.Entities:
Keywords: ResNet; SSD; object detection
Mesh:
Year: 2022 PMID: 35271157 PMCID: PMC8914923 DOI: 10.3390/s22052008
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1SSD object recognition algorithm architecture.
Figure 2The visualization architecture of VGG network 2.2 The residual network of ResNet.
Figure 3The residual network of ResNet.
Figure 4The structure of improved SSD with ResNet (R-SSD). In the figure, the backbone network is used for feature extraction; the extra network is used to extract deeper features and the prediction network is used to detect objects on a multi-scale feature map, including category prediction and position regression.
Figure 5The structure of the feature extraction network of R-SSD.
Object multiscale detection network parameters.
| Feature Layer Group | Feature Map Size | Default Boxes | |
|---|---|---|---|
| Distribution | Number | ||
| Bottleneck3_4 | 56 × 56 | 5 | 15,680 |
| Bottleneck4_6 | 28 × 28 | 5 | 3920 |
| Bottleneck5_3 | 14 × 14 | 5 | 980 |
| Bottleneck6 | 7 × 7 | 5 | 245 |
| Conv7 | 4 × 4 | 5 | 80 |
| Conv9 | 1 × 1 | 5 | 5 |
Relative size of feature detect maps.
| Number | Relative Size | Aspect Ratio |
|---|---|---|
| 1 | (0.134, 0.317) | 0.42 |
| 2 | (0.082, 0.183) | 0.45 |
| 3 | (0.395, 0.861) | 0.46 |
| 4 | (0.243, 0.406) | 0. 6 |
| 5 | (0.416, 0.416) | 1.0 |
Figure 6(a) The relationship between the number of clusters and the average IOU; (b) Dataset clustering bounding boxes.
Figure 7Labeled lower bodies shown in software ‘LabelImg’.
Number of labeled images for training.
| Number of Images | Number of Lower Body with Complex Images | Number of Lower Body with Simple Images | Total Number of Lower Body |
|---|---|---|---|
| 1132 | 1367 | 213 | 1580 |
Figure 8Training process loss curve of SSD based on 300 × 300 pixels and R-SSD based on 448 × 448 pixels. The red line represents the loss of the training process, and the black line represents the loss of the validation process. Because the training data structure is relatively simple and the validation loss is the result of training after an epoch, the validation loss is slightly lower than the training loss in the initial stage. (a) SSD training process loss; (b) R-SSD training process loss.
Detection accuracy assessment for SSD and R-SSD on test set.
| Method | Input | Data | Pre-Train | BN | Clusters | mAP |
|---|---|---|---|---|---|---|
| SSD | 300 | Complex | √ | × | × | 78.1% |
| R-SSD | 300 | Complex | √ | √ | × | 80.7% |
| R-SSD | 448 | Complex | √ | √ | × | 83.0% |
| R-SSD | 448 | Complex + Simple | √ | √ | × | 84.5% |
| R-SSD | 448 | Complex + Simple | √ | √ | √ | 85.1% |
Figure 9Object detection results in different environments. (a) Sufficient illumination; (b) Dim illumination; (c) Object scale change; (d) Object occlusion.