| Literature DB >> 30832452 |
Ye Wang1, Zhenyi Liu2, Weiwen Deng3,4.
Abstract
Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.Entities:
Keywords: ROI assignment; anchor generation optimization; receptive field matching; vehicle detection
Year: 2019 PMID: 30832452 PMCID: PMC6427343 DOI: 10.3390/s19051089
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Mismatch of the receptive field and the vehicle bounding box.
Figure 2Three subnetworks in the proposed method. The feature extractor based on FPN is shown in (a) and multiscale feature maps can be generated with it. The header network of RPN is shown in (b), with rectangular convolutional kernels on the top of it. Combined with FPN feature extractor, multishape receptive fields can be generated. The header network of faster R-CNN is shown in (c).
Twelve receptive fields with different shapes. The anchor shape is the same with corresponding receptive field. R and FS are the receptive field size and stride of each feature map.
| P2 Feature Map | P3 Feature Map | P4 Feature Map | P5 Feature Map | |
|---|---|---|---|---|
| 1 × 1 kernel | R2 × R2 = 18 × 18 | R3 × R3 = 48 × 48 | R4 × R4 = 108 × 108 | R5 × R5 = 228 × 228 |
| 1 × 7 kernel | R2 × (R2 + (7 − 1) × FS2) = 18 × 30 | R3 × (R3 + (7 − 1) × FS3) = 48 × 72 | R4 × (R4 + (7 − 1) × FS4) = 108 × 156 | R5 × (R5 + (7 − 1) × FS5) = 228 × 324 |
| 1 × 13 kernel | R2 × (R2 + (13 − 1) × FS2) = 18 × 42 | R3 × (R3 + (13 − 1) × FS3) = 48 × 96 | R4 × (R4 + (13 − 1) × FS4) = 108 × 204 | R5 × (R5 + (13 − 1) × FS5) = 228 × 420 |
Figure 3Different receptive field shapes and anchor shapes in P2.
Figure 4Illustration of the values in equations.
Figure 5Our method to determine which part of the image a feature map should be involved in the region proposal stage.
Parameter values in this section.
| H. | h | f/ρ | Hv | δv | α |
|---|---|---|---|---|---|
| 375 | 1.65 m | 721.54 | 1.6 m | 0.4 m | 2° |
Figure 6Computing results and valid region of each feature map.
Figure 7Appropriate feature stride for ROI pooling.
Performance of RPN in different experiments.
| Experiments | Anchor Number | Computing Time | AR | AR_S | AR_M | AR_L |
|---|---|---|---|---|---|---|
| 1 (Original RPN) | 15 K | 0.011 s | 0.288 | 0.213 | 0.296 | 0.330 |
| 2 (RPN with FPN) | 463 K | 0.023 s | 0.352 | 0.348 | 0.356 | 0.344 |
| 3 (Proposed method) | 142 K | 0.016 s |
|
|
|
|
Performance of ROI assignments in different experiments.
| Experiments | AP | AP_S | AP_M | AP_L |
|---|---|---|---|---|
| 1 (Without ROI assignment) | 0.762 | 0.403 | 0.793 | 0.834 |
| 2 (Assignment in Original FPN) | 0.849 | 0.868 | 0.833 | 0.856 |
| 3 (Proposed method) |
|
|
|
|
Performance of final detection results in different experiments.
| Experiments | Computing Time | AP | AP_S | AP_M | AP_L |
|---|---|---|---|---|---|
| 1 (Original Faster R-CNN) | 0.037 s | 0.785 | 0.632 | 0.790 | 0.865 |
| 2 (Faster R-CNN with FPN) | 0.07 s | 0.849 | 0.868 | 0.833 | 0.856 |
| 3 (Proposed method) | 0.055 s |
|
|
|
|
Figure 8Precision–recall curves of different detectors.
Figure 9Some examples of the detection results of different vehicle detectors on Kitti.