| Literature DB >> 27271635 |
Alejandro González1,2, Zhijie Fang3,4, Yainuvis Socarras5,6, Joan Serrat7,8, David Vázquez9,10, Jiaolong Xu11,12, Antonio M López13,14.
Abstract
Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and nighttime. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images; (b) just infrared images; and (c) both of them. In order to obtain results for the last item, we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset that we have built for this purpose as well as on the publicly available KAIST multispectral dataset.Entities:
Keywords: day/nighttime; far infrared; pedestrian detection
Year: 2016 PMID: 27271635 PMCID: PMC4934246 DOI: 10.3390/s16060820
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Camera setup for the CVC-14 dataset and registered sample frames showing the different field of views. (a) Fields of view of the visible and far infrared cameras and (b) example images.
Figure 2Sample pedestrians from the CVC-14 dataset.
FLIR Tau 2 and UI-3240CP camera specifications.
| Specifications | FLIR Tau 2 | IDS UI-3240CP |
|---|---|---|
| Resolution | 640 × 512 pixels | 1280 × 1024 pixels |
| Pixel size | 17 | 5.3 |
| Focal length | 13 mm | Adjustable (fixed 4 mm) |
| Sensitive area | 10.88 mm × 8.7 mm | 6.784 mm × 5.427 mm |
| Frame rate | 30/25 Hz (NTSC/PAL) | 60 fps |
New CVC-14 dataset summary of images and annotated pedestrians.
| Set | Variable | FIR | Visible | ||
|---|---|---|---|---|---|
| Day | Night | Day | Night | ||
| Training | Positive Frames | 2232 | 1386 | 2232 | 1386 |
| Negative Frames | 1463 | 2004 | 1463 | 2004 | |
| Annotated Pedestrians | 2769 | 2222 | 2672 | 2007 | |
| Mandatory Pedestrians | 1327 | 1787 | 1514 | 1420 | |
| Testing | Frames | 706 | 727 | 706 | 727 |
| Annotated Pedestrians | 2433 | 1895 | 2302 | 1589 | |
| Mandatory Pedestrians | 2184 | 1541 | 2079 | 1333 | |
Average miss rate (AMR) in the CVC-14 dataset.
| Detector | Day | Night | |||
|---|---|---|---|---|---|
| Visible | FIR | Visible | FIR | ||
| SVM | HOG | 42.9 | 22.7 | 71.8 | 25.4 |
| LBP | 40.6 | 21.6 | 87.6 | 32.1 | |
| HOG+LBP | 37.6 | 21.5 | 76.9 | ||
| DPM | HOG | 28.6 | 18.9 | 73.6 | 24.1 |
| HOG+LBP | 25.2 | 18.3 | 76.4 | 31.6 | |
| RF | HOG | 39.9 | 20.7 | 68.2 | 24.4 |
| HOG+LBP | 26.6 | 81.2 | 24.8 | ||
Figure 3Results using different detectors over CVC-14 dataset. First row plot results using detectors based on (a) SVM/HOG, (b) SVM/HOG+LBP, (c) DPM/HOG, (d) DPM/HOG+LBP, (e) RF/HOG and (f) RF/HOG+LBP.
AMR (average miss rate) in the KAIST dataset. The three rows in each cell represent the AMR for near, medium and reasonable pedestrians, as explained in the text.
| Detector | Day | Night | |||||
|---|---|---|---|---|---|---|---|
| Visible | FIR | Visible + FIR | Visible | FIR | Visible + FIR | ||
| RF | HOG + LBP | 39.7 | 31.5 | 76.0 | 29.4 | ||
| 74.5 | 72.5 | 93.2 | 61.7 | ||||
| 72.7 | 70.5 | 91.4 | 56.7 | ||||
Figure 4Results using different test subsets over KAIST multispectral dataset during daytime. Results obtained with RF/HOG+LBP for (a) reasonable (b) near and (c) medium pedestrian subsets.
Figure 5Qualitative Results comparing HOG/LinSVM detectors in different time/sensor conditions. The top row shows results over visible spectrum images, the bottom row over far infrared images. Blue boxes represent correct detections (True Positive), while red boxes represent misdetections (False Negative).