| Literature DB >> 29461473 |
Hirokatsu Kataoka1, Yutaka Satoh2, Yoshimitsu Aoki3, Shoko Oikawa4, Yasuhiro Matsui5.
Abstract
The paper presents an emerging issue of fine-grained pedestrian action recognition that induces an advanced pre-crush safety to estimate a pedestrian intention in advance. The fine-grained pedestrian actions include visually slight differences (e.g., walking straight and crossing), which are difficult to distinguish from each other. It is believed that the fine-grained action recognition induces a pedestrian intention estimation for a helpful advanced driver-assistance systems (ADAS). The following difficulties have been studied to achieve a fine-grained and accurate pedestrian action recognition: (i) In order to analyze the fine-grained motion of a pedestrian appearance in the vehicle-mounted drive recorder, a method to describe subtle change of motion characteristics occurring in a short time is necessary; (ii) even when the background moves greatly due to the driving of the vehicle, it is necessary to detect changes in subtle motion of the pedestrian; (iii) the collection of large-scale fine-grained actions is very difficult, and therefore a relatively small database should be focused. We find out how to learn an effective recognition model with only a small-scale database. Here, we have thoroughly evaluated several types of configurations to explore an effective approach in fine-grained pedestrian action recognition without a large-scale database. Moreover, two different datasets have been collected in order to raise the issue. Finally, our proposal attained 91.01% on National Traffic Science and Environment Laboratory database (NTSEL) and 53.23% on the near-miss driving recorder database (NDRDB). The paper has improved +8.28% and +6.53% from baseline two-stream fusion convnets.Entities:
Keywords: advanced driver-assistance systems (ADAS); driving recorder; fine-grained pedestrian action recognition; two-stream convnets
Mesh:
Year: 2018 PMID: 29461473 PMCID: PMC5855092 DOI: 10.3390/s18020627
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Fine-grained pedestrian actions on the self-collected databases: (a) crossing; (b) walking straight; (c) turning; and (d) riding a bicycle. Fine-grained pedestrian action recognition should be an issue in safety systems that have a recognition problem with distinguishing different actions between subtle changes. To improve the recent safety systems such as advanced driver assistance systems (ADAS) and self-driving cars, the concept is very important because a pedestrian intention can be estimated in advance.
Summary of databases.
| DB | NTSEL Database; NTSEL | Near-Miss Driving Database; NDRDB |
|---|---|---|
| #Video (#Frame) | #Video (#Frame) | |
| # | 25 (2648) | 15 (515) |
| # | 25 (2726) | 43 (1773) |
| # | 25 (923) | 13 (593) |
| # | 25 (1632) | 11 (457) |
| #Total | 100 (7929) | 82 (3338) |
Figure 2Flowchart of our proposed approach: Proposed architecture for fine-grained pedestrian action recognition. We assign two-stream fusion convnets [45] originally proposed by Feichtenhofer et al. The conventional work operates channel-sum with two different convolutional maps in an intermediate layer of spatial- and temporal-stream. After the channel fusion layer (“fusion” in the architecture), we add several convolutional and pooling layers (conv and pool) in order to generate a strong feature, e.g., subtle difference in walking pedestrian. In the classification step, we employ deep convolutional activation features (DeCAF; the first fully-connected layer (FC) with 4096-d vector) to converge the small-scale database by combining with support vector machines (SVM) [46]. Two-stream fusion convnets and DeCAF + SVM are trained with a training-set on self-collected databases.
With or without fine-tuning.
| NTSEL (%) | NDRDB (%) | |
|---|---|---|
| End-to-End | N/A | N/A |
| Without fine-tuning (DeCAF) | 82.73 | 46.70 |
| With fine-tuning (DeCAF) |
Various fc units on the self-collected databases.
| #Fc-Unit | NTSEL (%) | NDRDB (%) |
|---|---|---|
| 128 | 88.58 | 51.01 |
| 256 | 88.73 | 48.47 |
| 512 | 89.30 | 51.49 |
| 1024 | 86.30 | |
| 2048 | 89.01 | 49.87 |
| 4096 |
Figure 3SVM parameter tuning. (a) relationship between performance rate and SVM parameter on NTSEL; (b) relationship between performance rate and SVM parameter on NDRDB.
The performance rates on the NTSEL & near-miss DR dataset.
| Approach | NTSEL (%) | NDRDB (%) |
|---|---|---|
| IDT (HOG) | 70.18 | 50.43 |
| IDT (HOF) | 64.76 | 52.05 |
| IDT (MBH) | 65.38 | 49.12 |
| IDT [ | 74.52 | 52.19 |
| DeCAF (ImageNet) [ | 53.78 | 49.94 |
| DeCAF (ImageNet with VGG-16) [ | 53.63 | 50.54 |
| DeCAF (Places205 [ | 67.48 | 49.02 |
| DeCAF (Hybrid) [ | 58.91 | 47.17 |
| DeCAF (Combined) [ | 67.44 | 49.07 |
| Two-stream ConvNets (Spatial) | 69.04 | 48.47 |
| Two-stream ConvNets (Temporal) | 64.05 | 45.93 |
| Two-stream ConvNets [ | 50.50 | |
| TDD [ | 68.39 | |
| Ours |
Figure 4Visual results on NTSEL dataset: the first three lines, there are three success cases as the examples of walking and turnings. The last row shows the failure case in a sequence of a person is riding a bicycle. Especially in the second row, we succeeded with an estimation of pedestrian intention in advance. The turning walking action is important for a safety system.