| Literature DB >> 33267128 |
Ikechukwu Ofodile1, Ahmed Helmi1, Albert Clapés2, Egils Avots1, Kerttu Maria Peensoo3, Sandhra-Mirella Valdma3, Andreas Valdmann3, Heli Valtna-Lukner3, Sergey Omelkov3, Sergio Escalera2,4, Cagri Ozcinar5, Gholamreza Anbarjafari1,6,7.
Abstract
Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network.Entities:
Keywords: action recognition; single pixel single photon image acquisition; time-of-flight
Year: 2019 PMID: 33267128 PMCID: PMC7514902 DOI: 10.3390/e21040414
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The data collection setup: Fianium laser delivers 30 ps duration light pulses. The collimated laser beam is directed to a scatterer, which creates divergent speckle pattern (giving divergence of 40 degree apex angle) inside the box, which are directed to the black box specially designed for the robot. Scattering illumination will reduce potential interference effects at the detector and, using controlled speckle pattern could be used to increase the lateral resolution. The light scattered from the moving object (NAO V4) and the walls is detected using single-pixel hybrid photodetector (HPD), which detects the temporal evolution of back scattered light.
Summary of the performed actions.
| One-Robot | Two-Robot | |||||||
|---|---|---|---|---|---|---|---|---|
| Task | Walk Forward | Walk Reverse | Sit Down | Stand Up | Hand Wave | Object Setup | Same Action | Different Actions |
| Repetitions | 125 | 125 | 50 | 50 | 50 | 156 | 70 | 20 |
Figure 2Start and endpoints, showing paths of the robot during directional walk.
Forward (F) movement.
| Task | A1 | A2 | B1 | C1 | C2 |
|---|---|---|---|---|---|
| Repetitions | 25 | 25 | 25 | 25 | 25 |
| Start location | A | A | B | C | C |
| Stop location | a | c | b | c | a |
Reverse (R) movement.
| Task | A1 | A2 | B1 | C1 | C2 |
|---|---|---|---|---|---|
| Repetitions | 25 | 25 | 25 | 25 | 25 |
| Start location | a | c | b | c | a |
| Stop location | A | A | B | C | C |
Figure 3Positions of sitting down and standing up actions.
Tasks performed in specific locations.
| Task | Sit Down | Stand Up | Hand-Wave | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Repetitions | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 25 | 25 |
| Location | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | 2 | 5 |
| Action | sd | sd | sd | sd | sd | su | su | su | su | su | hw | hw |
Tasks in presence of an object.
| Task | 1 | 2 | 3 | 4 | 5 | 6 * |
|---|---|---|---|---|---|---|
| Object location | ||||||
|
| A to a | C to c | A to c | C to a | B to b | hw |
| Repetitions | 12 | 12 | 12 | 12 | 12 | 12 |
|
| a to A | c to C | a to C | c to A | b to B | su/sd |
| Repetitions | 12 | 12 | 12 | 12 | 12 | 12/12 |
* In Task 6, the robot did not go forward or reverse, but performed hand-wave, stand-up and sit down action, where each action was repeated 12 times.
Figure 4Positions of object during various robot actions.
Figure 5Position of two robots during actions.
Actions performed by two robots.
| Repetition | Robot 1 Action | Position | Robot 2 Action | Position |
|---|---|---|---|---|
| 10 | Forward | 1 to 4 | Forward | 2 to 3 |
| 10 | Sit Down | 1 | Sit Down | 2 |
| 10 | Stand Up | 1 | Stand Up | 2 |
| 10 | Sit Down | 3 | Sit Down | 4 |
| 10 | Stand Up | 3 | Stand Up | 4 |
| 10 | Hand-Wave | 1 | Hand-Wave | 2 |
| 10 | Hand-Wave | 3 | Hand-Wave | 4 |
| 10 | Stand | 1 | Forward | 2 to 3 |
| 10 | Stand | 1 | Forward | 2 to 4 |
Figure 6Visualisation of the traces throughout time (x-axis). Columns correspond to different actions (respectively, forward walking, reverse walking, sitting down, standing up, and waving), whereas rows correspond to different examples. The titles on the subplots correspond to the sequence files in the dataset.
Figure 7The two-layer bidirectional GRU baseline architecture. Arrays represent information flow, grey rectangles are bidirectional GRU layers, and circles represent the concatenation operation.
Comparison on GRU models with multiple layers and/or bidirectionality. In this ablation, we defined a set of five binary problems: forward, reverse, sit-down, stand-up, and hand-wave actions. The results reported are class-weighted accuracies averaged over a 10-fold cross validation. The “Average” column is the average of performances on binary problems.
| Forward | Reverse | Sit-Down | Stand-Up | Handwave | Average | |
|---|---|---|---|---|---|---|
| GRU (1-layer, 64-hidden) | 87.28 | 83.85 | 76.48 | 78.15 | 0.945 | 84.05 |
| GRU (two-layer, 64-hidden) | 88.48 | 90.06 | 85.75 | 86.41 | 94.99 | 89.14 |
| biGRU (1-layer, 64-hidden) | 89.28 | 87.62 | 82.49 | 86.85 | 96.02 | 88.45 |
| biGRU (two-layer, 64-hidden) | 91.42 | 91.08 | 90.07 | 92.51 | 97.01 | 92.42 |
Hidden layer size experiments on five binary problems (see Columns 2–6). The results reported are class-weighted accuracies averaged over a 10-fold cross validation. The “Average” column is the average of performances on binary problems.
| Forward | Reverse | Sit-Down | Stand-Up | Handwave | Average | |
|---|---|---|---|---|---|---|
| biGRU (two-layer, 32-hidden) | 89.97 | 89.78 | 87.89 | 89.47 | 95.92 | 90.61 |
| biGRU (two-layer, 64-hidden) | 91.42 | 91.08 | 90.07 | 92.51 | 97.01 | 92.42 |
| biGRU (two-layer, 128-hidden) | 93.10 | 93.63 | 92.98 | 95.32 | 97.70 | 94.55 |
| biGRU (two-layer, 256-hidden) | 93.76 | 94.17 | 94.34 | 95.52 | 98.89 | 95.34 |
| biGRU (two-layer, 512-hidden) | 94.94 | 95.20 | 95.02 | 96.70 | 99.29 | 96.23 |
GRU versus LSTM on 5 binary problems (see Columns 2–6). The results reported are class-weighted accuracies averaged over a 10-fold cross validation. The “Average” column is the average of performances on binary problems.
| Forward | Reverse | Sit-Down | Stand-Up | Handwave | Average | |
|---|---|---|---|---|---|---|
| biGRU (two-layer, 64-hidden) | 91.42 | 91.08 | 90.07 | 92.51 | 97.01 | 92.42 |
| biLSTM (two-layer, 64-hidden) | 91.56 | 96.91 | 92.02 | 89.84 | 94.58 | 92.98 |
Classification on four multiclass problems obtained by biGRU (two-layer, 512-hidden) baseline. The results reported are class-weighted accuracies averaged over a 10-fold cross validation.
| Actions | Path | Directed-Path | Setup |
|---|---|---|---|
| 92.67 | 86.23 | 86.65 | 90.00 |
Figure 8Confusion matrices (row-wise normalised) from multiclass classification experiments from Table 10.
Leave-one-rep set-out cross-validation (LOROCV) experiment using biGRU (two-layer, 512-hidden). These are the same as those from last row in Table 8, but using LOROCV instead of 10-fold CV.
| F | R | sd | su | hw | Average | |
|---|---|---|---|---|---|---|
| biGRU (two-layer, 512-hidden, 10fCV) | 94.94 | 95.20 | 95.02 | 96.70 | 99.29 | 96.23 |
| biGRU (two-layer, 512-hidden, LOROCV) | 96.77 | 95.14 | 93.36 | 97.11 | 100.0 | 96.47 |
Two-robot experiments in three different scenarios: one robot standing up while other performing a particular action, the two robots performing the same action, and the two performing each a different action. Each scenario is a separate test set with a different number of examples. In brackets, the number of positive examples for each class in each scenario. Since positive/negative classes are, we report class-weighted accuracies (%).
| #{Examples} | F | R | sd | su | hw | |
|---|---|---|---|---|---|---|
| One robot standing up and sitting down | 100 | 80.00 | 88.00 | 95.00 | 15.00 | 100.00 |
| (50) | (50) | (0) | (100) | (0) | ||
| Same two actions | 70 | 25.00 | 72.86 | 75.00 | 54.00 | 50.00 |
| (10) | (10) | (20) | (20) | (20) | ||
| Two different actions | 20 | 50.00 | 100.00 | 95.00 | 55.00 | 55.00 |
| (10) | (0) | (10) | (10) | (10) |