| Literature DB >> 32344755 |
Itsaso Rodríguez-Moreno1, José María Martínez-Otzeta1, Izaro Goienetxea1, Igor Rodriguez-Rodriguez1, Basilio Sierra1.
Abstract
Action recognition in robotics is a research field that has gained momentum in recent years. In this work, a video activity recognition method is presented, which has the ultimate goal of endowing a robot with action recognition capabilities for a more natural social interaction. The application of Common Spatial Patterns (CSP), a signal processing approach widely used in electroencephalography (EEG), is presented in a novel manner to be used in activity recognition in videos taken by a humanoid robot. A sequence of skeleton data is considered as a multidimensional signal and filtered according to the CSP algorithm. Then, characteristics extracted from these filtered data are used as features for a classifier. A database with 46 individuals performing six different actions has been created to test the proposed method. The CSP-based method along with a Linear Discriminant Analysis (LDA) classifier has been compared to a Long Short-Term Memory (LSTM) neural network, showing that the former obtains similar or better results than the latter, while being simpler.Entities:
Keywords: action recognition; common spatial patterns; social robotics
Mesh:
Year: 2020 PMID: 32344755 PMCID: PMC7219491 DOI: 10.3390/s20082436
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Interaction example.
Figure 2Proposed approach.
Figure 3Pepper’s RGB cameras position and orientation.
Figure 4Skeleton’s joint positions and matrix representation of the extracted signals.
Characteristics of each action category.
| Category | #Video | Resolution | FPS |
|---|---|---|---|
| COME | 46 | 320 × 480 | 10 |
| FIVE | 45 | 320× 480 | 10 |
| HANDSHAKE | 45 | 320 × 480 | 10 |
| HELLO | 44 | 320 × 480 | 10 |
| IGNORE | 46 | 320 × 480 | 10 |
| LOOK AT | 46 | 320 × 480 | 10 |
Figure 5Frame sequence examples for different categories.
Figure 6Linear interpolation example.
Results obtained applying Common Spatial Patterns (CSP) with different q values and using LDA as classifier.
| Variance | Variance, Max, Min, IQR | |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| COME-FIVE | 0.7579 ± 0.13 | 0.8124 ± 0.12 | 0.7667 ± 0.17 | 0.7578 ± 0.12 |
| 0.7667 ± 0.16 |
| COME-HANDSHAKE |
| 0.8019 ± 0.12 | 0.6910 ± 0.17 |
| 0.7900 ± 0.12 | 0.6567 ± 0.16 |
| COME-HELLO |
| 0.5000 ± 0.09 | 0.5000 ± 0.14 | 0.4778 ± 0.16 | 0.4444 ± 0.09 | 0.4778 ± 0.15 |
| COME-IGNORE |
| 0.9667 ± 0.05 | 0.9667 ± 0.05 | 0.9667 ± 0.05 | 0.9667 ± 0.05 | 0.9444 ± 0.06 |
| COME-LOOK_AT | 0.8678 ± 0.09 |
| 0.8789 ± 0.11 | 0.8678 ± 0.10 | 0.8356 ± 0.14 | 0.8033 ± 0.14 |
| FIVE-HAND |
| 0.9333 ± 0.06 | 0.9223 ± 0.05 | 0.9333 ± 0.11 | 0.9000 ± 0.11 | 0.9000 ± 0.08 |
| FIVE-HELLO |
| 0.7986 ± 0.15 | 0.7764 ± 0.17 | 0.7750 ± 0.18 | 0.7528 ± 0.18 | 0.7319 ± 0.21 |
| FIVE-IGNORE |
|
| 0.9556 ± 0.11 |
| 0.9556 ± 0.11 | 0.9556 ± 0.11 |
| FIVE-LOOK_AT |
| 0.9556 ± 0.06 | 0.9556 ± 0.06 | 0.9556 ± 0.08 | 0.9556 ± 0.08 | 0.9011 ± 0.17 |
| HANDSHAKE-HELLO | 0.7431 ± 0.19 | 0.7861 ± 0.14 | 0.8097 ± 0.10 | 0.7111 ± 0.24 | 0.7889 ± 0.21 | 0.8000 ± 0.10 |
| HANDSHAKE-IGNORE | 0.9889 ± 0.04 |
| 1.0000 ± 0.00 |
| 0.9889 ± 0.04 | 0.9889 ± 0.04 |
| HANDSHAKE-LOOK_AT |
| 0.7789 ± 0.16 | 0.7567 ± 0.12 | 0.8122 ± 0.17 | 0.7467 ± 0.17 | 0.7456 ± 0.12 |
| HELLO-IGNORE | 0.9333 ± 0.14 | 0.9221 ± 0.14 | 0.9333 ± 0.11 |
| 0.9444 ± 0.14 | 0.9444 ± 0.11 |
| HELLO-LOOK_AT | 0.8445 ± 0.11 | 0.8334 ± 0.12 | 0.8556 ± 0.14 | 0.8556 ± 0.09 | 0.8000 ± 0.10 |
|
| IGNORE-LOOK_AT |
|
| 0.9889 ± 0.04 | 0.9778 ± 0.05 | 0.9678 ± 0.05 | 0.9678 ± 0.05 |
| MEAN |
| 0.8623 | 0.8506 | 0.8586 | 0.8448 | 0.8301 |
Results obtained applying CSP with different q values and using RF as classifier.
| Variance | Variance, Max, Min, IQR | |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| COME-FIVE | 0.6800 ± 0.29 | 0.6022 ± 0.24 | 0.5811 ± 0.19 |
| 0.6244 ± 0.23 | 0.5922 ± 0.21 |
| COME-HANDSHAKE | 0.7000 ± 0.20 | 0.6900 ± 0.29 | 0.6344 ± 0.29 |
| 0.6678 ± 0.32 | 0.6344 ± 0.32 |
| COME-HELLO |
| 0.3889 ± 0.21 | 0.4222 ± 0.17 | 0.4889 ± 0.22 | 0.4222 ± 0.20 | 0.3889 ± 0.20 |
| COME-IGNORE |
| 0.8900 ± 0.17 | 0.8800 ± 0.18 |
| 0.8911 ± 0.15 | 0.8578 ± 0.20 |
| COME-LOOK_AT |
| 0.7800 ± 0.20 | 0.7456 ± 0.25 | 0.8122 ± 0.23 | 0.8122 ± 0.24 | 0.7789 ± 0.24 |
| FIVE-HANDSHAKE |
| 0.7778 ± 0.15 | 0.6444 ± 0.17 | 0.8444 ± 0.17 | 0.7667 ± 0.12 | 0.6667 ± 0.17 |
| FIVE-HELLO |
| 0.5500 ± 0.22 | 0.5028 ± 0.23 |
| 0.5361 ± 0.23 | 0.5236 ± 0.24 |
| FIVE-IGNORE | 0.9444 ± 0.14 | 0.9344 ± 0.14 | 0.9344 ± 0.14 |
| 0.9456 ± 0.11 | 0.9233 ± 0.14 |
| FIVE-LOOK_AT | 0.9000 ± 0.19 | 0.8889 ± 0.21 | 0.8233 ± 0.23 |
| 0.9000 ± 0.21 | 0.8556 ± 0.25 |
| HANDSHAKE-HELLO | 0.6875 ± 0.18 | 0.5708 ± 0.14 | 0.6111 ± 0.20 |
| 0.5819 ± 0.16 | 0.6556 ± 0.15 |
| HANDSHAKE-IGNORE |
| 0.9578 ± 0.07 | 0.9133 ± 0.12 |
| 0.9578 ± 0.07 | 0.9244 ± 0.11 |
| HANDSHAKE-LOOK_AT | 0.7344 ± 0.26 |
| 0.6789 ± 0.29 | 0.7456 ± 0.26 | 0.7456 ± 0.28 | 0.6678 ± 0.25 |
| HELLO-IGNORE | 0.9000 ± 0.14 | 0.8889 ± 0.17 | 0.8667 ± 0.21 |
| 0.8889 ± 0.17 | 0.8667 ± 0.21 |
| HELLO-LOOK_AT | 0.7667 ± 0.22 | 0.6556 ± 0.32 | 0.6556 ± 0.35 |
| 0.7556 ± 0.29 | 0.7333 ± 0.28 |
| IGNORE-LOOK_AT | 0.9222 ± 0.12 |
| 0.9222 ± 0.14 |
| 0.9111 ± 0.15 |
|
| MEAN | 0.7985 | 0.7509 | 0.7211 |
| 0.7605 | 0.7335 |
Comparison between the proposed approach and LSTM approach.
| Pair of Categories | CSP (Variance and | LSTM |
|---|---|---|
| COME-FIVE | 0.7579 ± 0.13 |
|
| COME-HANDSHAKE |
| 0.7739 ± 0.16 |
| COME-HELLO | 0.5334 ± 0.16 |
|
| COME-IGNORE |
| 0.9575 ± 0.06 |
| COME-LOOK_AT |
| 0.7849 ± 0.10 |
| FIVE-HANDSHAKE |
| 0.8125 ± 0.14 |
| FIVE-HELLO | 0.8208 ± 0.14 |
|
| FIVE-IGNORE | 0.9668 ± 0.07 |
|
| FIVE-LOOK_AT |
| 0.8889 ± 0.11 |
| HANDSHAKE-HELLO |
| 0.7108 ± 0.21 |
| HANDSHAKE-IGNORE |
| 0.9764 ± 0.05 |
| HANDSHAKE-LOOK_AT | 0.8235 ± 0.18 |
|
| HELLO-IGNORE | 0.9333 ± 0.14 |
|
| HELLO-LOOK_AT |
| 0.5733 ± 0.18 |
| IGNORE-LOOK_AT |
| 0.9775 ± 0.05 |
| MEAN |
| 0.8505 |