| Literature DB >> 30634583 |
Xinghao Chen1, Guijin Wang2, Hengkai Guo3, Cairong Zhang4, Hang Wang5, Li Zhang6.
Abstract
Dynamic hand gesture recognition has attracted increasing attention because of its importance for human⁻computer interaction. In this paper, we propose a novel motion feature augmented network (MFA-Net) for dynamic hand gesture recognition from skeletal data. MFA-Net exploits motion features of finger and global movements to augment features of deep network for gesture recognition. To describe finger articulated movements, finger motion features are extracted from the hand skeleton sequence via a variational autoencoder. Global motion features are utilized to represent the global movements of hand skeleton. These motion features along with the skeleton sequence are then fed into three branches of a recurrent neural network (RNN), which augment the motion features for RNN and improve the classification performance. The proposed MFA-Net is evaluated on two challenging skeleton-based dynamic hand gesture datasets, including DHG-14/28 dataset and SHREC'17 dataset. Experimental results demonstrate that our proposed method achieves comparable performance on DHG-14/28 dataset and better performance on SHREC'17 dataset when compared with start-of-the-art methods.Entities:
Keywords: feature augmentation; gesture recognition; recurrent neural networks; skeleton
Mesh:
Year: 2019 PMID: 30634583 PMCID: PMC6359639 DOI: 10.3390/s19020239
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The framework of our proposed motion feature augmented network (MFA-Net). Finger motion features and global motion features are extracted from the input dynamic hand gesture skeleton sequence. These motion features, along with the skeleton sequence, are fed into different branches of a Long-Short Term Memory (LSTM) network to get the predicted class of input gesture.
Figure 2PoseVAE: Variational autoencoder for hand pose. We use the encoder to obtain latent features of the hand skeleton to describe the articulated movements of fingers.
Comparison of recognition rates (%) with state-of-the-art methods on DHG-14/28 dataset.
| Method | DHG-14 | DHG-28 | ||
|---|---|---|---|---|
| Fine | Coarse | Both | Both | |
| HON4D [ | - | - | 75.53 | 74.03 |
| HOG | - | - | 80.85 | 76.53 |
| Smedt et al. [ | - | - | 82.50 | 68.11 |
| SoCJ + HoHD + HoWR [ | 73.60 | 88.33 | 83.07 | 80.0 |
| NIUKF-LSTM [ | - | - | 84.92 | 80.44 |
| SL-fusion-Average [ | 76.00 | 90.72 | 85.46 | 74.19 |
| CNN + LSTM [ |
| 89.8 | 85.6 |
|
|
| 75.60 |
|
| 81.04 |
Figure A1The confusion matrix of the proposed approach for DHG-14.
Figure A2The confusion matrix of the proposed approach for DHG-28.
Comparison of recognition rates (%) with state-of-the-art methods on SHREC’17 dataset.
| Method | 14 Gestures | 28 Gestures |
|---|---|---|
| HOD4D [ | 78.53 | 74.03 |
| Riemannian Manifold [ | 79.61 | 62.00 |
| Key Frames [ | 82.90 | 71.90 |
| HOG | 83.85 | 76.53 |
| SoCJ + HoHD + HoWR [ | 88.24 | 81.90 |
| 3 cent + OED + FAD [ | 89.52 | - |
| Boulahia et al. [ | 90.48 | 80.48 |
|
|
|
|
Figure A3The confusion matrix of the proposed approach for SHREC’17 (14 gestures) dataset.
Figure A4The confusion matrix of the proposed approach for SHREC’17 (28 gestures) dataset.
Recognition rates (%) of self-comparison experiments on DHG-14 dataset.
| Method | Fine | Coarse | Both | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||||
| Skeleton | 86.0 | 42.0 | 61.2 ± 12.37 | 97.78 | 74.44 | 86.44 ± 7.94 | 93.57 | 67.86 | 77.43 ± 6.82 |
| MF(Kinematic) | 84.0 | 46.0 | 71.5 ± 11.44 | 96.67 | 64.44 | 81.94 ± 8.17 | 90.0 | 58.57 | 78.21 ± 7.49 |
| S + MF(Kinematic) | 90.0 |
|
| 97.78 | 72.22 | 89.0 ± 7.55 | 94.29 | 67.86 | 84.68 ± 6.67 |
| S + MF(VAE) |
| 48.0 | 75.6 ± 10.29 |
|
|
|
|
|
|
Recognition rates (%) of MFA-Net with/without DAD strategy on DHG-14 dataset.
| Method | Fine | Coarse | Both | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Best | Worst | Avg ± Std | Best | Worst | Avg ± Std | Best | Worst | Avg ± Std | |
| MFA-Net w/o DAD | 92.0 | 42.0 | 74.2 ± 11.81 | 100.0 | 75.56 | 90.39 ± 6.89 |
| 67.86 | 84.60 ± 7.22 |
| MFA-Net |
|
|
|
|
|
| 96.43 |
|
|
Comparison of recognition rates (%) for different classifiers on SHREC’17 dataset.
| Method | 14 Gestures | 28 Gestures |
|---|---|---|
| 90.60 | 86.07 | |
| CD | 90.83 | 86.07 |
| Random Forest | 90.36 | 85.24 |
|
|
|
|
Figure 32D t-SNE visualization of features before FC layers: (Left) feature embeddings of training set on SHREC’17 dataset; and (Right) feature embeddings of testing set.