| Literature DB >> 35747074 |
Yongqiang Zhang1,2, Lixin Peng2, Guilei Ma1, Menghua Man1, Shanghe Liu1.
Abstract
In this article, a multi-layer convolutional neural network (ResNet-18) and Long Short-Term Memory Networks (LSTM) model is proposed for dynamic gesture recognition. The Soli dataset is based on the dynamic gesture signals collected by millimeter-wave radar. As a gesture sensor radar, Soli radar has high positional accuracy and can recognize small movements, to achieve the ultimate goal of Human-Computer Interaction (HCI). A set of velocity-range Doppler images transformed from the original signal is used as the input of the model. Especially, ResNet-18 is used to extract deeper spatial features and solve the problem of gradient extinction or gradient explosion. LSTM is used to extract temporal features and solve the problem of long-time dependence. The model was implemented on the Soli dataset for the dynamic gesture recognition experiment, where the accuracy of gesture recognition obtained 92.55%. Finally, compare the model with the traditional methods. The result shows that the model proposed in this paper achieves higher accuracy in dynamic gesture recognition. The validity of the model is verified by experiments.Entities:
Keywords: Human-Computer Interaction; LSTM; ResNet-18; gesture recognition; millimeter-wave radar
Year: 2022 PMID: 35747074 PMCID: PMC9211067 DOI: 10.3389/fnbot.2022.903197
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Figure 1(A) An random example of the dataset, (B) 20 frames of selected images for this example.
Figure 2A diagram of the model architecture. The CNNs form a ResNet-18 model (He et al., 2016). The Long short-term memory (LSTM) is an RNN model with two hidden layers, each with 512 units.
Figure 3The residual block structure of ResNet.
Figure 4Long short-term memory internal structure diagram.
Figure 5Long short-term memory classification model.
Traning parameters.
|
|
|
|---|---|
| Framework | TensorFlow |
| Epochs | 100 |
| Loss function | Cross entropy loss |
| Optimizer algorithm | RMSProp |
| Learning rate | 0.00001 |
| LSTM | Units = 512 |
| LSTM Activation | ReLU |
| Dense | Units = 11 |
| Dense activation | softmax |
Figure 6(A) Accuracy change in ResNet-18 and LSTM model. (B) Loss change in ResNet-18 and LSTM model.
Model training results.
|
| ||
|---|---|---|
| 1 | 100.00 | 92.15 |
| 2 | 100.00 | 92.73 |
| 3 | 100.00 | 92.32 |
| 4 | 100.00 | 92.64 |
| 5 | 100.00 | 92.91 |
Figure 7Confusion matrix for the Soli 60–40% split for training and evaluation. (A) Confusion matrix for 11 categories of gestures of 10 users, (B) Normalized to percentage confusion matrix for 11 categories of gestures of 10 users.
State-of-the-art radar-based gesture recognition.
|
|
|
|
|
|
|---|---|---|---|---|
| ResNet-18+LSTM |
| Soli | this model | 11 |
| CNN+LSTM | 87.17 | Soli | Wang et al., | 11 |
| Random forest | 92.1 | Soli | Lien et al., | 4 |