| Literature DB >> 33921769 |
Haohua Huang1,2, Pan Zhou1,2, Ye Li1, Fangmin Sun1.
Abstract
Wearable sensors-based gait recognition is an effective method to recognize people's identity by recognizing the unique way they walk. Recently, the adoption of deep learning networks for gait recognition has achieved significant performance improvement and become a new promising trend. However, most of the existing studies mainly focused on improving the gait recognition accuracy while ignored model complexity, which make them unsuitable for wearable devices. In this study, we proposed a lightweight attention-based Convolutional Neural Networks (CNN) model for wearable gait recognition. Specifically, a four-layer lightweight CNN was first employed to extract gait features. Then, a novel attention module based on contextual encoding information and depthwise separable convolution was designed and integrated into the lightweight CNN to enhance the extracted gait features and simplify the complexity of the model. Finally, the Softmax classifier was used for classification to realize gait recognition. We conducted comprehensive experiments to evaluate the performance of the proposed model on whuGait and OU-ISIR datasets. The effect of the proposed attention mechanisms, different data segmentation methods, and different attention mechanisms on gait recognition performance were studied and analyzed. The comparison results with the existing similar researches in terms of recognition accuracy and number of model parameters shown that our proposed model not only achieved a higher recognition performance but also reduced the model complexity by 86.5% on average.Entities:
Keywords: CNN; attention mechanism; gait recognition; lightweight model; wearable devices
Mesh:
Year: 2021 PMID: 33921769 PMCID: PMC8072684 DOI: 10.3390/s21082866
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Information of the datasets used in this paper [42].
| Dataset | Number of | Data Segmentation Method | Overlap of Samples | Samples for Training | Samples for Test |
|---|---|---|---|---|---|
| Dataset #1 | 118 | Gait cycle based segmentation (two gait cycles as a sample) | 50% | 33,104 | 3740 |
| Dataset #2 | 20 | Gait cycle based segmentation (two gait cycles as a sample) | 0 | 44,339 | 4936 |
| Dataset #3 | 118 | Fixed length based segmentation (sample length = 128) | 50% | 26,283 | 2991 |
| Dataset #4 | 20 | Fixed length based segmentation (sample length = 128) | 0 | 35,373 | 3941 |
| OU-ISR | 744 | Fixed length based segmentation (sample length = 128) | 61% | 13,212 | 1409 |
Note: (1) For gait cycle based segmentation, a single sample is interpolated into a fixed length of 128 (using Linear Interpolation function). (2) There is no overlap between the training set and test set. (3) The datasets used in this paper is available.at https://github.com/qinnzou/ (accessed on 15 December 2020).
Figure 1One input sample. Ax, Ay, Az are the 3-axis acceleration data, and Gx, Gy, Gz are the 3-axis angular velocity data. The 6-axis sensor data are combined into a matrix with the shape of 6 × 128.
Figure 2The architecture of gait identification network.
Structure and parameters of the lightweight CNN.
| Layer Name | Kernel Size | Kernel Num. | Feature Map |
|---|---|---|---|
| Conv1 | 1 × 9 | 32 | 6 × 64 × 32 |
| Pool1 | 1 × 2 | / | 6 × 32 × 32 |
| BN | / | / | 6 × 32 × 32 |
| ReLU | / | / | 6 × 32 × 32 |
| Conv2 | 1 × 3 | 64 | 6 × 32 × 64 |
| Conv3 | 1 × 3 | 128 | 6 × 32 × 128 |
| Pool2 | 1 × 2 | / | 6 × 16 × 128 |
| BN | / | / | 6 × 16 × 128 |
| ReLU | / | / | 6 × 16 × 128 |
| Conv4 | 6 × 1 | 128 | 1 × 16 × 128 |
| BN | / | / | 1 × 16 × 128 |
| ReLU | / | / | 1 × 16 × 128 |
Figure 3Structure of proposed attention mechanism.
Comparison of the network with and without attention mechanism.
| Dataset Name | Classification | Accuracy | Recall | F1-Score | Parameters Num. |
|---|---|---|---|---|---|
| Dataset #1 | CNN | 93.96% | 93.95% | 93.21% | 372,598 |
| CNN+CEDS (Ours) | 94.71% | 94.67% | 93.98% | 344,055 | |
| Dataset #2 | CNN | 97.21% | 94.96% | 94.89% | 171,796 |
| CNN+CEDS (Ours) | 97.67% | 95.51% | 95.37% | 168,341 | |
| Dataset #3 | CNN | 92.88% | 92.02% | 90.90% | 372,598 |
| CNN+CEDS (Ours) | 95.09% | 95.26% | 94.45% | 343,543 | |
| Dataset #4 | CNN | 97.97% | 96.50% | 96.87% | 171,796 |
| CNN+CEDS (Ours) | 98.58% | 97.38% | 97.81% | 168,341 | |
| OU-ISIR | CNN | 59.62% | 58.69% | 53.40% | 1,657,321 |
| CNN+CEDS (Ours) | 97.16% | 96.96% | 96.20% | 1,468,266 |
Comparison of different attention mechanisms.
| Dataset Name | Methods | Accuracy | Recall | F1-Score |
|---|---|---|---|---|
| Dataset #1 | CNN+SE | 94.20% | 93.99% | 93.13% |
| CNN+CEDS (Ours) | 94.71% | 94.67% | 93.98% | |
| Dataset #2 | CNN+SE | 97.24% | 94.93% | 94.76% |
| CNN+CEDS (Ours) | 97.67% | 95.51% | 95.37% | |
| Dataset #3 | CNN+SE | 93.38% | 93.16% | 92.10% |
| CNN+CEDS (Ours) | 95.09% | 95.26% | 94.45% | |
| Dataset #4 | CNN+SE | 98.05% | 96.33% | 96.78% |
| CNN+CEDS (Ours) | 98.58% | 97.38% | 97.81% | |
| OU-ISIR | CNN+SE | 60.04% | 58.65% | 54.20% |
| CNN+CEDS (Ours) | 97.16% | 96.96% | 96.20% |
Comparative Analysis.
| Dataset Name | Classification | Accuracy | AUC | Parameters | Memory Size Needed |
|---|---|---|---|---|---|
| Dataset #1 | CNN+LSTM [ | 93.52% | - | 4,716,406 | 56.7 Mb |
| LSTM & CNN [ | 94.15% | - | - | - | |
| CNN+CEDS (Ours) | 94.71% | 94.81% | 344,055 | 4.24 Mb | |
| Dataset #2 | CNN+LSTM [ | 97.33% | - | 4,415,252 | 53.1 Mb |
| CNN+CEDS (Ours) | 97.67% | 97.96% | 168,341 | 2.13 Mb | |
| OU-ISIR | LSTM [ | 72.32% | - | 4,986,601 | 59.9 Mb |
| LSTM & CNN [ | 89.79% | - | - | - | |
| CNN+CEDS (Ours) | 97.16% | 97.32% | 1,468,266 | 17.7 Mb |
Note: Since Tran, et al. [27] did not share their source codes, we did not compare the number of model parameters with them.
Figure 4Gait features visualization of the 20 subjects in Dataset #2 (a) features extracted by CNN and (b) features extracted by our proposed CNN+CEDS. The dots of different colors represent the extracted gait features of different subjects.