| Literature DB >> 34960562 |
Iram Noreen1, Muhammad Hamid2, Uzma Akram1, Saadia Malik3, Muhammad Saleem4.
Abstract
Recently, several computer applications provided operating mode through pointing fingers, waving hands, and with body movement instead of a mouse, keyboard, audio, or touch input such as sign language recognition, robot control, games, appliances control, and smart surveillance. With the increase of hand-pose-based applications, new challenges in this domain have also emerged. Support vector machines and neural networks have been extensively used in this domain using conventional RGB data, which are not very effective for adequate performance. Recently, depth data have become popular due to better understating of posture attributes. In this study, a multiple parallel stream 2D CNN (two-dimensional convolution neural network) model is proposed to recognize the hand postures. The proposed model comprises multiple steps and layers to detect hand poses from image maps obtained from depth data. The hyper parameters of the proposed model are tuned through experimental analysis. Three publicly available benchmark datasets: Kaggle, First Person, and Dexter, are used independently to train and test the proposed approach. The accuracy of the proposed method is 99.99%, 99.48%, and 98% using the Kaggle hand posture dataset, First Person hand posture dataset, and Dexter dataset, respectively. Further, the results obtained for F1 and AUC scores are also near-optimal. Comparative analysis with state-of-the-art shows that the proposed model outperforms the previous methods.Entities:
Keywords: 2D CNN; classification; deep learning; depth data; hand posture; multi stream
Mesh:
Year: 2021 PMID: 34960562 PMCID: PMC8708730 DOI: 10.3390/s21248469
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1A few hand posture samples from selected hand posture datasets, (left) Kaggle Samples [33], (middle) Dexter Samples [22], (right) First Person Samples [34].
Details of hand posture datasets.
| Title | Total Frames | Training Frames | Testing Frames | Number of Classes | Dimension |
|---|---|---|---|---|---|
| Kaggle [ | 20,000 | 13,375 | 6625 | 10 | 320 × 240 |
| First Person [ | 105,469 | 98,842 | 6627 | 45 | 320 × 240 |
| Dexter [ | 26,000 | 19,519 | 6481 | 7 | 320 × 240 |
Figure 2Proposed methodology.
Details concerning the network parameters.
| Network Parameters | Values |
|---|---|
| Total parameters | 114,078 |
| Trainable parameters | 113,316 |
| Non trainable parameters | 762 |
| Learning rate | 0.001 |
| Optimizer | Adam |
| Epochs | 25 |
| Iteration per epoch | 552 |
Network architecture details.
| No | Name | Type | Activations | Learnable | Total Learnable |
|---|---|---|---|---|---|
| 1 | Image input | Image input | 28 × 28 × 1 | - | 0 |
| 2 | Conv_1 | Convolution | 28 × 28 × 3 | Weights 5 × 5 × 1 × 3 | 78 |
| 3 | Conv_3 | Convolution | 28 × 28 × 3 | Weights 5 × 5 × 1 × 3 | 78 |
| 4 | reLu_3 | ReLU | 28 × 28 × 3 | - | 0 |
| 5 | batchnorm_3 | Batch normalization | 28 × 28 × 3 | Offset 1 × 1 × 3 | 6 |
| 6 | maxpool_3 | Max pooling | 28 × 28 × 3 | - | 0 |
| 7 | reLu_1 | ReLU | 28 × 28 × 3 | - | 0 |
| 8 | batchnorm_1 | Batch normalization | 28 × 28 × 3 | Offset 1 × 1 × 3 | 6 |
| 9 | maxpool_1 | Max pooling | 28 × 28 × 3 | - | 0 |
| 10 | Conv_4 | Convolution | 28 × 28 × 3 | Weights 5 × 5 × 1 × 3 | 78 |
| 11 | reLu_4, ReLU | ReLU | 28 × 28 × 3 | - | 0 |
| 12 | batchnorm_4 | Batch normalization | 28 × 28 × 3 | Offset 1 × 1 × 3 | 6 |
| 13 | maxpool_4 | Max pooling | 28 × 28 × 3 | - | 0 |
| 14 | Conv_2 | Convolution | 28 × 28 × 3 | Weights 5 × 5 × 1 × 3 | 78 |
| 15 | reLu_2, ReLU | ReLU | 28 × 28 × 3 | - | 0 |
| 16 | batchnorm_2 | Batch normalization | 28 × 28 × 3 | Offset 1 × 1 × 3 | 6 |
| 17 | maxpool_2 | Max pooling | 28 × 28 × 3 | - | 0 |
| 18 | Depthcat | Depth concatenation | 28 × 28 × 12 | - | 0 |
| 19 | Fc | Fully connected | 1 × 1 × 12 | Weights 12 × 9408 | 112,908 |
| 20 | SoftMax | SoftMax | 1 × 1 × 12 | - | 0 |
| 21 | Classoutput | Classification output | - | - | 0 |
Experimentation results of parameter tuning for the Kaggle hand posture dataset.
| Optimizer | Adam | ||
|---|---|---|---|
| Learning Rate | 0.01 | 0.001 | 0.0001 |
| Batch Size | Accuracy% | Accuracy% | Accuracy% |
| 25 | 91.57 | 99.60 | 98.78 |
| 30 | 92.27 | 99.99 | 99.32 |
| 60 | 98.51 | 99.58 | 97.74 |
| 90 | 98.43 | 99.62 | 98.99 |
Summary of testing results of the proposed methodology with three datasets.
| Dataset |
|
|
| AUC | Mean Error | |
|---|---|---|---|---|---|---|
| Kaggle [ | 99.99% | 1 | 0.99 | 1 | 0.9992 | 0.01 |
| Dexter [ | 99.48% | 1 | 1 | 1 | 1 | 0.52 |
| First Person [ | 98% | 0.93 | 0.92 | 0.92 | 0.9900 | 2.31 |
Figure 3Confusion matrices of the performance results by the proposed model by using (a) Kaggle, (b) Dexter, and (c) First Person datasets.
Figure 4Macro average ROC curve and AUC score by the proposed approach using the Kaggle dataset.
Figure 5Macro average ROC curve and AUC score by the proposed approach using the Dexter dataset.
Figure 6Macro average ROC curve and AUC score by the proposed approach using the First Person dataset.
Figure 7Plot of accuracy and loss for validation.
Comparison of the proposed model with state-of-art technique for all datasets.
| Dataset | Previous Approaches | Proposed Approach | |||
|---|---|---|---|---|---|
| Year | Reference | Technique |
| ||
| Dexter [ | 2018 | Sanchez-Riera et al. [ | CNN | 87% | 99.48% |
| First Person [ | 2017 | Garcia-Hernando et al. [ | TF | 80.69% | 98% |
| 2019 | Tekin et al. [ | LSTM + MLP | 88.47% | ||
| Kaggle [ | 2021 | Gadekallu et al. [ | CNN | 100% | 99.99% |