| Literature DB >> 34960294 |
Alberto Tellaeche Iglesias1, Ignacio Fidalgo Astorquia2, Juan Ignacio Vázquez Gómez1, Surajit Saikia2.
Abstract
The use of gestures is one of the main forms of human machine interaction (HMI) in many fields, from advanced robotics industrial setups, to multimedia devices at home. Almost every gesture detection system uses computer vision as the fundamental technology, with the already well-known problems of image processing: changes in lighting conditions, partial occlusions, variations in color, among others. To solve all these potential issues, deep learning techniques have been proven to be very effective. This research proposes a hand gesture recognition system based on convolutional neural networks and color images that is robust against environmental variations, has a real time performance in embedded systems, and solves the principal problems presented in the previous paragraph. A new CNN network has been specifically designed with a small architecture in terms of number of layers and total number of neurons to be used in computationally limited devices. The obtained results achieve a percentage of success of 96.92% on average, a better score than those obtained by previous algorithms discussed in the state of the art.Entities:
Keywords: deep learning; embedded systems; gesture detection; real time
Mesh:
Year: 2021 PMID: 34960294 PMCID: PMC8705809 DOI: 10.3390/s21248202
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Basic schema of the RCNN detector.
Figure 2Basic schema of the Fast RCNN detector.
Figure 3Final schema for the Faster RCNN detector.
Main small CNN networks for its use in the RCNN detector.
| Network Name | Year of Creation | Depth | Total Number of Layers | Size in Memory | Parameters (Millions) |
|---|---|---|---|---|---|
| SqueezeNet | 2016 | 18 | 68 | 4.6 MB | 1.24 |
| GoogleNet | 2015 | 22 | 144 | 27 MB | 7.0 |
| Mobilenetv2 | 2018 | 53 | 155 | 13 MB | 3.5 |
Hand position for each gesture.
| Agree | Halt | Ok | Run |
|---|---|---|---|
|
|
|
|
|
Figure 4Image for the Halt gesture.
Figure 5Schema of the Darknet network model.
Detail of the convolutional layers in the Darknet CNN.
| Conv. Layer | Filter Size | Num. Filters | Stride | Dilation Factor | Padding |
|---|---|---|---|---|---|
| Conv. Layer 1 | 3 × 3 | 16 | 1, 1 | 1, 1 | same |
| Conv. Layer 2 | 3 × 3 | 32 | 1, 1 | 1, 1 | same |
| Conv. Layer 3 | 3 × 3 | 64 | 1, 1 | 1, 1 | same |
| Conv. Layer 4 | 3 × 3 | 128 | 1, 1 | 1, 1 | same |
| Conv. Layer 5 | 3 × 3 | 256 | 1, 1 | 1, 1 | same |
| Conv. Layer 6 | 3 × 3 | 512 | 1, 1 | 1, 1 | same |
| Conv. Layer 7 | 3 × 3 | 1024 | 1, 1 | 1, 1 | same |
| Conv. Layer 8 | 1 × 1 | 1000 | 1, 1 | 1, 1 | same |
Figure 6Proposed changes in the Darknet architecture (left) to create the new CNN (right).
Characteristics of CNN networks used in RCNN detectors.
| Network | Year | Layers | Trainable Parameters (Millions) |
|---|---|---|---|
| SqueezeNet | 2016 | 68 | 1.24 |
| GoogleNet | 2015 | 144 | 7.0 |
| Mobilenetv2 | 2018 | 155 | 3.5 |
| Darknet | 2016 | 32 | 8.5 |
| Proposed CNN | 2019 | 24 | 6.3 |
Figure 7Final layers of the gesture detector using the proposed CNN in this work in a Faster RCNN architecture.
RCNN detector obtained results.
| CNN | CCP Metric | Accuracy | Mean Detection Time (s) |
|---|---|---|---|
| Squeezenet | 98.33% | 99.9% | 3.68 s |
| Googlenet | 100% | 99.58% | 4.57 s |
| Mobilenetv2 | 100% | 99.99% | 8.97 s |
| Proposed CNN | 100% | 99.83% | 3.54 s |
Fast RCNN detector obtained results.
| CNN | CCP Metric | Accuracy | Mean Detection Time (s) |
|---|---|---|---|
| Squeezenet | 100% | 96.11% | 1 s |
| Googlenet | 96.66% | 93.65% | 1.21 s |
| Mobilenetv2 | 98.33% | 95. 38% | 1.24 s |
| Proposed CNN | 100% | 98.64% | 0.984 s |
Faster RCNN detector obtained results.
| CNN | CCP Metric | Accuracy | Mean Detection Time (s) |
|---|---|---|---|
| Squeezenet | 100% | 99.83% | 0.16 s |
| Googlenet | 100% | 99.85% | 0.37 s |
| Mobilenetv2 | 98.33% | 99.86% | 0.26 s |
| Proposed CNN | 98.33% | 94.19% | 0.145 s |
Results of the proposed network architecture using a 10 fold cross validation approach.
| Fold | CCP Metric | Mean Accuracy | Mean Detection Time (s) |
|---|---|---|---|
| 1 | 97.56% | 99.75% | 0.124 s |
| 2 | 98.45% | 97.34% | 0.147 s |
| 3 | 95.3% | 95% | 0.152 s |
| 4 | 100% | 99.3% | 0.161 s |
| 5 | 96.34% | 92.47% | 0.153 s |
| 6 | 98.5% | 96.15% | 0.139 s |
| 7 | 94.3% | 94.56% | 0.17 s |
| 8 | 97% | 99.4% | 0.142 s |
| 9 | 92.8% | 97.83% | 0.122 s |
| 10 | 99% | 95.8% | 0.151 s |
| Mean | 96.92% | 96.76% | 0.1461 s |