| Literature DB >> 26569262 |
Diego G Santos1, Bruno J T Fernandes2, Byron L D Bezerra3.
Abstract
The hand is an important part of the body used to express information through gestures, and its movements can be used in dynamic gesture recognition systems based on computer vision with practical applications, such as medical, games and sign language. Although depth sensors have led to great progress in gesture recognition, hand gesture recognition still is an open problem because of its complexity, which is due to the large number of small articulations in a hand. This paper proposes a novel approach for hand gesture recognition with depth maps generated by the Microsoft Kinect Sensor (Microsoft, Redmond, WA, USA) using a variation of the CIPBR (convex invariant position based on RANSAC) algorithm and a hybrid classifier composed of dynamic time warping (DTW) and Hidden Markov models (HMM), called the hybrid approach for gesture recognition with depth maps (HAGR-D). The experiments show that the proposed model overcomes other algorithms presented in the literature in hand gesture recognition tasks, achieving a classification rate of 97.49% in the MSRGesture3D dataset and 98.43% in the RPPDI dynamic gesture dataset.Entities:
Keywords: CIPBR; DTW; HCI; HMM; dynamic gesture
Mesh:
Year: 2015 PMID: 26569262 PMCID: PMC4701301 DOI: 10.3390/s151128646
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1HAGR-D training architecture.
Notations and definitions used to describe the HAGR-D.
| Symbol | Description |
|---|---|
| C | Hand contour center of mass |
| P | The hand contour highest point |
| The line segment between points | |
| Θ | Maximum circumcircle |
| Ψ | Convex hull points of the hand contour |
| Contour point of the convex hull Ψ | |
| Point of the Ψ set | |
| Line segment calculated between the points | |
| Set of distances | |
| Intersection between | |
| Distance from the | |
| distance | |
| Set of angles | |
| Angle from the | |
| Angle | |
| Final feature vector returned by depth CIPBR | |
| Set of position in a search dimension | |
| Position in a search dimension | |
| Set of velocities in a search dimension | |
| Velocity in a search dimension | |
| Best position found by a particle | |
| Best position found by the swarm | |
| Feature vector | |
| Reduced feature vector | |
| Number of gestures candidates returned by DTW in classification time | |
| Cost matrix generated for DTW to compare two patterns | |
| Size of | |
| Size of |
Figure 2Depth CIPBR architecture.
Figure 3(a) The hand segmented using the Microsoft Kinect; (b) the hand posture contour; (c) the mass center point drawn in the hand posture contour (dark gray point); (d) the convex hull points with the maximum circumcircle Θ (red circle), the center mass point(dark gray point), the highest point (red point) and the segment of line (green line).
Figure 4HAGR-D classification architecture.
Figure 5Comparison between HAGR-D, depth CIPBR with DTW and depth CIPBR with HMM for hand gesture recognition.
Confusion matrix of the MSRGesture3D database results classified by HAGR-D.
| Z | J | Milk | Where | Store | Pig | Past | Green | Finish | Bathroom | Hungry | Blue | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Z | 100% | - | - | - | - | - | - | - | - | - | - | - |
| J | - | 95% | - | - | - | - | - | - | - | - | - | 5% |
| Milk | - | - | 100% | - | - | - | - | - | - | - | - | - |
| Where | - | 2% | - | 94% | - | - | 4% | - | - | - | - | - |
| Store | - | 5% | - | - | 95% | - | - | - | - | - | - | - |
| Pig | - | - | - | - | - | 100% | - | - | - | - | - | - |
| Past | - | - | - | - | 3% | - | 92% | - | 4% | - | 1% | - |
| Green | - | - | - | - | 7% | - | - | 93% | - | - | - | - |
| Finish | - | - | - | - | - | - | - | - | 100% | - | - | - |
| Bathroom | - | - | - | - | - | - | - | - | - | 100% | - | - |
| Hungry | - | - | - | - | - | - | - | - | - | - | 100% | - |
| Blue | - | - | - | 2% | - | - | - | - | - | - | - | 98% |
Confusion matrix of the MSRGesture3D database results classified by the CIPBR + DTW combination.
| Z | J | Milk | Where | Store | Pig | Past | Green | Finish | Bathroom | Hungry | Blue | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Z | 92% | - | - | 4% | - | - | - | - | 2% | - | - | 2% |
| J | - | 90% | - | - | - | - | 4% | 1% | - | - | - | 5% |
| Milk | - | - | 98% | 2% | - | - | - | - | - | - | - | - |
| Where | - | 4% | - | 91% | - | - | 5% | - | - | - | - | - |
| Store | - | 10% | - | - | 90% | - | - | - | - | - | - | - |
| Pig | - | - | - | - | - | 93% | 3% | - | - | - | - | - |
| Past | - | - | - | - | 3% | - | 89% | - | 7% | 1% | - | - |
| Green | - | - | - | - | 14% | - | - | 82% | 4% | - | - | - |
| Finish | - | - | - | - | - | - | - | - | 100% | - | - | - |
| Bathroom | 2% | - | - | - | 3% | - | 5% | - | - | 90% | - | - |
| Hungry | - | - | - | - | - | - | - | - | - | - | 100% | - |
| Blue | - | - | 2% | 2% | - | - | - | - | - | - | - | 96% |
Figure 6Examples of sequences of the gestures store and green.
Figure 7Example of two generated classes by the CIPBR algorithm.
Comparison between the results for MSRGesture3D with the leave-one-subject-out cross-validation as the classification process.
| Method | Classification Rate (%) |
|---|---|
| HAGR-D | 97.49 |
| Depth CIPBR + DTW | 91.53 |
| Depth CIPBR + HMM | 88.98 |
| Actionlet [ | 95.29 |
| HON4D + | 92.45 |
| HON4D [ | 87.29 |
| ROP, Wang | 88.50 |
| Depth motion maps, Yang | 89.20 |
| Kurakin | 87.70 |
| Klaser | 85.23 |
Figure 8Example for each of the gestures performed on the RPPDI dynamic gesture dataset.
Number of sequences captured by each gesture in the RPPDI dynamic dataset.
| Gesture | Number of Sequences |
|---|---|
| Gesture 1 | 24 |
| Gesture 2 | 24 |
| Gesture 3 | 31 |
| Gesture 4 | 18 |
| Gesture 5 | 26 |
| Gesture 6 | 33 |
| Gesture 7 | 32 |
Comparison between the results in RPPDI dynamic gesture dataset.
| Method | Classification Rate (%) |
|---|---|
| HAGR-D | 98.43 |
| Speed Up Robust Features (SURF) + HMM [ | 75.00 |
| Local Contour Sequence (LCS) + HMM [ | 77.00 |
| Convex SURF (CSURF) + HMM [ | 91.00 |
| Convex LCS (CLCS) + HMM [ | 91.00 |
| SURF + DTW [ | 38.00 |
| LCS + DTW [ | 78.00 |
| CSURF + DTW [ | 93.00 |
| CLCS + DTW [ | 97.00 |
Confusion matrix of RPPDI dynamic gesture database classified by the HAGR-D system.
| Gesture | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| 1 | 100% | - | - | - | - | - | - |
| 2 | - | 100% | - | - | - | - | - |
| 3 | - | - | 89% | - | - | 11% | - |
| 4 | - | - | - | 100% | - | - | - |
| 5 | - | - | - | - | 100% | - | - |
| 6 | - | - | - | - | - | 100% | - |
| 7 | - | - | - | - | - | - | 100% |