| Literature DB >> 26016921 |
Hanguen Kim1, Sangwon Lee2, Dongsung Lee3, Soonmin Choi4, Jinsun Ju5, Hyun Myung6.
Abstract
In this paper, we present human pose estimation and gesture recognition algorithms that use only depth information. The proposed methods are designed to be operated with only a CPU (central processing unit), so that the algorithm can be operated on a low-cost platform, such as an embedded board. The human pose estimation method is based on an SVM (support vector machine) and superpixels without prior knowledge of a human body model. In the gesture recognition method, gestures are recognized from the pose information of a human body. To recognize gestures regardless of motion speed, the proposed method utilizes the keyframe extraction method. Gesture recognition is performed by comparing input keyframes with keyframes in registered gestures. The gesture yielding the smallest comparison error is chosen as a recognized gesture. To prevent recognition of gestures when a person performs a gesture that is not registered, we derive the maximum allowable comparison errors by comparing each registered gesture with the other gestures. We evaluated our method using a dataset that we generated. The experiment results show that our method performs fairly well and is applicable in real environments.Entities:
Keywords: depth information; gesture recognition; human pose estimation; low-cost platform
Mesh:
Year: 2015 PMID: 26016921 PMCID: PMC4507703 DOI: 10.3390/s150612410
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Flow diagram of the proposed human pose estimation method.
Figure 2Results of superpixel generation on human bodies. The blue points on the human bodies are the generated superpixels.
Figure 3An example of: (a) superpixel classification; (b) optimization; (c) pose estimation. Misclassified superpixels removed in the optimization process are indicated by white rectangles.
Figure 4Example of the measurement update step when the hand occludes the torso. The hand tracker extracts the depth measurements for hand candidates. The final hand position is estimated by the depth measurement with the smallest Mahalanobis distance from the previous hand position.
Figure 5Flow diagram of the proposed gesture recognition method.
Figure 6An example of an action sequence.
Figure 7Action sequence matching.
Average computation times of the proposed methods.
| Time | 65 ms | less than 1 ms | 15 fps |
Experimental results of pose estimation. L, left; R, right.
|
| ||||
|---|---|---|---|---|
| Head | 26.0 | 26.0 | 25.2 | 15.5 |
| Neck | 53.3 | 37.5 | 28.4 | 26.4 |
| Torso | 75.3 | 121.0 | 31.7 | 32.8 |
| L Shoulder | 41.2 | 45.7 | 21.9 | 24.6 |
| L Elbow | 86.2 | 80.1 | 69.2 | 66.4 |
| L Hand | 199.4 | 128.2 | 332.6 | 156.1 |
| R Shoulder | 49.6 | 34.3 | 20.9 | 23.2 |
| R Elbow | 87.7 | 68.9 | 61.9 | 52.2 |
| R Hand | 190.8 | 97.4 | 306.7 | 120.2 |
| L Hip | 44.2 | 34.9 | 29.0 | 27.2 |
| L Knee | 133.0 | 44.3 | 44.3 | 24.6 |
| L Foot | 114.0 | 68.7 | 28.6 | 41.3 |
| R Hip | 58.2 | 43.9 | 23.8 | 29.6 |
| R Knee | 130.0 | 52.6 | 47.0 | 26.2 |
| R Foot | 96.8 | 81.6 | 28.0 | 74.4 |
|
| ||||
| Average | 92.3 | 64.3 | 73.3 | 48.8 |
Average initial human pose estimation time (unit: ms).
| Time | 67.0 | 2413.1 |
Computational cost of human pose estimation (unit: GFlops).
| Computational cost for each frame | 0.81 | 1.44 |
Figure 8Experimental results of pose estimation. (a) Superpixels classification results; (b) pose estimation results.
Experimental results of gesture recognition.
| “Request for help” | 47 | 11 | 94.0% | 5.5% |
| “Emergency” | 43 | 4 | 86.0% | 2.0% |
| “Request for emergency supplies” | 49 | 24 | 98.0% | 12.0% |
| “Complete” | 46 | 7 | 92.0% | 3.5% |
| “Suspension of work” | 47 | 1 | 94.0% | 0.5% |
|
| ||||
| Total | 232 | 47 | 92.8 | 4.7 |
False gesture recognition results.
| “Request for help” | None | 0 | 0 | 1 | 0 |
| “Emergency” | 0 | None | 5 | 0 | 0 |
| “Request for emergency supplies” | 1 | 4 | None | 0 | 0 |
| “Complete” | 0 | 0 | 5 | None | 1 |
| “Suspension of work” | 10 | 0 | 14 | 6 | None |
| Total | 11 | 4 | 24 | 7 | 1 |
Figure 9Example experimental results of gesture recognition. (a) The experimental results: “request for help”; (b) the experimental results: “emergency”; (c) the experimental results: “request for emergency supplies”; (d) the experimental results: “complete”; (e) the experimental results: “suspension of work”.