| Literature DB >> 32079124 |
Weiya Chen1, Chenchen Yu1,2, Chenyu Tu1,2, Zehua Lyu1, Jing Tang2, Shiqi Ou1, Yan Fu2,3, Zhidong Xue1,2.
Abstract
Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed.Entities:
Keywords: computer vision; data gloves; deep learning; hand pose estimation; human–computer interaction; wearable devices
Mesh:
Year: 2020 PMID: 32079124 PMCID: PMC7071082 DOI: 10.3390/s20041074
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The hand skeleton seen from the palmar side. Originally published in [17]. DIP: distal interphalangeal; PIP: proximal interphalangeal; MCP: Metacarpophalangeal; CMC: Carpometacarpal.
Figure 2Common kinematic model applied for pose estimation. (a) A kinematic hand model with 27 degrees of freedom (DoF) [15]. (b) Another kinematic model with 26 DoF [14].
Figure 3Data gloves based on bend sensors. (a) Data glove from Zheng et al. [28]. (b) Soft rubber data-collecting glove [29].
Figure 4Data glove made with stretch (strain) sensors. (a) A kinesthetic glove composed of five piezoresistive fabric (KPF) goniometers [33]; (b) wearable soft artificial sensing skin made of a hyperelastic elastomer material [36]; (c) data glove made of soft Ecoflex material [37]; (d) wearable glove based on highly stretchable textile–silicone capacitive sensors [38]; (e) glove made of a full soft composite of a stretchable capacitive silicone sensor array [41]. TA: Thumb Abduction; MM: Middle Metacarpal; RP: Ring Proximal; LM: Little Metacarpal; LA: Little Abduction.
Figure 5Data gloves made with inertial measurement units (IMUs) or magnetic sensors. (a) Keyglove Prototype E [48]. (b) IMUs combined with stretchable materials [44]. (c) Noitom Hi5 VR glove [49].
Comparison between different types of wearable sensors.
| Type | Accuracy | Response time | Lifetime | Cost | Ease of Wearing |
|---|---|---|---|---|---|
| Bend (Flex) sensor | high | medium | medium | medium | medium |
| Stretch (Strain) sensor | medium | slow | short | low | easy |
| IMU | medium | fast | long | low | hard |
| Magnetic sensor | low | fast | long | medium | hard |
Figure 6The workflow of generative methods for hand pose estimation.
Figure 7Hand models made of geometric primitives. (a) The hand model consisting of color-coded geometric primitives (yellow: elliptic cylinders, red: ellipsoids, green: spheres, blue: cones) [64]. (b) The hand’s collision model consisting of 25 spheres [64]. (c) Hand model using 48 spheres [19].
Figure 8Hand model composed of meshes. (a) The skeleton. (b) The deformed hand triangulated surface [55].
Summary of generative methods for hand pose estimation with RGB input.
| Literature | Features | Hand Model | DoF | Parameters | Optimization Method | FPS |
|---|---|---|---|---|---|---|
| Oikonomidis et al. [ | Skin and edge | GCM 1 | 26 | 27 | PSO | - |
| Oikonomidis et al. [ | Skin and edge | GCM | 26 | 27 | PSO | - |
| Gorce et al. [ | Surface texture and illuminant | DPMM 2 | 22 | - | Quasi-Newton method | 40 |
| Ballan et al. [ | Skin, edges, optical flow, and collisions | DPMM | 35 | - | Levenberg–Marquard | 50 |
1 GCM: Generalized Cylindrical Model. 2 DPMM: Deformable Polygonal Mesh Model.
Summary of generative methods for hand pose estimation with RGB and depth inputs.
| Literature | Features | Hand Model | DoF | Parameters | Optimization Method | FPS |
|---|---|---|---|---|---|---|
| Oikonomidis et al. [ | Skin and depth | GCM | 26 | 27 | PSO | 15 |
| Oikonomidis et al. [ | Skin and depth | GCM | 26 | 27 | PSO | 4 |
| Qian et al. [ | Depth | GCM | 26 | 26 | ICP–PSO | 25 |
| Sridhar et al. [ | Skin and depth | DPMM | 26 | - | Gradient ascent | 10 |
| Tzionas et al. [ | Skin and depth | DPMM | 37 | - | Self-build method | 60 |
Figure 9A binary latent tree model (LTM)-based [75] search process for skeletal joint position estimation.
Figure 10Global regression calculates the wrist parameters, local regression calculates five fingers parameters [20].
Figure 11The augmented skeleton space model [83]. HPE: hand pose estimator; HPG: hand pose generator; HPD: hand pose discriminator.
Summary of discriminative methods for hand pose estimation with RGB and depth inputs.
| Literature | Datasets | Method | FPS |
|---|---|---|---|
| Keskin et al. [ | Self-built dataset | RF: Random decision | - |
| Tang et al. [ | Self-built dataset | RF: STR | 25 |
| Liang et al. [ | Self-built dataset | RF: SMRF | - |
| Tang et al. [ | Self-built dataset | RF: LRF | 62.5 |
| Sun et al. [ | Self-built dataset | RF: Cascaded regression | 300 |
| Wan et al. [ | - | RF: FCRF | 29.4 |
| Tompson et al. [ | Self-built dataset | RDF + CNN | 24.9 |
| Sinha et al. [ | Dexter1 [ | CNN: DeepHand | 32 |
| Oberweger et al. [ | NYU, ICVL | CNN: Deep-Prior | 500 |
| Ge et al. [ | MSRA, NYU | CNN: Multi-View CNNs | 82 |
| Che et al. [ | NYU, ICVL | CNN: HHLN and WR-OCNN | - |
| Ge et al. [ | NYU, ICVL, MSRA | CNN | 41.8 |
| Ge et al. [ | NYU, ICVL, MSRA | CNN | 48 |
| Dou et al. [ | NYU, MSRA | CNN | 70 |
| Li and Lee [ | NYU, Hands 2017Challenge dataset [ | CNN | - |
| Deng et al. [ | NYU, ICVL | CNN: Hand3d | 30 |
| Ge et al. [ | MSRA, NYU | CNN | 215 |
| Moon et al. [ | ICVL, MSRA, NYU, HANDS2017 [ | CNN: 3D CNN | 35 |
| Ge et al. [ | MSRA, NYU, ICVL | CNN: 3D CNN | 91 |
Summary of discriminative methods for hand pose estimation with RGB input.
| Literature | Datasets | Method | FPS |
|---|---|---|---|
| Zimmermann and Brox [ | Stereo hand pose (STB) [ | CNN: HandSegNet, PoseNet | - |
| Iqbal et al. [ | Dexter [ | CNN | 150 |
| Rad et al. [ | LINEMOD [ | CNN, FCN | 116 |
| Cai et al. [ | STB, RHD | CNN | - |
| Ge et al. [ | STB, RHD | Graph CNN | 50 |
Commonly used public datasets for vision-based hand pose estimation.
| Dataset | Image Type | Number of Images | Camera | Number of Annotated Joints | Description |
|---|---|---|---|---|---|
| ICVL [ | D | 331,000 | Intel Creative Gesture Camera | 16 | Real hand and manual labeling |
| NYU [ | RGB-D | 81,009 | Prime Sense Carmine 1.09 | 36 | Real hand and automatic labeling |
| BigHand 2.2M [ | D | 2.2M | Intel RealSense SR300 | 21 | Real hand and automatic labeling |
| HandNet [ | D | 12,773 | Intel RealSense Camera | Fingertip and palm coordinates | Real hand and automatic labeling |
| MSRC [ | D | 10,2000 | - | 22 | Synthetic data |
| MSHD [ | D | 101k | Kinect2 | - | Synthetic data |
| MSRA14 [ | D | 2400 | - | 21 | Real hand and manual labeling |
| MSRA15 [ | D | 76,500 | Intel’s Creative Interactive Camera | 21 | Real hand and semi-automatic labeling |
| OpenPose hand dataset [ | RGB | 16k | - | 21 | Manual labeling from MPII [ |
| Stereo hand pose (STB) [ | RGB | 18,000 frame pairs | Point Grey Bumblebee2 Stereo Camera | 21 | Real-world stereo image pairs with two subsets: STB–BB and STB–SK |
| Rendered hand (RHD) [ | RGB-D | 43,986 | - | 21 | Synthetic dataset with 20 different characters performing 39 actions in different settings |