| Literature DB >> 34485925 |
Abstract
Hand gesture recognition is viewed as a significant field of exploration in computer vision with assorted applications in the human-computer communication (HCI) community. The significant utilization of gesture recognition covers spaces like sign language, medical assistance and virtual reality-augmented reality and so on. The underlying undertaking of a hand gesture-based HCI framework is to acquire raw data which can be accomplished fundamentally by two methodologies: sensor based and vision based. The sensor-based methodology requires the utilization of instruments or the sensors to be genuinely joined to the arm/hand of the user to extract information. While vision-based plans require the obtaining of pictures or recordings of the hand gestures through a still/video camera. Here, we will essentially discuss vision-based hand gesture recognition with a little prologue to sensor-based data obtaining strategies. This paper overviews the primary methodologies in vision-based hand gesture recognition for HCI. Major topics include different types of gestures, gesture acquisition systems, major problems of the gesture recognition system, steps in gesture recognition like acquisition, detection and pre-processing, representation and feature extraction, and recognition. Here, we have provided an elaborated list of databases, and also discussed the recent advances and applications of hand gesture-based systems. A detailed discussion is provided on feature extraction and major classifiers in current use including deep learning techniques. Special attention is given to classify the schemes/approaches at various stages of the gesture recognition system for a better understanding of the topic to facilitate further research in this area.Entities:
Keywords: Deep learning methods; Human–computer interaction (HCI); Static and dynamic gestures; Vision-based gesture recognition (VGR)
Year: 2021 PMID: 34485925 PMCID: PMC8403257 DOI: 10.1007/s42979-021-00827-x
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Fig. 1Overview of human–computer interaction [73]
Fig. 2Classification of different gestures based on used body-part
Fig. 3Human–computer interaction using: a CyberGlove-II (picture courtesy: https://www.cyberglovesystems.com/products/cyberglove-II/photos-video), b vision-based system
Fig. 4General taxonomy of HCI system based on input channels
Fig. 5Effect of illumination variations on perceived skin color: a skin color in low and high illumination conditions, b 2D color histogram in YCbCr space, and c 2D color histogram in CIE-Lab space
Fig. 6Effect of complex background on skin color segmentation: a original images, b segmentation results, and c ground truth
Fig. 7Effect of camouflage on skin color segmentation (left column: original image, right column: segmented image): a African, b Asian, and c Caucasian
Fig. 8Different hand poses and their side views
Fig. 9Multiple camera-based gesture recognition
Fig. 10Skeletal hand model: a hand anatomy [48], b the kinematic model [123]
Fig. 11a Movement epenthesis problem [18] b Gesture co-articulation (marked with redline) [202] c sub-gesture problem (here gesture ‘5’ is a sub-gesture of gesture ‘8’) [7]
Fig. 12The basic architecture of a typical gesture recognition system
Fig. 13Different skin segmentation techniques
Fig. 14Different hand models for hand gesture representation
Major features used in gesture recognition
| Feature type | Examples | Static | Dynamic | Advantages | Limitations |
|---|---|---|---|---|---|
| Spatial domain (2D) | Fingertips location, finger direction, and silhouette [ | ✓ | ✓ | • Easy to extract | • Unreliable under occlusion or varying illumination |
| • Rotation invariant | • Object view dependent | ||||
| • Distorted hand trajectory distorts MCC also | |||||
| Motion chain code (MCC) [ | ✓ | ||||
| Spatial domain (3D) | Joint angles, hand location, surface texture and surface illumination [ | ✓ | ✓ | • 3D modeling can most accurately represent the state of a hand, and thus can give higher recognition accuracy | • Difficult to accurately estimate 3D shape information of a hand |
| Transform domain | Fourier descriptor [ | ✓ | ✓ | • RST invariant | • Not able to perfectly distinguish different gestures |
| Moments | Geometric moments, orthogonal moments [ | ✓ | ✓ | • Moments can be used to derive RST invariant global features | • Moments are in general global features. So, moments cannot effectively represent an occluded hand |
| Curve fitting based | Curvature scale space [ | ✓ | • RST invariant | • Sensitive to distortion in the boundary | |
| • Resistant to noise | |||||
| Histogram based | Histogram of gradient (HoG) features [ | ✓ | ✓ | • Invariant to geometry and illumination changes | • Performance is not so satisfactory for images with a complex background and noise |
| Interest point based | Scale-invariant feature transform (SIFT) [ | ✓ | ✓ | • RST and illumination invariant | • They are not the best choice for real-time applications because they are computationally expensive |
| Mixture of features | Combined features [ | ✓ | ✓ | • Incorporates the advantages of different types of features | • Classification performance may degrade due to curse of dimensionality |
Fig. 15Conventional dynamic gesture recognition techniques
Fig. 16a HMM b a directed conditional model or MEMM c a conditional random field accommodates arbitrary overlapping features or long-term dependency of observation sequence [203]
Summary of hand gesture databases with brief description
| Sl. No. | Dataset | Contents | Description | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | NUS hand posture dataset-I, 2010 [ | 10 classes, 1 subject, 240 samples | Both color and gray scale | ||||||
| 2 | NUS hand posture dataset-II, 2012 [ | 10 classes, 40 subjects, 2750 samples | Complex natural background | ||||||
| 3 | UNIGEhands Dataset, 2015 [ | 37.21 and 37.63 minutes of positives and negative video sequencs | Egocentric videos in 5 natural locations (Office, Bar, Kitchen, Bench, Street) | ||||||
| 4 | OUHANDS hand gesture dataset, 2016 [ | 2150 training and 1000 test images | Different background, contains body gesture, collected by Intel RealSense | ||||||
| 5 | Cambridge hand gesture dataset, 2007 [ | 9 classes, 2 subjects, 900 image sequences | Different illumination conditions | ||||||
| 6 | Gesture dataset by Shen et al. [ | 10 classes, 15 subjects, 1050 samples | Different poses of thumb, fist, all fingers extended | ||||||
| 7 | Sebastien Marcel hand posture and gesture datasets, 2001 [ | Three static datasets, with 10 (gray scale), 12 (color), and 6 (gray scale) classes. One dynamic dataset with 4 classes | Both simple and complex background | ||||||
| 8 | Aalborg Video Database, 2004 [ | 9 static and 4 dynamic classes | Hand gestures over a wooden table | ||||||
| 9 | Sebastien Marcel interact play database, 2004 [ | 16 classes, 22 subjects, 50 samples/ subject | Single and both hand dataset | ||||||
| 10 | Gesture dataset by Yoon et al. [ | 48 classes, 20 subjects, 9600 samples | Alphabetical gestures containing sequences of xy coordinates | ||||||
| 11 | Keck gesture dataset, 2009 [ | Keck gesture dataset, 2009 | Military signals with training set in simple background and testing set in complex background | ||||||
| 12 | Massey gesture dataset, 2005 [ | 6 classes, 5 subjects, about 1500 frames | Different image frames of gestures in different illumination | ||||||
| 13 | IDIAP two-handed gesture dataset, 2005 [ | 7 classes, 7 subjects | Special color-glove to differentiate between right and left hand | ||||||
| 14 | FABO gesture dataset, 2006 [ | 21 classes divided into two sets | Face and body gesture dataset in fixed background | ||||||
| 15 | IBGHT dataset, 2015 [ | 36 classes, 60 video sequences | 0–9 numeric and A–Z alphabetic color dataset | ||||||
| 16 | 10 Palm Graffiti Digits dataset, 2009 [ | 10 classes, 30 examples per class | 0–9 digits in continuous stream, colored glove in training set, both easy and hard test set | ||||||
| 17 | NITS hand gesture dataset, 2015 [ | 40 classes, 20 subjects, divided into 7 sets | Gestures collected in lab environment with colored fingertip | ||||||
| 18 | The 20BN-jester dataset, 2019 [ | 148,092 videos in total: 118,562 for training, 14,787 for validation and 14,743 for testing | Densely labeled video clips that show humans performing predefined hand gestures in front of a laptop camera or webcam | ||||||
| 19 | NTU posture dataset by Ren et al., 2011 [ | 10 classes, 10 subjects, 1000 samples | Color as well as depth maps, cluttered background, recorded with Kinect | ||||||
| 20 | ColorTip dataset, 2013 [ | 7 subjects, 9 classes, 7 training sequences of between 600 and 2000 depth frames | Fingertips are covered with colored glove for automatic annotation | ||||||
| 21 | NYU hand pose dataset, 2014 [ | 72,757 and 8252 frames in training and test sets | 2 users, data from 3 Kinects (frontal and 2 sides) | ||||||
| 22 | General-HANDS data-set, 2014 | 22 sequences | Different view-points, scales, poses, and occlusions | ||||||
| 23 | VPU Hand Gesture dataset (HGds), 2008 [ | 12 classes, 11 subjects | One static pose video per gesture (252 grayscale frames); collected by time-of-flight camera | ||||||
| 24 | ChaLearn gesture data, 2011 [ | 62,000 samples | Hand gestures including body gestures; recorded with Kinect | ||||||
| 25 | MSRC-12 Kinect gesture dataset, 2012 [ | 12 classes, 30 subjects, 6244 samples | Human movement including body gestures; recorded with Kinect | ||||||
| 26 | ChaLearn multi-modal gesture dataset, 2013 [ | 20 classes, 27 subjects, 13,858 samples | Including body gestures | ||||||
| 27 | NATOPS aircraft handling signals database, 2011 [ | 24 classes, 20 subjects, 9600 samples | Including body gestures | ||||||
| 28 | ChAirGest multi-modal dataset , 2013 [ | 10 classes, 10 subjects, 1200 samples | Recorded with Kinect and inertial motion units | ||||||
| 29 | Sheffield KInect Gesture (SKIG) dataset, 2013 [ | 10 classes, 6 subjects, 2160 samples | Two illumination condition, recorded with Kinect and RGB cameras | ||||||
| 30 | Full Body Gesture (FBG) database, 2006 [ | 14 normal gesture of daily life, 10 abnormal gesture classes, 20 subjects | Full body 3D dataset | ||||||
| 31 | 10 3D digit dataset by Berman et al., 2013 [ | 10 classes, 8 subjects | 0–9 in continuous stream, dataset collected using PrimeSense 3D camera | ||||||
| 32 | 6D Motion Gesture (6DMG) dataset, 2012 [ | 10 digit classes, 26 upper and lower alphabet classes each | Dataset is recorded by Wii device with trajectories in space, includes some body gestures also | ||||||
| 33 | Hand gesture datasets, University of Padova, 2014 [ | 10 ASL classes, 14 subjects | Dataset is collected with both leap motion controller and Kinect. First of its kind dataset collected by both. | ||||||
| 34 | Hand gesture datasets, University of Padova, 2015 [ | Several static gestures | Collected with Senz3D device | ||||||
| 35 | Hand gesture datasets, University of Polytechnique, Madrid, 2015 [ | 10 classes, divided into 2 sets with 5 gestures each | Collected with Senz3D device | ||||||
| 36 | SP-EMD dataset, 2015 [ | 10 gestures with 20 different poses, 5 subjects | In two different illumination, collected using Kinect | ||||||
| 37 | DHG-14/28, 2016 [ | 14 classes, 20 subjects | Gestures are collected using Kinect in two ways: using one finger and the whole hand | ||||||
| 38 | DVS128 gesture dataset, 2017 [ | 11 classes, 29 subjects | 3 illumination condition, collected with DSV128 | ||||||
| 39 | BigHand2.2M hand posture dataset, 2017 [ | 2.2 million depth maps | Collected with Intel RealSense, some are egocentric images | ||||||
| 40 | EgoGesture Dataset, 2017 [ | 83 classes, 50 subjects, 6 scenes, 24161 RGB-D video samples | First-person view gestures, collected using Intel RealSense SR300 | ||||||
| 41 | VIVA dataset, 2014 [ | 19 classes, 8 subjects, 885 RGB-D video samples | Driver hand gestures in single scene, collected using Microsoft Kinect | ||||||
| 42 | NVIDIA Gesture (nvGesture) dataset, 2016 [ | 25 classes, 20 subjects, 1532 RGB-D video samples | Driver hand gestures collected using SoftKinetic DS325 and a top-mounted DUO 3D sensor to record a pair of stereo-IR streams | ||||||
| 43 | Dataset by Kawulok et al., 2014 [ | 32 classes, 18 subjects | Gestures from Polish Sign Language and American Sign Language (ASL) | ||||||
| 44 | ASL Finger Spelling Dataset, 2011 [ | 24 classes, 9 subjects, 65,000 samples | Alphabet depth dataset | ||||||
| 45 | Massey 2D Static ASL dataset, 2011 [ | 2425 gestures, 5 subjects | Color ASL dataset | ||||||
| 46 | Purdue RVL-SLLL ASL Database, 2006 [ | Different ASL gestures by 14 subjects | Alphanumeric dataset | ||||||
| 47 | RWTH-BOSTON-104 Database, 2007 [ | 104 signs, 201 videos, about 15000 image frames | Grayscale ASL dataset | ||||||
| 48 | RWTH-BOSTON-400, 2008 [ | 406 signs; extended upon 2007 dataset | Color ASL dataset | ||||||
| 49 | MSR/MSRA Gesture 3D dataset, 2011 [ | 12 ASL gesture classes, 10 subjects | Hand tracking ASL dataset. Some are daily gestures | ||||||
| 50 | Kaggle Sign Language dataset, 2017 | 24 classes A–Z excluding J, Z and 10 classes of digits 0–9, mimics EMNIST | ASL image dataset | ||||||
Publicly available hand gesture databases with their sources
| Sl. No. | Dataset | Static(S) and/or Dynamic(D) | Source | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | NUS hand posture dataset-I, 2010 | S | |||||||||
| 2 | NUS hand posture dataset-II, 2012 | S | |||||||||
| 3 | UNIGEhands Dataset, 2015 | S | |||||||||
| 4 | OUHANDS hand gesture dataset, 2016 | S | |||||||||
| 5 | Cambridge hand gesture dataset, 2007 | S and D | |||||||||
| 6 | Gesture dataset by Shen et al. 2012 | S and D | |||||||||
| 7 | Sebastien Marcel hand posture and gesture datasets, 2001 | S and D | |||||||||
| 8 | Aalborg Video Database, 2004 | S and D | |||||||||
| 9 | Sebastien Marcel interact play database, 2004 | D | |||||||||
| 10 | Gesture dataset by Yoon et al. 2001 | D | Available on e-mail request to yoonhs@etri.re.kr | ||||||||
| 11 | Keck gesture dataset, 2009 | D | |||||||||
| 12 | Massey gesture dataset, 2005 | D | |||||||||
| 13 | IDIAP two-Handed Gesture Dataset, 2005 | D | |||||||||
| 14 | FABO gesture dataset, 2006 | D | |||||||||
| 15 | IBGHT dataset, 2015 | D | |||||||||
| 16 | 10 Palm Graffiti Digits dataset, 2009 | D | |||||||||
| 17 | NITS dataset, 2015 | D | |||||||||
| 18 | The 20BN-jester dataset, 2019 | D | |||||||||
| 19 | NTU posture dataset by Ren et al. 2011 | S | |||||||||
| 20 | ColorTip dataset, 2013 | S | |||||||||
| 21 | NYU hand pose dataset, 2014 | S | |||||||||
| 22 | General-HANDS data-set, 2014 | S | |||||||||
| 23 | VPU Hand Gesture dataset (HGds), 2008 | S | |||||||||
| 24 | ChaLearn gesture data, 2011 | D | |||||||||
| 25 | MSRC-12 Kinect gesture dataset, 2012 | D | |||||||||
| 26 | ChaLearn multi-modal gesture data, 2013 | D | |||||||||
| 27 | NATOPS aircraft handling signals database, 2011 | D | |||||||||
| 28 | ChAirGest multi-modal dataset , 2013 | D | |||||||||
| 29 | Sheffield KInect Gesture (SKIG) dataset, 2013 | D | |||||||||
| 30 | Full Body Gesture (FBG) Database, 2006 | D | |||||||||
| 31 | 10 3D digit dataset by Berman et al. 2013 | D | Available on e-mail request to sigalbe@bgu.ac.il | ||||||||
| 32 | 6D Motion Gesture (6DMG) dataset, 2012 | D | |||||||||
| 33 | Hand gesture datasets, University of Padova, 2014 | S | |||||||||
| 34 | Hand gesture datasets, University of Padova, 2015 | D | |||||||||
| 35 | Hand gesture datasets, University of Polytechnique, Madrid, 2015 | S and D | |||||||||
| 36 | SP-EMD dataset, 2015 | D | |||||||||
| 37 | DHG-14/28, 2016 | D | |||||||||
| 38 | DVS128 gesture dataset, 2017 | D | |||||||||
| 39 | BigHand2.2M hand posture dataset, 2017 | S | Available on e-mail request to hands.iccv17@outlook.com | ||||||||
| 40 | EgoGesture dataset, 2017 | D | |||||||||
| 41 | VIVA datase, 2014 | D | |||||||||
| 42 | NVIDIA Gesture (nvGesture) dataset, 2016 | D | |||||||||
| 43 | Dataset by Kawulok et al. 2014 | S | |||||||||
| 44 | ASL Finger Spelling Dataset, 2011 | S | |||||||||
| 45 | Massey 2D Static ASL dataset, 2011 | S | |||||||||
| 46 | Purdue RVL-SLLL ASL Database, 2006 | D | Available on e-mail request to wilbur@purdue.edu | ||||||||
| 47 | RWTH-BOSTON-104 Database, 2007 | D | |||||||||
| 48 | RWTH-BOSTON-400, 2008 | D | |||||||||
| 49 | MSR/MSRA Gesture 3D dataset, 2011 | D | |||||||||
| 50 | Kaggle Sign Language dataset, 2017 | S | |||||||||
Fig. 17Applications of hand gesture recognition systems: a virtual reality, b gesture-based interaction with robots (Picture courtesy http://www.robots-dreams.com/pc-based-robosapien-control-project), c desktop computing application, d virtual computer games using gesture, e sign language recognition, f vehicle control (picture courtesy: http://www.automotiveworld.com/news-releases/3D-gesture-recognition-virtual-touch-screen-bring-new-meaning-vehicle-controls/), g gesture controlled robotic surgery (Pic. courtesy: http://www.purdueexponent.org/campus/collection_daa8e8c2-3e15-11e0-bb90-0017a4a78c22.html) and h television and desktop controlling
Fig. 18Use of GPU in gesture recognition (courtesy: http://community.arm.com/groups/arm-mali-graphics/blog/2013/10/06/improved-gesture-detection-with-mali-gpu-compute)