| Literature DB >> 27898003 |
Wenjuan Gong1, Xuena Zhang2, Jordi Gonzàlez3, Andrews Sobral4,5, Thierry Bouwmans6, Changhe Tu7, El-Hadi Zahzah8.
Abstract
Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problem into several modules: feature extraction and description, human body models, and modeling methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used.Entities:
Keywords: bottom-up methods; discriminative methods; generative methods; human body models; human pose estimation; top-down methods
Mesh:
Year: 2016 PMID: 27898003 PMCID: PMC5190962 DOI: 10.3390/s16121966
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The Composition of The Review. The survey considers three processing units, and dedicates one section to each. After these three processing units, human poses can be estimated from images. Each directed flow chart denotes the composition of specific types of methods. Rectangle units are motion-based components.
A complete overview of human pose estimation from monocular images.
| Components | Categories | Sub-Categories | |
|---|---|---|---|
| Randomized trees [ | |||
| Boosting pose estimation accuracy iteratively [ | |||
| Motion model [ | |||
Figure 2Edge Filter and Extracted Edge Feature Examples. (a) Haar Filters as Edge Filters; (b) Edge Features in [37].
Figure 3Examples of Silhouette Extraction in [227]. (a) The background image; (b) An original image; (c) The extracted silhouette from (b).
Figure 4Contour Features from [228]. (a) An original image; (b) Extracted contours.
Figure 5Shape context examples. (a) Log-polar coordinates in shape context; (b) Shape Context Encoding.
Figure 6Two widely utilized feature extractors and descriptors. (a) Scale Invariant Feature Transform (SIFT); (b) Histogram of Gradient (HOG) templates [229].
Figure 7Edgelet features [230]. (a) The Sobel convolution result; (b) Examples of edgelet features and orientation quantization.
Figure 8Shapelet Features from Two Sample Images. Each computed in one direction [59].
Figure 9Illustration of the Optical Flow Descriptor. (a,b) Reference images at time t and t + 1; (c) Computed optical flow.
Figure 10Three Types of Human Body Models. (a) Kinematic model; (b) Cardboard model; (c) Volumetric model.
Figure 11Geometric Reconstruction of 3D Poses. (a) Perspective camera models; (b) An example pose and its projection.
Figure 12Bag-of-words feature representation pipeline.
Figure 13The convolutional network architecture used in [156]. It includes: one input layer, two convolution and down sampling layers, one convolution layer, two fully connected layers, one logistic regression layer, and one output layer. Note, “LCN” stands for local contrast normalization, and ReLU and logistic are activation functions.
Figure 14A tree that composes random forests [167]. The tree consists of split nodes (blue) and leaf nodes (green). The red arrows indicate the path that is taken for a particular input.
Figure 15Overview of the combined method of discriminative and generative methods.
Figure 16Tree-structured human body model in human pose estimation. (a) Tree-structured body model; (b) A pose estimation example.
Publicly available human pose estimation data sets.
| Data Set | Content | Image No. | |
|---|---|---|---|
| Type | Name | ||
| Still Images | PASCAL VOC 2009 | Phoning, Riding Horse, Running, Walking | 7054 |
| Gamesourcing [ | 300 images each from PARSE, BUFFY, LEEDS | 48 | |
| Leeds Sports Pose Dataset [ | Athletics, Badminton, Baseball, Gymnastics, Parkour, Soccer, Tennis, Volleyball | 2000 | |
| “We are family” stickmen [ | |||
| PASCAL VOC 2012 | Ten actions, including jumping, phoning, playing instrument, etc. | 11,530 | |
| PASCAL Stickmen [ | 549 | ||
| PEAR [ | Five subjects performing seven predefined | ||
| KTH Multiview Football Dataset I [ | 2D dataset | 5907 | |
| KTH Multiview Football Dataset II [ | 3D dataset | 2400 | |
| FLIC (Frames Labeled In Cinema) [ | Images in 30 movies | 5003 | |
| FLIC-full [ | Images in 30 movies | 20,928 | |
| FLIC-plus [ | |||
| PARSE [ | Mostly playing sports | 305 | |
| MPII Human Pose Dataset [ | hockey ice, rope skipping, trampoline, rock climbing, cricket batting, etc. | 25,000 | |
| Poses in the Wild [ | 900 | ||
| Multi Human Pose [ | |||
| Human 3.6H (H36M) [ | Seventeen scenarios, including discussion, smoking, taking photo, talking on the phone, etc. | 3.6 million | |
| ChaLearn Looking at People 2015: Human Pose Recovery [ | 8000 | ||
| Image Sequences | CMU-Mocap [ | Jumping Jacks, Climbing a ladder, Walking | |
| Utrecht Multi-Person Motion [ | Multi-person motion image sequences | ||
| HumanEva-I [ | Walk, Jog, Gestures, ThrowCatch, Box | 74,267 | |
| HumanEva-II | |||
| TUM Kitchen [ | >20,000 | ||
| Buffy Pose Classes (BPC) [ | Episodes 2 to 6 of the 5th season the TV show “Buffy the vampire slayer” (BTVS) | 748 | |
| Buffy Stickmen V3.01 [ | Five episodes of the fifth season of BTVS | ||
| H3D database | With 3D joint positions | 1240 | |
| Video Pose [ | Forty-four short clips from Buffy the Vampire Slayer, Friends, and LOST | 1286 | |
| Video Pose 2.0 dataset | 900 | ||