| Literature DB >> 34901426 |
Yazeed Ghadi1, Israr Akhter2, Mohammed Alarfaj3, Ahmad Jalal2, Kibum Kim4.
Abstract
The study of human posture analysis and gait event detection from various types of inputs is a key contribution to the human life log. With the help of this research and technologies humans can save costs in terms of time and utility resources. In this paper we present a robust approach to human posture analysis and gait event detection from complex video-based data. For this, initially posture information, landmark information are extracted, and human 2D skeleton mesh are extracted, using this information set we reconstruct the human 2D to 3D model. Contextual features, namely, degrees of freedom over detected body parts, joint angle information, periodic and non-periodic motion, and human motion direction flow, are extracted. For features mining, we applied the rule-based features mining technique and, for gait event detection and classification, the deep learning-based CNN technique is applied over the mpii-video pose, the COCO, and the pose track datasets. For the mpii-video pose dataset, we achieved a human landmark detection mean accuracy of 87.09% and a gait event recognition mean accuracy of 90.90%. For the COCO dataset, we achieved a human landmark detection mean accuracy of 87.36% and a gait event recognition mean accuracy of 89.09%. For the pose track dataset, we achieved a human landmark detection mean accuracy of 87.72% and a gait event recognition mean accuracy of 88.18%. The proposed system performance shows a significant improvement compared to existing state-of-the-art frameworks. ©2021 Ghadi et al.Entities:
Keywords: 2D to 3D reconstruction; Convolutional neural network; Gait event classification; Human posture analysis; Landmark detection; Silhouette optimization; Synthetic model
Year: 2021 PMID: 34901426 PMCID: PMC8627229 DOI: 10.7717/peerj-cs.764
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Comprehensive review of relevant research.
|
| |
|---|---|
|
|
|
|
| Using contextual, stationary, and vibration attributes, an effective randomized forest-based methodology for human body part localization was developed. They used videos and photographs to evaluate different human actions. |
| A micro, horizontal, and vertical differential function was proposed as part of an automated procedure. To classify human behavior, they used Deep Neural Network (DNN) mutation. To accomplish DNN-based feature strategies, a pre-trained Convolutional Neural Network Convolution layer was used. | |
|
| Adaptation-Oriented Features (AOF), an integrated framework with one-shot image classification for approximation to human actions was defined. The system applies to all classes, and they incorporated AOF parameters for enhanced performance. |
|
| They created a multilayer structure with significant human skeleton details using RGB images. They used Histogram of Oriented (HOG) descriptor attributes to identify human actions. |
|
| The defined a single Convolutional Neural Network (CNN)-based actual data communications and information channel method. They utilized vision methods to gather information through non-monitoring instruments. The Convolutional Neural Network (CNN) technique is used to predict temporal features as well as deep auto-encoders and deep features in order to monitor human behavior. |
|
| They developed an integrated approach to calculate vibrant human motion in sports events using movement tracker sensors. The major contribution is the computation of human events in sports datasets by estimating the kinematics of human body joints, motion, velocity, and recreation of the human pose. |
|
| They developed a lightweight event recognition strategy based on spatial development and social body pose. The kinematics knowledge of attached human body parts is used to characterize tree-based characteristics. |
|
| Using a Hidden Markov methodology, they built a solid framework for event identification which is accomplished using time-continuous dependent features and body marker detectors. |
|
| With the assistance of a human tracking methodology, they developed a comprehensive new approach for estimating the accuracy of human motion. The Deep Neural Network (DNN) is used to identify events. |
|
| They introduced a multidimensional function method for estimating human motion and gestures. They used a late mean combination algorithm to recognize events in complex scenes. |
|
| They developed a lightweight organizational approach focused on optimal allocation, optical flow, and a histogram of the extracted optical flow. They were able to achieve effective event recognition using the standard optimization process, body joint restoration, and a Reduced and Compressed Coefficient Dictionary Learning (LRCCDL) methodology. |
|
| Through task identification, isolation of sequential 2D posture characteristics and a convolutional sequence network, a coherent framework for event recognition with athletes in motion was created. They correctly identified number of sporting event. |
|
| Their work describes a probabilistic framework for detecting events in specific interchanges in soccer rivalry videos. This is done using the replay recognition approach which recognizes the most important background features for fulfilling spectator needs and generating replay storytelling clips. |
|
| A comprehensive deep learning framework for identifying anomalous and natural events was developed. The findings were obtained using differentiation, grouping, and graph-based techniques. They discovered natural and unusual features for event duration use using deep learning techniques. |
|
| This article is based on a real-time method for detecting the 2D posture of numerous individuals in a picture. The suggested technique learns to connect body parts with persons in the image using a nonparametric informed decision-making Part Affinity Fields (PAFs). |
|
| |
|
| They devised a reliable method for analyzing the movement of human body parts through multiple cameras which monitor the body parts detection. They also created a simulation for human body joints that is 2D-3D. |
|
| They designed an example-based synthesis methodology using a single class-based objects database that holds example reinforcements of realistic mappings due to the complexity of the objects. |
|
| To define facial dimensionality, an effective 2D-to-3D hybrid face reconstruction technique is used to recreate a customizable 3D face template from a single cortical face picture with a neutral expression and regular lighting. Immersive-looking faces including different PIE are synthesized based on the customizable 3D image. |
|
| To enhance the classification of both the roots from each 2D image, they initially model the context only as a harmonic function. Second, they analyze the formalized graphical hull definition, which eliminates jitter and diffusion by maintaining continuity with a single 2D image. Third, they maintain connectivity by making variations to the 3D reconstruction by global errors minimization. |
|
| They proposed a heuristic approach for human activity detection and human posture analysis. For this, they utilized human body joint angle information with the help of the hidden Markov model (HMM). |
|
| The researchers created a deep learning system for detecting abnormal and normal events. Distinction, classification, and graph-based methods were used to obtain the results. Using deep learning methods, they explored natural and uncommon features for event interval use. |
|
| To retrieve deep features’ spatial locations in composite images, a guided Long Short-Term Memory (LSTM) approach that is based on a Convolutional Neural Network (CCN) system was evaluated. For personal authentication, the state-of-the-art YOLO v3 template was used and, for event recognition, a directed Long Short-Term Memory (LSTM) driven method was used. |
| They developed home-based patient control strategies based on body-marker detectors. To record data from patients, body-marker sensors with a color indicator framework are connected to the joints. | |
|
| For sporting events, human movement monitoring body-marker tools were used to establish a Trunk Motion Method (TMM) with Body-worn Sensors that provide a low power physical system (BWS). Twelve removable detectors were used to measure 3D trunk movements in this process. |
|
| A robust wireless strategy was developed for detecting physical human behavior. They used a magnetic flux cable to monitor human behavior, and thematic maps were attached to the body joints. Research lab approximation function and Deep RNN (Recurrent Neural Network) were used to enhance efficiency. |
Figure 1The proposed system model’s structural design.
Figure 2Flow chart of the proposed method.
Figure 3Results of different background subtraction techniques.
(A) Change detection, (B) floor detection, (C) Markov random field and (D) spatial–temporal differencing.
Figure 4Results of (A) optimized human silhouette (B) human head detection (C) human detection in RGB videos and image sequences.
Figure 5Human body landmark detection results: (A) The landmark results using an HSV color map, (B) the eleven human body points.
Figure 6The human 2D skeleton model results in over eleven human body parts.
Figure 7The results of the computational model with ellipsoids over human body points.
Figure 8The results of the synthetic model with super quadrics over human body points.
(A) Human 2D skeleton, (B) computational model with ellipsoids, (C) synthetic model with super quadrics.
Figure 9The theme concept of local and global coordinate systems.
The left side shows the local coordinate system over the human left knee; the right side shows the DOF based global coordinate system.
Figure 10The results of the 3D ellipsoid reconstruction over the synthetic model with super quadrics and joint angle estimation.
Figure 11Few DOF results examples.
Figure 12Rotational angular joint results and the pattern of rotational angels.
Figure 13The most accurate features results via the rule-based features mining approach over the mpii-video pose, COCO, and Pose track datasets.
Figure 14CNN model overview.
Human body parts recognition and detection accuracy.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| HP | 11.2 | 88 | 9.70 | 88 | 9.90 | 91 |
| NP | 10.8 | 86 | 10.2 | 86 | 11.1 | 88 |
| REP | 11.5 | 82 | 10.1 | 83 | 14.1 | 86 |
| RHP | 12.1 | 81 | 11.7 | 82 | 12.7 | 83 |
| LEP | 11.1 | 83 | 11.9 | 79 | 11.0 | 88 |
| LHP | 12.0 | 77 | 11.7 | 81 | 12.0 | 79 |
| MP | 10.1 | 91 | 13.1 | 90 | 11.9 | 91 |
| LKP | 13.2 | 94 | 12.8 | 92 | 12.3 | 87 |
| RKP | 9.90 | 91 | 10.3 | 91 | 11.7 | 81 |
| LFP | 10.3 | 94 | 11.2 | 95 | 14.1 | 94 |
| RFP | 11.5 | 91 | 10.3 | 94 | 13.8 | 97 |
|
|
|
|
|
| ||
Human body parts results of multi-person for mpii-video-pose dataset.
| Body parts | Human1 | Human2 | Human3 | Human4 | Human5 |
|---|---|---|---|---|---|
| HP | ✓ | ✓ | ✓ | ✓ | × |
| NP | × | ✓ | × | × | ✓ |
| REP | ✓ | ✓ | ✓ | ✓ | ✓ |
| RHP | ✓ | × | × | ✓ | ✓ |
| LEP | × | ✓ | ✓ | ✓ | × |
| LHP | × | ✓ | ✓ | × | ✓ |
| MP | ✓ | × | ✓ | ✓ | ✓ |
| LKP | ✓ | ✓ | × | × | ✓ |
| RKP | × | × | ✓ | ✓ | × |
| LFP | ✓ | ✓ | × | ✓ | ✓ |
| RFP | ✓ | ✓ | ✓ | ✓ | ✓ |
| Accuracy | 63.63% | 72.72% | 63.63% | 72.72% | 72.72% |
|
| |||||
Human body parts results of multi-person for COCO dataset.
| Body parts | Human1 | Human2 | Human3 | Human4 | Human5 |
|---|---|---|---|---|---|
| HP | ✓ | ✓ | ✓ | ✓ | ✓ |
| NP | ✓ | ✓ | ✓ | × | ✓ |
| REP | ✓ | × | × | ✓ | ✓ |
| RHP | ✓ | × | × | × | × |
| LEP | ✓ | ✓ | ✓ | ✓ | ✓ |
| LHP | × | ✓ | ✓ | ✓ | × |
| MP | × | × | ✓ | ✓ | ✓ |
| LKP | ✓ | ✓ | × | × | ✓ |
| RKP | ✓ | ✓ | ✓ | ✓ | × |
| LFP | ✓ | ✓ | ✓ | ✓ | ✓ |
| RFP | ✓ | ✓ | ✓ | ✓ | ✓ |
| Accuracy | 81.81% | 72.72% | 72.72% | 72.72% | 72.72% |
|
| |||||
Human body parts results of multi-person for mpii-video-pose dataset.
| Body parts | Human1 | Human2 | Human3 | Human4 | Human5 |
|---|---|---|---|---|---|
| HP | ✓ | ✓ | ✓ | ✓ | ✓ |
| NP | × | × | ✓ | × | ✓ |
| REP | ✓ | ✓ | × | ✓ | ✓ |
| RHP | × | × | ✓ | × | × |
| LEP | ✓ | ✓ | × | ✓ | ✓ |
| LHP | ✓ | × | ✓ | × | × |
| MP | × | ✓ | ✓ | ✓ | ✓ |
| LKP | ✓ | ✓ | × | ✓ | ✓ |
| RKP | ✓ | × | ✓ | × | × |
| LFP | ✓ | ✓ | × | ✓ | ✓ |
| RFP | × | ✓ | ✓ | ✓ | ✓ |
| Accuracy | 63.63% | 63.63% | 63.63% | 63.63% | 72.72.1% |
|
| |||||
Figure 15Confusion matrix results using CNN over Mpii-video-pose dataset.
Figure 16Confusion matrix results using CNN over the COCO dataset.
Figure 17Confusion matrix results using CNN over the Pose track dataset.
Precision, recall, and F-1 measure comparison with the artificial neural network, decision tree and CNN over Mpii-video-pose dataset.
|
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Events | Precision | Recall | F- 1 | Precision | Recall | F- 1 | Precision | Recall | F- 1 | |
| Bi | 0.778 | 0.700 | 0.737 | 0.667 | 0.600 | 0.632 | 0.818 | 0.900 | 0.857 | |
| Ce | 0.700 | 0.700 | 0.700 | 0.700 | 0.700 | 0.700 | 0.692 | 0.900 | 0.783 | |
| Da | 0.857 | 0.600 | 0.706 | 0.818 | 0.900 | 0.857 | 0.909 | 0.909 | 0.909 | |
| Fh | 0.909 | 1.000 | 0.952 | 0.727 | 0.800 | 0.762 | 1.000 | 0.900 | 0.947 | |
| MP | 0.900 | 0.900 | 0.900 | 0.727 | 0.800 | 0.762 | 0.900 | 0.900 | 0.900 | |
| Ra | 0.889 | 0.800 | 0.842 | 0.889 | 0.800 | 0.842 | 1.000 | 0.900 | 0.947 | |
| SP | 0.727 | 0.889 | 0.800 | 0.833 | 1.000 | 0.909 | 0.900 | 0.900 | 0.900 | |
| Tr | 0.875 | 0.778 | 0.824 | 0.909 | 1.000 | 0.952 | 1.000 | 0.800 | 0.889 | |
| Wl | 0.875 | 0.700 | 0.778 | 1.000 | 0.700 | 0.824 | 1.000 | 0.818 | 0.900 | |
| Wa | 0.818 | 1.000 | 0.900 | 1.000 | 0.900 | 0.947 | 1.000 | 1.000 | 1.000 | |
| Wn | 0.769 | 1.000 | 0.870 | 0.800 | 0.800 | 0.800 | 0.750 | 0.900 | 0.818 | |
|
|
|
|
|
|
|
|
|
|
| |
Precision, recall, and F-1 measure comparison with the artificial neural network, decision tree and CNN over COCO dataset.
|
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Events | Precision | Recall | F- 1 | Precision | Recall | F- 1 | Precision | Recall | F- 1 | |
| Bi | 0.818 | 0.750 | 0.783 | 0.700 | 0.700 | 0.700 | 0.889 | 0.800 | 0.842 | |
| Da | 0.750 | 0.750 | 0.750 | 0.889 | 0.800 | 0.842 | 0.909 | 1.000 | 0.952 | |
| Ce | 0.889 | 0.667 | 0.762 | 0.833 | 1.000 | 0.909 | 0.818 | 0.900 | 0.857 | |
| Fh | 0.900 | 1.000 | 0.947 | 0.818 | 0.900 | 0.857 | 1.000 | 0.800 | 0.889 | |
| Ra | 0.909 | 0.909 | 0.909 | 0.818 | 0.900 | 0.857 | 0.714 | 1.000 | 0.833 | |
| Mp | 0.900 | 0.818 | 0.857 | 0.900 | 0.900 | 0.900 | 0.889 | 0.800 | 0.842 | |
| Sp | 0.667 | 0.857 | 0.750 | 0.889 | 0.800 | 0.842 | 1.000 | 0.900 | 0.947 | |
| Wl | 0.900 | 0.818 | 0.857 | 0.727 | 1.000 | 0.842 | 1.000 | 0.900 | 0.947 | |
| Tr | 0.900 | 0.750 | 0.818 | 0.889 | 0.800 | 0.842 | 0.800 | 0.800 | 0.800 | |
| Wa | 0.800 | 1.000 | 0.889 | 1.000 | 0.800 | 0.889 | 1.000 | 0.900 | 0.947 | |
| Wn | 0.727 | 1.000 | 0.842 | 0.875 | 0.700 | 0.778 | 0.909 | 1.000 | 0.952 | |
| Mean | 0.833 | 0.847 | 0.833 | 0.849 | 0.845 | 0.842 | 0.903 | 0.891 | 0.892 | |
Precision, recall, and F-1 measure comparison with the artificial neural network, decision tree and CNN over Posetrack dataset.
|
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Events | Precision | Recall | F- 1 | Precision | Recall | F- 1 | Precision | Recall | F- 1 | |
| Ce | 0.818 | 0.750 | 0.783 | 0.769 | 1.000 | 0.870 | 1.000 | 0.700 | 0.824 | |
| Bi | 0.700 | 0.700 | 0.700 | 0.750 | 0.818 | 0.783 | 1.000 | 0.800 | 0.889 | |
| Da | 0.889 | 0.667 | 0.762 | 0.875 | 0.700 | 0.778 | 0.900 | 0.900 | 0.900 | |
| Mp | 0.900 | 1.000 | 0.947 | 0.778 | 0.700 | 0.737 | 0.833 | 1.000 | 0.909 | |
| Fh | 0.875 | 0.875 | 0.875 | 0.692 | 0.900 | 0.783 | 0.769 | 1.000 | 0.870 | |
| Ra | 0.900 | 0.818 | 0.857 | 0.818 | 0.900 | 0.857 | 0.909 | 1.000 | 0.952 | |
| Tr | 0.750 | 0.900 | 0.818 | 0.700 | 0.700 | 0.700 | 0.889 | 0.800 | 0.842 | |
| Sp | 0.857 | 0.750 | 0.800 | 0.875 | 0.700 | 0.778 | 1.000 | 0.900 | 0.947 | |
| Wl | 0.857 | 0.667 | 0.750 | 1.000 | 0.750 | 0.857 | 0.833 | 1.000 | 0.909 | |
| Wn | 0.800 | 1.000 | 0.889 | 1.000 | 1.000 | 1.000 | 0.727 | 0.800 | 0.762 | |
| Wa | 0.769 | 1.000 | 0.870 | 0.778 | 0.778 | 0.778 | 1.000 | 0.800 | 0.889 | |
| Mean | 0.829 | 0.830 | 0.823 | 0.821 | 0.813 | 0.811 | 0.896 | 0.882 | 0.881 | |
Performance analysis over various features extracting upon the MPII, COCO and Pose track datasets.
|
|
|
|
|
|---|---|---|---|
|
| 71.20 | 73.21 | 72.59 |
|
| 76.73 | 76.07 | 75.13 |
|
|
|
|
|
Gait event mean accuracy comparison with the other methods over the MPII, COCO and Pose track datasets.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| 73.00 |
| 74.20 |
| 71.08 |
|
| 87.10 |
| 82.30 | Bao et al. (2020) | 72.03 |
|
| 90.50 |
| 83.10 |
| 74.02 |
|
|
|
|
|
Figure 18Some examples of limitations and failure cases.
Human body parts recognition and detection accuracy using Openpose CNN.
| Body key points | Distance | MPII (%) | Distance | COCO (%) | Distance | Posetrack (%) |
|---|---|---|---|---|---|---|
| HP | 12.5 | 86 | 10.1 | 84 | 11.1 | 83 |
| NP | 12.3 | 83 | 12.3 | 83 | 12.5 | 81 |
| REP | 10.6 | 81 | 12.5 | 82 | 12.9 | 80 |
| RHP | 13.1 | 88 | 14.1 | 86 | 10.1 | 85 |
| LEP | 12.3 | 84 | 10.6 | 80 | 10.9 | 82 |
| LHP | 14.0 | 78 | 10.8 | 83 | 11.6 | 77 |
| MP | 11.3 | 88 | 12.2 | 88 | 9.9 | 92 |
| LKP | 11.0 | 85 | 11.9 | 87 | 10.5 | 89 |
| RKP | 15.1 | 83 | 13.3 | 89 | 12.8 | 85 |
| LFP | 14.2 | 90 | 11.6 | 91 | 11.3 | 91 |
| RFP | 12.3 | 88 | 11.1 | 93 | 10.1 | 92 |
|
|
|
|
| |||
Comparison table of the Openpose CNN model with the proposed method.
| Dataset | Openpose CNN | Proposed method |
|---|---|---|
| MPII (%) | 84.90 | 87.09 |
| COCO (%) | 86.00 | 87.36 |
| Posetrack (%) | 85.18 | 87.72 |
Figure 19Results of the openpose CNN based classification.