| Literature DB >> 33834109 |
Audrius Kulikajevas1, Rytis Maskeliunas1, Robertas Damaševičius2,3.
Abstract
Human posture detection allows the capture of the kinematic parameters of the human body, which is important for many applications, such as assisted living, healthcare, physical exercising and rehabilitation. This task can greatly benefit from recent development in deep learning and computer vision. In this paper, we propose a novel deep recurrent hierarchical network (DRHN) model based on MobileNetV2 that allows for greater flexibility by reducing or eliminating posture detection problems related to a limited visibility human torso in the frame, i.e., the occlusion problem. The DRHN network accepts the RGB-Depth frame sequences and produces a representation of semantically related posture states. We achieved 91.47% accuracy at 10 fps rate for sitting posture recognition.Entities:
Keywords: Artificial neural network; Computer vision; Deep learning; Depth sensors; Posture detection; Sitting posture; e-Health
Year: 2021 PMID: 33834109 PMCID: PMC8022631 DOI: 10.7717/peerj-cs.442
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Our recurrent hierarchical ANN architecture using MobileNetV2 as the main backbone.
It takes the RGB-D frame sequence as input and outputs the flattened prediction tree as a result.
Figure 2Flattened hierarchy representation of postures expanded into a hierarchical tree.
Layers of the proposed neural network architecture for human posture recognition.
| Type | Filters | Size | Output |
|---|---|---|---|
| Input | – | – | |
| Depthwise convolution | – | 11 × 11/2 | |
| Convolution | 64 | 1 × 1 | |
| Spatial dropout | – | ||
| Depthwise convolution | – | 5 × 5/2 | |
| Convolution | 128 | 1 × 1 | |
| Spatial dropout | – | ||
| LSTM convolution | 16 | 3 × 3 | 160 × 120 |
| Spatial dropout | – | 160 × 120 | |
| – | – | 4 × 5 | |
| Global average pooling | – | – | 1,280 |
| Dropout | – | 1,280 | |
| Fully-connected (sigmoid) | – | – | 8 |
Figure 3Activity diagram of the proposed method for sitting posture state recognition.
Frame count in the dataset.
| Posture class | Training | Testing | Dataset (%) |
|---|---|---|---|
| Sitting straight | 3390 | 505 | 21.53 |
| Lightly hunched | 2230 | 200 | 14.16 |
| Hunched over | 2534 | 321 | 16.09 |
| Extremely hunched | 1918 | 182 | 12.18 |
| Partially lying | 2053 | 339 | 13.04 |
| Lying down | 3622 | 302 | 23.00 |
Examples of images in dataset (right side view).
| Posture class | RGB and Depth images |
|---|---|
| Sitting straight | |
| Lightly hunched forward | |
| Hunched over forward | |
| Extremely hunched forward | |
| Partially lying down in the chair | |
| Lying down in the chair |
Figure 4Confusion matrix indicating expected labels versus network predictions.
Accuracy values are given in percents. Diagonal values indicate correct predictions.
Figure 5Confusion matrix indicating bottom level expected labels versus network predictions.
Accuracy values are given in percents. Diagonal values indicate correct predictions.
Comparison of posture recognition methods.
| Method | Frame resolution, px | Frame rate, fps | Accuracy, % | Task | Reference |
|---|---|---|---|---|---|
| Real-time deformable detector | 320 × 240 | 10 | 75.33 | Hand posture recognition | |
| Ensemble of InceptionResNetV2 | 640 × 480 | n/a | 95.34 | Four postures (standing, sitting, lying, and lying crouched) | |
| LVQ (learning vector quantization) neural network | 640 × 480 | 333 | 99.01 | Five full-skeleton postures (standing, sitting, stooping, kneeling, and lying) | |
| Multi-stage convolutional neural network (M-CNN) | n/a | 5 | 98.70 | Two postures for fall detection | |
| LVQ neural network | 48 × 16 | 10 | 99.95 | Eight postures (stand, hand raise, akimbo, open wide arms, squat, toe touch, crawl, and lie) | |
| Deep CNN | 24 × 8 | 9 | 99.99 | 26 yoga postures | |
| D CNN | n/a | n/a | 98.16 | Detection of 10 standstill body poses. | |
| Deep recurrent hierarchical network | 640 × 480 | 10 | 91.47 | Spine posture recognition while sitting | This paper |
Note:
n/a, data is not available.