| Literature DB >> 32882884 |
Neziha Jaouedi1, Francisco J Perales2, José Maria Buades2, Noureddine Boujnah3, Med Salim Bouhlel1.
Abstract
The recognition of human activities is usually considered to be a simple procedure. Problems occur in complex scenes involving high speeds. Activity prediction using Artificial Intelligence (AI) by numerical analysis has attracted the attention of several researchers. Human activities are an important challenge in various fields. There are many great applications in this area, including smart homes, assistive robotics, human-computer interactions, and improvements in protection in several areas such as security, transport, education, and medicine through the control of falling or aiding in medication consumption for elderly people. The advanced enhancement and success of deep learning techniques in various computer vision applications encourage the use of these methods in video processing. The human presentation is an important challenge in the analysis of human behavior through activity. A person in a video sequence can be described by their motion, skeleton, and/or spatial characteristics. In this paper, we present a novel approach to human activity recognition from videos using the Recurrent Neural Network (RNN) for activity classification and the Convolutional Neural Network (CNN) with a new structure of the human skeleton to carry out feature presentation. The aims of this work are to improve the human presentation through the collection of different features and the exploitation of the new RNN structure for activities. The performance of the proposed approach is evaluated by the RGB-D sensor dataset CAD-60. The experimental results show the performance of the proposed approach through the average error rate obtained (4.5%).Entities:
Keywords: action recognition; deep association metric; deep learning; human activities; human detection; motion tracking; skeleton features
Mesh:
Year: 2020 PMID: 32882884 PMCID: PMC7506930 DOI: 10.3390/s20174944
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
State-of-the-art methods and their interpretation.
| Authors | Methods | Interpretation |
|---|---|---|
| AlbuSlava 2016 [ | 3D CNN | Spatial features |
| Murad and Ryun 2017 [ | Deep recurrent neural networks and multimodal sensors | Motion features |
| Ning et al., 2017 [ | Local optical flow of a global human silhouette | Motion features |
| Nicolas et al., 2016 [ | GRU + RCN | Spatio-temporal features |
| Xu et al., 2016 [ | RCNN | Spatio-temporal features |
| Zhang et al., 2016 [ | Vector of locally aggregated descriptors, SIFT and ISA | Spatio-temporal features |
| Zhao et al., 2017 [ | RNN + GRU + 3D CNN | Spatio-temporal features |
| Faria et al., 2012 [ | Dynamic Bayesian mixture model | Skeleton features |
| Koppula et al., 2013 [ | HMM | Skeleton features |
| Bingbing et al., 2013 [ | Histogram of oriented gradient and SVM | Spatio-temporal features |
| Wang et al., 2014 [ | LOM | Skeleton features |
| Shan and Akella 2014 [ | Pose Kinetic Energy + SVM | Skeleton features |
| Gaglio et al., 2015 [ | Kmeans + HMM + SVM | Skeleton features |
| Manzi et al.,2017 [ | Kmeans + Sequential Minimal Optimization | Skeleton features |
| Srijan et al., 2018 [ | RGB-D + CNN + LSTM model | Skeleton and contextual features |
| Yanli et al., 2018 [ | VS-CNN | Skeleton and contextual features |
| Hug et al., 2019 [ | The conversion of the distance value of two joints to colors points + CNN | Skeleton and contextual features |
|
|
|
|
CNN: Convolutional Neural Network, GRU: Gated Recurrent Units, LOM: Local Occupation Model, LSTM: Long Short Term Memory, RCN: Recurrent Convolution Networks, RNN: Recurrent Neural Network, SVM: Support Vector Machines, VS-CNN: View-guided Skeleton-CNN.
Figure 1The newly proposed model for human activity recognition. Our model is divided into two parts: model training and activity recognition. This model is based on human pose estimation and human tracking.
Figure 2The MobileNet architecture for skeleton joint representation.
Figure 3The human skeleton presentation used in our paper. This skeleton model presents 18 joints in which each joint is projected in the 2D plane. For the collection of human features, we used 36 values (18 × 2), for example, nose_x, nose_y, neck_x, neck_y, Right_shoulder_x, Right_shoulder_y, Right_elbow_x, Right_elbow_y, etc.
Figure 4The structure of our human activity recognition system.
Figure 5Human tracking using the Kalman filter.
Figure 6Six activity classes included in the CAD-60 dataset. The video samples were captured by Microsoft Kinect sensors concurrently at 25 fps. The activities were performed in the bathroom and kitchen.
Figure 7Confusion matrix of the proposed human activity recognition system. The true labels are presented in the rows, and the labels predicted by the proposed model are presented in the columns.
Classification recall, precision, and F1 of the Cornell Activity Dataset (CAD) 60 dataset.
| A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 | A10 | A11 | A12 | A13 | A14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1 | 0.93 | 1 | 0.95 | 0.95 | 0.92 | 0.90 | 0.94 | 0.93 | 0.97 | 0.95 | 0.99 | 0.94 | 0.96 |
|
| 0.95 | 1 | 0.94 | 0.96 | 0.96 | 0.91 | 0.95 | 0.93 | 0.93 | 0.95 | 0.97 | 0.90 | 0.95 | 1 |
|
| 0.97 | 0.96 | 0.96 | 0.95 | 0.95 | 0.91 | 0.92 | 0.93 | 0.93 | 0.96 | 0.96 | 0.94 | 0.94 | 0.98 |
Performance of our proposed system according to human locations using the CAD-60. Five locations are presented: bathroom, bedroom, kitchen, living room, and office.
| Location | Activity | Prediction (%) |
|---|---|---|
| Bathroom | Brushing teeth | 100% |
| Bedroom | Drinking water | 95% |
| Kitchen | Cooking (chopping) | 93% |
| Living room | Random | 92% |
| Office | Writing on board | 96% |
| Average |
|
Figure 8Some activities classified for four people in the UIB lab: relaxing on the couch, rinsing teeth, random, and talking on the phone.
Average recognition accuracies (%) of our approach and comparison with previous works using the CAD-60 dataset. The best accuracy level is presented in bold.
| Methods | Year | Acc. (%) |
|---|---|---|
| Dynamic Bayesian Mixture Model [ | 2014 | 91.9% |
| Support Vector Machine + Hidden Markov Model [ | 2015 | 77.3% |
| Multiclass Support Vector Machine [ | 2016 | 93.5 |
| Classifier Ensemble [ | 2018 | 92.3% |
| Weighted 3D joints [ | 2019 | 94.4% |
| Our System | 2020 |
|