| Literature DB >> 35068651 |
Neha Gupta1,2, Suneet K Gupta1, Rajesh K Pathak3, Vanita Jain2, Parisa Rashidi4, Jasjit S Suri5,6.
Abstract
Human activity recognition (HAR) has multifaceted applications due to its worldly usage of acquisition devices such as smartphones, video cameras, and its ability to capture human activity data. While electronic devices and their applications are steadily growing, the advances in Artificial intelligence (AI) have revolutionized the ability to extract deep hidden information for accurate detection and its interpretation. This yields a better understanding of rapidly growing acquisition devices, AI, and applications, the three pillars of HAR under one roof. There are many review articles published on the general characteristics of HAR, a few have compared all the HAR devices at the same time, and few have explored the impact of evolving AI architecture. In our proposed review, a detailed narration on the three pillars of HAR is presented covering the period from 2011 to 2021. Further, the review presents the recommendations for an improved HAR design, its reliability, and stability. Five major findings were: (1) HAR constitutes three major pillars such as devices, AI and applications; (2) HAR has dominated the healthcare industry; (3) Hybrid AI models are in their infancy stage and needs considerable work for providing the stable and reliable design. Further, these trained models need solid prediction, high accuracy, generalization, and finally, meeting the objectives of the applications without bias; (4) little work was observed in abnormality detection during actions; and (5) almost no work has been done in forecasting actions. We conclude that: (a) HAR industry will evolve in terms of the three pillars of electronic devices, applications and the type of AI. (b) AI will provide a powerful impetus to the HAR industry in future. Supplementary Information: The online version contains supplementary material available at 10.1007/s10462-021-10116-x.Entities:
Keywords: And hybrid models; Deep learning; Device-free; Human activity recognition; Imaging; Machine learning; Radio frequency-based identification; Sensor-based; Vision-based
Year: 2022 PMID: 35068651 PMCID: PMC8763438 DOI: 10.1007/s10462-021-10116-x
Source DB: PubMed Journal: Artif Intell Rev ISSN: 0269-2821 Impact factor: 9.588
Fig. 1Four stages of HAR process (Hx et al. 2017)
Fig. 2PRISMA model for the study selection
Sensor-based HAR
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| DS& | Activities | Datasets | Subjects | Scenarios | Total # actions | Performance evaluation | CIT* | |
| R1 | WS | ADL: 10, Sports: 11 | Proprietary dataset | 13- 10 M and 3 F | RT, Sports | 1300/1100 | 99.65%, 99.92% | Hsu et al. ( |
| R2 | SPS | ADL: 6, 5 | UCI-HAR, Weakly labelled (WL) | 30 subs (UCI), 7 participants (WL) | Waist mounted SPS | 76,157 | UCIHAR- 93.41%, WL- 93.83% | Wang et al. ( |
| R3 | SPS | ADL: 7 | Proprietary dataset | 100 participants | Texting, handheld, trouser pocket, backpack | 235 977 Samples | CNN-7–95.06%, Ensemble-96.11% | Zhu et al. ( |
| R4 | SPS | ADL: 4 | Proprietary dataset | Active: 147, inactive: 99, walking: 200 and driving: 120. Total 574 | Lab env | 4,99,276 | Mean accuracy-74.39% | Garcia-Gonzalez et al. ( |
| R5 | SPS | ADL: 6 | HHAR | 9 subs | Biking, walking, stairs | 4,39,30,257 | 96.79% | Sundaramoorthy and Gudur ( |
| R6 | SPS | ADL: 6, 8, 9 | HHAR, RWHAR, MobiAct | 9, 15 (M-8, F-7), 57 (M-42, F-15) | Indoor and outdoor | 4,39,30,257/2500 | F1-measure on three datasets | Gouineua et al. ( |
| R7 | SPS | ADL: 8 | RWHAR | 15 subs | Experimental setup | 8,85,360 | F1-Score:0.94 | Lawal and Bano |
| R8 | SPS | ADL: 6, 9, 14 | HHAR, PAMAP, USC-HAD | 9 (HHAR), 15 (RWHAR) | Waist mounted SPS, Experimental setup | 4,39,30,257/38,50,505 | F1-score: 0.848, 0.723, 0.702 | Buffelli and Vandin ( |
| R9 | SPS | ADL: 6 | HHAR | 9 subs | Biking, walking, stairs | 4,39,30,257 | 98% | Yao et al. ( |
| R10 | SPS | ADL: 6 | HHAR, carTrack | 9 subs | Biking, walking, stairs | 4,39,30,257 | 94.2% | Yao et al. ( |
| R11 | WS, SPS | ADL: 6, 7, 17 | UCIHAR, WISDM, OPPORTUNITY | 30 (UCI-HAR), 4 (OPPORTUNITY), 51 (WISDM) | Waist mounted SPS, Experimental setup, Biking, walking, stairs | 76,157/2,551/15,630,426 | F1-score: 92.63%, 95.85%, 95.78% | Xia et al. ( |
| R12 | SPS | ADL: 13, 5, 6 | UniMibShar, MobiAct | 57 (MobiAct) | Fall scenario | 11,771/2500 | 87.30 | Ferrari et al. ( |
| R13 | SPS | ADL: 50 | Vaizman dataset | 60 subs | Indoor, running (Phone in pocket) | 300 k | 92.80 | Fazli et al. ( |
| R14 | WS | 19 (ADL) | 19 NonSense | 13 subs (5 Female 8 male) of age (19–45 years) | 9 activities indoor and 9 outdoor | _ | Precision: 93.41%, recall: 93.16% | Pham et al. ( |
| R15 | WS | ADL: (6, 12,6,11), GC | UCI-HAR, DaphNet, OPPORTUNITY, Skoda | 30 (UCI-HAR), 4 (OPPORTUNITY) | ADL (UCI-HAR & Opportunity) Skoda: GC | 76,157/15,630,426 | 96.7%, 97.8%, 92.5%, 94.1%, 92.6% | Murad and Pyun ( |
| R16 | WS | 18 ADL | 19 NonSense | 13 subs of age (19–45 years) | 9 indoor and 9 outdoor | – | F1-score: 77.7 | Pham et al. ( |
*CIT citation
&DS data source
WS wearable sensor, SPS smartphone sensor, Acc accelerometer, Gyro gyroscope, Mag magnetometer, ADL activities of daily living, RT Rrutine tasks, GC gestures in car maintenance
Vision-based HAR
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| DS& | Activities | Datasets | Subjects | Scenarios | # Actions | Performance evaluation | CIT* | |
| R1 | DC | 10, 10 | UTKinect Action dataset, and CAD-60 | 10 (30 joints), 4 | ADL: Home, kitchen, bedroom, bathroom, living room | 60 | 97%, 96.5% | Phyo et al. ( |
| R2 | DC | 14, 20, 16 | CAD-60, MSR Action3D and MSR Daily Activity 3D | 4 (2 M-2F), 10, 10 | Indoor | 60/1000/567 | 94.12%, 86.81%, 68.75% | Qi et al. ( |
| R3 | Vid | 10 | Weizmann | 9 | Non uniform background, walk | 90 | 95.90% | Deep and Zheng ( |
| R4 | Vid | 101,51 | UCF-101 HMDB-51 | YT, GV, MC | UCF-101: H–O, ADL, RT. HMDB-51: FE, H–O, H–H, ADL | 13,320/7000 | 92.5%,65.4% | Feichtenhofer et al. ( |
| R5 | Vid | 101, 51 | UCF-101 HMDB-51 | YT, GV, MC | UCF-101: H–O, ADL, RT. HMDB-51: FE, H–O, H–H, ADL | 13,320/7000 | UCF 101: 93.4%, HMDB 51: 66.4% | Feichtenhofer et al. ( |
| R6 | Vid | 400, 600 | AVA, Kinetics 400, Kinetics-600, charades | MC and YT | H–O, H–H, ADL, RT | 57,600/3,00,000/5,00,000 | Kinetics:75.6 and 92.1, Kinetics 600: 81.8, 95.1 | Feichtenhofer et al. ( |
| R7 | Vid | - | AVA | Movie data | H–O, H–H, ADL, RT | 57,600 | Feichtenhofer and Ai ( | |
| R8 | Vid | 101, 51, 400, 174 | UCF-101, HMDB-51, kinetics-400, Something Something v1 | YT, GV, MC | H–O, ADL, RT | 13,320/7000/240 k/86,017 | 74.9%, 98.1%, 80.9%, 53.0% | Crasto et al. ( |
| R9 | Vid | Proposed HVU dataset (with YT-8 M), Kinetics-600 and HACS dataset | MC, YT | H–O, H–H | 572 k videos—9 M annotation | _ | Diba et al. ( | |
| R10 | Vid | 101 | UCF 101 | YT | H–O, H–H, playing musical instruments, and sports | 13,320 | 90.2% | Diba et al. ( |
| R11 | Vid | – | UCSD: Ped1 and Ped 2, Avenue, Subway: entrance and exit | SVF | Crowd data from surveillance camera | _ | 90.5%, 88.9%, 90.3%, 91.6%, 98.4% | Wang et al. ( |
| R12 | Vid | 101, 51 | UCF-101, HMDB-51 | YT, GV, MC | UCF-101: H–O, ADL, RT. HMDB-51: FE, H–O, H–H, ADL | 13,320/7000 | 98% | Simonyan and Zisserman ( |
| R13 | Vid | 174, 27, 101, 51, 400 | Something something v1 and v2, Jester, UCF-101, HMDB-51 and Kinetics-400 | Actions while using objects | V1, v2: H–O, Jester: crowd acted, | 108,499/220,887/148,092/300,000/13,320/6766 | 80.4, 89.8%, 99.9%, 91.6%,96.2%, 72.2% | Jiang et al. ( |
| R14 | Vid | UCSD Ped1, ped2 | SVF | Crowd data of surveillance camera | 70/28 | 97–98% accuracy on UMN dataset | Thida et al. ( | |
| R15 | Vid | 10, 6, 8 | Weizmann, KTH, Ballet | ADL, Ballet dance | Weizmann: walk pattern from different angles, Ballet: DVD | 90/6 × 100/44 | _ | Vishwakarma and Singh ( |
| R16 | DC | 17 (15 J), 4 Walk (20 T) | SPHERE, DGD: Gait Dataset | 9 and 7 subjects | SPHERE: with three anomalies (short stop, stairs with right and stairs with left leg, DGD: four gait types | 48/56 | F1-score: SPHERE: 1, DGD: 0.98 | Chaaraoui ( |
| R17 | DC | Full body poses | FLIC, MPII | Elbow and wrist action images | FLIC: images from films, MPII: daily activity pose images | 5003/25 k images | FLIC: 99% (elbow), 97% (wrist), MPII: 91.2% (elbow), 87.1% (wrist) | Newell Alejandro ( |
| R18 | DC | 20 | Proprietary dataset | 10 (9 M–1F) | Daily activities | 6220 frames | 90.91% | Xia et al. ( |
*CIT citation, &DS data source, DC depth camera, Vid video, J joints, ADL activities of daily living, H–H Human to human interaction, H–O Human to object interaction, RT routine tasks, YT YouTube, MC Movie clip, GV Google video
RFID-based HAR
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| Data source | Activities | Datasets | Subjects | Scenarios | # Actions | Accuracy | CIT* | |
| R1 | RFID passive tags | 10 free weight exercise | Proprietary dataset | 15 (10 exercise) | Two weeks in a gym environment | Exercise Data 15 subs for 10 activities | 90% | Ding et al. ( |
| R2 | RFID passive tags | 10 actions and 5 phases of resuscitation | Proprietary dataset | RSS during 16 actual trauma resuscitation | Realtime resuscitation | 16 Trauma resuscitation data | 80.2% | Li et al. ( |
| R3 | RFID passive tags | 23 orientation sensitive activities | Proprietary dataset | 6 (5 F, 1 M) | Lab environment | 23 actions performed by each subject for 120 secs | 96% | Yao et al. ( |
| R4 | RFID passive tags | 10 ADL activities | Replaced sensors with RFID tags in Ordonez dataset | 2 users’ data of 21 days | Indoor environment, H–O | 2747 instances | 78.3% | Du et al. ( |
| R5 | Smart Wall (passive rfid tags) | 12 simple & complex ADL activities | Proprietary dataset | 4 | 12 activities performed by4 volunteers in mock room with SmartWall | _ | 97.9%, | Oguntala et al. ( |
| R6 | RFID tags | 4 ADL | Proprietary dataset | 10 | Indoor | _ | 83.17% | Fan et al. ( |
H–O human–object interaction, M male, F female, ADL activities of daily living, RSS received signal strength
* CIT citation
Device-free HAR
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| Data source | Activities | Datasets | Subs | Scenarios | # Actions | Accuracy | CIT* | |
| R1 | TL-WDR6500 AP Tx and Intel 5300 NIC card RP Rx | 10 daily activities | Proprietary dataset | 6 volunteers (3 male, 3 female) | Meeting room and lab room | 4400 | 94.20% | Yan et al. ( |
| R2 | Intel 5300 NIC | 27 multi variation activities, Gaits: walk 10 m of volunteers | Proprietary dataset | 5 volunteers (multi variation activities), 10 volunteers (of 10 walk gait data) | Lab environment | 50 groups data for each volunteer | 95% | Fei et al. ( |
| R3 | Intel 5300 NIC | 8 activities divided into 2 groups: torso based and gesture based | Proprietary dataset | Test model performance for 6 volunteers’ data | Training in lab while testing in big hall, apartment and small office | 5760 | 96% | Wang et al. ( |
NIC network interface card, Tx transmitter, Rx receiver, M male, F female
*CIT citation
Sensor-based HAR models
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| # Features | Feature extraction | ML or DL model | Architecture | Metrics | Validation | Hyper-parameters/optimizer/loss function | CIT* | |
| R1 | 6/Time domain | Hand-crafted | SVM | SVM classifier for different kernels (polynomial, Radial basis function and linear) | F1-score, accuracy | tenfold | C, ℽ & degree in grid search | Garcia-Gonzalez et al. ( |
| R2 | Spatial features | Automatic | CNN | C (32) − C (64) − C (128) − P − C (128) − P − C (128) − P – FC (128) – SM | Accuracy | 10% data for validation | LR: 0.001, BS: 50/Adam | Wang et al. ( |
| R3 | Frequency domain | Automatic | CNN | 3C with MP and dropout, 2 FC with dropout and SM | F1-score, Precision, recall | CV | LR: 0.01, DO/Adam | Lawal and Bano ( |
| R4 | Time domain | Automatic | CNN-RNN with attention mechanism | TRASEND: C1- C2- C3)- flatten and concat, merge layer- temporal information extractor using a 8-headed self-attention mechanism RNN, o/p layer | F1-score | Leave one user out and CV | LR: {0.001, 0.0001,,00,001}/Adam/Cross Entropy | Buffelli and Vandin ( |
| R5 | Spatial, Temporal | Automatic | LSTM-CNN | 2 LSTM layer (32 neurons), CNN (64), Max pooling, CNN (128), GAP, BN, o/p layer(softmax) | F1-score, accuracy | _ | LR: 0.001/Adam/Cross entropy | Xia et al. ( |
| R6 | 18/Time & Frequency domain | Hand-crafted | AdaBoost, AdaBoost-CNN, CNN-SVM | For AdaBoost CNN- 4C, AP, FC, SM | Accuracy | Sub-out validation | Experiment with and without personalization similarity | Ferrari et al. ( |
| R7 | 225 sensory features | Automatic | DNN | Layer 1(256), layer2 (512), layer 3 (128), O/p (softmax) | Accuracy, F1-score, Specificity, Sensitivity | 5% training data is used | No. of layers, no. of nodes per layer, appropriate regularization function | Fazli et al. ( |
| R8 | Time domain | Automatic | CNN- CapsNet architecture | SenseCapsNet: I/p, 1D C (K = 5, S = 1), Primary caps: C2(K = 5, S = 2) and squash, Activity caps where k is kernel size and S is strides | Precision, recall | tenfold CV | mini batches:64,LR: 0.01, DO/SGD | Pham et al. ( |
CV cross validation, LOSO leave one subject out, C convolution, P pooling, AP average pooling, MP max pooling, FC fully connected, SM softmax, BN batch normalization layer, LR learning rate, DO dropout, BS batch size, SGD stochastic gradient descent, concat concatenation, Spec. specificity, Sens sensitivity, TL transfer learning, CIT citations
Vision-based HAR models
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| # Features | Feature extraction | ML/DL model | Architecture | Metrics | Validation | Hyper-parameters/optimizer/loss function | CIT* | |
| R1 | T. domain | Hand-crafted | 3DCNN on Color-skl-MHI and RJI | I/p layer with skeletal joints, Color-skl-MHI followed by 3D-DCNN, RJI followed by 3D-DCNN, decision fusion, o/p | Accuracy | Cross validation | DO ratios for the three hidden layers (0.1%, 0.2%,0.3%)/SGD | Phyo et al. ( |
| R2 | Spatio temporal | Automatic | VGG-16, VGG-19, inception v3 | 224 × 224 is image is input and features from fc1 layer are extracted which gives 4096-dimensional vector for per image | Accuracy, precision, recall, F1-score | 10% data is used for validation | All 3 CNNs trained on imageNet then trained on Weizmann reusing same weights | Deep and Zheng ( |
| R3 | Spatio temporal | Automatic | ResNet-50 | C1, MP, C2-C5, AP, FC (2048), SM | Accuracy | Evaluates UCF-101 and HMDB-51 | BS- 128, DO, LR: | Feichtenhofer et al. ( |
| R4 | Spatio temporal | Automatic | ResNet-50 | Raw clip i/P C1- P- C2- C3- C4- C5- GAP- FC- No. of classes. Pre-train on Kinetics-400, Kinetics-600 and kinetics-700 | mAPS, GFLOPS | Evaluate model performance on AVA dataset | LR, WD: | Feichtenhofer and Ai ( |
| R5 | Spatio temporal | Automatic | MERS model with ResNeXt-101 | MERS: Train using flow, freeze weights, train with RGB using MSE loss. MARS: Train using privileged flow n/w, freeze weights, use RGB frames during test phase | top-1 mean accuracy | Kinetics 40: 20 k, MiniKinetics: 5 k | WD = 0.0005, LR = 0.1, momentum = 0.9 and LR = 0.1 for 64f-clips/SGD/Cross entropy | Crasto et al. ( |
| R6 | Spatio temporal | Automatic | HATnet based on ResNet-50 and STCnet | 2D ConvNets: to extract spatial structure, 3DConv: to deal with interaction in frames. Both 2D and 3D use ResNet-50 | Top-1 mAPS | Kinetics 400 and 600 | Fine tune on UCF-101 & HMDB-51/Cross entropy | Diba et al. ( |
| R7 | Spatio-temporal | Automatic | 2D ResNet 50 with STM blocks | Video frames i/p, C1, C2x, C3x, C4x, C5x, FC, o/p. Replace all residual block with STM block (1 × 1 2D conv, followed by CMM and CSTM blocks, then 1 × 1 2D Conv) | top-1, top-5 accuracy | Kinetics 400: 19,095 | LR = 0.01, LR = 0.001 for 25 epochs, momentum = 0.9, WD = 2.5 | Jiang et al. ( |
T time, F frequency, CV cross validation, LOSO leave one subject out, C convolution, P pooling, AP average pooling, MP max pooling, FC fully connected, SM softmax, BN batch normalization layer, LR learning rate, DO dropout, BS batch size, SGD stochastic gradient descent, mAPs mean average precision, GFLOP giga floating point operations per second, Spec specificity, Sens sensitivity, AUC area under curve, EER equal error rate, TL transfer Learning
*CIT citations
RFID-based HAR models
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| # features/ | Feature extraction | ML/DL model | Architecture | Metrics | Validation | Hyper-parameters/Optimizer Loss Function | CIT* | |
| R1 | 84 features for F-statistics, Relief-F, Fisher | Hand-crafted | Canonical Correlation Analysis (CCA) | Divide RSSI stream into segments, CCA for extracting features by computing canonical correlation for each feature pair, activity specific dictionary is formed: Sparse coding and dictionary is updated sequentially using K-SVD | F1-score | One sub out validation strategy | _ | Yao et al. ( |
| R2 | Frequency | Automatic | LSTM | I/p layer, two hidden layers and o/p layer | Precision, accuracy | _ | Timestep, neuron in hidden layers/Adam/Cross entropy | Du et al. ( |
| R3 | Frequency | Hand-crafted | Multi variate Gaussian Approach | ADL activity data gathering, score each activity with gaussian pdf, Human Activity Recognition Based on Maximum likelihood Estimation, Activity classification | Accuracy, precision, recall, F1-score, root mean square error (RMSE) | _ | _ | Oguntala et al. ( |
*CIT citations, DTW dynamic time wraping, DO dropout, LR learning rate, SVM support vector machine, LSTM long short-term memory
Device-free HAR models
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
|---|---|---|---|---|---|---|---|---|
| # Features | Feature extraction | ML/DL model | Architecture | Metrics | Validation | Hyper-parameters/optimizer loss function | CIT* | |
| R1 | Frequency | Hand-crafted | ELM | AACA: using the difference between the activity and the stationary parts in the signal variance feature. For recognition use 3-layer ELM with an i/p layer with 200 neurons, an o/p (10) and hidden layer (40) neurons | Accuracy | Tested the performance on 1100 samples | No. of hidden layer neurons (after 400 becomes stable), different users, impact of total no. of samples | Yan et al. ( |
| R2 | Frequency CSI | Hand-crafted | DTW: by comparing similarity b/w waveforms | CP decomposition: decompose the CSI signals with CP and each rank-one tensor after decomposition is regarded as the feature. With DTW, we can compare the similarity between 2 waveforms and identify action | Accuracy | Recognition of gaits using MARS | Impact of nearby people, test for system delay | Fei et al. ( |
| R3 | CSI (time & amplitude) | Automatic | LSTM | CNN extracts spatial features from multiple antenna pairs, then CNN o/p is given to LSTM followed by FC | FPR, precision, recall, F1-score | DO, Layer size, recognition method | Wang et al. ( |
*CIT citations, DTW dynamic time wraping, FPR false positive rate, DO dropout, ELM extreme learning machine, CSI channel state information
Fig. 3a Changing pattern of HAR devices over time, b distribution of machine learning (ML) and deep learning (DL) articles in last decade
Fig. 4a Types of HAR devices, b sensor-based devices, c vision-based devices, d HAR applications. WS: wearable sensors, SPS: smartphone sensor, sHome: smart home, mHealthcare: health care monitoring, cSurv: crowd surveillance, fDetect: fall detection, eMonitor: exercise monitoring, gAnalysis: gait analysis
Nine review articles published between 2013 and 2020
| S. no. | Citation | Year | Focus area | Keywords | #K* | Period | #CI$ |
|---|---|---|---|---|---|---|---|
| 1 | Wang et al. ( | 2020 | Device-free HAR | HAR, gesture recognition, motion detection, Device-free, dense sensing, IoT, RFID, human object interaction | 7 | 2011–2017 | 212 |
| 2 | Wang et al. ( | 2020 | Sensor-based HAR | Mobile sensing, human-behaviour, human behaviour inference, cloud-edge computing, activity recognition, context awareness | 6 | 2007–2018 | 136 |
| 3 | Demrozi et al. ( | 2020 | Sensor-based HAR | HAR, DL, ML, available datasets, accelerometer, sensors | 6 | 2015–2019 | 219 |
| 4 | Beddiar et al. ( | 2020 | Vision-based | HAR, behaviour understanding, Action representation, action detection, computer vision | 4 | 2010–2019 | 237 |
| 5 | Dhiman Chhavi ( | 2019 | Vision-based recognition | Two-dimension anomaly detection, three-dimensional anomaly detection, crowd anomaly, skeleton based fall detection, AAL | 5 | 2006–2018 | 226 |
| 6 | Lima et al. ( | 2019 | Sensor-based HAR | HAR, smartphones, feature extraction, inertial sensors | 4 | 2006–2017 | 149 |
| 7 | Lara and Labrador ( | 2018 | Sensor-based HAR | AAL, HAR, ADL, activity recognition system (ARS), dataset | 6 | 2003–2017 | 134 |
| 8 | Hx et al. ( | 2017 | Sensor-based HAR | DL, activity recognition, pervasive computing, pattern recognition | 4 | 2011–2017 | 80 |
| 9 | Ke et al. ( | 2013 | Video-based activity recognition | HAR, segmentation, feature representation, health monitoring, security surveillance, human computer interface | 5 | 2003– 2012 | 145 |
#K keywords, $#CI #citations, AAL ambient assistive living, ADL activity of daily living
Multifaceted HAR application with varying HAR devices and AI
| App | Application | HAR device | Activity-type | AI model | Architecture | *CIT |
|---|---|---|---|---|---|---|
| cSurv | Crowd surveillance | Subway camera video footage | Group | DL | Laplacian eigenmap feature extraction and k-means clustering-based recognition | Thida et al. ( |
| Video data from mobile clips | ADL | DL | Transfer learning-based model using VGG-16 and InceptionV3 | Deep and Zheng ( | ||
| mHealthcare | Health monitoring | Smartphone sensor | ADL | DL | CNN (two variants CNN-2 and CNN-7) | Zhu et al. ( |
| Smartphone sensor | ADL | DL | Hierarchical model-based on DNN | Fazli et al. ( | ||
| On-body sensor (Watch and shoe) | ADL | DL | CapSense (a CNN capsule n/w) | Pham et al. ( | ||
| RFID data collection in an actual trauma room | Single person | DL | CNN model based on data collected from passive RFID tags for trauma resuscitation | Li et al. ( | ||
| sHome | Smart home/Smart cities | Smartphone sensor | ADL | DL | CNN (two variants CNN-2 and CNN-7) | Zhu et al. ( |
| Depth sensor | ADL | DL | 3DCNN (Color-skl-MHI and RJI) for elderly care in smart home | Phyo et al. ( | ||
| Wireless sensor | AAL | ML | Child activity monitoring based HAR model | Nam and Park ( | ||
| Video | Daily routine | ML | Disabled care HAR model | Jalal et al. ( | ||
| Wireless sensor | Forget and Repeat (AAL) | DL | RNN based smart home HAR model for dementia suffering patients | Arifoglu and Bouchachia ( | ||
| fDetect | Fall detection | Smartphone accelerometer | ADL | ML and DL | AdaBoost-HC, AdaBoost-CNN, SVM-CNN | Ferrari et al. ( |
| eMonitor | Exercise monitoring | Weizmann, KTH with ADL, Ballet: from DVD | ADL and Ballet dance moves | ML | SVM-KNN with PCA | Vishwakarma and Singh ( |
| Free weight exercise data recorded with RFID tags | Free weight exercise | RF | FEMO with Doppler shift profile | Ding et al. ( | ||
| gAnalysis | Gait analysis | SPHERE, DGD: DAI Gait | Gait pattern | ML | JMH feature and BagofKeyPoses recognition | Chaaraoui ( |
*CIT citation, RF conventional RF profiling, ML machine learning and DL deep learning, ADL activities of daily living, FEMO free weight exercise monitoring, Color-skl-MHI color skeleton motion history images, and RJI relative joint image
Fig. 5a CNN model for HAR . , b TL-based model for HAR, and c hybrid HAR model (CNN-LSTM)
Previous reviews versus proposed review
| C1 | C2 | C3 | C4 | C5 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Citation | Year | HAR devices | AI type | Dataset (HAR-devices) | ||||||||
| Sensor | Vision | RFID | DFp | ML | DL | Sensor | Vision | RFID | DFp | |||
| R1 | Hussain et al | 2020 | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| R2 | Carvalho and Sofia ( | 2020 | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| R3 | Demrozi et al. ( | 2020 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| R4 | Beddiar et al | 2020 | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| R5 | Dhiman Chhavi ( | 2019 | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| R6 | Lima et al | 2019 | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| R7 | De-La-Hoz-Franco ( | 2018 | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| R8 | Hx et al. ( | 2017 | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ |
| R9 | Ke et al. ( | 2013 | ✗ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| R10 | Proposed | 2021 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
DFp Device-free
Sensor datasets
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset name/Publicly available | Year | Source | # Classes | # Actors | Sensor location | Activity type | Single/multiple person | Size | ||||||
| FE | H–O | H–H | ADL | G | RT | |||||||||
| R1 | Skoda/Pub | 2008 | WS | 10 gestures in car maintenance | 1 subject | 19 sensors on both arms | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | Single | _ |
| R2 | USC-HAD/Pub | 2012 | IMU with Acc, Gyro, Mag | 12 activities | 17 (7 M, 7 F) | Front right hip | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | _ |
| R3 | PAMAP/Pub | 2012 | WS (IMU, 3 Colibri), HR monitor | 18 activities | 9 subjects | _ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | Single | 3,850,505 (52 attributes) |
| R4 | Opportunity/Pub | 2012 | WS:(7 IMU, 12 Acc,7 Loc), OS (12), AS (21) | 6 runs per subject (5 ADL and 6th for drill) | 4 subjects | Upper body, hip and leg | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | Single | 2551 (242 attributes) |
| R5 | UCI-HAR/Pub | 2012 | SPS (Acc, Gyro) | 6 activities | 30 (19–48 years) | Samsung galaxy SII mounted on waist | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | 10,299 (561 attributes) |
| R6 | Heterogeinity (HHAR)/Pub | 2015 | SPS and SW Acc, Gyro | 6 activities | 9 users | 8 SPS & 4 SW | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | 43,930,257 (16 attributes) |
| R7 | MobiAct/Pub | 2016 | SP (Acc, Gyro) | 9 ADL activities and 4 types of falls | 57 subjects (42 M, 15F) of (20–57 years) | Samsung Galaxy S3 SP in trousers’ pocket | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | 2500 |
| R8 | UniMibShar/Pub | 2017 | SPS | 17 activities (9 ADL and 8 fall) | 30 of (18- 60 years) | _ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | 11,771 samples |
| R9 | WISDM/Pub | 2019 | SPS and SW’s Acc, Gyro | 18 activities | 51 | _ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | 15,630,426 |
| R10 | 19NonSense/Pri | 2020 | e-Shoe, Samsung gear SW | 18 activities (9 indoor and 9 outdoor sports) | 13 (5F, 8 M) of (19–45 years) | Foot, arm | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | Single | Not public dataset |
* CIT citations, SPS smartphone sensor, WS wearable sensor, SP smartphone, Acc accelerometer, Gyro gyroscope, Loc location, SW smartwatch, ADL activities of daily living, M male, F female, Pub publicly available, Prop proprietary
Video datasets
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset name/Publicly available | Year | Source | # Classes | # Actor | Body part involved | Activity type | Single/multiple person | Size | ||||||
| FE | H–O | H–H | ADL | G | RT | |||||||||
| R1 | CAD-60/Pub | 2009 | Kinect | 5 (environments) 12 (activities) | 4 (2 M, 2F) | Whole body joint | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ | Single | 60 videos |
| R2 | CAD-120/Pub | 2009 | Kinect | 10 (High level) 10 (sub activity labels) 12 (sub affordance labels) | 4 (2 M, 2F) | Whole body joint | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ | Single | 120 videos |
| R3 | MSR Action 3D/Pub | 2009 | DC | 10 subjects, 20 action, 20 3D joints) | 10 | Whole body | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | Single | 336 action files |
| R4 | UT Kinect/Pub | 2012 | Kinect | 10 actions | 10 | Whole body | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | Single | 1.79 GB |
| R5 | AVA/Pub | 2018 | MC | 80 atomic visual actions | 192 movies | Whole body | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | Both | 57,600 videos |
| R6 | UCF-101/Pub | 2012 | YT | 101 actions | 2,500 videos | Whole body | ✗ | ✓ | ✓ | ✓ | ✗ | ✓ | Both | 13 K clips 27 h |
| R7 | HMDB51/Pub | 2011 | YT, GV, MC | 51 action classes | 3,312 videos | Whole body | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | Both | 6,766 clips of 2 GB |
| R8 | Charades/Pub | 2016 | ADL | 157 classes | 267 | Whole body | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | Single | 4855 KB |
| R9 | Kinetics 400/Pub | 2017 | YT | 400 | 400–1000 clips/class | Whole body | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | Both | 3,00,000 |
| R10 | Kinetics 600/Pub | 2018 | YT | 600 | 600–1000 clips/class | Whole body | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | Both | 5,00,000 |
| R11 | SomethingSomething/Pub | 2018 | Objects Actions | 174 classes | H-I actions | Whole body | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | H–O interaction | 108,499 videos |
| R12 | Weizmann/Pub | 2005 | ADL | 10 action classes | 2 subs | Whole body | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | Single | 90 videos |
| R13 | UCSD/Pub | 2013 | camera | Peds1 and Peds2 | Subway | People group | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | Surveillance data | Peds1: 60 & Peds2: 28 |
*CIT citations, ADL activities of daily living, M male, F female, YT YouTube, MC movie clip, DC depth camera, Pub publicly available, Prop proprietary
Evaluation metrics
| S. no. | Metrics | Description |
|---|---|---|
| 1 |
| Ratio of number of correct prediction and total number of input samples |
| 2 |
| It is the no. of correct positives divided by the predicted positives |
| 3 |
| It is the no. of correct positives divided by total no. of true positives and false negatives |
| 4 |
| Harmonic mean between precision and recall |
| 5 |
| The proportion of actual negatives predicted as positives |
| 6 |
| The proportion of actual positives predicted as positives |
| 7 | Positive
| LHR assess the goodness of fit of two competing statistical models based on their likelihoods |
P precision, R recall, TP true positive, TN true negative, FP false positive, and FN false negative, LHR likelihood ratio