| Literature DB >> 34960321 |
Saad Irfan1, Nadeem Anjum1, Nayyer Masood1, Ahmad S Khattak2, Naeem Ramzan3.
Abstract
In recent years, a plethora of algorithms have been devised for efficient human activity recognition. Most of these algorithms consider basic human activities and neglect postural transitions because of their subsidiary occurrence and short duration. However, postural transitions assume a significant part in the enforcement of an activity recognition framework and cannot be neglected. This work proposes a hybrid multi-model activity recognition approach that employs basic and transition activities by utilizing multiple deep learning models simultaneously. For final classification, a dynamic decision fusion module is introduced. The experiments are performed on the publicly available datasets. The proposed approach achieved a classification accuracy of 96.11% and 98.38% for the transition and basic activities, respectively. The outcomes show that the proposed method is superior to the state-of-the-art methods in terms of accuracy and precision.Entities:
Keywords: deep learning; human activity recognition; hybrid models; transition activities
Mesh:
Year: 2021 PMID: 34960321 PMCID: PMC8706790 DOI: 10.3390/s21248227
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Literature Summary.
| Ref. | Model Type | Network | Accuracy (%) | Transition Activities | Weaknesses |
|---|---|---|---|---|---|
| [ | Machine Learning | SVM + SFFS | 96.80 | No | Higher accuracy on smaller datasets—increase in data causes decrease in accuracy. |
| [ | Machine Learning | STD-TA | 80.00 | Yes | A conventional SVM with an average accuracy that extracts statistical features to differentiate between transitional and basic activities. |
| [ | Machine Learning | SVM-TED | 81.62 | Yes | A traditional SVM with a transition event detection module to detect postural transitions but lacks accuracy for efficient identification of an action. |
| [ | Deep Learning | CNN | 91.00 | No | Requires strongly labeled data as well as increased features in data. |
| [ | Deep Learning | BiLSTM | 87.50 | Yes | Single BiLSTM unit cannot extract quality features from the input, no past information to correlate the data with. Works better on time series data. |
| [ | Deep Learning | Multi-LSTM | 89.00 | Yes | Multiple pipelined LSTM units used in this approach, causing the network to train slowly and increasing the complexity of the whole model. Any fault or irregularity in a single LSTM unit affects the overall pipeline of LSTM units. |
| [ | Deep Learning | DBN | 95.80 | Yes | DBN makes the network architecture more complex to train, and it has been replaced with ReLu, which better handles the vanishing gradient problem. |
| [ | Hybrid | INN + RNN | 94.00 | No | INN has poor initialization, which makes it hard to debug, thus increasing the cost of the system. Moreover, a fine-tuned CNN can achieve the same or better performance than INN, which is no longer used in state-of-the-art systems. |
| [ | Hybrid | GBDT | 94.90 | Yes | Gives best results on smaller datasets whereas accuracy decreases as the data increase. |
| [ | Hybrid | CNN + LSTM | 95.80 | Yes | The model itself is complex and the CNN used is a conventional CNN with a basic three-layered structure that is not optimized at all. Complex activities and their transitions were not considered. |
Figure 1Architecture of the proposed system.
Figure 2LSTM unit.
LSTM & BiLSTM Parameters.
| Parameter | Value-Dataset A | Value-Dataset B |
|---|---|---|
|
| 561 | 60 |
|
| 0.002 | 0.002 |
|
| 100 | 50 |
| Optimizer | Adam | Adam |
|
| 0.5 | 0.5 |
| Epochs | 400 | 100 |
Figure 3BiLSTM unit.
Figure 4CNN unit.
CNN parameters.
| Parameter | Value-Dataset A | Value-Dataset B |
|---|---|---|
|
| 24 × 24 | 8 × 8 |
|
| 1 | 1 |
|
| 8 | 8 |
|
| 18 | 18 |
|
| 2 × 4 | 2 × 4 |
|
| 2 × 8 | 2 × 8 |
|
| 2 | 2 |
|
| ReLu | ReLu |
|
| 0.002 | 0.002 |
| Epochs | 50 | 50 |
Human Activities and Postural Transitions dataset (Dataset A) overview.
| Activity | Training Instances | Test Instances |
|---|---|---|
| Walking | 1226 | 496 |
| Walking Upstairs | 1073 | 471 |
| Walking Downstairs | 987 | 420 |
| Sitting | 1293 | 508 |
| Standing | 1423 | 556 |
| Laying | 1413 | 545 |
| Stand to Sit | 47 | 23 |
| Sit to Stand | 23 | 10 |
| Sit to Lie | 75 | 32 |
| Lie to Sit | 60 | 25 |
| Stand to Lie | 90 | 49 |
| Lie to Stand | 57 | 27 |
Human Activity dataset (Dataset B) overview.
| Activity | Training Instances | Test Instances |
|---|---|---|
| Sitting | 5265 | 585 |
| Standing | 5598 | 622 |
| Walking | 4856 | 540 |
| Running | 3561 | 395 |
| Dancing | 2388 | 267 |
Comparison with state-of-the-art approaches in terms of average accuracy (Dataset A).
| Approach | Average Accuracy (%) |
|---|---|
| STD-TA [ | 80.00 |
| SVM-TED [ | 81.62 |
| LSTM [ | 89.00 |
| GBDT [ | 94.90 |
| DBN [ | 95.80 |
| CNN-LSTM [ | 95.80 |
| Proposed | 96.11 |
Accuracy, precision, recall and F-measure of various activities—dataset A.
| Activity ID | Accuracy (%) | Precision (%) | Recall (%) | F-Measure (%) |
|---|---|---|---|---|
| A1 | 99.34 | 97.00 | 99.00 | 98.00 |
| A2 | 99.11 | 98.00 | 96.00 | 97.00 |
| A3 | 99.56 | 99.00 | 98.00 | 98.00 |
| A4 | 98.13 | 96.00 | 92.00 | 94.00 |
| A5 | 98.36 | 94.00 | 97.00 | 95.00 |
| A6 | 100.00 | 100.00 | 100.00 | 100.00 |
| A7 | 99.49 | 62.00 | 78.00 | 69.00 |
| A8 | 99.97 | 91.00 | 100.00 | 95.00 |
| A9 | 99.75 | 84.00 | 90.00 | 87.00 |
| A10 | 99.59 | 73.00 | 76.00 | 75.00 |
| A11 | 99.46 | 80.00 | 82.00 | 81.00 |
| A12 | 99.46 | 70.00 | 56.00 | 62.00 |
Accuracy, precision, recall and F-measure of various activities—dataset B.
| Activity ID | Accuracy (%) | Precision (%) | Recall (%) | F-Measure (%) |
|---|---|---|---|---|
| B1 | 99.92 | 100.00 | 100.00 | 100.00 |
| B2 | 99.88 | 100.00 | 100.00 | 100.00 |
| B3 | 99.58 | 99.00 | 99.00 | 99.00 |
| B4 | 98.71 | 96.00 | 96.00 | 96.00 |
| B5 | 98.67 | 93.00 | 95.00 | 94.00 |
Average accuracy of the proposed approach on two datasets.
| Average Accuracy (%) | ||
|---|---|---|
| Proposed Approach | dataset A | dataset B |
| 96.11% | 98.38% | |
Confusion matrix of activities—dataset A.
| Predicted | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
| Actual | A1 | 490 | 12 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| A2 | 3 | 454 | 7 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| A3 | 3 | 1 | 410 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| A4 | 0 | 0 | 0 | 467 | 15 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | |
| A5 | 0 | 0 | 0 | 35 | 540 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| A6 | 0 | 0 | 0 | 0 | 0 | 545 | 0 | 0 | 0 | 0 | 0 | 0 | |
| A7 | 0 | 4 | 0 | 6 | 1 | 0 | 18 | 0 | 0 | 0 | 0 | 0 | |
| A8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 10 | 0 | 0 | 0 | 0 | |
| A9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 0 | 5 | 0 | |
| A10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 2 | 5 | |
| A11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 36 | 6 | |
| A12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 14 | |
Confusion matrix of activities—dataset B.
| Predicted | ||||||
|---|---|---|---|---|---|---|
|
|
|
|
|
| ||
| Actual | A1 | 584 | 1 | 0 | 0 | 0 |
| A2 | 0 | 619 | 7 | 0 | 0 | |
| A3 | 1 | 2 | 536 | 3 | 0 | |
| A4 | 0 | 0 | 0 | 378 | 14 | |
| A5 | 0 | 0 | 4 | 14 | 251 | |
Figure 5Execution time of the CNN-LSTM and proposed approach on dataset A.