| Literature DB >> 36231351 |
Ye Htet1, Thi Thi Zin2, Pyke Tin3, Hiroki Tamura2, Kazuhiro Kondo4, Etsuo Chosa4.
Abstract
Addressing the problems facing the elderly, whether living independently or in managed care facilities, is considered one of the most important applications for action recognition research. However, existing systems are not ready for automation, or for effective use in continuous operation. Therefore, we have developed theoretical and practical foundations for a new real-time action recognition system. This system is based on Hidden Markov Model (HMM) along with colorizing depth maps. The use of depth cameras provides privacy protection. Colorizing depth images in the hue color space enables compressing and visualizing depth data, and detecting persons. The specific detector used for person detection is You Look Only Once (YOLOv5). Appearance and motion features are extracted from depth map sequences and are represented with a Histogram of Oriented Gradients (HOG). These HOG feature vectors are transformed as the observation sequences and then fed into the HMM. Finally, the Viterbi Algorithm is applied to recognize the sequential actions. This system has been tested on real-world data featuring three participants in a care center. We tried out three combinations of HMM with classification algorithms and found that a fusion with Support Vector Machine (SVM) had the best average results, achieving an accuracy rate (84.04%).Entities:
Keywords: Hidden Markov Model; Histogram of Oriented Gradients; Support Vector Machine; Viterbi Algorithm; YOLOv5; action recognition; depth colorization; e-Healthcare; older persons; person detection
Mesh:
Year: 2022 PMID: 36231351 PMCID: PMC9566476 DOI: 10.3390/ijerph191912055
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Overview of the proposed system.
Comparison of system storage space.
| Data Type | Data Recorded Duration | Storage Space |
|---|---|---|
| CSV files | 1 h | 1.73 GB |
| Colorized Images | 1 h | 100 MB |
Figure 2The hue color scale used for the colorization process.
Figure 3Alternating pixel values between depth and color values.
Figure 4Results of the colorization process.
Figure 5Sample person detection results.
Figure 6Person silhouette cropping, resizing, and depth recovery.
Figure 7The architecture of feature extraction.
Figure 8Specific action images.
Figure 9HMM model structure.
Training datasets used for calculating HMM emission probability distribution B.
| Action | Transition | Seated | Standing | Sitting | Lying |
|---|---|---|---|---|---|
| State |
|
|
|
|
|
| # Sequence in Dataset- | 900 | 900 | 900 | 900 | 900 |
| # Sequence in Dataset- | 100 | 100 | 100 | 100 | 100 |
Figure 10Heatmap visualization after training with Baum-Welch Algorithm for: (a) HMM transition probability matrix A; (b) HMM emission probability matrix B.
Figure 11HMM, prediction on testing image sequences.
Confusion matrix of HMM prediction results tested on the training dataset.
| Mean + HMM | |||||
|---|---|---|---|---|---|
| Actual Actions | Predicted Actions | ||||
| Transition | Seated | Standing | Sitting | Lying | |
| Transition | 90 | 1 | 7 | 2 | 0 |
| Seated | 14 | 86 | 0 | 0 | 0 |
| Standing | 7 | 0 | 93 | 0 | 0 |
| Sitting | 14 | 0 | 0 | 86 | 0 |
| Lying | 2 | 0 | 0 | 0 | 98 |
Confusion matrices of HMM prediction results tested on the training dataset.
| (a) k-NN + HMM | |||||
|---|---|---|---|---|---|
| Actual Actions | Predicted Actions | ||||
| Transition | Seated | Standing | Sitting | Lying | |
| Transition | 100 | 0 | 0 | 0 | 0 |
| Seated | 1 | 99 | 0 | 0 | 0 |
| Standing | 0 | 0 | 100 | 0 | 0 |
| Sitting | 1 | 0 | 0 | 99 | 0 |
| Lying | 1 | 0 | 0 | 0 | 99 |
|
| |||||
|
|
| ||||
|
|
|
|
|
| |
| Transition | 100 | 0 | 0 | 0 | 0 |
| Seated | 0 | 100 | 0 | 0 | 0 |
| Standing | 1 | 0 | 99 | 0 | 0 |
| Sitting | 2 | 0 | 0 | 98 | 0 |
| Lying | 0 | 0 | 0 | 0 | 100 |
Comparison of three methods.
| Room ID | Total | Average Accuracy for All Sequences | ||
|---|---|---|---|---|
| Mean + HMM | k-NN + HMM | SVM + HMM | ||
| 1 | 22 | 87.05 | 95.19 | 90.28 |
| 2 | 10 | 74.83 | 89.01 | 81.37 |
| 3 | 17 | 79.41 | 57.56 | 80.48 |
| Average Accuracy | 80.43 | 80.59 | 84.04 | |
Accuracy for each specific action tested with the SVM + HMM method.
| Room ID | Accuracy (%) | |||||
|---|---|---|---|---|---|---|
| Transition | Seated | Standing | Sitting | Lying | Overall | |
| 1 | 63.37 | 95.45 | 97.24 | 95.41 | 91.65 | 90.28 |
| 2 | 54.93 | 75.09 | - | 98.83 | 74.61 | 81.37 |
| 3 | 74.13 | 48.68 | 59.96 | 83.87 | 91.89 | 80.48 |
| Average | 64.14 | 73.07 | 78.60 | 92.70 | 86.05 | 84.04 |
Accuracy for each testing sequence from each room tested with the SVM + HMM method.
| (a) Room 1 Sequences | |||||
|---|---|---|---|---|---|
| Sequence | Duration | Total Frames | Start Date and Time | Accuracy | Processing Time (hour) |
| Seq_1 | 0.05 | 168 | 21 October 2019_07:19:43 | 86.31 | 0.02 |
| Seq_2 | 0.12 | 433 | 24 October 2019_08:44:51 | 87.76 | 0.07 |
| Seq_3 | 0.15 | 530 | 21 October 2019_18:25:28 | 85.66 | 0.07 |
| Seq_4 | 0.34 | 1230 | 19 October 2019_12:47:07 | 72.11 | 0.19 |
| Seq_5 | 0.47 | 1684 | 21 October 2019_17:37:38 | 91.75 | 0.25 |
| Seq_6 | 0.61 | 2192 | 22 October 2019_17:16:55 | 94.30 | 0.33 |
| Seq_7 | 0.62 | 2235 | 19 October 2019_13:32:27 | 96.06 | 0.35 |
| Seq_8 | 1.08 | 3876 | 24 October 2019_07:27:36 | 83.18 | 0.58 |
| Seq_9 | 1.85 | 6673 | 24 October 2019_11:34:20 | 86.68 | 1.00 |
| Seq_10 | 1.86 | 6687 | 19 October 2019_08:12:33 | 95.10 | 1.02 |
| Seq_11 | 2.60 | 9347 | 21 October 2019_11:54:52 | 90.46 | 1.38 |
| Seq_12 | 2.74 | 9865 | 20 October 2019_11:44:59 | 97.77 | 1.51 |
| Seq_13 | 2.80 | 10,063 | 22 October 2019_11:19:33 | 85.44 | 1.55 |
| Seq_14 | 3.04 | 10,927 | 18 October 2019_12:08:33 | 99.35 | 1.66 |
| Seq_15 | 3.25 | 11,688 | 23 October 2019_11:38:46 | 96.89 | 1.78 |
| Seq_16 | 7.34 | 26,412 | 12 October 2019_20:54:48 | 96.41 | 4.05 |
| Seq_17 | 10.98 | 39,535 | 20 October 2019_18:57:47 | 86.78 | 6.10 |
| Seq_18 | 11.16 | 40,158 | 21 October 2019_18:40:47 | 86.38 | 6.23 |
| Seq_19 | 11.50 | 41,407 | 22 October 2019_18:02:55 | 92.67 | 6.42 |
| Seq_20 | 11.73 | 42,234 | 18 October 2019_17:58:57 | 91.69 | 6.90 |
| Seq_21 | 12.18 | 43,842 | 23 October 2019_17:05:09 | 93.37 | 7.36 |
| Seq_22 | 12.60 | 45,366 | 19 October 2019_17:51:09 | 90.11 | 7.99 |
| Average Accuracy | 90.28 | ||||
|
| |||||
|
|
|
|
|
|
|
| Seq_1 | 0.10 | 362 | 26 October 2019_07:20:50 | 67.96 | 0.05 |
| Seq_2 | 0.30 | 1078 | 26 October 2019_07:32:06 | 92.12 | 0.18 |
| Seq_3 | 0.68 | 2455 | 28 October 2019_04:45:41 | 93.93 | 0.4 |
| Seq_4 | 1.17 | 4214 | 26 October 2019_06:02:33 | 92.05 | 0.71 |
| Seq_5 | 2.17 | 7808 | 27 October 2019_04:35:50 | 83.03 | 1.32 |
| Seq_6 | 2.36 | 8494 | 26 October 2019_09:54:09 | 94.59 | 1.43 |
| Seq_7 | 2.64 | 9488 | 25 October 2019_11:07:16 | 77.18 | 1.58 |
| Seq_8 | 11.35 | 40,846 | 25 October 2019_17:01:02 | 53.04 | 6.89 |
| Seq_9 | 11.73 | 42,214 | 26 October 2019_15:10:58 | 63.57 | 7.55 |
| Seq_10 | 12.19 | 43,896 | 27 October 2019_14:02:05 | 96.22 | 7.4 |
| Average Accuracy | 81.37 | ||||
|
| |||||
|
|
|
|
|
|
|
| Seq_1 | 0.05 | 171 | 26 October 2019_07:56:27 | 84.80 | 0.03 |
| Seq_2 | 0.42 | 1511 | 27 October 2019_04:24:47 | 72.34 | 0.27 |
| Seq_3 | 0.45 | 1630 | 12 October 2019_12:09:13 | 93.56 | 0.28 |
| Seq_4 | 0.54 | 1941 | 26 October 2019_07:23:18 | 80.94 | 0.35 |
| Seq_5 | 0.65 | 2325 | 22 October 2019_19:16:20 | 81.29 | 0.38 |
| Seq_6 | 0.87 | 3146 | 25 October 2019_12:21:40 | 69.52 | 0.55 |
| Seq_7 | 1.03 | 3697 | 12 October 2019_13:47:28 | 93.70 | 0.64 |
| Seq_8 | 1.42 | 5112 | 27 October 2019_06:12:16 | 64.10 | 0.91 |
| Seq_9 | 2.01 | 7240 | 28 October 2019_05:11:31 | 79.93 | 1.29 |
| Seq_10 | 10.58 | 38,085 | 18 October 2019_19:45:54 | 79.95 | 6.37 |
| Seq_11 | 10.71 | 38,551 | 20 October 2019_19:23:14 | 77.13 | 6.57 |
| Seq_12 | 11.28 | 40,608 | 12 October 2019_17:54:12 | 93.84 | 6.83 |
| Seq_13 | 11.38 | 40,971 | 26 October 2019_16:48:13 | 71.05 | 5.85 |
| Seq_14 | 11.69 | 42,092 | 19 October 2019_19:33:25 | 73.67 | 7.31 |
| Seq_15 | 11.74 | 42,263 | 21 October 2019_18:32:10 | 70.23 | 7.42 |
| Seq_16 | 11.90 | 42,823 | 27 October 2019_15:47:34 | 92.31 | 7.18 |
| Seq_17 | 11.91 | 42,882 | 25 October 2019_17:39:06 | 89.82 | 7.38 |
| Average Accuracy | 80.48 | ||||
Comparison of recognition accuracy between the proposed methods and those in previous works.
| Approach | Method | No. of Actions | Accuracy (%) |
|---|---|---|---|
| RGB Images | CNN [ | 15 | 71.00 |
| Skeleton | Random Forest [ | 20 | 70.00 |
| Sensor Data | Two-Layer HMM [ | 13 | 74.85 |
| Sensor Data | Hierarchical HMM [ | 12 | 65.20 |
| Depth Images | Mean HOG + HMM (Proposed) | 5 | 80.43 |
| Depth Images | kNN + HMM (Proposed) | 5 | 80.59 |
| Depth Images | SVM + HMM (Proposed) | 5 | 84.04 |
Figure 12Examples of the common false action recognitions in Situation 1 (left two images); Situation 2 (middle two images); and Situation 3 (right two images).