| Literature DB >> 36091986 |
Milagros Jaén-Vargas1, Karla Miriam Reyes Leiva1,2, Francisco Fernandes3, Sérgio Barroso Gonçalves4, Miguel Tavares Silva4, Daniel Simões Lopes3,5, José Javier Serrano Olmedo1,6.
Abstract
Deep learning (DL) models are very useful for human activity recognition (HAR); these methods present better accuracy for HAR when compared to traditional, among other advantages. DL learns from unlabeled data and extracts features from raw data, as for the case of time-series acceleration. Sliding windows is a feature extraction technique. When used for preprocessing time-series data, it provides an improvement in accuracy, latency, and cost of processing. The time and cost of preprocessing can be beneficial especially if the window size is small, but how small can this window be to keep good accuracy? The objective of this research was to analyze the performance of four DL models: a simple deep neural network (DNN); a convolutional neural network (CNN); a long short-term memory network (LSTM); and a hybrid model (CNN-LSTM), when variating the sliding window size using fixed overlapped windows to identify an optimal window size for HAR. We compare the effects in two acceleration sources': wearable inertial measurement unit sensors (IMU) and motion caption systems (MOCAP). Moreover, short sliding windows of sizes 5, 10, 15, 20, and 25 frames to long ones of sizes 50, 75, 100, and 200 frames were compared. The models were fed using raw acceleration data acquired in experimental conditions for three activities: walking, sit-to-stand, and squatting. Results show that the most optimal window is from 20-25 frames (0.20-0.25s) for both sources, providing an accuracy of 99,07% and F1-score of 87,08% in the (CNN-LSTM) using the wearable sensors data, and accuracy of 98,8% and F1-score of 82,80% using MOCAP data; similar accurate results were obtained with the LSTM model. There is almost no difference in accuracy in larger frames (100, 200). However, smaller windows present a decrease in the F1-score. In regard to inference time, data with a sliding window of 20 frames can be preprocessed around 4x (LSTM) and 2x (CNN-LSTM) times faster than data using 100 frames. ©2022 Jaén-Vargas et al.Entities:
Keywords: Accelerometer; Deep learning; Human activity recognition; Motion capture; Pattern recognition; Sliding windows
Year: 2022 PMID: 36091986 PMCID: PMC9455026 DOI: 10.7717/peerj-cs.1052
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Representative set of IMU signals from participant 1.
Figure 2Sliding window schematic.
IMU and MOCAP window sizes’ distribution for 8 people for training the model.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| IMU | 281161 | (281156, 5, 6) | (281151, 10, 6) | (281146, 15, 6) | (281141, 20, 6) | (281136, 25, 6) |
| MOCAP | 247804 | (247799, 5, 6) | (247794, 10, 6) | (247789, 15, 6) | (247784, 20, 6) | (247779, 25, 6) |
Notes.
Window sizes for IMU and MOCAP.
Parameters of each deep learning architecture.
|
|
|
|
|
|
|---|---|---|---|---|
| Layers with Neurons | 3 Dense (32 neurons), Flatten, SoftMax | 1D-Conv (32 neurons, filter=3 kernel=3), | LSTM (100 neurons), | 2 1D-Conv (64 neurons, filter=3 kernel=3), |
| Dropout rate | 0 | 0.5 | 0.5 | 0.5 |
| Activation function | ReLU | ReLU | ReLU | ReLU |
| Optimizer | Adam | Adam | Adam | Adam |
| Loss function | Sparse categorical crossentropy | Sparse categorical crossentropy | Sparse categorical crossentropy | Sparse categorical crossentropy |
| Batch size | 64 | 32 | 64 | 64 |
| Epochs | 100 | 100 | 100 | 100 |
Notes.
Software: Python, Tensor Flow, Google Colab.
Figure 3(1) Deep neural network architecture. (2) Convolutional neural network architecture. (3) Long short-term memory architecture. (4) Hybrid model (CNN-LSTM) architecture.
Figure 4Class 0 is not equally balanced.
Performance metrics in window sizes of 200, 100, 75, and 50 for IMU and MOCAP.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| DNN | 0.99 | 95.64 | 0.99 | 91.85 | 0.99 | 89.38 | 0.98 | 88.41 |
| CNN | 0.99 | 94.91 | 0.98 | 91.35 | 0.97 | 92.66 | 0.94 | 90.36 |
| LSTM | 0.94 | 95.95 | 0.99 | 91.85 | 0.99 | 92.34 | 0.99 | 88.79 |
| CNN-LSTM | 0.99 | 96.84 | 0.99 | 99.97 | 0.99 | 91.17 | 0.99 | 90.82 |
Performance metrics in window sizes of 5, 10, 15, 20, and 25 for IMU and MOCAP.
|
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| DNN | 0.86 | 79.53 | 0.89 | 80.70 | 0.91 | 81.68 | 0.94 | 83.36 | 0.96 | 82.72 |
| CNN | 0.82 | 79.74 | 0.87 | 83.59 | 0.89 | 84.76 | 0.90 | 88.01 | 0.92 | 88.87 |
| LSTM | 0.91 | 80.80 | 0.96 | 82.22 | 0.99 | 83.47 | 1.00 | 85.18 | 1.00 | 86.93 |
| CNN-LSTM | 0.89 | 83.56 | 0.95 | 84.57 | 0.98 | 85.40 | 0.99 | 87.08 | 1.00 | 87.50 |
Recall and precision for IMU and MOCAP.
|
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| DNN | 80.56 | 79.14 | 81.62 | 80.51 | 82.45 | 81.60 | 83.45 | 83.27 | 83.07 | 82.55 |
| CNN | 80.73 | 79.37 | 84.27 | 83.28 | 85.79 | 84.63 | 85.18 | 87.81 | 89.26 | 88.69 |
| LSTM | 81.08 | 80.60 | 82.30 | 82.16 | 83.89 | 83.67 | 85.17 | 85.19 | 87.00 | 86.88 |
| CNN-LSTM | 83.80 | 83.45 | 84.82 | 84.48 | 85.47 | 85.19 | 87.23 | 87.83 | 87.64 | 87.44 |
Sensitivity and specificity for IMU and MOCAP.
|
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| DNN | 79.14 | 89.57 | 80.52 | 90.26 | 81.60 | 90.80 | 83.27 | 91.64 | 82.56 | 91.28 |
| CNN | 79.37 | 89.69 | 83.28 | 91.64 | 84.63 | 92.31 | 87.81 | 93.91 | 88.70 | 94.35 |
| LSTM | 80.61 | 90.30 | 82.16 | 91.08 | 83.67 | 91.83 | 85.19 | 92.60 | 86.89 | 93.44 |
| CNN-LSTM | 83.46 | 91.73 | 84.48 | 92.24 | 85.19 | 92.60 | 87.03 | 93.52 | 87.44 | 93.72 |
Figure 5Inference time comparison using different windows frames for IMU and MOCAP.
Figure 6Accuracy and F1-score for IMU and MOCAP using a sliding window of size 20.
Figure 7IMU vs MOCAP. (1) Inference time. (2) F1-score. (3) Accuracy. (4) Effectiveness. (5) Efficiency.