| Literature DB >> 36236427 |
Nuno Bento1, Joana Rebelo1, Marília Barandas1,2, André V Carreiro1, Andrea Campagner3, Federico Cabitza3,4, Hugo Gamboa1,2.
Abstract
Human Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance. This lack of generalization hinders the applicability of these models in real-world environments. As deep neural networks are becoming increasingly popular in recent work, there is a need for an explicit comparison between handcrafted and deep representations in Out-of-Distribution (OOD) settings. This paper compares both approaches in multiple domains using homogenized public datasets. First, we compare several metrics to validate three different OOD settings. In our main experiments, we then verify that even though deep learning initially outperforms models with handcrafted features, the situation is reversed as the distance from the training distribution increases. These findings support the hypothesis that handcrafted features may generalize better across specific domains.Entities:
Keywords: accelerometer; deep learning; domain generalization; human activity recognition
Mesh:
Year: 2022 PMID: 36236427 PMCID: PMC9572241 DOI: 10.3390/s22197324
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Description of the datasets, including activities, positions, devices, and number of subjects.
| Dataset | Description | Devices | Source |
|---|---|---|---|
| PAMAP2—Physical Activity Monitoring | 9 subjects; | Heart rate monitor | [ |
| Sensors Activity Dataset (SAD) | 10 subjects; | 5 smartphones containing an accelerometer, a gyroscope and a magnetometer (50 Hz); | [ |
| DaLiAc—Daily Life Activities | 19 subjects; | 4 sensors, each with a triaxial accelerometer and gyroscope (200 Hz); | [ |
| MHEALTH | 10 subjects; | 3 wearable sensors containing an accelerometer, a gyroscope and a magnetometer. One of the sensors also provides 2-lead ECG measurements (50 Hz); | [ |
| RealWorld (HAR) | 15 subjects; | 6 wearable sensors containing accelerometers, gyroscopes and magnetometers (50 Hz). Also includes GPS, light and sound level sensors; | [ |
Distribution of samples and activity labels per dataset. The # symbol represents the number of samples.
| Activity | Datasets (%) | Total | ||||||
|---|---|---|---|---|---|---|---|---|
| PAMAP2 | SAD | DaLiAc | MHEALTH | RealWorld | % | # | ||
| Run | 10.5 | 16.9 | 20.0 | 33.3 | 19.1 | 18.3 | 7975 | |
| Sit | 19.8 | 16.9 | 10.6 | 16.7 | 17.0 | 16.3 | 7102 | |
| Stairs | 23.6 | 32.2 | 12.3 | 16.7 | 30.0 | 26.3 | 11,460 | |
| Stand | 20.4 | 16.9 | 10.6 | 16.7 | 16.4 | 16.2 | 7047 | |
| Walk | 25.7 | 16.9 | 46.5 | 16.7 | 17.5 | 22.8 | 9927 | |
|
|
| 12.7 | 24.4 | 15.3 | 4.96 | 42.6 | - | - |
|
| 5541 | 10,620 | 6644 | 2160 | 18,546 | - | 43,511 | |
Figure 1Convolutional neural network architectures. The values above the representation of each feature map indicate their shape (Signal length × Number of channels). Convolutional layers (1D): k = kernel size; nr_f = number of filters; stride = 1; padding = 0. Max pooling layers: k = kernel size; stride = 1; padding = 0. (a) CNN-simple Architecture. (b) CNN-base Architecture. (c) ResNet Architecture. The convolutional block is depicted in Figure 2.
Figure 2ResNet convolutional block. The letter k stands for “kernel size”.
Figure 3Simplified illustration of the hybrid version of CNN-base (excluding the CNN backbone for ease of visualization).
Figure 4Scheme of the experimental pipeline.
Comparison of metrics over all four domain generalization settings based on the TSFEL feature representations. For each setting, values were averaged over every test set. All metrics are ratios except the ones with (*).
| Metric | Setting | Avg. OOD | |||
|---|---|---|---|---|---|
| ID | OOD-U | OOD-MD | OOD-SD | ||
| Wasserstein |
|
|
|
|
|
| MMD |
|
|
|
|
|
| Euclidean |
|
|
|
|
|
| DC Euclidean * |
|
|
|
|
|
| Cosine |
|
|
|
|
|
| DC Cosine * |
|
|
|
|
|
Comparison of metrics over all four domain generalization settings based on the CNN-base representations. For each setting, values were averaged over all the datasets. All metrics are ratios except the ones with (*).
| Metric | Setting | Avg. OOD | |||
|---|---|---|---|---|---|
| ID | OOD-U | OOD-MD | OOD-SD | ||
| Wasserstein |
|
|
|
|
|
| MMD |
|
|
|
|
|
| Euclidean |
|
|
|
|
|
| DC Euclidean * |
|
|
|
|
|
| Cosine |
|
|
|
|
|
| DC Cosine * |
|
|
|
|
|
Figure 5Evolution of loss by epoch on SAD dataset in the OOD-U setting. The red dots indicate the minimum loss of each curve.
Figure 6F1-score vs. log(distance ratio). Each marker represents a different task. Distance ratios are based on the CNN-base embeddings. Error bars represent one standard deviation away from the mean. The natural logarithm was applied to the distance ratios to make the regression curves linear.
Figure A2TSFEL + LR vs. CNN-base. Distance ratios are based on TSFEL features.
Average f1-score in percentage over all the tasks in a given setting. Values in bold indicate the best performance for each setting.
| Model | Setting | Avg. OOD | |||
|---|---|---|---|---|---|
| ID | OOD-U | OOD-MD | OOD-SD | ||
| CNN-simple |
|
|
|
|
|
| CNN-base |
|
|
|
|
|
| ResNet |
|
|
|
|
|
| CNN-simple hybrid |
|
|
|
|
|
| CNN-base hybrid |
|
|
|
|
|
| ResNet hybrid |
|
|
|
|
|
| TSFEL + MLP |
|
|
|
|
|
| TSFEL + LR |
|
|
|
|
|