| Literature DB >> 31513592 |
Xiheng Zhang1, Yongkang Wong2, Mohan S Kankanhalli2, Weidong Geng1.
Abstract
Sensor-based human activity recognition aims at detecting various physical activities performed by people with ubiquitous sensors. Different from existing deep learning-based method which mainly extracting black-box features from the raw sensor data, we propose a hierarchical multi-view aggregation network based on multi-view feature spaces. Specifically, we first construct various views of feature spaces for each individual sensor in terms of white-box features and black-box features. Then our model learns a unified representation for multi-view features by aggregating views in a hierarchical context from the aspect of feature level, position level and modality level. We design three aggregation modules corresponding to each level aggregation respectively. Based on the idea of non-local operation and attention, our fusion method is able to capture the correlation between features and leverage the relationship across different sensor position and modality. We comprehensively evaluate our method on 12 human activity benchmark datasets and the resulting accuracy outperforms the state-of-the-art approaches.Entities:
Mesh:
Year: 2019 PMID: 31513592 PMCID: PMC6742398 DOI: 10.1371/journal.pone.0221390
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1An overview of the hierarchical multi-view aggregation network.
We first construct four views of feature spaces for each individual sensor in the bottom layer. Then we designed three aggregation modules to integrate views step-by-step into a multi-view unified representation.
White-box features extracted from the sensor data.
| Domain | Features |
|---|---|
| Time | Interquartile Range, Max, Min, Mean, Median |
| Amplitude, Mean Crossing, Signal Magnitude Area | |
| Standard Deviation, Skewness, Kurtosis, Zero Crossing | |
| Frequency | Largest Frequency Component, Energy |
| Skewness, Kurtosis, Weighted average | |
| Sum of the First 5 FFT Coefficients | |
| Time-frequency | Standard Deviation, Max, Min, Mean |
| Median Crossing Rate, Wavelet Energy |
Fig 2Overview of feature level aggregation.
Fig 3Overview of correlation feature computing.
Public human activity datasets for evaluation.
| Dataset | #Subject | Sample Rate | #Activity | #Sample | Sensor | #Position | Reference |
|---|---|---|---|---|---|---|---|
| OPPORTUNITY | 4 | 32Hz | 16 | 191564 | A, G, M | 5 | [ |
| PAMAP2 | 9 | 100Hz | 18 | 64173 | A, G, M | 3 | [ |
| DSA | 8 | 25Hz | 19 | 75998 | A, G, M | 5 | [ |
| MHEALTH | 10 | 50Hz | 12 | 40522 | A, G, M | 3 | [ |
| HHAR | 9 | 100-200Hz | 6 | 366038 | A, G | 3 | [ |
| Skoda | 1 | 96Hz | 10 | 22000 | A | 4 | [ |
| Daphnet Gait | 10 | 64Hz | 2 | 49942 | A | 3 | [ |
| UCI Smartphone | 30 | 50Hz | 6 | 10299 | A, G | 1 | [ |
| USC-HAD | 14 | 100Hz | 12 | 41998 | A, G | 1 | [ |
| SHO | 10 | 50Hz | 7 | 20998 | A, G, M | 1 | [ |
| WISDM v1.1 | 29 | 20Hz | 6 | 91515 | A | 1 | [ |
| WISDM v2.0 | 36 | 20Hz | 6 | 248653 | A | 1 | [ |
A = accelerometer, G = gyroscope, M = magnetometer
Comparison of the proposed model against the state-of-the-art methods on various human activity benchmark datasets.
| Results from each method | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jiang et al. [ | DeepConvLSTM [ | Hammerla et al. [ | Ravi et al. [ | attrCNN-IMU [ | HMVAN | Simplified HMVAN | ||||||||
| Datasets | Acc. | Acc. | Acc. | Acc. | Acc. | Acc. | Acc. | |||||||
| OPP-Gesture | 0.913 | 0.912 | - | 0.915* | - | 0.927* | 0.922 | 0.921 | - | 0.929* | 0.930 | 0.929 | ||
| OPP-Locomotion | 0.889 | 0.889 | - | 0.895* | 0.892 | 0.891 | 0.897 | 0.896 | - | 0.900* | 0.902 | 0.902 | ||
| PAMAP2 | 0.911 | 0.910 | 0.927 | 0.926 | - | 0.937* | 0.930 | 0.929 | - | 0.9088* | 0.932 | 0.932 | ||
| DSA | 0.863 | 0.862 | 0.877 | 0.872 | 0.892 | 0.891 | 0.884 | 0.883 | 0.865 | 0.865 | 0.885 | 0.885 | ||
| HHAR | 0.954 | 0.954 | 0.977 | 0.976 | 0.959 | 0.958 | 0.940 | 0.940 | 0.947 | 0.947 | 0.953 | 0.952 | ||
| MHEALTH | 0.933 | 0.932 | 0.921 | 0.920 | 0.946 | 0.945 | 0.950 | 0.949 | 0.942 | 0.941 | 0.951 | 0.950 | ||
| Skoda | 0.944 | 0.943 | - | 0.958* | 0.950 | 0.948 | 0.953 | 0.952 | 0.959 | 0.958 | 0.954 | 0.954 | ||
| Daphnet Gait | 0.901 | 0.899 | 0.942 | 0.941 | - | 0.760* | 0.958* | - | 0.933 | 0.932 | 0.960 | 0.959 | ||
| UCI Smartphone | 0.9518* | - | 0.944 | 0.944 | 0.931 | 0.930 | 0.945 | 0.943 | 0.950 | 0.950 | 0.947 | 0.947 | ||
| USC-HAD | 0.9701* | - | 0.957 | 0.957 | 0.954 | 0.953 | 0.961 | 0.959 | 0.967 | 0.965 | 0.964 | 0.963 | ||
| SHO | 0.9993* | - | 0.987 | 0.986 | 0.989 | 0.989 | 0.994 | 0.994 | 0.997 | 0.997 | 0.9958 | 0.9958 | ||
| WISDM v1.1 | 0.955 | 0.954 | 0.948 | 0.947 | 0.933 | 0.933 | 0.986* | - | 0.966 | 0.965 | - | - | ||
| WISDM v2.0 | 0.897 | 0.896 | 0.906 | 0.905 | 0.911 | 0.911 | 0.927* | - | 0.920 | 0.919 | - | - | ||
Results marked with ‘*’ are obtained from the papers.
Results of our model with different structure.
| Methods | Accuracy |
|---|---|
| HMVAN w/o position aggregation layer | 0.874 |
| HMVAN w/o modality layer | 0.882 |
| HMVAN w/o feature aggregation layer | 0.891 |
| HMVAN w/o auxiliary losses | 0.895 |
| HMVAN | 0.907 |
Results of the proposed model with different views.
| Methods | Accuracy |
|---|---|
| HMVAN w/o time domain features view | 0.885 |
| HMVAN w/o frequency domain features view | 0.881 |
| HMVAN w/o time-frequency domain features view | 0.890 |
| HMVAN w/o visual domain features view | 0.886 |
| HMVAN | 0.907 |
Results of different feature level aggregation methods.
| Methods | Accuracy |
|---|---|
| element-wise adding | 0.885 |
| element-wise multiplying | 0.881 |
| element-wise mean | 0.890 |
| element-wise maximum | 0.886 |
| non-local | 0.900 |
| 0.863 | |
| non-local + | 0.907 |
Results of different position level aggregation methods.
| Methods | Accuracy |
|---|---|
| simple view pooling | 0.875 |
| correlation feature + view pooling | 0.907 |
Results of different modality level aggregation methods.
| Methods | Accuracy |
|---|---|
| MLP | 0.894 |
| element-wise adding | 0.899 |
| element-wise multiplying | 0.881 |
| element-wise mean | 0.887 |
| element-wise maximum | 0.901 |
| attention fusion | 0.907 |
Results of different attention mechanisms in each aggregation layer.
| Feature layer | Position layer | Modality layer | Accuracy |
|---|---|---|---|
| type #1 | type #1 | type #1 | 0.837 |
| type #2 | type #2 | type #2 | 0.802 |
| type #3 | type #3 | type #3 | 0.849 |
| type #1 | type #2 | type #1 | 0.875 |
| type #1 | type #3 | type #1 | 0.844 |
| type #1 | type #1 | type #2 | 0.852 |
| type #1 | type #1 | type #3 | 0.850 |
| type #1 | type #2 | type #2 | 0.822 |
| type #2 | type #2 | type #2 | 0.784 |
| type #3 | type #2 | type #2 | 0.802 |
| type #2 | type #1 | type #2 | 0.801 |
| type #2 | type #3 | type #2 | 0.823 |
| type #2 | type #2 | type #1 | 0.795 |
| type #2 | type #2 | type #3 | 0.809 |
| type #1 | type #3 | type #3 | 0.867 |
| type #2 | type #3 | type #3 | 0.871 |
| type #3 | type #3 | type #3 | 0.880 |
| type #3 | type #1 | type #3 | 0.865 |
| type #3 | type #2 | type #3 | 0.883 |
| type #3 | type #3 | type #1 | 0.870 |
| type #3 | type #3 | type #2 | 0.842 |
| type #1 | type #2 | type #3 | |
| type #3 | type #2 | type #1 | 0.821 |
| type #2 | type #1 | type #3 | 0.812 |
Fig 4Comparison of the proposed model against the state-of-the-art methods with different percentage of training samples in HHAR dataset.
Confusion matrix for OPPORTUNITY dataset using our HMVAN.
| Predicted | |||||||||||||||||||
| Null | Open Door 1 | Open Door 2 | Close Door 1 | Close Door 2 | Open Fridge | Close Fridge | Open Draw Washer | Close Draw Washer | Open Draw 1 | Close Draw 1 | Open Draw 2 | Close Draw 2 | Open Draw 3 | Close Draw 3 | Clean Table | Drink From Cup | Toggle Switch | ||
| Actual | Null | 13832 | 6 | 5 | 5 | 3 | 24 | 15 | 5 | 2 | 10 | 13 | 5 | 4 | 22 | 39 | 7 | 58 | 9 |
| Open Door 1 | 10 | 76 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Door 2 | 7 | 0 | 155 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Door 1 | 8 | 15 | 0 | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Door 2 | 10 | 0 | 0 | 0 | 130 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Fridge | 111 | 0 | 0 | 0 | 0 | 253 | 22 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| Close Fridge | 41 | 0 | 0 | 0 | 0 | 19 | 210 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Draw Washer | 61 | 0 | 0 | 0 | 0 | 6 | 0 | 99 | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Draw Washer | 43 | 0 | 0 | 0 | 0 | 2 | 0 | 10 | 79 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| Open Draw 1 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 38 | 6 | 0 | 1 | 3 | 1 | 0 | 0 | 1 | |
| Close Draw 1 | 20 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 8 | 46 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Draw 2 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 18 | 2 | 29 | 6 | 1 | 0 | 0 | 0 | 0 | |
| Close Draw 2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 5 | 4 | 25 | 0 | 3 | 0 | 0 | 0 | |
| Open Draw 3 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 88 | 3 | 0 | 0 | 0 | |
| Close Draw 3 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 9 | 5 | 80 | 0 | 0 | 0 | |
| Clean Table | 88 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 81 | 0 | 0 | |
| Drink From Cup | 143 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 397 | 0 | |
| Toggle Switch | 57 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 122 | |
Confusion matrix for OPPORTUNITY dataset using the method in [23].
| Predicted | |||||||||||||||||||
| Null | Open Door 1 | Open Door 2 | Close Door 1 | Close Door 2 | Open Fridge | Close Fridge | Open Draw Washer | Close Draw Washer | Open Draw 1 | Close Draw 1 | Open Draw 2 | Close Draw 2 | Open Draw 3 | Close Draw 3 | Clean Table | Drink From Cup | Toggle Switch | ||
| Actual | Null | 13752 | 5 | 8 | 6 | 5 | 39 | 18 | 14 | 29 | 2 | 0 | 1 | 1 | 40 | 20 | 2 | 114 | 8 |
| Open Door 1 | 17 | 51 | 0 | 28 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Door 2 | 15 | 0 | 111 | 0 | 38 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Door 1 | 10 | 22 | 0 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Door 2 | 9 | 0 | 7 | 0 | 124 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Fridge | 130 | 0 | 0 | 0 | 0 | 220 | 34 | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Fridge | 49 | 0 | 0 | 0 | 0 | 76 | 146 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Draw Washer | 108 | 0 | 0 | 0 | 0 | 4 | 0 | 45 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Close Draw Washer | 75 | 0 | 0 | 0 | 0 | 4 | 0 | 30 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Draw 1 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 5 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | |
| Close Draw 1 | 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Open Draw 2 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 1 | 18 | 1 | 6 | 0 | 0 | 0 | 0 | |
| Close Draw 2 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 13 | 5 | 9 | 0 | 0 | 0 | 0 | |
| Open Draw 3 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 56 | 28 | 0 | 0 | 0 | |
| Close Draw 3 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 51 | 42 | 0 | 0 | 0 | |
| Clean Table | 98 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 73 | 0 | 0 | |
| Drink From Cup | 194 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 349 | 0 | |
| Toggle Switch | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 82 | |