| Literature DB >> 35271058 |
Iveta Dirgová Luptáková1, Martin Kubovčík1, Jiří Pospíchal1.
Abstract
Computing devices that can recognize various human activities or movements can be used to assist people in healthcare, sports, or human-robot interaction. Readily available data for this purpose can be obtained from the accelerometer and the gyroscope built into everyday smartphones. Effective classification of real-time activity data is, therefore, actively pursued using various machine learning methods. In this study, the transformer model, a deep learning neural network model developed primarily for the natural language processing and vision tasks, was adapted for a time-series analysis of motion signals. The self-attention mechanism inherent in the transformer, which expresses individual dependencies between signal values within a time series, can match the performance of state-of-the-art convolutional neural networks with long short-term memory. The performance of the proposed adapted transformer method was tested on the largest available public dataset of smartphone motion sensor data covering a wide range of activities, and obtained an average identification accuracy of 99.2% as compared with 89.67% achieved on the same data by a conventional machine learning method. The results suggest the expected future relevance of the transformer model for human activity recognition.Entities:
Keywords: human activity recognition; sequence-to-sequence prediction; time series; transformer
Mesh:
Year: 2022 PMID: 35271058 PMCID: PMC8914677 DOI: 10.3390/s22051911
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison of selected methods, preprocessing, and resulting accuracy over different datasets for HAR based on mobile sensors data.
| Standard Dataset | Paper | Data Structure | Method | Accuracy |
|---|---|---|---|---|
| KU-HAR [ | This study | Standardization | HAR transformer | 99.2 |
| Sikder et al. [ | fast Fourier transform | Random forest | 89.67 | |
| MHEALTH [ | Qin et al. [ | Gramian angular fields | GAF and ResNet | 98.5 |
| PAMAP2 [ | Li et al. [ | Standardization | 2D Conv + BiLSTM | 97.15 |
| Gao et al. [ | Standardization | Conv + SKConv | 93.03 | |
| WISDM [ | Alemayoh et al. [ | Segmentation into a grayscale image that represents the time serie of signal | SC-CNN | 97.08 |
| Gupta [ | RAW | CNN-GRU | 96.54 | |
| Alemayoh et al. [ | Heuristic features | J48 decision tree | 90.04 | |
| HAPT [ | Wang et al. [ | Splice into two-dimensional matrix (like a picture) | CNN-LSTM | 95.87 |
| UK Bio-bank [ | Zebin et al. [ | RAW | LSTM + BN | 92 |
Description of the activity classes in the KU-HAR dataset, amended from [33].
| Class Name | ID | Performed Activity | Duration Repetitions | No. Subsamples |
|---|---|---|---|---|
| Stand | 0 | Standing still on the floor | 1 min | 1886 |
| Sit | 1 | Sitting still on a chair | 1 min | 1874 |
| Talk-sit | 2 | Talking with hand movements while sitting on a chair | 1 min | 1797 |
| Talk-stand | 3 | Talking with hand movements while standing up or sometimes walking around within a small area | 1 min | 1866 |
| Stand-sit | 4 | Repeatedly standing up and sitting down (transition activity) | 5 times | 2178 |
| Lay | 5 | Laying still on a plain surface (a table) | 1 min | 1813 |
| Lay-stand | 6 | Repeatedly standing up and laying down (transition activity) | 5 times | 1762 |
| Pick | 7 | Picking up an object from the floor by bending down | 10 times | 1333 |
| Jump | 8 | Jumping repeatedly on a spot | 10 times | 666 |
| Push-up | 9 | Performing full push-ups with a wide-hand position | 5 times | 480 |
| Sit-up | 10 | Performing sit-ups with straight legs on a plain surface | 5 times | 1005 |
| Walk | 11 | Walking 20 m at a normal pace | ~12 s | 882 |
| Walk-backward | 12 | Walking backwards for 20 m at a normal pace | ~20 s | 317 |
| Walk-circle | 13 | Walking at a normal pace along a circular path | ~20 s | 259 |
| Run | 14 | Running 20 m at a high speed | ~7 s | 595 |
| Stair-up | 15 | Ascending on a set of stairs at a normal pace | ~1 min | 798 |
| Stair-down | 16 | Descending from a set of stairs at a normal pace | ~50 s | 781 |
| Table-tennis | 17 | Playing table tennis | 1 min | 458 |
|
| 20,750 | |||
Figure 1The transformer model for human activity recognition.
Newly created couples of activities.
| Stand + Talk-Stand | Sit + Talk-Sit | Talk-Stand + Stand | Pick + Stand | Jump + Stand | Walk + Stand | Walk-Backward + Stand | Walk-Circle + Stand | Run + Stand | Stair-up + Stand | Stair-down + Stand | Table-Tennis + Stand |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Stand + Pick | Talk-sit + sit | Talk-Stand + Pick | Pick + Talk-Stand | Jump + Talk-Stand | Walk + Talk-Stand | Walk-backward + Talk-Stand | Walk-circle + Talk-Stand | Run + Talk-Stand | Stair-up + Talk-Stand | Stair-down + Talk-Stand | Table-tennis + Talk-Stand |
| Stand + Jump | Lay + Sit-up | Talk-Stand + Jump | Pick + Jump | Jump + Pick | Walk + Pick | Walk-backward + Pick | Walk-circle + Pick | Run + Pick | Stair-up + Pick | Stair-down + Pick | Table-tennis + Pick |
| Stand + Walk | Sit-up + Lay | Talk-Stand + Walk | Pick + Walk | Jump + Walk | Walk + Jump | Walk-backward + Jump | Walk-circle + Jump | Run + Jump | Stair-up + Jump | Stair-down + Jump | Table-tennis + Jump |
| Stand + Walk-backward | Talk-Stand + Walk-backward | Pick + Walk-backward | Jump + Walk-backward | Walk + Walk-circle | Walk-backward + Table-tennis | Walk-circle + Walk | Run + Walk | Stair-up + Walk | Stair-down + Walk | Table-tennis + Walk | |
| Stand + Walk-circle | Talk-Stand + Walk-circle | Pick + Walk-circle | Jump + Walk-circle | Walk + Run | Walk-circle + Run | Run + Walk-circle | Stair-up + Walk-circle | Stair-down + Walk-circle | Table-tennis + Walk-backward | ||
| Stand + Run | Talk-Stand + Run | Pick + Run | Jump + Run | Walk + Stair-up | Walk-circle + Stair-up | Run + Stair-up | Stair-up + Run | Stair-down + Run | Table-tennis + Walk-circle | ||
| Stand + Stair-up | Talk-Stand + Stair-up | Pick + Stair-up | Jump + Stair-up | Walk + Stair-down | Walk-circle + Stair-down | Run + Stair-down | Stair-up + Stair-down | Stair-down + Stair-up | Table-tennis + Run | ||
| Stand + Stair-down | Talk-Stand + Stair-down | Pick + Stair-down | Jump + Stair-down | Walk + Table-tennis | Walk-circle + Table-tennis | Run + Table-tennis | |||||
| Stand + Table-tennis | Talk-Stand + Table-tennis | Pick + Table-tennis | Jump + Table-tennis |
Figure 2Distribution of examples by classes: (a) Distribution of examples by individual classes before and after the data augmentation process; (b) distribution of examples by individual classes into training, testing, and validation datasets.
Optimized hyperparameter settings.
| Name | Description | Value |
|---|---|---|
| Epochs | Number of training episodes | 50 |
| Attention dropout rate | Dropout applied to the attention matrix | 0.1 |
| Batch size | Number of samples applied during training at once | 64 |
| Dropout rate | Dropout applied between layers | 0.1 |
| Embedding size | Size of features after projection signal and size of position embedding | 128 |
| Fully Connected (FC) size | Size of the first layer in the position-wise feed-forward network | 256 |
| Global clipnorm | Clipping applied globally on gradients | 3.0 |
| Label smoothing | Smoothing of the hard one-hot encoded classes | 0.1 |
| Optimizer | Optimizer used during training model | Adam |
| Warmup steps | Number of steps from the learning starts to reach learning rate maximum | 10 |
| Learning rate | The maximum value of learning rate after warmup | 0.001 |
| Learning rate scheduler | The scheduler that controls the learning rate during training | Cosine |
| No. Heads | Number of heads in multi-head attention | 6 |
| No. Layers | Number of encoder blocks in the entire model | 3 |
Figure 3Parameter importance chart.
Figure 4Parallel coordinates chart, split in half into two rows.
Figure 5Attention heatmaps of different single activities.
Figure 6Attention heatmaps of different pairs of activities.
Figure 7The cosine similarity between the position embedding of the time step.
Figure 8Visualization of: (a) Attention matrix; (b) position embedding similarity of randomly initialized transformer.
Figure 9(a) Confusion matrix; (b) class-wise performance of the transformer model for human activity recognition.