| Literature DB >> 35890902 |
David Burns1,2,3,4, Philip Boyer1,4, Colin Arrowsmith1,3, Cari Whyne1,2,4.
Abstract
A significant challenge for a supervised learning approach to inertial human activity recognition is the heterogeneity of data generated by individual users, resulting in very poor performance for some subjects. We present an approach to personalized activity recognition based on deep feature representation derived from a convolutional neural network (CNN). We experiment with both categorical cross-entropy loss and triplet loss for training, and describe a novel loss function based on subject triplets. We evaluate these methods on three publicly available inertial human activity recognition datasets (MHEALTH, WISDM, and SPAR) comparing classification accuracy, out-of-distribution activity detection, and generalization to new activity classes. The proposed triplet algorithm achieved an average 96.7% classification accuracy across tested datasets versus the 87.5% achieved by the baseline CNN algorithm. We demonstrate that personalized algorithms, and, in particular, the proposed novel triplet loss algorithms, are more robust to inter-subject variability and thus exhibit better performance on classification and out-of-distribution detection tasks.Entities:
Keywords: human activity recognition; inertial sensors; machine learning; personalized algorithms; time series; triplet neural network
Mesh:
Year: 2022 PMID: 35890902 PMCID: PMC9324610 DOI: 10.3390/s22145222
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Personalized triplet network (PTN) training and prediction methodology. Beginning from top left, each dataset is split into 5 groups for 5-fold cross validation, stratifying the groups by subject. Activity classes are distributed uniformly across groups. Colorization indicates activity classes or model layer as applicable. Sliding window segmentation is then applied to each fold and the segmented test fold is held back. PTN training (bottom left) is achieved by drawing two segments and from the target activity class and one segment from a different class, performing a forward pass through the triplet neural network (TNN) for each of the three segments, and computing the triplet loss . This procedure is then repeated for the set of triplets for each activity class i. The model is then evaluated by temporal splitting of the test segments for each class into “reference” and “test” sets, ensuring no temporal overlap between reference and test segments. Reference segments from all classes for a given patient are then passed through the TNN and the resulting embeddings are used to train a k-NN model (bottom right). Finally, inference is performed by passing test segments though the TNN and performing a k-NN search across the set of reference embeddings.
Experimental inertial datasets.
| Dataset | Sensors | Subjects | Classes | Sampling | Omitted Subjects | Domain | Sensor Placement |
|---|---|---|---|---|---|---|---|
| MHEALTH | 9-axis IMU | 10 | 12 | 100 Hz | 0 | Exercise | Chest, left ankle, right arm |
| WISDM | 6-axis IMU x2 | 51 | 18 | 20 Hz | 4 | ADL, Exercise | Right pant pocket, wrist |
| SPAR | 6-axis IMU x1 | 40 | 7 | 50 Hz | 0 | Physiotherapy | Wrist |
1 The following activities were performed in each dataset. MHEALTH: Standing still, sitting, lying down, walking, climbing stairs, waist bends forward, frontal elevation of arms, knees bending, cycling, jogging, running, jump front and back. WISDM:Walking, jogging, ascending/descending stairs, sitting, standing, kicking a soccer ball, dribbling a basketball, catching a tennis ball, typing, writing, clapping, brushing teeth, folding clothes, eating pasta, eating soup, eating a sandwich, eating chips, drinking from a cup. SPAR: Pendulum, abduction, forward elevation, internal rotation with resistance band, external rotation with resistance band, lower trapezius row with resistance band, bent over row with 3 lb dumbell.
Activity classification performance .
| Model | MHEALTH | WISDM | SPAR |
|---|---|---|---|
| FCN | 0.925 ± 0.049 | 0.754 ± 0.012 | 0.947 ± 0.069 |
| PEF | 0.984 ± 0.029 | 0.852 ± 0.060 | 0.971 ± 0.038 |
| 0.995 ± 0.016 | 0.889 ± 0.055 | 0.980 ± 0.028 | |
| PTN | 0.993 ± 0.024 | 0.909 ± 0.054 | 0.978 ± 0.035 |
| PTN |
|
1 Classification performance of the fully-convolutional neural network (FCN), personalized engineered feature model (PEF), personalized deep feature model (PDF), personalized triplet network trained with conventional triplet loss (PTN†), and the personalized triplet model trained with patient-specific triplet loss (PTN). Scores are the cross-validated classification accuracy (mean ± standard deviation) aggregated by subject. † The PTN trained with conventional triplet loss.
Figure 2Violin plots showing the distribution of classifier performance by subject using five-fold cross validation. The distributions are cut-off at the minimum and maximum accuracy values. The personalized classifiers have better performance and less inter-subject performance variation than the impersonal FCN (fully convolutional network) model.
Figure 3The effect of embedding size (number of features) on personalized feature classifier accuracy, evaluated on the SPAR dataset. The performance of the PEF model appears to degrade at embedding size 16 and below.
Figure 4The effect of reference data size (number of reference segments per activity class) on personalized feature classifier accuracy, evaluated on the SPAR dataset. Increasing reference data size results in improved performance for the PEF model. A reference size of four segments results in significantly degraded performance in all models.
Figure 5Violin plots showing distribution of OOD detection AUROC across subjects, with 30% of activity classes held back from the training set. The displayed distributions are cut-off at the minumum and maximum AUROC values for each classifier. The PDF, PTN, and PEF classifiers had the highest mean AUROC scores for the MHEALTH, WISDM, and SPAR datasets, respectively.
Figure 6Distribution of activity classification performance when generalizing an embedding to novel activity classes, with 30% of activity classes held back from the training set. The PTN model achieved the highest mean accuracy across all three datasets.
Computational and storage expense.
| Model | Fit Time [s] | Inference Time [s] | Model Size [kB] | Reference Size [kB] |
|---|---|---|---|---|
| FCN | 137 | 0.47 | 4290 | 0 |
| PEF | 3.3 | 0.39 | 3.8 | 112 |
| 129 | 0.94 | 1095 | 112 | |
| PTN | 667 | 1.3 | 1095 | 112 |