| Literature DB >> 32230844 |
Dionicio Neira-Rodado1, Chris Nugent2, Ian Cleland2, Javier Velasquez1, Amelec Viloria1.
Abstract
Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors such as accelerometers. The performance of these classifiers can be improved through an adequate training process. In order to improve the training process, multivariate outlier detection was used in order to improve the quality of data in the training set and, subsequently, performance of the classifier. The impact of the technique was evaluated with KNN and random forest (RF) classifiers. In the case of KNN, the performance of the classifier was improved from 55.9% to 63.59%.Entities:
Keywords: HAR; dataset quality; machine learning; multivariate analysis
Mesh:
Year: 2020 PMID: 32230844 PMCID: PMC7180455 DOI: 10.3390/s20071858
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Relevant research regarding outlier detection in HAR.
| Authors | Outlier Approach | Scope | Relevant Topics |
|---|---|---|---|
| Meng et al. [ | Heuristic based on statistical hierarchy | Activities of daily living | The model looks for the detection of abnormal behavior once the classifier is trained. Aims to the detect changes in patients’ behavior. No description of data cleaning process. |
| Gopalakrishnan and Krishnan [ | Hybrid approach, combining outliers scores | Activities of daily living (ADL) | Improvement of 6% in F-measure, in SVM and RBFN. Database with six activities. |
| Jones et al. [ | Hybrid approach using different outlier detection techniques. | Physical activity | Improvement in the performance of the k-means in 4%. |
| Muñoz-Organero [ | ANN | ADL | They look for a specific type of outlier data, related with the cessation of movement. |
| Sunderland et al. [ | Hybrid approach using simulation and minimum covariance determinant | Database of neurodegenerative disease | They succeeded on the detection of error in the database. |
| Li et al. [ | Hybrid approach based on KNN and kernel-target alignment | Not specific | Slight Improvement on the performance of the classifier. Tested in two-class scenarios. |
| Hyndman and Ullah [ | Principal component analysis | Forecasting mortality and fertility rates | Reduction of the forecast error. |
| Penny and Jollyffe [ | Comparison between different techniques | Laboratory safety data | Concludes that the best approaches for high dimensional data are Mahalanobis distance and Hadi method. |
| Zhao et al. [ | Hybrid approach | ADL | Improvement in classifier performance. Tested in data with four activities. |
| Marubini and Orenti [ | Minimum covariance determinant | Linear regression prediction | Reduction of the prediction error. |
Scenarios and activities performed [57].
| No. | Scenario | Activities | Number of Files |
|---|---|---|---|
| 1 | Self-care | Hair grooming, washing hands, brushing teeth | 24 (72 files) |
| 2 | Exercise (cardio) | Walking, jogging, stepping-up. | 23 (69 files) |
| 3 | House cleaning | Ironing clothes, washing windows, washing dishes | 25 (75 files) |
| 4 | Exercise (weights) | Arm curls, dead lift, lateral arm raise | 21 (63 files) |
| 5 | Sport | Bounce ball, catch ball, pass ball | 25 (75 files) |
| 6 | Food preparation | Mixing food, chopping vegetables, sieving flour | 23 (69 files) |
| Total | 141 (423 files) | ||
Extracted features.
| Feature Number | Feature Name | Feature Description |
|---|---|---|
| 1–3 | Mean acceleration | Mean acceleration in |
| 4 | Mean | Mean |
| 5–7 | Mean logarithm | Mean logarithm in x, y and z in the time window. |
| 8–10 | Mean exponential | Mean value of the exponential function powered at the value of the acceleration in each axis. This was also calculated in the corresponding time window. |
| 11–13 | Mean exponential squared | Mean value of the exponential function powered at the squared value of the acceleration in each axis. This was also calculated in the corresponding time window. |
| 14–16 | Mean squared acceleration | Mean of the squared values of the acceleration in x, y and z, in the time window. |
| 17–20 | Trapezoidal rule | Trapezoidal rule of acceleration in each axis and |
| 21–24 | Minimum | Minimum value of the acceleration in each axis, and |
| 25–28 | Maximum | Maximum value of the acceleration in each axis, and |
| 29–32 | Range | Range of the values of acceleration in each axis and |
| 33–36 | Standard deviation | Standard deviation of the values of |
| 37–40 | Root mean square ( | Root mean square of the values of |
| 41 | Signal magnitude area ( | Signal magnitude area in the time window. It is calculated with the equation |
| 42–44 | Mean square | Mean of the square values of individual observations in the time window. |
| 45–48 | Entropy | Fast Fourier transform of |
| 49–51 | Median | Median of the acceleration values across |
Figure 1(A) Proposed approach representation. This representation corresponds to stage 2 of the process. (B) Proposed approach representation. This diagram corresponds to the stage 3 of the process, in particular to phases 1 to 5. (C) Proposed approach representation. This diagram corresponds to the stage 3 of the process, phases 6 to 8.
Figure 2Signal manual cleansing. The values in the y-axis correspond to acceleration in m/s2. The x-axis shows the evolution of the observations. Each second, 51.2 acceleration data are gathered by the shimmer. This means the value of 100 in x-axis correspond to the time 1.95 s. The blue, orange, and grey lines represent the acceleration in x-, y- and z-axis.
Distribution of the data by activity.
| Activity | No. of Files |
|---|---|
| Arm curls | 21 |
| Bounce | 25 |
| Catch | 25 |
| Chopping | 23 |
| Deadlift | 21 |
| Dishwashing | 25 |
| Hair grooming | 25 |
| Handwashing | 24 |
| Ironing | 24 |
| Jogging | 23 |
| Lateral arm raise | 21 |
| Mixing food in bowl | 23 |
| Pass | 25 |
| Sieving flour | 23 |
| Stepping | 23 |
| Teeth brushing | 24 |
| Walking | 23 |
| Window washing | 25 |
| TOTAL | 423 |
F-measure results of the different classifiers at stage 1.
| Classifier | F-Measure |
|---|---|
| KNN | 84.00% |
| SVM | 73.60% |
| Random forest | 89.40% |
| Neural networks | 78.00% |
| Regression | 72.10% |
F-measure value with different cleansing levels.
| F-Measure | |||
|---|---|---|---|
| Scenario | KNN | RF | Kept Instances (%) |
| Case 1 | 95.80% | 96.00% | 84.07% |
| Case 2 | 94.70% | 95.70% | 92.99% |
| Case 3 | 94.10% | 93.80% | 88.52% |
| Case 4 | 81.20% | 93.90% | 98.16% |
| Case 5 | 52.20% | 92.10% | 100% |
F-measure value with different cleansing levels and different amount of features.
| Scenario | KNN (27 Features) | KNN (3 Features) | Kept Instances |
|---|---|---|---|
| Scenario 1 | 92.90% | 63.80% | 98.5% |
| Scenario 2 | 94.20% | 64.00% | 88.7% |
F-measure value with different cleansing levels for KNN against the corresponding cleaned test set.
| KNN | Cleaning Levels | |||
|---|---|---|---|---|
| Fold | 95_95 | 95_99 | 99_95 | 99_99 |
|
| 0.940 | 0.925 | 0.938 | 0.921 |
|
| 0.935 | 0.922 | 0.929 | 0.914 |
|
| 0.935 | 0.923 | 0.936 | 0.922 |
|
| 0.939 | 0.867 | 0.930 | 0.860 |
|
| 0.941 | 0.936 | 0.936 | 0.842 |
|
| 0.949 | 0.842 | 0.936 | 0.855 |
|
| 0.935 | 0.860 | 0.932 | 0.838 |
|
| 0.936 | 0.866 | 0.934 | 0.833 |
|
| 0.937 | 0.926 | 0.938 | 0.924 |
|
| 0.933 | 0.849 | 0.933 | 0.846 |
F-measure value with different cleansing levels for RF against the corresponding cleaned test set.
| RF | 95_95 | 95_99 | 99_95 | 99_99 |
|---|---|---|---|---|
|
| 0.953 | 0.948 | 0.946 | 0.946 |
|
| 0.952 | 0.948 | 0.949 | 0.944 |
|
| 0.952 | 0.951 | 0.949 | 0.946 |
|
| 0.952 | 0.948 | 0.947 | 0.946 |
|
| 0.950 | 0.951 | 0.948 | 0.944 |
|
| 0.954 | 0.946 | 0.946 | 0.946 |
|
| 0.950 | 0.948 | 0.942 | 0.947 |
|
| 0.950 | 0.951 | 0.948 | 0.945 |
|
| 0.952 | 0.949 | 0.946 | 0.944 |
|
| 0.953 | 0.946 | 0.945 | 0.943 |
F-measure value for KNN and RF with raw data. Trained and tested against raw data.
| RF | KNN | |
|---|---|---|
|
| 0.939 | 0.557 |
|
| 0.942 | 0.556 |
|
| 0.945 | 0.555 |
|
| 0.941 | 0.562 |
|
| 0.941 | 0.561 |
|
| 0.937 | 0.549 |
|
| 0.938 | 0.543 |
|
| 0.941 | 0.551 |
|
| 0.94 | 0.559 |
|
| 0.94 | 0.557 |
F-measure value for each KNN and RF model when tested against raw data.
| Repetition | Models | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| KNN raw | RF raw | 95_95 KNN | 95_99 KNN | 99_95 KNN | 99_99 KNN | 95_95 RF | 95_99 RF | 99_95 RF | 99_99 RF | |
|
| 55.91% | 94.35% | 55.61% | 56.34% | 55.63% | 56.17% | 88.15% | 90.80% | 88.34% | 90.92% |
|
| 55.04% | 94.47% | 73.13% | 75.26% | 73.01% | 75.38% | 88.2% | 90.30% | 87.80% | 90.73% |
|
| 55.72% | 94.09% | 59.51% | 60.17% | 59.06% | 60.31% | 87.54% | 90.14% | 87.35% | 90.47% |
|
| 55.56% | 94.44% | 63.39% | 62.56% | 63.43% | 62.65% | 87.51% | 90.71% | 87.91% | 91.30% |
|
| 56.67% | 94.73% | 57.43% | 57.99% | 57.40% | 58.07% | 87.73% | 90.87% | 88.03% | 89.97% |
|
| 56.27% | 94.61% | 69.16% | 66.34% | 69.51% | 66.86% | 87.89% | 90.82% | 88.17% | 91.75% |
|
| 57.05% | 95.15% | 63.86% | 61.28% | 64.71% | 61.24% | 88.91% | 91.56% | 88.58% | 92.05% |
|
| 55.56% | 94.44% | 68.00% | 69.09% | 68.28% | 69.11% | 88.10% | 90.42% | 88.48% | 91.23% |
|
| 56.17% | 94.47% | 64.69% | 65.45% | 65.28% | 65.71% | 87.63% | 90.78% | 87.80% | 91.11% |
|
| 55.87% | 95.1% | 59.67% | 59.67% | 59.56% | 59.44% | 88.08% | 91.11% | 88.55% | 90.78% |
Mean, standard deviation, and p-value for Anderson Darling test, for the KNN models tested against cleaned sets.
| 95_95 KNN | 95_99 KNN | 99_95 KNN | 99_99 KNN | |
|---|---|---|---|---|
| Mean | 93.8% | 89.2% | 93.4% | 87.6% |
| Standard Deviation | 0.46% | 3.76% | 0.32% | 3.93% |
| 0.0763 | 0.0189 | 0.4524 | 0.0128 |
Mean, standard deviation, and p-value for Anderson Darling test, for the RF models tested against cleaned sets.
| 95_95 KNN | 95_99 KNN | 99_95 KNN | 99_99 KNN | |
|---|---|---|---|---|
| Mean | 95.2% | 94.9% | 94.7% | 94.5% |
| Standard Deviation | 0.14% | 0.19% | 0.21% | 0.13% |
| 0.083 | 0.0789 | 0.0.3061 | 0.1096 |
Mean, standard deviation, and p-value for Anderson Darling test, for the RF and KNN models tested against raw sets.
| KNN raw | RF raw | 95_95 KNN | 95_99 KNN | 99_95 KNN | 99_99 KNN | 95_95 RF | 95_99 RF | 99_95 RF | 99_99 RF | |
|---|---|---|---|---|---|---|---|---|---|---|
| Mean | 55.98% | 94.58% | 63.40% | 63.42% | 63.59% | 63.49% | 87.97% | 90.75% | 88.10% | 91.03% |
| Standard Deviation | 0.58% | 0.33% | 5.59% | 5.72% | 5.66% | 5.82% | 0.42% | 0.41% | 0.40% | 0.60% |
| 0.8347 | 0.1165 | 0.8579 | 0.5615 | 0.8644 | 0.6167 | 0.2539 | 0.5110 | 0.6290 | 0.9490 |
p-value for t-test comparison for KNN models (tested against cleaned data).
| Model 1 | Model 2 | |
|---|---|---|
| 95_95_KNN | 95_99_ KNN | 0.003 |
| 95_95_ KNN | 99_95 _ KNN | 0.047 |
| 95_95_ KNN | 99_99_ KNN | 0.0007 |
| 95_99_ KNN | 99_95 _ KNN | 0.006 |
| 95_99_ KNN | 99_99_ KNN | 0.361 |
| 99_95 _ KNN | 99_99_ KNN | 0.001 |
p-value for t-test comparison for the RF models (tested against cleaned data).
| Model 1 | Model 2 | |
|---|---|---|
| 95_95_RF | 95_99_ RF | 0.0 |
| 95_95_ RF | 99_95 _ RF | 0.0 |
| 95_95_ RF | 99_99_ RF | 0.0 |
| 95_99_ RF | 99_95 _ RF | 0.039 |
| 95_99_ RF | 99_99_ RF | 0.0 |
| 99_95 _ RF | 99_99_ RF | 0.07 |
p-value for t-test comparison the for RF models (tested against raw data).
| Model 1 | Model 2 | |
|---|---|---|
| 95_95_RF | 95_99_ RF | 0.000 |
| 95_95_RF | 99_95 _RF | 0.494 |
| 95_95_RF | 99_99_RF | 0.000 |
| 95_99_ RF | 99_95 _RF | 0.000 |
| 95_99_ RF | 99_99_RF | 0.243 |
| 99_95 _RF | 99_99_RF | 0.000 |
p-value for t-test comparison for the KNN (raw data model) against KNN (cleaned data models) when tested against raw data.
| Raw Data Model | Clean Data Model | |
|---|---|---|
| KNN | 95_95_KNN | 0.002 |
| KNN | 95_99_ KNN | 0.003 |
| KNN | 99_95 _KNN | 0.002 |
| KNN | 99_99_KNN | 0.003 |
p-value for t-test comparison for the RF (raw data model) against RF (cleaned data models) when tested against raw data.
| Raw Data Model | Clean Data Model | |
|---|---|---|
| RF | 95_95_RF | 0.000 |
| RF | 95_99_ RF | 0.000 |
| RF | 99_95 _RF | 0.000 |
| RF | 99_99_RF | 0.000 |