| Literature DB >> 28880923 |
Eftim Zdravevski1, Biljana Risteska Stojkoska1, Marie Standl2, Holger Schulz2.
Abstract
BACKGROUND: Assessment of health benefits associated with physical activity depend on the activity duration, intensity and frequency, therefore their correct identification is very valuable and important in epidemiological and clinical studies. The aims of this study are: to develop an algorithm for automatic identification of intended jogging periods; and to assess whether the identification performance is improved when using two accelerometers at the hip and ankle, compared to when using only one at either position.Entities:
Mesh:
Year: 2017 PMID: 28880923 PMCID: PMC5589162 DOI: 10.1371/journal.pone.0184216
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Exemplary raw accelerometer readings for one hour during which two participants, (a) male and (b) female, had a jogging activity.
The inclinometer and number of steps time series are not shown for clarity because they are in a different unit with much smaller values. Jogging ‘diary’ relates to the reported jogging period by the user. Jogging ‘golden’ is the jogging period per the ‘golden standard’ labels.
Distribution of participants and Jogging Periods (JP) duration in minutes per dataset.
| Gender | JP Duration (minutes) | |||||||
|---|---|---|---|---|---|---|---|---|
| Dataset | Male | Female | Days | Number of JP | Sum | Mean | Min | Max |
| Train | 7 | 7 | 22 | 40 | 806 | 20.1 | 2 | 90 |
| Validation | 7 | 6 | 19 | 31 | 648 | 22.0 | 3 | 63 |
| Test | 4 | 8 | 14 | 39 | 380 | 9.8 | 1 | 46 |
Days is the total number of days on which participants reported a jogging period.
Fig 2Distribution of the duration (in minutes) of jogging periods (a) and pauses between jogging periods (b) in the training and validation datasets based on the ‘golden standard’ labels before application of post-classification rules.
Number of instances per dataset and segmentation strategy.
| 60s windows without overlap | 180s windows with 120s overlap | |||||
|---|---|---|---|---|---|---|
| Dataset | Jogging | Non-jogging | JR | Jogging | Non-jogging | JR |
| Train | 829 | 31421 | 0.0257 | 745 | 31320 | 0.0232 |
| Validation | 707 | 27148 | 0.0254 | 645 | 27060 | 0.0233 |
| Test | 395 | 23437 | 0.0166 | 320 | 22966 | 0.0137 |
Jogging and Non-jogging is the number of instances (i.e. epochs or episodes) in the dataset and JR is the Jogging Ratio (i.e. ).
Fig 3Algorithm for feature extraction, selection and classification.
Performance of different classifiers on the 4 final feature sets, depending on feature type with the segmentation strategy of 60s windows without overlap.
| Features | Classifier | Acc. | AUC | Prec. | Recall | Spec. | F1 | Time |
|---|---|---|---|---|---|---|---|---|
| Best Ankle (8 feat.) | ERT | 0.9970 | 0.9906 | 0.9339 | 0.8937 | 0.9988 | 0.9133 | 4.0 |
| RF | 0.9969 | 0.9883 | 0.9579 | 0.8633 | 0.9993 | 0.9081 | 4.7 | |
| Logistic | 0.9967 | 0.9986 | 0.9259 | 0.8861 | 0.9987 | 0.9056 | 0.1 | |
| SVM | 0.9917 | 0.9940 | 0.7137 | 0.8962 | 0.9934 | 0.7946 | 24.3 | |
| Best Hip (20 feat.) | ERT | 0.9967 | 0.9951 | 0.9171 | 0.8962 | 0.9985 | 0.9065 | 5.5 |
| RF | 0.9962 | 0.9906 | 0.8939 | 0.8962 | 0.9981 | 0.8951 | 6.9 | |
| Logistic | 0.9977 | 0.9991 | 0.9528 | 0.9190 | 0.9992 | 0.9356 | 0.4 | |
| SVM | 0.9970 | 0.9988 | 0.9533 | 0.8785 | 0.9992 | 0.9144 | 188.2 | |
| All (17 feat.) | ERT | 0.9954 | 0.9977 | 0.8600 | 0.8861 | 0.9974 | 0.8728 | 9.9 |
| RF | 0.9959 | 0.9959 | 0.8961 | 0.8734 | 0.9981 | 0.8846 | 8.2 | |
| Logistic | 0.9963 | 0.9911 | 0.9023 | 0.8886 | 0.9982 | 0.8954 | 0.9 | |
| SVM | 0.9954 | 0.9987 | 0.8391 | 0.9241 | 0.9968 | 0.8795 | 249.9 | |
| Best Ankle + Best Hip(28 feat.) | ERT | 0.9971 | 0.9966 | 0.9321 | 0.9038 | 0.9988 | 0.9177 | 7.8 |
| RF | 0.9972 | 0.9911 | 0.9514 | 0.8911 | 0.9992 | 0.9203 | 7.7 | |
| Logistic | 0.9968 | 0.9992 | 0.9030 | 0.9190 | 0.9982 | 0.9109 | 0.5 | |
| SVM | 0.9976 | 0.9993 | 0.9525 | 0.9139 | 0.9992 | 0.9328 | 204.7 |
Classifiers are ERT for Extremely Randomized Trees, RF for Random Forest, Logistic for logistic regression, SVM for Support Vector Machines. The SVM parameters for the first feature set are: C = 10, γ = 0.01 and for all remaining feature sets: C = 0.1, γ = 0.0001. Acc. stands for Accuracy, AUC for Area Under the receiver-operating characteristic Curve, Prec. for Precision, Recall (also known as sensitivity, hit rate, or true positive rate), Spec. for Specificity (also known as true negative rate), F1 for F1 score (harmonic mean of precision and recall) and Time for Total time (in seconds) for building a model on the training dataset and making predictions on the test dataset. The classifier with gray background has highest accuracy for the feature set, thus was selected as best for that feature set. Other cells with gray background represent the best classifier for the feature set in regards to the metric (column).
Performance of different classifiers on the 4 final feature sets, depending on feature type with the segmentation strategy of 180s windows with 120s overlap.
| Features | Classifier | Acc. | AUC | Prec. | Recall | Spec. | F1 | Time |
|---|---|---|---|---|---|---|---|---|
| Best Ankle (17 feat.) | ERT | 0.9990 | 0.9930 | 0.9571 | 0.9750 | 0.9993 | 0.9659 | 5.0 |
| RF | 0.9988 | 0.9928 | 0.9509 | 0.9688 | 0.9993 | 0.9598 | 6.1 | |
| Logistic | 0.9989 | 0.9994 | 0.9837 | 0.9406 | 0.9998 | 0.9617 | 0.3 | |
| SVM | 0.9994 | 0.9972 | 1.0000 | 0.9625 | 1.0000 | 0.9809 | 25.2 | |
| Best Hip (20 feat.) | ERT | 0.9981 | 0.9976 | 0.9761 | 0.8938 | 0.9997 | 0.9331 | 7.1 |
| RF | 0.9978 | 0.9975 | 0.9564 | 0.8906 | 0.9994 | 0.9223 | 5.6 | |
| Logistic | 0.9967 | 0.9988 | 0.8739 | 0.9094 | 0.9980 | 0.8913 | 0.4 | |
| SVM | 0.9968 | 0.9994 | 0.8862 | 0.9000 | 0.9983 | 0.8930 | 141.4 | |
| All(12 feat.) | ERT | 0.9954 | 0.9977 | 0.8600 | 0.8861 | 0.9974 | 0.8728 | 9.9 |
| RF | 0.9975 | 0.9935 | 0.9102 | 0.9188 | 0.9987 | 0.9145 | 5.5 | |
| Logistic | 0.9990 | 0.9962 | 1.0000 | 0.9344 | 1.0000 | 0.9661 | 0.3 | |
| SVM | 0.9972 | 0.9964 | 0.8862 | 0.9250 | 0.9982 | 0.9052 | 24.8 | |
| Best Ankle + Best Hip(37 feat.) | ERT | 0.9988 | 0.9982 | 0.9399 | 0.9781 | 0.9991 | 0.9587 | 7.6 |
| RF | 0.9983 | 0.9964 | 0.9354 | 0.9500 | 0.9990 | 0.9426 | 7.6 | |
| Logistic | 0.9985 | 0.9997 | 0.9470 | 0.9500 | 0.9992 | 0.9485 | 1.6 | |
| SVM | 0.9984 | 0.9998 | 0.9613 | 0.9313 | 0.9994 | 0.9460 | 186.9 |
All naming conventions are the same as in Table 3. The SVM parameters for the first and third feature sets are: C = 10, γ = 0.001 and for the second and forth feature sets: C = 0.1, γ = 0.0001.
Fig 4Distribution of the duration (in minutes) of jogging periods ((a), (b) and (c)) and pauses between jogging periods ((d), (e) and (f)) in the training and validation datasets based on the ‘golden standard’ labels after applying post-classification Rule 1 ((a) and (d)), Rule 2 ((b) and (e)) and Rule 3 ((c) and (f)).
Feature importances estimated by Random Forest with 1000 trees with the Best Ankle + Best Hip feature set and the segmentation strategy of 180s windows with 120s overlap.
| # | Feature name | Score | # | Feature name | Score |
|---|---|---|---|---|---|
| 1 | AnkleSteps auto-corr t = 1 | 0.1495 | 20 | HipAxis1 quad fit c2 | 0.0045 |
| 2 | AnkleSteps auto-corr t = 2 | 0.1239 | 21 | FFT_amp(AnkleMag) IQR | 0.0045 |
| 3 | AnkleMag auto-corr t = 2 | 0.1065 | 22 | 1st_deriv(HipSteps) perc. 70 | 0.0044 |
| 4 | AnkleSteps perc. 80 | 0.0980 | 23 | HipAxis1 lin fit c1 | 0.0036 |
| 5 | AnkleMag energy | 0.0945 | 24 | FFT_mag(AnkleSteps) perc. 90 | 0.0031 |
| 6 | AnkleSteps hist [0, 0.57) | 0.0725 | 25 | FFT_freq(AnkleSteps) IQR | 0.0031 |
| 7 | HipAxis2 median | 0.0716 | 26 | FFT_freq(AnkleMag) max | 0.0030 |
| 8 | HipAxis3 perc. 60 | 0.0474 | 27 | 1st_deriv(AnkleMag) hist [-0.5, 0.5) | 0.0029 |
| 9 | HipAxis2 perc. 10 | 0.0332 | 28 | delta(AnkleSteps) perc. 80 | 0.0028 |
| 10 | HipAxis2 perc. 5 | 0.0324 | 29 | delta(AnkleMag) perc. 80 | 0.0024 |
| 11 | HipAxis2 perc. 70 | 0.0297 | 30 | HipSteps perc. 80 | 0.0023 |
| 12 | HipAxis1 perc. 80 | 0.0291 | 31 | 1st_deriv(HipSteps) perc. 20 | 0.0017 |
| 13 | HipAxis1 perc. 10 | 0.0174 | 32 | HipAxis1 hist [129, 259) | 0.0014 |
| 14 | HipMag median | 0.0106 | 33 | AnkleAxis2 min | 0.0010 |
| 15 | AnkleMag perc. 10 | 0.0101 | 34 | HipAxis3 hist [0, 129) | 0.0010 |
| 16 | HipAxis2 min | 0.0101 | 35 | HipAxis2 hist [0, 129) | 0.0008 |
| 17 | HipAxis2 quad fit c2 | 0.0098 | 36 | 1st_deriv(AnkleSteps) hist [-2.9, -1.7) | 0.0006 |
| 18 | HipAxis1 min | 0.0052 | 37 | delta(HipAxis3) hist [-70, 89) | 0.0004 |
| 19 | delta(AnkleMag) perc. 75 | 0.0050 |
Comparison of the baseline and proposed approach for feature extraction per sensor location by the best obtained value per metric.
| Method | Accuracy | AUC | Precision | Recall | Specificity | F1 | |
|---|---|---|---|---|---|---|---|
| Ankle | Baseline | 0.9982 | 0.9993 | 0.9557 | 0.9749 | 0.9993 | 0.9393 |
| Proposed | 0.9994 | 0.9994 | 1.0000 | 0.9750 | 1.0000 | 0.9809 | |
| Hip | Baseline | 0.9977 | 0.9993 | 0.9426 | 0.9469 | 0.9991 | 0.9187 |
| Proposed | 0.9981 | 0.9994 | 0.9761 | 0.9190 | 0.9997 | 0.9356 | |
| Ankle + Hip | Baseline | 0.9987 | 0.9997 | 0.9579 | 0.9563 | 0.9993 | 0.9563 |
| Proposed | 0.9990 | 0.9998 | 1.0000 | 0.9781 | 1.0000 | 0.9661 |
All naming conventions are the same as in Table 3.
Average performance on the balanced datasets of the baseline and proposed approach for feature extraction per sensor location by the best obtained value per metric.
| Method | Accuracy | AUC | Precision | Recall | Specificity | F1 | |
|---|---|---|---|---|---|---|---|
| Ankle | Baseline | 0.9825 | 0.9947 | 0.9788 | 0.9851 | 0.9786 | 0.9820 |
| Proposed | 0.9827 | 0.9980 | 0.9789 | 0.9865 | 0.9787 | 0.9827 | |
| Hip | Baseline | 0.9830 | 0.9969 | 0.9803 | 0.9859 | 0.9789 | 0.9830 |
| Proposed | 0.9847 | 0.9971 | 0.9792 | 0.9904 | 0.9801 | 0.9848 | |
| Ankle + Hip | Baseline | 0.9817 | 0.9958 | 0.9780 | 0.9857 | 0.9777 | 0.9818 |
| Proposed | 0.9841 | 0.9972 | 0.9787 | 0.9899 | 0.9784 | 0.9842 |
All naming conventions are the same as in Table 3.
Fig 5The jogging period matching ratio per feature set type and applied post-classification rule for the highest-accuracy classification model obtained with the proposed and baseline feature sets.