| Literature DB >> 35336451 |
Muhammad Awais Shafique1,2,3, Sergi Saurí Marchán1,2.
Abstract
The accuracy of Human Activity Recognition is noticeably affected by the orientation of smartphones during data collection. This study utilized a public domain dataset that was specifically collected to include variations in smartphone positioning. Although the dataset contained records from various sensors, only accelerometer data were used in this study; thus, the developed methodology would preserve smartphone battery and incur low computation costs. A total of 175 different features were extracted from the pre-processed data. Data stratification was conducted in three ways to investigate the effect of information sharing between the training and testing datasets. After data balancing using only the training dataset, ten-fold and LOSO cross-validation were performed using several algorithms, including Support Vector Machine, XGBoost, Random Forest, Naïve Bayes, KNN, and Neural Network. A very simple post-processing algorithm was developed to improve the accuracy. The results reveal that XGBoost takes the least computation time while providing high prediction accuracy. Although Neural Network outperforms XGBoost, XGBoost demonstrates better accuracy with post-processing. The final detection accuracy ranges from 99.8% to 77.6% depending on the level of information sharing. This strongly suggests that when reporting accuracy values, the associated information sharing levels should be provided as well in order to allow the results to be interpreted in the correct context.Entities:
Keywords: human activity recognition; machine learning; oversampling; random forest; smartphone
Mesh:
Year: 2022 PMID: 35336451 PMCID: PMC8948682 DOI: 10.3390/s22062280
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Proposed Methodology of the study.
Figure 2Detection of Outliers.
Figure 3Removal of Outliers.
Figure 4Detection accuracy with respect to window size.
Detection accuracy with varying overlap values.
| Window Size | Overlap (%) | Detection Accuracy (%) | ||||
|---|---|---|---|---|---|---|
| Inactive | Active | Walking | Driving | Overall | ||
| 28 s | 25 | 97.22 | 92.13 | 77.55 | 84.81 | 91.16 |
| 50 | 98.24 | 93.29 | 82.78 | 88.90 | 93.34 | |
| 75 | 99.02 | 95.81 | 87.24 | 92.92 | 95.47 | |
| 24 s | 25 | 98.43 | 90.67 | 80.88 | 86.88 | 92.19 |
| 50 | 98.48 | 92.50 | 84.27 | 88.32 | 93.23 | |
| 75 | 99.07 | 94.61 | 87.60 | 93.66 | 95.37 | |
| 20 s | 25 | 98.17 | 93.17 | 79.02 | 87.92 | 92.57 |
| 50 | 98.18 | 92.46 | 80.76 | 87.17 | 92.38 | |
| 75 | 98.56 | 95.45 | 88.11 | 92.91 | 95.35 | |
| 16 s | 25 | 97.57 | 91.01 | 77.85 | 85.60 | 91.03 |
| 50 | 98.38 | 91.69 | 82.03 | 89.01 | 92.73 | |
| 75 | 98.66 | 94.23 | 86.29 | 92.51 | 94.77 | |
| 12 s | 25 | 97.88 | 89.43 | 78.21 | 85.95 | 90.89 |
| 50 | 98.10 | 90.46 | 78.47 | 87.81 | 91.77 | |
| 75 | 98.63 | 93.61 | 85.48 | 90.45 | 94.17 | |
| 8 s | 25 | 97.89 | 87.87 | 76.61 | 84.65 | 90.09 |
| 50 | 97.79 | 90.09 | 80.08 | 86.15 | 91.30 | |
| 75 | 98.60 | 92.23 | 84.60 | 89.86 | 93.49 | |
Amount of data with respect to activities.
| Activity | No. of Participants | No. of Trips | Amount of Data | Percentage |
|---|---|---|---|---|
| Inactive | 13 | 77 | 40,297 | 45.95 |
| Active | 9 | 65 | 22,516 | 25.67 |
| Walking | 14 | 119 | 12,417 | 14.16 |
| Driving | 12 | 73 | 12,471 | 14.22 |
Figure 5Distribution of trips among ten folds.
Participant Bands for LOSO cross-validation.
| Band | Participant | Inactive | Active | Walking | Driving |
|---|---|---|---|---|---|
| 1 | 1 | 666 | 0 | 0 | 153 |
| 6 | 0 | 2008 | 973 | 0 | |
| 2 | 2 | 159 | 0 | 675 | 356 |
| 13 | 1407 | 495 | 86 | 0 | |
| 3 | 3 | 20 | 876 | 1014 | 602 |
| 7 | 1214 | 0 | 906 | 1869 | |
| 4 | 4 | 849 | 868 | 63 | 260 |
| 5 | 5 | 331 | 487 | 489 | 3499 |
| 14 | 152 | 0 | 204 | 0 | |
| 6 | 8 | 29,865 | 1403 | 4216 | 1441 |
| 7 | 9 | 1724 | 0 | 2365 | 655 |
| 11 | 1013 | 12,134 | 618 | 1289 | |
| 8 | 10 | 0 | 1859 | 279 | 1885 |
| 15 | 519 | 0 | 45 | 96 | |
| 9 | 12 | 2378 | 2386 | 484 | 366 |
Example of a forward and backward voting sequence.
| Trip No. | Actual | Predicted | Forward Voting Sequence | Corrected Prediction | Backward Voting Sequence | Corrected Prediction |
|---|---|---|---|---|---|---|
| 1 | Walking | Walking | 0, 0, 1, 0 | Walking | 0, 0, 8, 0 | Walking |
| 1 | Walking | Walking | 0, 0, 2, 0 | Walking | 0, 0, 7, 0 | Walking |
| 1 | Walking | Walking | 0, 0, 3, 0 | Walking | 0, 0, 6, 0 | Walking |
| 1 | Walking |
| 0, 0, 2, 1 | Walking | 0, 0, 5, 0 | Walking |
| 1 | Walking |
| 0, 0, 1, 2 |
| 0, 0, 4, 1 | Walking |
| 1 | Walking | Walking | 0, 0, 2, 1 | Walking | 0, 0, 5, 0 | Walking |
| 1 | Walking | Walking | 0, 0, 3, 0 | Walking | 0, 0, 4, 0 | Walking |
| 1 | Walking | Walking | 0, 0, 4, 0 | Walking | 0, 0, 3, 0 | Walking |
| 1 | Walking | Walking | 0, 0, 5, 0 | Walking | 0, 0, 2, 0 | Walking |
| 1 | Walking |
| 0, 0, 4, 1 | Walking |
| Walking |
| 2 | Inactive |
|
|
| 4, 1, 0, 0 | Inactive |
| 2 | Inactive | Inactive | 1, 0, 0, 0 | Inactive | 5, 0, 0, 0 | Inactive |
| 2 | Inactive | Inactive | 2, 0, 0, 0 | Inactive | 4, 0, 0, 0 | Inactive |
| 2 | Inactive | Inactive | 3, 0, 0, 0 | Inactive | 3, 0, 0, 0 | Inactive |
| 2 | Inactive | Inactive | 4, 0, 0, 0 | Inactive | 2, 0, 0, 0 | Inactive |
| 2 | Inactive |
| 3, 1, 0, 0 | Inactive |
| Inactive |
Figure 6Confusion Matrix.
Classification results for random stratified data.
| Algorithm | Measure | Inactive | Active | Walking | Driving |
|---|---|---|---|---|---|
| XGB | Precision | 0.807 | 0.869 | ||
| Recall | 0.929 | ||||
| F-Score | 0.95 | ||||
| RF | Precision | 0.975 | 0.771 | 0.825 | |
| Recall | 0.933 | 0.818 | 0.811 | 0.861 | |
| F-Score | 0.951 | 0.788 | 0.836 | 0.835 | |
| SVM | Precision | 0.959 | 0.868 | 0.856 | |
| Recall | 0.947 | 0.829 | 0.816 | 0.859 | |
| F-Score | 0.951 | 0.814 | 0.834 | 0.853 | |
| NB | Precision | 0.934 | 0.707 | 0.703 | 0.588 |
| Recall | 0.476 | 0.705 | 0.815 | ||
| F-Score | 0.555 | 0.685 | 0.674 | ||
| KNN | Precision | 0.943 | 0.788 | 0.781 | 0.812 |
| Recall | 0.937 | 0.761 | 0.781 | 0.834 | |
| F-Score | 0.937 | 0.766 | 0.777 | 0.819 |
Accuracy and computation time for randomly stratified data.
| Algorithm | Accuracy | Computation Time (min) |
|---|---|---|
| XGB | 2.64 | |
| RF | 0.876 | 22.86 |
| SVM | 0.886 | 208.13 |
| NB | 0.786 | 5.85 |
| KNN | 0.855 | 295.02 |
Accuracy values after data balancing for randomly stratified data.
| Balancing Method | Accuracy |
|---|---|
| Downsampling | 0.983 |
| Oversampling | |
| Both | 0.995 |
| SMOTE | 0.997 |
Results for trip-wise stratified data.
| Balancing Method | Measure | Inactive | Active | Walking | Driving | Accuracy |
|---|---|---|---|---|---|---|
| None | Precision | 0.935 | 0.785 | 0.868 | ||
| Recall | 0.677 | 0.798 | ||||
| F-Score | 0.948 | 0.693 | 0.735 | |||
| Downsampling | Precision | 0.961 | 0.8 | 0.727 | 0.793 | |
| Recall | 0.952 | 0.694 | 0.909 | |||
| F-Score | 0.956 | 0.709 | 0.841 | |||
| Oversampling | Precision | 0.654 | 0.773 | 0.871 | ||
| Recall | 0.822 | 0.838 | ||||
| F-Score | 0.712 | 0.775 |
Results for user-wise stratified data.
| Algorithm | Measure | Inactive | Active | Walking | Driving | Accuracy | Time (min) |
|---|---|---|---|---|---|---|---|
| XGB | Precision | 0.878 | 0.68 | 0.554 | 0.635 | 0.695 | 3.03 |
| Recall | 0.908 | 0.467 | 0.791 | 0.646 | |||
| F-Score | 0.878 | 0.486 | 0.59 | 0.563 | |||
| NN | Precision | 0.697 | 0.62 | 0.62 | 0.814 | 0.731 | 14.16 |
| Recall | 0.711 | 0.763 | |||||
| F-Score | 0.639 | 0.529 | 0.649 | ||||
| XGB with Postprocessing | Precision | 3.2 | |||||
| Recall | 0.508 | 0.772 | |||||
| F-Score | 0.744 |
Figure 7Information sharing at data level.
Figure 8Information sharing at trip level.
Figure 9Information sharing at participant level.