| Literature DB >> 30463336 |
Wesllen Sousa Lima1, Hendrio L de Souza Bragança2, Kevin G Montero Quispe3, Eduardo J Pereira Souto4.
Abstract
Mobile sensing has allowed the emergence of a variety of solutions related to the monitoring and recognition of human activities (HAR). Such solutions have been implemented in smartphones for the purpose of better understanding human behavior. However, such solutions still suffer from the limitations of the computing resources found on smartphones. In this sense, the HAR area has focused on the development of solutions of low computational cost. In general, the strategies used in the solutions are based on shallow and deep learning algorithms. The problem is that not all of these strategies are feasible for implementation in smartphones due to the high computational cost required, mainly, by the steps of data preparation and the training of classification models. In this context, this article evaluates a new set of alternative strategies based on Symbolic Aggregate Approximation (SAX) and Symbolic Fourier Approximation (SFA) algorithms with the purpose of developing solutions with low computational cost in terms of memory and processing. In addition, this article also evaluates some classification algorithms adapted to manipulate symbolic data, such as SAX-VSM, BOSS, BOSS-VS and WEASEL. Experiments were performed on the UCI-HAR, SHOAIB and WISDM databases commonly used in the literature to validate HAR solutions based on smartphones. The results show that the symbolic representation algorithms are faster in the feature extraction phase, on average, by 84.81%, and reduce the consumption of memory space, on average, by 94.48%, and they have accuracy rates equivalent to conventional algorithms.Entities:
Keywords: human activity recognition; inertial sensors; representation symbolic algorithms
Mesh:
Year: 2018 PMID: 30463336 PMCID: PMC6263747 DOI: 10.3390/s18114045
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Example of a discretized time series [29].
Figure 2Example of dimensionality and numerosity reduction in a discretized time series represented by words. In this case, the numbers indicate the indexes and the bold words indicate the repeated words.
Figure 3Statistical table used in the definition of breakpoints [33].
Figure 4Example of time series discretization with SAX.
Figure 5Visual example of the difference between SAX (a) and SFA (b) [29].
Figure 6Subsequence T is approximated using the DFT and quantized using the MCB to finally generate the word ‘DAAC’ of size 4 and alphabet of size 6 [29].
Figure 7Example of generation of the MCB by means of the coefficients of the MFT [29].
Figure 8Visual illustration of the distribution of bins and breakpoints in an MCB. In this case, the word size is 4 and the alphabet is 6 [29].
Figure 9Flow of SFA steps.
Example of the weight matrix generated by TF-IDF.
| Activity 1 | Activity 2 | |
|---|---|---|
| accbb | 0.023 | 0.0 |
| cdaaa | 0.14 | 0.0 |
| ... | ... | ... |
| ddbca | 0.0 | 0.010 |
Figure 10Example of the data processing steps of the SAX-VSM algorithm.
Figure 11Example of discretization and histogram generation by SFA.
Figure 12Example of two BOSS histograms.
Figure 13Difference between the histograms generated by the SFA (a) and SFA-W (b).
Figure 14Example of how the SFA-W discretization process works. In this example, two classes are used ‘A’ and ‘B’, with the real0 and imag3 coefficients selected as the best. Then, the best cut-off point was 0.46.
Figure 15Example of the steps of histograms manipulation based on the SFA-W (a); bigrams (b); Chi-Squared (c). Note: the value 50 indicates the size of the window [25].
List of time and frequency domains features.
| Domain | Features |
|---|---|
| Time | min, max, amplitude, amplitude peak, sum, absolute sum, Euclidian norm, mean, absolute mean, mean square, mean absolute deviation, sum square error, variance, standard deviation, Pearson coefficient, zero crossing rate, correlation, cross-correlation, auto-correlation, skewness, kurtosis, area, absolute area, signal magnitude mean, absolute signal magnitude mean, magnitude difference function. |
| Frequency | Energy, energy normalized, power, centroid, entropy, DC component, peak, coefficient sum. |
Results regarding the accuracy of the classification models generated by the shallow algorithms combined with the time and frequency domain features.
| Features | Algorithm | UCI-HAR | SHOAIB | WISDM |
|---|---|---|---|---|
| Time Features | Decision Tree | 84.64 | 94.56 | 88.38 |
| Naive Bayes | 57.36 | 73.74 | 67.34 | |
| SVM | 81.78 | 92.04 | 83 | |
| KNN | 87.83 | 96.76 | 87.72 | |
| FFT | Decision Tree | 69.48 | 76.02 | 68.05 |
| Naive Bayes | 27.81 | 57.39 | 35.03 | |
| SVM | 60.12 | 70.76 | 63.51 | |
| KNN | 65.84 | 74.08 | 65.94 | |
| Wavelet | Decision Tree | 74.96 | 79.22 | 77.85 |
| Naive Bayes | 25.78 | 69 | 45.95 | |
| SVM | 60.17 | 68.22 | 72.58 | |
| KNN | 61.03 | 63.72 | 73.53 |
Results regarding the accuracy of the classification models generated by the symbolic representation algorithms using the coordinate fusion strategies for histogram stacking, magnitude, and PCA.
| Features | Algorithm | UCI-HAR | SHOAIB | WISDM |
|---|---|---|---|---|
| Axis Fusion | SAX-VSM | 95.7 | 91.3 | 75.1 |
| BOSS | 100 | 100 | 100 | |
| BOSS-VS | 98.5 | 99.9 | 94.4 | |
| WEASEL | 52.9 | 48.4 | 59.9 | |
| Magnitude | SAX-VSM | 87.7 | 81.7 | 60.8 |
| BOSS | 100 | 100 | 100 | |
| BOSS-VS | 98.9 | 99.6 | 96 | |
| WEASEL | 36.6 | 54.09 | 57.9 | |
| PCA | SAX-VSM | 69.6 | 80.7 | 36.5 |
| BOSS | 100 | 100 | 100 | |
| BOSS-VS | 96.9 | 99.7 | 97.2 | |
| WEASEL | 41 | 43.6 | 62.4 |
List of processing time for feature extraction step in time, frequency, and discrete domain.
| Feature Extraction Time (ms) | |||
|---|---|---|---|
| Features | UCI-HAR | SHOAIB | WISDM |
| Time Features | 14,458 | 6434 | 11,240 |
| Frequency Features (FFT) | 17,008 | 8227 | 17,750 |
| Frequency Features (Wavelet) | 13,519 | 5387 | 8394 |
| SAX | 3121 | 1706 | 2636 |
| SFA | 559 | 301 | 418 |
| SFA-W | 4230 | 2858 | 2624 |
List of processing time in the training step of the classification model.
| Train Time (ms) | |||
|---|---|---|---|
| Algorithm | UCI-HAR | SHOAIB | WISDM |
| Decision Tree | 2534 | 925 | 2507 |
| Naive Bayes | 151 | 66 | 125 |
| SVM | 4073 | 1277 | 9233 |
| KNN | 51 | 45 | 49 |
| SAX-VSM | 76 | 97 | 39 |
| BOSS | 62 | 35 | 46 |
| BOSS-VS | 8895 | 2014 | 1976 |
| WEASEL | 31,036 | 7531 | 18,318 |
Memory consumption in data bytes before and after the feature extraction process.
| Dataset | All | Time Features | Frequency Features (FFT) | Discrete Features (SAX) |
|---|---|---|---|---|
| UCI-HAR | 43,409,405 | 9,014,107 | 2,503,597 | 1,779,486 |
| WISDM | 22,035,537 | 7,019,442 | 1,967,915 | 866,972 |
| SHOAIB | 16,175,747 | 4,151,907 | 1,182,459 | 1,377,616 |