| Literature DB >> 29970823 |
Nsikak Pius Owoh1, Manmeet Mahinderjit Singh2, Zarul Fitri Zaaba3.
Abstract
Automatic data annotation eliminates most of the challenges we faced due to the manual methods of annotating sensor data. It significantly improves users’ experience during sensing activities since their active involvement in the labeling process is reduced. An unsupervised learning technique such as clustering can be used to automatically annotate sensor data. However, the lingering issue with clustering is the validation of generated clusters. In this paper, we adopted the k-means clustering algorithm for annotating unlabeled sensor data for the purpose of detecting sensitive location information of mobile crowd sensing users. Furthermore, we proposed a cluster validation index for the k-means algorithm, which is based on Multiple Pair-Frequency. Thereafter, we trained three classifiers (Support Vector Machine, K-Nearest Neighbor, and Naïve Bayes) using cluster labels generated from the k-means clustering algorithm. The accuracy, precision, and recall of these classifiers were evaluated during the classification of “non-sensitive” and “sensitive” data from motion and location sensors. Very high accuracy scores were recorded from Support Vector Machine and K-Nearest Neighbor classifiers while a fairly high accuracy score was recorded from the Naïve Bayes classifier. With the hybridized machine learning (unsupervised and supervised) technique presented in this paper, unlabeled sensor data was automatically annotated and then classified.Entities:
Keywords: activity recognition; clustering; data security; multivariate data; sensitive data
Year: 2018 PMID: 29970823 PMCID: PMC6069149 DOI: 10.3390/s18072134
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Mathematical notations.
| Symbol | Definition |
|---|---|
| D | Data |
| n | Total number of samples |
| K | Total number of clusters |
| S | Observations |
| F | Features (X, Y, Z, L, V) |
| | | Absolute value of X |
| | | Absolute value of Y |
| | | Absolute value of Z |
| | | Absolute value of L |
| | | Absolute value of V |
|
| Dot Operator |
Algorithm for the proposed k-means validation index.
|
|
| 1. Choose the number of clusters |
| 7. For |
| 8. |
| 9. |
| 10. |
Figure 1Three-dimensional axes of an accelerometer.
Figure 2Three-dimensional axes of a gyroscope.
Figure 3Centroid initialization of non-sensitive and sensitive clusters.
Figure 4Converged non-sensitive and sensitive clusters.
Confusion Matrix from SVM, KNN, and NB classifiers.
| SVM Classifier | KNN Classifier | NB Classifier | ||||||
|---|---|---|---|---|---|---|---|---|
| N = 2000 | Predicted Class 0 (Non-sensitive) | Predicted Class 1 (Sensitive) | N = 2000 | Predicted Class 0 (Non-sensitive) | Predicted Class 1 (Sensitive) | N = 2000 | Predicted Class 0 (Non-sensitive) | Predicted Class 1 (Sensitive) |
| Actual Class 0 (Non-sensitive) | TN = 1008 | FN = 6 | Actual Class 0 (Non-sensitive) | TN = 1005 | FN = 9 | Actual Class 0 (Non-sensitive) | TN = 909 | FN = 105 |
| Actual Class 1 (Sensitive) | FP = 10 | TP = 976 | Actual Class 1 (Sensitive) | FP = 13 | TP = 973 | Actual Class 1 (Sensitive) | FP = 21 | TP = 965 |
Summary of Results from SVM, KNN, and NB classifiers.
| Classifiers | Accuracy | Prediction Mean | Precision | Recall | False Positive Rate (FPR) | Misclassification Rate | |
|---|---|---|---|---|---|---|---|
| 0 s | 1 s | ||||||
|
| 99.3% | 0.5125 | 0.4875 | 0.9948 | 0.9848 | 0.0050 | 0.0056 |
|
| 98% | 0.5070 | 0.4930 | 0.9790 | 0.9700 | 0.0090 | 0.0150 |
|
| 94% | 0.5070 | 0.4930 | 0.9430 | 0.9019 | 0.1034 | 0.0535 |
Figure 5Comparison of accuracy, precision, and recall results from evaluated classifiers.
Figure 6Comparison of FPR and MR results from evaluated classifiers.
Figure 7ROC Curve of SVM, KNN, and NB classifiers.