| Literature DB >> 25688592 |
Muhammad Habib ur Rehman1, Chee Sun Liew2, Teh Ying Wah3, Junaid Shuja4, Babak Daghighi5.
Abstract
The staggering growth in smartphone and wearable device use has led to a massive scale generation of personal (user-specific) data. To explore, analyze, and extract useful information and knowledge from the deluge of personal data, one has to leverage these devices as the data-mining platforms in ubiquitous, pervasive, and big data environments. This study presents the personal ecosystem where all computational resources, communication facilities, storage and knowledge management systems are available in user proximity. An extensive review on recent literature has been conducted and a detailed taxonomy is presented. The performance evaluation metrics and their empirical evidences are sorted out in this paper. Finally, we have highlighted some future research directions and potentially emerging application areas for personal data mining using smartphones and wearable devices.Entities:
Year: 2015 PMID: 25688592 PMCID: PMC4367420 DOI: 10.3390/s150204430
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Terminology.
| PSDs | Personal Sensing Devices |
| RCEs | Resource-constrained Environments |
| PerDM | Personal Data Mining |
| PE | Personal Ecosystem |
| KDP | Knowledge Discovery Process |
| IoT | Internet of Things |
| EEU | Efficient Energy Utilization |
| BUC | Bandwidth Utilization Cost |
| BSN | Body Sensors Network |
| BLE | Bluetooth Low Energy |
| FPGA | Field Programmable Gate Array |
| DSPS | Data Stream Processing Systems |
| ID | Integration Device |
| OMM | Open Mobile Miner |
| SMA | Mobile Smart Achieve |
| FPM | Frequent Pattern Mining |
| SL | Supervised Learning |
| UL | Unsupervised learning |
| SSL | Semi-Supervised Learning |
Figure 1.Data lifecycle in PEs.
Personal Data in PE.
| Sensors | Physiological | Heart rate monitor | Numeric/Integer |
| Blood Glucose Monitor | Numeric/Integer | ||
|
| |||
| Physical activity | Accelerometer | Numeric/Floating point | |
|
| |||
| Environmental | Temperature | Numeric/Floating point | |
| Humidity | Numeric/Integer | ||
| Air pressure monitor | Numeric/Floating point | ||
|
| |||
| Navigational | GPS Location | Numeric/Floating point | |
| Compass | Text | ||
|
| |||
| User Interaction | Input data | On-screen keyboard | Text/Numeric |
| Microphone | Audio | ||
| Camera | Images/Video | ||
|
| |||
| Device-resident | Application Logs | Web browser logs | Text |
| Application specific logs | Text | ||
|
| |||
| Communication logs | Bluetooth scans | Text | |
| Wi-Fi Scans | Text | ||
|
| |||
| User data | Contact List | Text | |
| Call Logs | Text | ||
| SMS data | Text | ||
Figure 2.Taxonomy of PerDM in RCEs.
Figure 3.Data flow from PSDs to remote DSPS.
Large-scale DSPS.
| DataSift [ | Twitter Stream Analysis | Cluster | Multiple | Filtering, Regular Expressions |
| Dremal [ | Interactive | Cluster | Dremal query language | |
| ESper [ | Complex Event Processing | In-Memory | Java, .Net | Online queries |
| IBM InfoSphere [ | Stream processing | Cluster | Java | Online/ |
| Kenesis [ | Stream processing | Cluster | Java | Online queries |
| Hadoop Online Prototype [ | Stream processing | Cluster | Java | Not Found |
| MOA [ | Machine Learning | Various platforms | Java | Not Found |
| Microsoft Stream Insight [ | Complex Event Processing | Server | .Net | Online queries |
| S4 [ | Event processing | Cluster | Java | Online data processing |
| SAMOA [ | Distributed machine learning | Can be integrated with S4 and Storm | Java | Not Found |
| Scikits.learn [ | Machine learning | Programming Abstraction | Python, C++ | Not Found |
| StreamDrill [ | Stream processing | Not found | Not found | Top-K item counting |
| Storm [ | Stream processing | Cluster | Multiple | Online |
Figure 4.SL and UL algorithm development process.
Figure 5.SSL algorithm development process.
Classification algorithms in PSDs.
| [ | MNN | Feature extraction | Offline | Accuracy = 61.6% | Smartphone |
| [ | NB with KDE | Captures RSSI values | Adaptive | Accuracy = 84% | Smartphone |
| [ | J48 | STFT and CWT | Offline | Accuracy = 97.2% | Smartphone |
| [ | LDA | Feature extraction | Offline | Accuracy = 82.8% | MATLAB |
| [ | HT, NB | 80 (training) : 20 (test) | Online | Accuracy >50% | WEKA |
| [ | C4.5 | Feature extraction | Offline | Accuracy = 81.9% | Smartphone |
| [ | ANN | Lag and autocorrelation plots, FFT and DCT | Offline | Accuracy with KDA [ | MATLAB, Smartphone |
| [ | SVM | Dimensionality and noise reduction, and feature extraction | Offline Online | Accuracy with KDA features = 99% | MATLAB, Smartphone |
| [ | SVM | Feature extraction and Noise reduction | Offline | Accuracy = 98.85% | Smartphone |
| [ | BN | Context inference module is used for adoption of BN | Online | Accuracy = 63% | Smartphone |
| [ | J48 | Feature extraction | Offline | Accuracy = 97.02% | Smartphone WEKA |
| [ | QDA | Feature extraction | Offline | Accuracy = 95.8% | Smartphone |
| [ | RF | Feature extraction | Offline | Accuracy = 80.3% | WEKA |
| [ | NN | Feature extraction | Offline | Accuracy = 100% | WEKA |
| [ | SVM | Feature extraction | Offline | SVM has best accuracy in almost all cases | PC, Smartphone |
| [ | NB | Feature extraction | Online Adaptive | Accuracy = 86% ± 3.9% | ZTE Blade WEKA |
| [ | kNN | Feature extraction | Offline | Recall = 95% | Smartphone |
| [ | NN | Feature extraction | Offline | Accuracy = 100% | WEKA |
| [ | MLP | Feature extraction | Offline | Accuracy = 50% | WEKA |
| [ | J48 , LibSVM, AdaBoost, BN | Feature extraction | Offline | Average accuracy = 77.14% | Smartphone |
Clustering algorithms in PSDs.
| [ | Light-weight clustering | Online/offline | Energy gain = 300%, bandwidth gain = 17 times | Smartphone |
| [ | Offline | Accuracy = 82.9% | Smartphone | |
| [ | Offline | Accuracy = 95.31% | Smartphone | |
| [ | Adjustable fuzzy clustering with Probabilistic NNs | Offline/incremental | Accuracy = 91.3% | BSN |
| [ | Time-based clustering | Online | Data reduction = 11 | Smartphone |
| [ | Offline | Accuracy = 97.1% | Smartphone |
Evaluation criteria for data mining algorithms in PSDs.
| [ | - | - | √ | - | - | √ | √ | √ |
| [ | - | - | √ | - | - | - | - | √ |
| [ | √ | - | √ | - | - | - | - | √ |
| [ | - | - | √ | - | - | - | - | - |
| [ | - | - | √ | - | - | - | - | - |
| [ | √ | - | √ | √ | - | - | - | √ |
| [ | - | - | √ | √ | - | - | - | √ |
| [ | - | - | √ | - | - | - | - | - |
| [ | √ | √ | √ | - | - | - | - | - |
| [ | - | - | √ | - | - | - | √ | - |
| [ | √ | √ | √ | √ | - | - | - | √ |
| [ | - | - | √ | - | - | - | - | - |
| [ | - | - | √ | √ | - | √ | √ | - |
| [ | - | - | √ | - | - | - | - | - |
| [ | √ | - | √ | √ | - | - | - | - |
| [ | - | - | √ | - | √ | √ | √ | - |
| [ | √ | √ | √ | √ | - | - | - | - |
| [ | √ | - | √ | - | √ | √ | √ | - |
| [ | - | - | √ | - | √ | - | - | - |
| [ | - | - | √ | - | √ | √ | √ | √ |
| [ | - | - | √ | - | - | - | - | √ |
| [ | √ | - | √ | - | - | - | - | - |
| [ | - | - | √ | √ | - | - | - | - |
| [ | - | - | √ | - | - | - | - | - |
| [ | - | √ | - | - | - | - | - | - |
| [ | √ | √ | TBC based algorithm was evaluated by No. of stay points and regions | |||||
| [ | - | - | √ | - | √ | √ | √ | - |
| [ | √ | √ | √ | - | - | - | - | √ |
Note: The performance criteria used in respective studies is denoted by √, otherwise marked as -.
Confusion matrix.
| Class1 | Class2 | Class3 | Class4 | ||
| Class1 | |||||
| Class2 | |||||
| Class3 | |||||
| Class4 | |||||
Figure 6.PSDs based user-centric personalization process.
Figure 7.Evaluation model for PSDs based user-centric personalization [105].