| Literature DB >> 32413952 |
Victor Vlădăreanu1, Valentin-Gabriel Voiculescu2, Vlad-Alexandru Grosu2, Luige Vlădăreanu1, Ana-Maria Travediu1, Hao Yan3, Hongbo Wang3, Laura Ruse4.
Abstract
This paper describes the steps involved in obtaining a set of relevant data sources and the accompanying method using software-based sensors to detect anomalous behavior in modern smartphones based on machine-learning classifiers. Three classes of models are investigated for classification: logistic regressions, shallow neural nets, and support vector machines. The paper details the design, implementation, and comparative evaluation of all three classes. If necessary, the approach could be extended to other computing devices, if appropriate changes were made to the software infrastructure, based upon mandatory capabilities of the underlying hardware.Entities:
Keywords: machine-learning classifier; smartphone security; software sensor data
Mesh:
Year: 2020 PMID: 32413952 PMCID: PMC7284384 DOI: 10.3390/s20102768
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1General data collection architecture.
Measurable events and description.
| Event | Intercepted Information |
|---|---|
| SMS | an SMS is sent, destination phone number, message content |
| Call | outgoing phone call takes place, destination phone number |
| WiFi | Wi-Fi state—enabled or not |
| Bluetooth | Bluetooth state—enabled or not |
| App Install | package installation/uninstallation and package name |
| Sensors | value information for registered sensors |
| Camera | Camera state—on or off |
| NFC | NFC state—on or off |
| Activity | current state of the Activity: Create, Start, Stop, Pause, Resume, Destroy |
| Runtime Crash | a runtime crash of the application is happening |
| ANR | Application Not Responding dialog box is being generated |
Example of raw data obtained from software sensors.
| Application Name | sent_bytes_wifi_fore | Application Name | sent_bytes_wifi_fore |
|---|---|---|---|
| air.CandyCatcher | 23461 | air.com.innmenu.free | 7293 |
| air.CandyCatcher | 23617 | air.com.innmenu.free | 9316 |
| air.CandyCatcher | 23825 | air.com.innmenu.free | 12354 |
| air.CandyCatcher | 23825 | air.com.innmenu.free | 13367 |
| air.CandyCatcher | 23825 | air.com.innmenu.free | 19440 |
| air.com.innmenu.free | 7293 | air.com.innmenu.free | 21469 |
Description of sensor data features.
| No. | Feature Name | Description |
|---|---|---|
| 1 | SysCycles_subset | System-wise CPU cycles collected via performance counters |
| 2 | Cycles_subset | Application CPU cycles collected via performance counters |
| 3 | Load1_subset | CPU load for 1 min collected via procfs (cat /proc/loadavg) in user-space |
| 4 | Load5_subset | CPU load for 5 min collected via procfs (cat /proc/loadavg) in user-space |
| 5 | Load15_subset | CPU load for 15 min collected via procfs (cat /proc/loadavg) in user-space |
| 6 | Total_occ_mem_subset | Total memory information collected via procfs (cat /proc/meminfo) in user-space |
| 7 | Memory_subset | VmSize memory information collected via procfs (cat /proc/$pid/status | grep VmSize) in user-space |
| 8 | RSS_memory_subset | VmRSS memory information collected via procfs (cat /proc/$pid/status | grep VmRSS) in user-space |
| 9 | Threads_subset | Number of threads collected via procfs (cat /proc/$pid/status | grep Threads) in user-space |
| 10 | Recv_bytes_wifi_back_subset | Number of bytes, obtained using AOSP events and procfs stats, received via WiFi while the application is in background |
| 11 | Recv_packets_wifi_back_subset | Number of packets, obtained using AOSP events and procfs stats, received via WiFi while the application is in background |
| 12 | Sent_bytes_wifi_back_subset | Number of bytes, obtained using AOSP events and procfs stats, sent via WiFi while the application is in background |
| 13 | Sent_packets_wifi_back_subset | Number of packets, obtained using AOSP events and procfs stats, sent via WiFi while the application is in background |
| 14 | Recv_bytes_wifi_fore_subset | Number of bytes, obtained using AOSP events and procfs stats, received via WiFi while the application is in foreground |
| 15 | Recv_packets_wifi_fore_subset | Number of packets, obtained using AOSP events and procfs stats, received via WiFi while the application is in foreground |
| 16 | Sent_bytes_wifi_fore_subset | Number of bytes, obtained using AOSP events and procfs stats, sent via WiFi while the application is in foreground |
| 17 | Sent_packets_wifi_fore_subset | Number of packets, obtained using AOSP events and procfs stats, sent via WiFi while the application is in foreground |
| 18 | CPU_usage_user_float_subset | User-space CPU usage percentage |
| 19 | CPU_usage_system_float_subset | Total CPU usage percentage |
Example of dataset samples.
| App 1 | App 2 | App 3 | App 4 | App 5 | App 6 | App 7 | App 8 | App 9 | |
|---|---|---|---|---|---|---|---|---|---|
| Feat 1 | 2.1260 | −0.2039 | −0.9102 | −1.0380 | −0.7869 | −0.4426 | −0.5818 | 2.0124 | 0.3142 |
| Feat 2 | 1.5776 | 0.2069 | −0.8571 | −0.8805 | −0.7298 | −0.8488 | −0.6119 | 4.5156 | −0.2921 |
| Feat 3 | 1.3070 | 1.5609 | −0.9932 | −0.7135 | 0.8064 | −1.4613 | −0.6617 | −0.0143 | 1.9401 |
| Feat 4 | 2.0614 | 0.5280 | −0.5014 | −0.3219 | 0.4668 | −0.7027 | −0.2998 | −0.3167 | 0.6111 |
| Feat 5 | 4.8613 | −0.0313 | −0.4700 | −0.3211 | 1.4666 | −0.3339 | −0.4885 | −0.2823 | 0.1868 |
| Feat 6 | 1.6025 | −0.2129 | −0.4023 | 0.7248 | 1.3224 | −0.9621 | −1.3607 | 1.0606 | 0.5359 |
| Feat 7 | 1.6277 | 1.7521 | −1.1560 | −1.0896 | 0.8968 | −1.1018 | −1.1545 | −0.7817 | 0.3637 |
| Feat 8 | 1.3134 | 1.5573 | −1.0430 | −0.9086 | −0.0139 | −0.9776 | −0.9513 | −0.9219 | 0.0963 |
| Feat 9 | 0.9160 | 2.1956 | −1.2540 | −1.1672 | 0.9956 | −1.0804 | −1.0804 | −0.8200 | 0.3952 |
| Feat 10 | −0.1558 | −0.1579 | −0.1550 | −0.1579 | −0.0435 | −0.1579 | −0.1579 | −0.0612 | −0.1579 |
| Feat 11 | −0.1674 | −0.1841 | −0.1590 | −0.1841 | 0.0255 | −0.1841 | −0.1841 | 0.0255 | −0.1841 |
| Feat 12 | −0.2925 | −0.3570 | −0.3128 | −0.3803 | 0.6953 | −0.3803 | −0.3803 | 0.5122 | −0.3803 |
| Feat 13 | −0.2251 | −0.2320 | −0.2182 | −0.2667 | 0.2182 | −0.2667 | −0.2667 | 0.3775 | −0.2667 |
| Feat 14 | −0.1042 | −0.1742 | −0.2019 | −0.1976 | −0.1635 | −0.2020 | 0.3298 | −0.1763 | −0.1719 |
| Feat 15 | −0.1062 | −0.1707 | −0.2224 | −0.1955 | −0.1546 | −0.2245 | 0.2975 | −0.1638 | −0.1750 |
| Feat 16 | 0.0372 | −0.1001 | −0.4477 | −0.2964 | 0.1316 | −0.4582 | −0.0927 | 0.0206 | 0.1077 |
| Feat 17 | −0.0349 | −0.2000 | −0.3626 | −0.2479 | −0.1387 | −0.3678 | 0.2105 | −0.0466 | −0.2137 |
| Feat 18 | −0.0548 | −0.0547 | −0.0202 | −0.0548 | −0.0548 | 0.0555 | −0.0548 | −0.0528 | −0.0548 |
| Feat 19 | −0.0660 | −0.0659 | −0.0323 | −0.0660 | −0.0660 | 0.0497 | −0.0660 | 0.0804 | −0.0660 |
Classifier metrics.
| Number of True Positives |
|
| Number of True Negatives |
|
| Number of False Positives |
|
| Number of False Negatives |
|
| Accuracy |
|
| Precision (Positive Predictive Value—PPR) |
|
| Recall (True Positive Rate—TPR) |
|
| F1 Score |
|
Parameters, meta-parameters and criteria.
| Training | Cross-Validation | Testing | |||
|---|---|---|---|---|---|
| Optimize | Evaluate | Choose/Test | Evaluate | Evaluate | |
| Logistic Regression | Theta | Cross-Entropy | Lambda | F1 score | F1 score |
| Threshold | |||||
| Support Vector Machine | Margin | Hinge | Kernel | F1 score | F1 score |
| Pattern net | Weights | Cross-Entropy | Weights | Cross-Entropy | F1 score |
Figure 2Logistic Regression cost function.
Figure 3Gradient Descent algorithm.
Figure 4SVM and Logistic Regression cost functions.
Figure 5Artificial neural network with one hidden layer.
Figure 6Logistic Regression Cost Function over algorithm iterations.
Ranges of values for lambda and threshold.
| Lambda | 0.1 | 0.3 | 1 | 3 | 10 | 25 | 50 | 100 |
| Threshold | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 |
Logistic Regression results on varying lambda.
| λ | Training Set | Cross-Validation Set | Test Set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Prec | Rec |
| Acc | Prec | Rec |
| Acc | Prec | Rec |
| |
| 0.1 | 0.91 | 0.92 | 0.87 | 0.89 | 0.87 | 0.83 | 0.87 | 0.85 | 0.81 | 0.80 | 0.86 | 0.83 |
| 0.3 | 0.91 | 0.91 | 0.88 | 0.89 | 0.89 | 0.87 | 0.87 | 0.87 | 0.83 | 0.83 | 0.86 | 0.84 |
| 1 | 0.90 | 0.90 | 0.88 | 0.89 | 0.89 | 0.87 | 0.87 | 0.87 | 0.83 | 0.83 | 0.86 | 0.84 |
| 3 | 0.90 | 0.90 | 0.87 | 0.88 | 0.89 | 0.87 | 0.87 | 0.87 | 0.83 | 0.83 | 0.86 | 0.84 |
| 10 | 0.89 | 0.89 | 0.87 | 0.88 | 0.91 | 0.88 | 0.91 | 0.89 | 0.83 | 0.85 | 0.82 | 0.84 |
| 25 | 0.89 | 0.89 | 0.85 | 0.87 | 0.89 | 0.87 | 0.87 | 0.87 | 0.81 | 0.82 | 0.82 | 0.82 |
| 50 | 0.88 | 0.88 | 0.84 | 0.86 | 0.89 | 0.87 | 0.87 | 0.87 | 0.80 | 0.79 | 0.82 | 0.81 |
| 100 | 0.88 | 0.88 | 0.84 | 0.86 | 0.87 | 0.83 | 0.87 | 0.85 | 0.80 | 0.79 | 0.82 | 0.81 |
Figure 7Confusion Matrix for Logistic Regression.
Figure 8PatternNet architecture.
Figure 9PatternNet performance.
PatternNet results.
| PatternNet | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Training Set | 0.88 | 0.88 | 0.84 | 0.86 |
| Cross-Validation Set | 0.85 | 0.88 | 0.81 | 0.84 |
| Test Set | 0.89 | 0.95 | 0.79 | 0.86 |
Figure 10PatternNet Confusion Matrix.
SVM results on varying the kernel.
| Kernel | Training Set | Cross-Validation Set | Test Set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Prec | Rec |
| Acc | Prec | Rec |
| Acc | Prec | Rec |
| |
| Linear | 0.88 | 0.87 | 0.87 | 0.87 | 0.89 | 0.90 | 0.83 | 0.86 | 0.89 | 0.93 | 0.87 | 0.90 |
| Gaussian | 0.98 | 0.98 | 0.97 | 0.98 | 0.80 | 0.93 | 0.57 | 0.70 | 0.83 | 1.00 | 0.70 | 0.82 |
| Polynomial | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 | 0.78 | 0.78 | 0.78 | 0.85 | 0.92 | 0.80 | 0.86 |
Figure 11SVM Confusion Matrix.
Overall comparison of model results.
| Logistic Regression | PatternNet | SVM | |
|---|---|---|---|
|
| 20 Neurons | Linear Kernel | |
| Training Set | |||
| Accuracy | 0.89 | 0.88 | 0.88 |
| Precision | 0.89 | 0.88 | 0.87 |
| Recall | 0.87 | 0.84 | 0.87 |
| F1 score | 0.88 | 0.86 | 0.87 |
| Cross-Validation Set | |||
| Accuracy | 0.91 | 0.85 | 0.89 |
| Precision | 0.88 | 0.88 | 0.90 |
| Recall | 0.91 | 0.81 | 0.83 |
| F1 score | 0.89 | 0.84 | 0.86 |
| Test Set | |||
| Accuracy | 0.83 | 0.89 | 0.89 |
| Precision | 0.85 | 0.95 | 0.93 |
| Recall | 0.82 | 0.79 | 0.87 |
| F1 score | 0.84 | 0.86 | 0.90 |
McNemar tests for significance of model performance.
| Exact–Conditional | Mid | Asymptotic | |
|---|---|---|---|
| Logistic Regression vs. PatternNet | |||
| h | 0 | 0 | 0 |
| p | 0.2295 | 0.1147 | 0.1103 |
| e1 | 0.1108 | ||
| e2 | 0.1274 | ||
| Logistic Regression vs. Support Vector Machine | |||
| h | 1 | 0 | 0 |
| p | 0.0488 | 0.9755 | 0.9761 |
| e1 | 0.1108 | ||
| e2 | 0.0720 | ||
| PatternNet vs. Support Vector Machine | |||
| h | 1 | 0 | 0 |
| p | 0.0041 | 0.9977 | 0.9976 |
| e1 | 0.1274 | ||
| e2 | 0.0720 | ||
Corrected exact McNemar tests for significance of model performance.
| Test 1 | Test 2 | Test 3 | |
|---|---|---|---|
| LogReg vs. PatternNet | LogReg vs. SVM | PatternNet vs. SVM | |
| Corrected | 0.2200 | 0.0800 | 0.0120 |
| Significance | 0 | 0 | 1 |