| Literature DB >> 31794555 |
Chengyun Song1, Weiyi Liu2, Zhining Liu3, Xiaoyang Liu1.
Abstract
With the growing popularity of online services such as online banking and online shopping, one of the essential research topics is how to build a privacy-preserving user abnormal behavior recommendation system. However, a machine-learning based system may present a dilemma. On one aspect, such system requires large volume of features to pre-train the model, but on another aspect, it is challenging to design usable features without looking to plaintext private data. In this paper, we propose an unorthodox approach involving graph analysis to resolve this dilemma and build a novel private-preserving recommendation system under a multilayer network framework. In experiments, we use a large, state-of-the-art dataset (containing more than 40,000 nodes and 43 million encrypted features) to evaluate the recommendation ability of our system on abnormal user behavior, yielding an overall precision rate of around 0.9, a recall rate of 1.0, and an F1-score of around 0.94. Also, we have also reported a linear time complexity for our system. Last, we deploy our system on the "Wenjuanxing" crowd-sourced system and "Amazon Mechanical Turk" for other users to evaluate in all aspects. The result shows that almost all feedbacks have achieved up to 85% satisfaction.Entities:
Mesh:
Year: 2019 PMID: 31794555 PMCID: PMC6890179 DOI: 10.1371/journal.pone.0224684
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A use case example.
A privacy-preserving system should distinguishes real users from imposters even though both are using the same login account, and forbids the imposter from accessing sensitive user data.
Fig 2Workflow for proposed system.
Fig 3Multilayer network based on user behavior over time.
Fig 4Architecture of visualization system.
Fig 5Visualization system.
Proposed privacy-preserving recommendation system for abnormal user behavior. System has two major parts: an overview (left), and individual behavior (right) along with device score groups, analysis results, and suggestions.
Evaluation dataset.
| Entire Users Information | ||
| Users | Devices | Encrypted Search keywords and IPs |
| 453,029 | 926,578 | 43,786,036 |
| Per User Information | ||
| Avg timeslot groups | Avg devices | Avg Encrypted search keywords and IPs |
| 5.263 | 2.05 | 96.652 |
Notes: Despite a large number of logs, there are on average too few encrypted features per user to train good models. We use Alg. 1 to group times from each user’s reach_time and data_time logs.
Comparison with state-of-the-art algorithms.
| Related properties | Proposed method | Robust deep auto encoder | ML-based algorithms | Asynchronous MTL |
|---|---|---|---|---|
| 1. Privacy-preserving | Embedded | Required | Embedded | Embedded |
| 2. Volume of input features | Few | Large | Large | Large |
| 3. Unsupervised | Yes | Yes | No | No |
| 4. Training-free | Yes | No | No | No |
| 5. Analyzes different sources | Yes | No | No | Yes |
| 6. Noise-resistance | Not Effected | High | Not Effected | High |
Notes: ML = machine learning, MTL = multi-task learning
Abnormal device detection.
| Injected devices | Precision | Recall | F1-score |
|---|---|---|---|
| 1 | 0.901 | 1.0 | 0.934 |
| 2 | 0.896 | 1.0 | 0.931 |
| 3 | 0.913 | 1.0 | 0.942 |
Notes: The high precision rate demonstrates the proposed system’s high accuracy in distinguishing injected abnormal devices from other normal devices. The consistent, perfect recall rate shows that the system captures all injected abnormal devices.
System efficiency.
| Min T(ms) | Avg T (ms) | Max T (ms) | |
|---|---|---|---|
| Step 1 Construction | 0.19 | 5.85 | 17.51 |
| Step 2 Recommendation | 0.61 | 2.23 | 4.45 |
Notes: The overall time to system construction and recommendation steps is around 10ms; from the system operator’s point of view, this indicates the proposed system is fast enough for large-scale deployment [1].
System efficiency.
| No. | Question | Score |
|---|---|---|
| Q1 | The overall response speed of the system | 1–5 |
| Q2 | The user-friendliness of the system | 1–5 |
| Q3 | Does “Overview system” show all the device information for the current user? | Yes/no |
| Q4 | Does “Overview system” show the topological behavior among all related devices? | Yes/no |
| Q5 | Does “Individual system” represent the device behavior within a certain timeslot group? | Yes/no |
| Q6 | Does “Individual system” capture the topological relationship within a certain timeslot group? | Yes/no |
| Q7 | Is it possible to clearly identify abnormal nodes in “Individual system”? | Yes/no |
| Q8 | Does “User security score” represent the security status for the related user? | Yes/no |
| Q9 | Does “Suspicious activity reports” identify abnormal devices convincingly enough? | 1–5 |
Notes: Cronbach’s α coefficient is 0.837.
*: 1–5 indicates users choose a score from 1 to 5, where 1 is the worst and 5 the best.
**: For Yes/no questions, we use a score of 1 for yes and 0 for no when calculating the average score for related questions.
Fig 6Statistic results for user case study.