| Literature DB >> 27918593 |
Ikuesan Richard Adeyemi1,2, Shukor Abd Razak1, Mazleena Salleh1, Hein S Venter2.
Abstract
Comprehension of the statistical and structural mechanisms governing human dynamics in online interaction plays a pivotal role in online user identification, online profile development, and recommender systems. However, building a characteristic model of human dynamics on the Internet involves a complete analysis of the variations in human activity patterns, which is a complex process. This complexity is inherent in human dynamics and has not been extensively studied to reveal the structural composition of human behavior. A typical method of anatomizing such a complex system is viewing all independent interconnectivity that constitutes the complexity. An examination of the various dimensions of human communication pattern in online interactions is presented in this paper. The study employed reliable server-side web data from 31 known users to explore characteristics of human-driven communications. Various machine-learning techniques were explored. The results revealed that each individual exhibited a relatively consistent, unique behavioral signature and that the logistic regression model and model tree can be used to accurately distinguish online users. These results are applicable to one-to-one online user identification processes, insider misuse investigation processes, and online profiling in various areas.Entities:
Mesh:
Year: 2016 PMID: 27918593 PMCID: PMC5137900 DOI: 10.1371/journal.pone.0166930
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Correlation between Models of Inter-Event and Response Times.
| Event | Recorded exponent (α) | Individual Basis |
|---|---|---|
| -1 | Slight variation | |
| SMS | -1.52 to -1.7 | Correlation between average number of message per day and α |
| Web browsing | -2.1 to -3 | Variation |
| Movie watching | -2.08 | Slight variation |
Fig 1Session creation Algorithm.
Descriptive summary of the extracted features.
| Features Used | Label | Brief Description |
|---|---|---|
| Aggregated visitation pattern | ( | It is the ratio of the sum of the total URLs visited in a session to the sum of URL-count (URL under observation) in the session. URL-count refers to the sum of the number of times any URL is revisited within the duration of a session. It also shows the sequential/parallel characteristics of the individual. This feature reveals the degree of linearity in online browsing behavior. |
| Rate of visits per session | ( | It is the ratio of the sum of the total number of URLs visited in the session, to the duration of the session. This feature shows the visit behavior of an individual within a session. |
| Rate of visit-count per session | ( | It is the ratio of the sum of URL-count to the duration of the session. This feature shows the re-visitation behavior of an individual within the duration of a session. |
| Total number of requests per session | {f4} | It is the total number of requests made within the duration of a session. This feature shows the behavior of an individual with respect to the amount of request capacity. It also indicates the nature of the task being handled by the individual. |
| Session duration | {f5} | It is the absolute difference between the end time of the session and the start time of the session. This feature reflects the behavior of the user within the delimited session of 30-minutes. |
| Interval and Flight Mean | {f6 and f7} | The mean of a distribution reveals the standard shape parameter of individual request pattern over the observed duration. |
| Interval and Flight Standard Deviation | {f8 & f9} | The standard deviation of a distribution reveals the degree of spread-out of individual request pattern within the period of observation. This feature will reveal the inherent work pattern of each user. |
| Interval and Flight variance | {f10 & f11} | The variance of a distribution is similar to the standard deviation distribution. It measures the degree of proximity of individual request pattern over the period of observation. |
| Interval and Flight Skewness | {f12 & f13} | The skewness of Interval and Flight measures the degree of asymmetry of individual request pattern within the period of observation. |
| Interval and Flight Kurtosis | {f14 & f15} | These features show the behavior of individual request pattern based on its peak width and tail weight. They also measure the degree of an outlier in request pattern. |
Seasonal Duration for Consistency Exploration.
| Season | Duration |
|---|---|
| a. Pre-fasting period | April 24 –June 30, 2014 |
| b. Fasting period | July 1 –August 1, 2014 |
| c. Post-fasting period | August 2 –September 2, 2014 |
Summary of User Requests and Sessions.
| Users | Season | Number of Requests | Number of Sessions |
|---|---|---|---|
| 1 | a | 6,544 | 268 |
| b | 2,408 | 110 | |
| c | 9,944 | 297 | |
| 2 | a | 6,107 | 320 |
| b | 1,437 | 94 | |
| c | 915 | 71 | |
| 3 | a | 9,215 | 365 |
| b | 7,198 | 176 | |
| c | 17,662 | 364 | |
| 4 | a | 3,101 | 186 |
| b | 1,382 | 77 | |
| c | 1,392 | 109 | |
| 5 | a | 3,739 | 200 |
| b | 2,086 | 143 | |
| c | 2,729 | 181 | |
| 6 | a | 7,584 | 223 |
| b | 3,109 | 102 | |
| c | 4,728 | 140 | |
| 7 | a | 7,035 | 288 |
| b | 3,194 | 122 | |
| c | 4,335 | 218 | |
| 8 | a | 12,456 | 417 |
| b | 4,568 | 159 | |
| c | 10,569 | 294 | |
| 9 | a | 2,728 | 155 |
| b | 1,162 | 75 | |
| c | 2,710 | 204 | |
| 10 | a | 5,717 | 253 |
| b | 2,726 | 128 | |
| c | 4,723 | 197 | |
| 11 | a | 2,751 | 156 |
| b | 2,183 | 71 | |
| c | 10,569 | 294 |
Revealed Consistency in Individual Request Patterns.
| User | Number of observed rules per season | Total number of common rules in all seasons | ||
|---|---|---|---|---|
| a | b | c | ||
| 288 | 117 | 472 | 45 | |
| 267 | 68 | 41 | 16 | |
| 398 | 352 | 841 | 109 | |
| 192 | 90 | 70 | 15 | |
| 213 | 128 | 141 | 26 | |
| 390 | 170 | 255 | 55 | |
| 321 | 153 | 203 | 45 | |
| 615 | 246 | 518 | 103 | |
| 124 | 55 | 111 | 9 | |
| 304 | 144 | 239 | 41 | |
| 124 | 116 | 518 | 34 | |
Fig 2Cumulative Probability Plot of Session Characteristics for Four Users.
Model of Session Duration for Visitation Pattern Aggregation per Session.
| User | Model fit parameter | |||||
|---|---|---|---|---|---|---|
| Power law | Polynomial law | Exponential law | ||||
| Model | R2 | Model | R2 | Model | R2 | |
| 1962 | 1 | 4 | 0.461 | 193.81 | 0.798 | |
| 9057 | 1 | 3 | 0.381 | 109.19 | 0.762 | |
| 19981 | 1 | 3 | 0.296 | 0.119 | ||
| 6294 | 1 | 1 | 0.297 | 0.095 | ||
| 9121 | 1 | 2 | 0.366 | 114.85 | 0.820 | |
| 1539 | 1 | 4 | 0.503 | 200.8 | 0.805 | |
| 15275 | 1 | 2 | 0.266 | 0.101 | ||
| 28568 | 1 | 4 | 0.234 | 0.106 | ||
| 7187 | 1 | 2 | 0.183 | 0.084 | ||
| 13792 | 1 | 2 | 0.280 | 0.093 | ||
| 8718 | 1 | 3 | 0.184 | 0.088 | ||
Model of Number of Requests per Session for Aggregated Visitation Patterns.
| User | Model fit parameter | |||||
|---|---|---|---|---|---|---|
| Power law | Polynomial law | Exponential law | ||||
| Model | R2 | Model | R2 | Model | R2 | |
| 5 | 1 | 4 | 1 | 5 | 0.740 | |
| 1 | 2 | 1 | 7 | 0.747 | ||
| 5 | 1 | 1 | 1 | 4 | 0.763 | |
| 2 | 1 | 1 | 1 | 1.1 | 0.603 | |
| 1 | −4 | 1 | 6 | 0.8034 | ||
| 6 | 1 | −6 | 1 | 5 | 0.7861 | |
| 7 | 1 | 4 | 1 | 4 | 0.7924 | |
| 4 | 1 | 1 | 1 | 3 | 0.7446 | |
| 1 | −5 | 1 | 7 | 0.7696 | ||
| 7 | 1 | 3 | 1 | 4 | 0.8501 | |
| 1 | 2 | 1 | 8 | 0.7148 | ||
Cluster Evaluation on a Class of 11 Users.
| Number of classes | Percentage of accuracy for cluster to class evaluation | |||||||
|---|---|---|---|---|---|---|---|---|
| EM | Cobweb | Hierarchical | Canopy | SOM | Density-base | k-Means | LVQ | |
| 11 | 13.70% | 14.05% | 14.31% | 14.39% | 14.82% | 13.67% | 13.70% | 15.07% |
Fig 33-D Plot of Experimental Data.
Fig 4The process of Classification.
Performance Evaluation of Explored Classifiers.
| Supervised ML Taxonomy | Metrics / Scheme | |||||||
|---|---|---|---|---|---|---|---|---|
| Tree/Logic-based | 14.42(0.07) | 0.00(0.00) | 0.29(0.00) | 0.00(0.00) | 0.00(0.00) | 0.00(0.00) | 0.50(0.00) | |
| 81.24(5.90) | 0.79(0.07) | 0.16(0.02) | 0.79(0.08) | 0.64(0.11) | 0.70(0.08) | 0.93(0.03) | ||
| 19.34(0.68) | 0.08(0.01) | 0.28(0.00) | 0.00(0.00) | 0.00(0.00) | 0.00(0.00) | 0.54(0.04) | ||
| 14.41(0.09) | 0.00(0.00) | 0.29(0.00) | 0.00(0.02) | 0.00(0.02) | 0.00(0.02) | 0.50(0.02) | ||
| 86.00(1.39) | 0.84(0.02) | 0.14(0.01) | 0.85(0.05) | 0.79(0.06) | 0.82(0.04) | 0.95(0.02) | ||
| 91.48(2.22) | 0.91(0.02) | 0.11(0.01) | 0.95(0.05) | 0.95(0.07) | 0.95(0.06) | 1.00(0.00) | ||
| 84.38(4.88) | 0.83(0.05) | 0.17(0.01) | 0.72(0.09) | 0.74(0.09) | 0.73(0.08) | 0.92(0.03) | ||
| 73.24(2.05) | 0.70(0.02) | 0.19(0.00) | 0.54(0.06) | 0.57(0.08) | 0.55(0.06) | 0.92(0.02) | ||
| 63.77(4.52) | 0.60(0.05) | 0.26(0.02) | 0.46(0.09) | 0.47(0.09) | 0.46(0.08) | 0.71(0.05) | ||
| 87.33(1.40) | 0.86(0.02) | 0.13(0.01) | 0.83(0.07) | 0.78(0.07) | 0.80(0.05) | 0.98(0.01) | ||
| Statistics-based | 59.04(3.66) | 0.55(0.04) | 0.23(0.01) | 0.37(0.06) | 0.49(0.09) | 0.42(0.06) | 0.83(0.04) | |
| 62.96(3.84) | 0.59(0.04) | 0.22(0.01) | 0.44(0.07) | 0.58(0.08) | 0.50(0.06) | 0.86(0.03) | ||
| 79.86(1.25) | 0.78(0.01) | 0.17(0.00) | 0.70(0.09) | 0.44(0.07) | 0.53(0.07) | 0.96(0.01) | ||
| 99.34(0.97) | 0.99(0.01) | 0.04(0.01) | 1.00(0.00) | 1.00(0.00) | 1.00(0.00) | 1.00(0.00) | ||
| 18.06(0.99) | 0.06(0.01) | 0.28(0.00) | 0.31(0.26) | 0.03(0.02) | 0.05(0.04) | 0.63(0.04) | ||
| 16.33(0.52) | 0.02(0.01) | 0.39(0.00) | 0.57(0.50) | 0.02(0.02) | 0.03(0.03) | 0.51(0.01) | ||
| Perceptron-based | 63.72(5.30) | 0.60(0.06) | 0.21(0.01) | 0.60(0.20) | 0.42(0.18) | 0.44(0.11) | 0.92(0.04) | |
| Logic-based | 73.48(3.20) | 0.71(0.04) | 0.20(0.01) | 0.55(0.14) | 0.66(0.12) | 0.58(0.07) | 0.94(0.02) | |
| 84.07(2.90) | 0.82(0.03) | 0.16(0.01) | 0.78(0.08) | 0.75(0.07) | 0.76(0.05) | 0.98(0.01) | ||
| 76.46(2.60) | 0.74(0.03) | 0.18(0.01) | 0.90(0.09) | 0.38(0.12) | 0.52(0.12) | 0.90(0.02) | ||
| 86.64(1.49) | 0.85(0.02) | 0.14(0.01) | 0.84(0.05) | 0.80(0.07) | 0.82(0.05) | 0.95(0.03) | ||
| Instance-based | 29.98(1.89) | 0.22(0.02) | 0.27(0.00) | 0.25(0.05) | 0.28(0.06) | 0.26(0.06) | 0.74(0.04) |
Number of folds: 10, Type of experiment: cross validation, Number of repetitions: 10, Statistical test: paired T-test (corrected), Confidence: 0.01 (two tailed), SD: standard deviation
Fig 5Comparison of Accuracy.
Weighted Classification of Eleven Users.
| Classifier | Evaluation metrics (weighted average) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PART | ||||||||||||||
| DTNB | 67.39 | 65.77 | 0.21 | 0.21 | 0.70 | 0.68 | 0.67 | 0.66 | 0.94 | 0.93 | 0.67 | 0.65 | 0.64 | 0.62 |
| LMT | ||||||||||||||
| J48 | ||||||||||||||
| REPTree | 89.17 | 84.37 | 0.12 | 0.14 | 0.89 | 0.85 | 0.89 | 0.84 | 1.00 | 0.98 | 0.89 | 0.85 | 0.88 | 0.83 |
Testing instances: 1,887, Training instances: 4403
Training and Testing Analysis for Double and Triple Sample Sizes.
| Scheme | Double Sample Size | Triple Sample Size | ||
|---|---|---|---|---|
| Training (%) | Testing (%) | Training (%) | Testing (%) | |
| DTNB | 65.41 | 73.82 | 54.68 | 52.21 |
| PART | 77.60 | 77.79 | 75.87 | 75.77 |
| LMT | 83.61 | 85.38 | 75.08 | 76.29 |
| J48 | 80.69 | 82.44 | 83.32 | 83.81 |
| REPTree | 77.19 | 80.63 | 80.05 | 82.62 |
D Testing instances: 2,819, D Training instances: 6,577, T Testing instances: 3,285, T Training instances: 7,664
Accuracy of Logistic Regression Model for Different Sample Sizes.
| User ID | Initial sample size (%) | ||
|---|---|---|---|
| User-1 | 100 | 100 | 99.52 |
| User-10 | 100 | 100 | 100 |
| User-11 | |||
| User-2 | |||
| User-3 | 100 | 100 | 100 |
| User-4 | 99.62 | 100 | 99.17 |
| User-5 | |||
| User-6 | 100 | 100 | 100 |
| User-7 | 100 | 99.08 | 99.02 |
| User-8 | 100 | 100 | 99.61 |
| User-9 | 100 | 100 | 100 |
| User-19 | 100 | 100 | |
| User-21 | 97.06 | 96.49 | |
| User-15 | 100 | 100 | |
| User-18 | 100 | 96.59 | |
| User-14 | 100 | 97.22 | |
| User-13 | 97.86 | 100 | |
| User-17 | 94.59 | 100 | |
| User-12 | 100 | 100 | |
| User-27 | 38.18 | ||
| User-28 | 36.58 | ||
| User-26 | 29.79 | ||
| User-24 | 33.33 | ||
| User-29 | 31.11 | ||
| User-23 | 48.48 | ||
| User-30 | 25 | ||
| User-22 | 56.1 | ||
| User-31 | 22.64 | ||
| User-25 | 32 |
Result of Logistic Model with Ridge Estimate for Each User.
| Class | ||||||||
|---|---|---|---|---|---|---|---|---|
| Performance Evaluation Metrics | ||||||||
| Accuracy (%) | Precision | F-measure | AUC | Accuracy | Precision | F-measure | AUC | |
| User1 | 100 | 1.00 | 1.00 | 1.00 | 100 | 1.00 | 1.00 | 1.00 |
| user10 | 100 | 1.00 | 1.00 | 1.00 | 100 | 1.00 | 1.00 | 1.00 |
| user11 | 100 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | ||
| user2 | 1.00 | 0.959 | 1.00 | |||||
| user3 | 100 | 1.00 | 1.00 | 1.00 | 100 | 1.00 | 1.00 | 1.00 |
| user4 | 100 | 1.00 | 1.00 | 1.00 | 99.12 | 0.991 | 0.99 | 1.00 |
| user5 | 1.00 | 98.2 | 0.982 | 0.98 | 1.00 | |||
| user6 | 100 | 1.00 | 1.00 | 1.00 | 100 | 1.00 | 1.00 | 1.00 |
| user7 | 100 | 1.00 | 1.00 | 1.00 | 100 | 1.00 | 1.00 | 1.00 |
| user8 | 100 | 1.00 | 1.00 | 1.00 | 100 | 1.00 | 1.00 | 1.00 |
| user9 | 100 | 1.00 | 1.00 | 1.00 | 0.99 | 0.98 | 1.00 | |
Fig 6Metric of Evaluation of Testing Dataset.