| Literature DB >> 33237926 |
Jing Xu1, Jie Wang2, Ye Tian3, Jiangpeng Yan1, Xiu Li1, Xin Gao4.
Abstract
Online shopping behavior has the characteristics of rich granularity dimension and data sparsity and presents a challenging task in e-commerce. Previous studies on user behavior prediction did not seriously discuss feature selection and ensemble design, which are important to improving the performance of machine learning algorithms. In this paper, we proposed an SE-stacking model based on information fusion and ensemble learning for user purchase behavior prediction. After successfully using the ensemble feature selection method to screen purchase-related factors, we used the stacking algorithm for user purchase behavior prediction. In our efforts to avoid the deviation of the prediction results, we optimized the model by selecting ten different types of models as base learners and modifying the relevant parameters specifically for them. Experiments conducted on a publicly available dataset show that the SE-stacking model can achieve a 98.40% F1 score, approximately 0.09% higher than the optimal base models. The SE-stacking model not only has a good application in the prediction of user purchase behavior but also has practical value when combined with the actual e-commerce scene. At the same time, this model has important significance in academic research and the development of this field.Entities:
Mesh:
Year: 2020 PMID: 33237926 PMCID: PMC7688168 DOI: 10.1371/journal.pone.0242629
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overall framework.
Symbol definition.
| Symbol | Definition |
|---|---|
| Training data set | |
| Base classifier set | |
| Feature subset | |
| W | Feature weight sequence |
| Prediction target | |
| Ensemble classifier | |
| Importance operators |
Fig 2Framework of SA-EFS ensemble feature selection.
Fig 3Principle of stacking algorithm.
SE-stacking algorithm.
| Algorithm SE-Stacking |
|---|
| 1: |
| 2: |
| 16: for classifiers |
| 17: Using cross-validation |
| 18: end for |
Construction features of the next time first order difference in operation 5.
| Structural Features | Meaning |
|---|---|
| action_user_onlytype_mean_5 | The mean of the next time first-order difference of user operation 5 |
| action_user_onlytype_median_5 | The median of the next time first-order difference of user operation 5 |
| action_user_onlytype_max_5 | The max of the next time first-order difference of user operation 5 |
| action_user_onlytype_min_5 | The min of the next time first-order difference of user operation 5 |
| action_user_onlytype_std_5 | The standard deviation of the next time first order difference of user operation 5 |
Construction features of the first order difference in operation 5.
| Structural Features | Meaning |
|---|---|
| gap_action_user_onlytype_mean_5 | The mean of the last time first-order difference of user operation 5 |
| gap_action_user_onlytype_median_5 | The median of the last time first-order difference of user operation 5 |
| gap_action_user_onlytype_max_5 | The max of the last time first-order difference of user operation 5 |
| gap_action_user_onlytype_min_5 | The min of the last time first-order difference of user operation 5 |
| gap_action_user_onlytype_std_5 | The standard deviation of the last time first-order difference of user operation 5 |
Characteristics of user operation construction.
| Structural Features | Meaning |
|---|---|
| cvr5 | User operation 5 behavior conversion rate (5–9 the same) |
| action_last_1 | Time from user’s latest operation 1 to present (2–9 the same) |
| actionType_max_5 | Last time of operation 5 (same as 5–9) |
| actionType_min_5 | The farthest time of operation 5 (the same as 5–9) |
| orderTime_userid_max | Last request time |
| orderTime_userid_min | Maximum request time |
Other characteristic structures.
| Structural Features | Meaning |
|---|---|
| history_count | Number of user history table occurrences |
| orderType_userid_sum | Total operation times |
| Japan_count | Visits to Japan (same for other countries) |
| rating_userid_min | Minimum score of evaluation |
| rating_userid_count | Number of evaluations |
| not_in_history_userid_sum | Whether it is a new user, not in the history table |
Result of feature selection.
| NO. | Feature | importance | Feature Meaning |
|---|---|---|---|
| 1 | action_last_1 | 1.293146216 | Time from user’s latest operation 1 to present |
| 2 | action_last_5 | 1.243441452 | The time from the user’s last operation 5 to now |
| 3 | province | 1.048388111 | Province |
| 4 | gap_action_user_onlytype_max_5 | 1.033069104 | Maximum value of the last time first-order difference of user operation 5 |
| 5 | action_user_onlytype_max_6 | 0.930298291 | Maximum value of the next time first-order difference in user operation 6 |
| 6 | action_user_onlytype_max_5 | 0.915895998 | Maximum value of the next time first-order difference of user operation 5 |
| 7 | action_user_onlytype_min_6 | 0.841260212 | Minimum value of the next time first-order difference of user operation 5 |
| 8 | cvr9 | 0.832753647 | User action 9 conversion rate |
| 9 | action_user_onlytype_median_5 | 0.829253058 | Median of the next time first-order difference of user operation 5 |
| 10 | gap_action_user_onlytype_min_6 | 0.797105161 | Minimum value of the last time first-order difference of user operation 6 |
| 11 | action_user_onlytype_std_5 | 0.768093864 | Variance of the next time first order difference of user operation 5 |
| 12 | cvr8 | 0.747993565 | User action 8 conversion rate |
| 13 | actionType_max_7 | 0.687867921 | Last time of operation 7 |
| 14 | action_user_onlytype_min_5 | 0.684920057 | Minimum value of the next time first-order difference of user operation 5 |
| 15 | Japan_count | 0.628384876 | Visits to Japan |
Fig 4Characteristic correlation heat map.
Fig 5n_ parameter adjustment of estimators.
Fig 6max_ depth and min_ samples_ split joint parameter adjustment.
Confusion matrix structure.
| Forecast Category | Positive (Purchase) | Negative (No Purchase) |
| Real Category | ||
| Positive (Purchase) | True Positive (TP) | False Negative (FN) |
| Negative (No Purchase) | False Positive (FP) | True Negative (TN) |
Model comparison results.
| NO. | Model | F1 score | Training time |
|---|---|---|---|
| 1 | Stacking | 0.9840 | 2.97300 |
| 2 | CatBoost | 0.9831 | 337.953 |
| 3 | XGBoost | 0.9820 | 930.681 |
| 4 | RandomForest | 0.9814 | 53.6011 |
| 5 | LightGBM | 0.9789 | 78.3099 |
| 6 | Logistic regression | 0.9663 | 21.1730 |
| 7 | LinearSVC | 0.9639 | 40.4603 |
| 8 | ExtraTrees | 0.9619 | 15.0000 |
| 9 | AdaBoost | 0.9527 | 22.7302 |
| 10 | K-NN | 0.9122 | 17.7015 |
| 11 | Gaussian Bayesian | 0.8969 | 0.32770 |
Fig 7Comparison of F1 scores of each model.