| Literature DB >> 33270714 |
Cheng-Ju Liu1, Tien-Shou Huang1, Ping-Tsan Ho2, Jui-Chan Huang3, Ching-Tang Hsieh4.
Abstract
In recent years, China's e-commerce industry has developed at a high speed, and the scale of various industries has continued to expand. Service-oriented enterprises such as e-commerce transactions and information technology came into being. This paper analyzes the shortcomings and challenges of traditional online shopping behavior prediction methods, and proposes an online shopping behavior analysis and prediction system. The paper chooses linear model logistic regression and decision tree based XGBoost model. After optimizing the model, it is found that the nonlinear model can make better use of these features and get better prediction results. In this paper, we first combine the single model, and then use the model fusion algorithm to fuse the prediction results of the single model. The purpose is to avoid the accuracy of the linear model easy to fit and the decision tree model over-fitting. The results show that the model constructed by the article has further improvement than the single model. Finally, through two sets of contrast experiments, it is proved that the algorithm selected in this paper can effectively filter the features, which simplifies the complexity of the model to a certain extent and improves the classification accuracy of machine learning. The XGBoost hybrid model based on p/n samples is simpler than a single model. Machine learning models are not easily over-fitting and therefore more robust.Entities:
Year: 2020 PMID: 33270714 PMCID: PMC7714352 DOI: 10.1371/journal.pone.0243105
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Experimental environment.
| Processor | Intel(R) Xeon (R) E5640 2.66GHz*2 |
|---|---|
| RAM | 4GB*6 |
| Operating System | Red Hat Enterprise Linux 6.1 |
Fig 1Logistic regression algorithm positive and negative sample ratio on the test set AUC results graph.
Logistic regression algorithm positive and negative sample proportions on the test set accuracy results table.
| Positive and Negative Sample Ratio | 1:3 | 1:5 | 1:10 | 1:15 |
|---|---|---|---|---|
| 0.9355 | 0.9384 | 0.9405 | 0.9421 |
Manual weighting results.
| XGBoost Weight | Logical Regression Weight | Fusion Model AUC Value |
|---|---|---|
| 0.962 | 0 | 0.70231 |
| 0.921 | 0.03 | 0.69932 |
| 0.905 | 0 | 0.70015 |
| 0.866 | 0 | 0.71136 |
Fig 2Single model and fusion model results.
Fusion model for linear model construction AUC results table.
| Combined Single Model | Fusion Model AUC Results |
|---|---|
| XGBoost, Logistic Regression | 0.8102 |
| XGBoost | 0.7568 |
| Logistic Regression | 0.7721 |
Fig 3Performance comparison of sales forecasting models in stable volatility mode.
Fig 4Comparison of prediction performance at different aggregation levels.
(a) High frequency customer set. (b) Low frequency customer set.
Fig 5Performance comparison of different models before and after feature selection.
Fig 6Comparison of different models of F1 oscillation curve.