Literature DB >> 33237926

SE-stacking: Improving user purchase behavior prediction by information fusion and ensemble learning.

Jing Xu¹, Jie Wang², Ye Tian³, Jiangpeng Yan¹, Xiu Li¹, Xin Gao⁴.

Abstract

Online shopping behavior has the characteristics of rich granularity dimension and data sparsity and presents a challenging task in e-commerce. Previous studies on user behavior prediction did not seriously discuss feature selection and ensemble design, which are important to improving the performance of machine learning algorithms. In this paper, we proposed an SE-stacking model based on information fusion and ensemble learning for user purchase behavior prediction. After successfully using the ensemble feature selection method to screen purchase-related factors, we used the stacking algorithm for user purchase behavior prediction. In our efforts to avoid the deviation of the prediction results, we optimized the model by selecting ten different types of models as base learners and modifying the relevant parameters specifically for them. Experiments conducted on a publicly available dataset show that the SE-stacking model can achieve a 98.40% F1 score, approximately 0.09% higher than the optimal base models. The SE-stacking model not only has a good application in the prediction of user purchase behavior but also has practical value when combined with the actual e-commerce scene. At the same time, this model has important significance in academic research and the development of this field.

Entities: Chemical Gene Species

Mesh：

Year: 2020 PMID： 33237926 PMCID： PMC7688168 DOI： 10.1371/journal.pone.0242629

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

1. Introduction

With the rapid development and popularization of internet technology in recent decades, an increasing number of people have begun to rely on the internet and intelligent devices for daily shopping. It is reported that in 2018, the scale of e-commerce transactions in 28 major countries and regions in the world reached USD 24,716.726 billion, and the total online retail transaction volume was USD 297.46 billion [1]. Specifically, e-commerce transaction volume in the United States reached USD 9.776 billion, representing a growth rate of 10.1%; that of China reached USD 4731.1 billion, a growth rate of 11.6%; and that of Japan reached USD 3.240 billion, a growth rate of 8.9%. A network survey [2] shows that when shopping, more than 70% of users first consider the quality of the goods and the service quality of the store. If an enterprise wants to improve the overall service level of the platform, the first and most important task is to fully understand user preferences and clarify user behavior. Therefore, the most concerning problem for enterprises is how to use technical means to realize effective data analysis of user behavior. Currently, two main research directions exist for prediction of user purchasing behavior in e-commerce platforms. One direction is prediction of purchasing behavior based on a recommendation system. This research analyzes and speculates by mining data on the results from users and their purchase behaviors. Interesting commodities predict that the user might purchase in the future and recommend this type of commodity when the user logs in. The other direction is methods based on machine learning, which are based on a large sample of user data on e-commerce platforms and use machine learning to train user purchase prediction models. In the research on purchasing behavior prediction based on a recommendation system, most of the prediction is based on the relationship between the users and products. Even if the user behavior is discussed, it is only a type of operation action, and the overall operation behavior is not discussed. Moreover, this approach can only infer the products that the user may buy, which does not ensure that the users will buy in the future. In the study of purchasing behavior prediction based on machine learning, the traditional machine learning model was used by early researchers, and the integrated model was used by recent researchers, but the performance of the basic model is weak, only few types exist, and the training speed is not good. However, feature engineering is an important component of data mining, and good feature results can often obtain twice the result with half the effort [3]. With the continuous development of information technology, the storage and computing level of big data has greatly improved, and the consumption information of users has been recorded. Through scientific analysis, businesses can discern the purchasing tendency and consumption intention of users. At the same time, many competition platforms are cooperating with governments and enterprises to hold big data competitions that provide desensitization data for the majority of scholars and are committed to solving complex big-data problems through the strength of outstanding data scientists. In this paper, we propose a predictive model based on information fusion and ensemble learning to realize effective data analysis of user purchase behavior and verify it on real data sets. Specifically, to improve the effective feature dimension and to consider the overall operation behavior of users, we constructed 82 features related to the prediction target based on the original data. Using ensemble feature selection based on sort aggregation (SA-EFS), we previously proposed extraction of the most helpful features for predicting the purchase behavior of 15 features to improve the accuracy of prediction. Finally, we established a prediction model under the stacking integration framework to integrate the advantages of 10 different types of models for improved prediction effect. The result shows that our SE-stacking algorithm is effective. The remainder of the paper is organized as follows: Section 2 introduces the problems of the traditional recommendation algorithms and the development of buying behavior prediction based on machine learning, Section 3 introduces the proposed model, Section 4 validates the effect of the model on real data sets and analyzes the results, and Section 5 summarizes the full text and looks forward to the future.

2. Related work

In prediction of purchasing behavior based on a recommendation system, the common basic algorithms are content-based recommendation, collaborative filtering, and hybrid recommendation algorithms. Collaborative filtering recommends items that users might be interested in through similar nearest neighbor rating data [4], but it has the disadvantages of a sparse score, inaccurate prediction of new users and new products, and poor scalability of the algorithm [5, 6]. At the same time, relevant personnel found that relying on buyer evaluation of the project can only obtain the prediction result and cannot accurately determine the buyer purchase tendency [7]. Because the purchasing service method is used to explore the buyer characteristics to analyze and compare with the characteristics of the goods, it introduces the products with the highest degree of similarity to the buyers, but a cold start occurs for new buyers. Therefore, it is difficult to distinguish when two different product feature words are the same, only products similar to the products purchased by the buyer can be introduced, and even the recommendation diversity is insufficient [8]. The hybrid recommendation algorithm does not easily define the weight of each recommendation algorithm and the recommendation results. At the same time, the problem of a complex recommendation framework appears [9, 10]. In recent years, the advent of big data era has made it possible to store massive amounts of data. Analysts constantly study the purchase behavior of buyers on selected shopping websites (browsing, clicking, collecting, adding to a shopping cart, paying, and evaluating) to make inferences, analyze the online records of buyers, and predict their purchase behavior. Most of the traditional machine learning algorithms are based on a single tree model. Wang Ying Shuang et al. [11] established a prediction model of user purchase behavior based on user information and user purchase behavior data by combining decision trees and association rules. However, the decision tree produced is complex, large in height and small in width, which makes it difficult to interpret. Du Gang et al. [12] introduced the concept of an attribute core and established an improved decision tree model based on the Teradata platform to predict the purchase behavior of users, an approach that solved the defects of the decision tree model constructed by the original ID3 algorithm. Zhang Pengyi et al. [13] established a mapping between log request parameters and user information behavior types and obtained user behavior analysis. After further analysis of the user behavior characteristics, the researchers used logistic binary regression and the C&R decision tree to establish a product payment purchase speculation model and concluded that the prediction accuracy of the C&R decision tree was slightly higher than that of logistic binary regression, but the prediction accuracy rate was only 84.27%. With the development of ensemble learning, researchers have attempted to use ensemble learning to predict purchase behavior. Mart í Nez et al. [14] used the gradient tree enhancement algorithm to predict whether users have purchasing behavior shortly by using the information of more than 10000 customers and the data of 200000 purchases. Yang Lihong et al. [15] used the unique characteristics of buyers and the characteristics of commodities, as well as the interaction between buyers and commodities, to elaborate on the construction method of quadratic combination statistical characteristics based on the original feature group and also used the XGBoost model to complete the prediction. Ge et al. [16] established all-buyers purchase models by constructing user purchase feature engineering and used a deep forest-based user purchase behavior prediction model to achieve an efficient purchasing behavior prediction training effect. Based on the ensemble learning method, HuX et al. [17] also proposed an online purchasing behavior prediction model based on deep forest. However, the above four methods do not integrate different types of models, and the base models are all decision trees. Zhu Xin et al. [18] constructed a purchase prediction model based on the shopping behavior data from the Alibaba e-commerce platform. That model used support vector machine and logistic regression as well as a fusion method of the two. KongH et al. [19] proposed a fusion model based on Logistic and GBDT to predict the risk of users buying goods. ZhouA et al. [20] proposed a multimodel stacked ensemble (MMSE) algorithm to solve the problem of personalized product recommendation. In the stacking framework, RandomForest, Adaboost, GBDT and XGBoost were selected as base classifiers, and the XGBoost algorithm was selected as the combiner classifier. Although the above three methods integrate different types of models, the base learners are weak and the number is small and therefore cannot satisfactorily integrate the advantages of different models. Therefore, based on information fusion and ensemble learning, this paper proposes a prediction model for user purchase behavior. Because the stacking ensemble method can integrate different types of models, this paper selects the ensemble scheme under the stacking framework after feature engineering of user personal information and a series of operational behavior data. Different types of models, such as probability models, linear models, and ensemble models, are selected as the base learners, and their types vary. Most of these models are based on a tree structure, and the parameters are much fewer than in deep learning, which eases the parameter adjustment, increases the training speed of the model and improves the accuracy.

3. Methods

In this paper, we establish a prediction model for user purchase behavior through analysis and preprocessing of existing raw data and construct the characteristics related to user purchase behavior. According to the optimal features obtained by SA-EFS ensemble feature selection, a prediction model is established under the stacking integration framework. First, the optimized base learner is trained by 5-fold cross-validation on the training set, and a new prediction data set is established based on the predicted values. Finally, the fusion model is obtained by training with meta-learners. To compare the prediction effects of stacking and bagging and boosting, the representative algorithms of bagging, namely, RandomForest and ExtraTrees, and the representative algorithms of boosting, namely, Catboost, XGBoost, AdaBoost, and LightGBM, are selected as the components of the base learner. The other four base learners selected the K-nearest neighbor algorithm, logistic regression algorithm, linear support vector machine algorithm, and Gauss Bayes algorithm. The above description is the SE-stacking model of information fusion and ensemble learning, as shown in Fig 1 below:

Fig 1

Overall framework.

The research can be transformed into a binary classification problem in machine learning by judging whether the user purchases goods or not. The classification targets are 0 and 1, where the number 1 means user purchases, and 0 means no purchases. We input the original data set into the SE-stacking model, train the model to obtain the trained ensemble classifier, and use this classifier to predict the classification result. The symbol definition is shown in Table 1.

Table 1

Symbol definition.

Symbol	Definition
D	Training data set
CS	Base classifier set
FS	Feature subset
W	Feature weight sequence
y	Prediction target
H	Ensemble classifier
P	Importance operators

3.1 Ensemble feature selection

The ensemble feature selection based on ranking aggregation is referred to as SA-EFS. First, different feature selection methods are used to obtain candidate sets of multiple optimal feature subsets. Second, according to the rule of arithmetic mean aggregation, the learning results of multiple optimal feature subset candidate sets are aggregated, and feature selection is based on the information fusion method [21]. The SA-EFS method is described as follows: Given algorithm set , data set D; Defining feature sets Defining importance operators P, for , calculate the importance characteristics, which ∃ P(F) ∈ ℝ+; As , on decreasing order of j, get new ordered sequences For , normalization j, then For ∀i ∈ [1, m], ∀j ∈ [1, n], get , normalization N, so For ∀i ∈ [1, m], ∀j ∈ [1, n], note , arithmetic aggregate column is For , set t as the threshold hyperparameter, then ∀i ∈ [1, n], note . In this paper, the best performance of the maximum information coefficient, LightGBM, XGBoost algorithm to participate in feature selection, the overall framework is shown in Fig 2. First, user behavior features are input, and feature selection is performed by three algorithms to obtain their respective feature sequences and feature weight sequences. Finally, the SA-EFS ensemble method is used to aggregate the multiple feature selection results and obtain the optimal feature.

Fig 2

Framework of SA-EFS ensemble feature selection.

3.2 Principle of stacking

Stacking is an ensemble learning scheme. Wolpert [22] initiated the learning framework of stacked generalization for the first time in 1992. The basic level model depends on the perfect training set, and the meta-model relies on the output of the basic level model to carry out the research. The principle of the stacking algorithm is shown in Fig 3.

Fig 3

Principle of stacking algorithm.

According to the output results obtained under the base learning algorithm as the input information of the meta-learning algorithm [23], meta-learning algorithm can make full use of the low-level learning ability in the high-level induction process and replace the classification bias in the base learning algorithm in a timely manner. We rely on a meta-learning algorithm to determine how to combine the output of the base learning algorithm more effectively. Stacking ensures the complexity of base learners through the differences of various learning algorithms. At the same time, meta-learners are used to summarize the prediction results of different base learners. Compared with bagging and boosting, all base learners generally require the same model. Stacking usually predicts more accurately, and the risk of overfitting is low [24]. Therefore, this paper chooses to build a model based on the stacking ensemble learning method.

3.3 SE-stacking algorithm

If there are m training sample data in the training data set D, each sample data contains n features, respectively X = {x1, x2, …, x, y}, and the n is the prediction target y. In this article, the feature sets are F and F = {LightGBM, MIC, XGBoost}, and 10 models are set up to form the prediction model set CS (classifiers set), CS = {ExtraTrees, AdaBoost algorithm, logistics regression, Catboost, LightGBM, K − NN algorithm, XGBoost, LinearSVC, GaussionNB, RandomForest} The pseudocode of the SE-stacking algorithm proposed in this paper is shown in Table 2:

Table 2

SE-stacking algorithm.

Algorithm SE-Stacking
1: Input: Training data set D = {xi,yn}i=1n-1, k, Base classifier set CS, Feature set F = {FS₁, FS₂‥FS_t}
2: Output: Ensemble classifier HStep 1: Filter feature selection3: for Algorithm A_i in F4: Using dataset D do Feature selection by AlgorithmA_i5: Sorted Features by the result of A_i6: return the sorted Feature subsetFS_iStep 2: Get the weight sequence of feature selection method FS_i7: for FS_i in {FS₁, FS₂..FS_t}8: for f_j in FS_i wji=(n-j)/n9: return the weight sequence {W₁, W₂…W_t} of {FS₁, FS₂..FS_t}Step 3: Arithmetic mean aggregation10: for f_i in original feature set FS11: wi=(wi1+wi2+⋯+wit)/t12: return the Aggregation feature weight sequence WStep 4: Select the first k optimal features13: Sorted FS according W14: Get the first k features from FS-> FS_best15: return the FS_bestStep 5: learn base-level classifiers
16: for classifiers C_i in CS do
17: Using cross-validation Learn new Training data set a_i and Test data set b_i based on D
18: end forStep 6: Build a prediction dataset D_h19: cs_len = len(CS)20: for i = 1 to cs_len do21: Construct new data set D_h by union Trainingseth={a1,…,ai,yn}i=1cs_len, and Testseth={b1,…,bi,yn}i=1cs_len22: end forStep 7: Learning meta-learner23: Learn H based on D_h24: return H

4. Experiments

4.1 Data sources and preprocessing

The experimental data in this paper are derived from the forecasting data set of the HI GUIDES tourism service provided by the DataCastle competition platform. The original data set contains the personal information of 50383 users of the HI GUIDES platform from September 2016 to September 2017, as well as all browsing records, corresponding order records, and comments on historical orders. There are five tables in total: user profile, action, orderHistory, order future, and user comments. The purpose of data preprocessing is to clean the missing data, duplicate data, and irrelevant data in the original data. Additionally, the missing value can be used as a feature of users, and thus the missing value is filled in as "other", mainly for sex and age. The 15 variable names in the original database are coded with labels, the codes are changed into continuous numerical variables according to Label Encoder, and the discontinuous texts are encoded.

4.2 Feature structure

The fields in the original data can be input into the algorithm as the basic features. However, according to the literature and practical experience, many features still do not exist in the original data and are related to the user purchase behavior, such as the average, median, maximum, minimum, variance, and the number of user historical occurrences for each operation. Therefore, based on the original data, this study constructs 82 features related to the prediction target. In this paper, five tables are associated with user ID. Because the time data are stored in the form of a timestamp, the timestamp is transformed into the format of year, month, day, hour, minute, and second, and the characteristics based on the time dimension are constructed accordingly. Because operations 5–9 are sequential, from filling in the form to submitting the order to the final payment, the first-order difference between all time and the next time can be calculated to construct the statistical dimension characteristics of the five operations with time as the statistical dimension. First, the users are sorted according to the operation type and time, and the first-order difference is discerned in the time dimension. Finally, the statistical characteristics of these times of each operation are calculated, including the average, median, maximum, minimum, and variance. The average shows the average interval of the user operation time, the median shows the median value of the operation interval, the maximum and minimum values are the maximum and minimum time of the operation interval, and the variance shows the amplitude of the operation. By constructing these features, the purchase intention of users is depicted. For example, operation 5 is constructed according to the five features in Table 3, and operations 6–9 are the same.

Table 3

Construction features of the next time first order difference in operation 5.

Structural Features	Meaning
action_user_onlytype_mean_5	The mean of the next time first-order difference of user operation 5
action_user_onlytype_median_5	The median of the next time first-order difference of user operation 5
action_user_onlytype_max_5	The max of the next time first-order difference of user operation 5
action_user_onlytype_min_5	The min of the next time first-order difference of user operation 5
action_user_onlytype_std_5	The standard deviation of the next time first order difference of user operation 5

Next, we calculate the first-order difference of all the time for the previous time, calculate the statistical characteristics of these five operations, and construct the five groups of features in Table 4 as follows. Operations 6–9 are the same.

Table 4

Construction features of the first order difference in operation 5.

Structural Features	Meaning
gap_action_user_onlytype_mean_5	The mean of the last time first-order difference of user operation 5
gap_action_user_onlytype_median_5	The median of the last time first-order difference of user operation 5
gap_action_user_onlytype_max_5	The max of the last time first-order difference of user operation 5
gap_action_user_onlytype_min_5	The min of the last time first-order difference of user operation 5
gap_action_user_onlytype_std_5	The standard deviation of the last time first-order difference of user operation 5

According to experience, the conversion rate of the general user’s operation behavior can be predicted more accurately. The time information of the user operation can show whether the person has purchase intention shortly, and different operations reflect different purposes. Only from filling in the form to final payment can the purchase be completed. Therefore, this paper constructs six groups of characteristics: the conversion rate of the user operation behavior, the time from the last operation of the user to the present, the farthest and the latest time from filling in the form to the final payment, the latest request time and the farthest request time, as shown in Table 5.

Table 5

Characteristics of user operation construction.

Structural Features	Meaning
cvr5	User operation 5 behavior conversion rate (5–9 the same)
action_last_1	Time from user’s latest operation 1 to present (2–9 the same)
actionType_max_5	Last time of operation 5 (same as 5–9)
actionType_min_5	The farthest time of operation 5 (the same as 5–9)
orderTime_userid_max	Last request time
orderTime_userid_min	Maximum request time

This paper also constructs selected other features to mine the purchase intention of users. For example, the minimum score and times of user evaluation can be used to obtain the satisfaction degree of the product. The number of browsing places can be used to determine whether the user has considered choosing a boutique tour product, whether the user is a new user, and the number of historical occurrences. It can be known whether the user has experienced, understood, and repurchased the product, as well as the total operation behavior. The structural characteristics are shown in Table 6.

Table 6

Other characteristic structures.

Structural Features	Meaning
history_count	Number of user history table occurrences
orderType_userid_sum	Total operation times
Japan_count	Visits to Japan (same for other countries)
rating_userid_min	Minimum score of evaluation
rating_userid_count	Number of evaluations
not_in_history_userid_sum	Whether it is a new user, not in the history table

4.3 Feature selection

4.3.1 Ensemble feature selection

After the features are constructed, the SA-EFS method is used to obtain the ranking results of the importance of 97 features, and the top 15 features are obtained to form the optimal feature subset. The results of the feature selection are shown in Table 7.

Table 7

Result of feature selection.

NO.	Feature	importance	Feature Meaning
1	action_last_1	1.293146216	Time from user’s latest operation 1 to present
2	action_last_5	1.243441452	The time from the user’s last operation 5 to now
3	province	1.048388111	Province
4	gap_action_user_onlytype_max_5	1.033069104	Maximum value of the last time first-order difference of user operation 5
5	action_user_onlytype_max_6	0.930298291	Maximum value of the next time first-order difference in user operation 6
6	action_user_onlytype_max_5	0.915895998	Maximum value of the next time first-order difference of user operation 5
7	action_user_onlytype_min_6	0.841260212	Minimum value of the next time first-order difference of user operation 5
8	cvr9	0.832753647	User action 9 conversion rate
9	action_user_onlytype_median_5	0.829253058	Median of the next time first-order difference of user operation 5
10	gap_action_user_onlytype_min_6	0.797105161	Minimum value of the last time first-order difference of user operation 6
11	action_user_onlytype_std_5	0.768093864	Variance of the next time first order difference of user operation 5
12	cvr8	0.747993565	User action 8 conversion rate
13	actionType_max_7	0.687867921	Last time of operation 7
14	action_user_onlytype_min_5	0.684920057	Minimum value of the next time first-order difference of user operation 5
15	Japan_count	0.628384876	Visits to Japan

4.3.2 Feature correlation test

In this paper, the Pearson correlation coefficient is selected to calculate the correlation between features and construct a correlation matrix to test the degree of correlation between selected features. The Pearson correlation coefficient (Cc) is a commonly used measure of feature correlation. Given a pair of variables (X, Y), the Pearson correlation coefficient is defined as r(X, Y): where x is the mean value of the variable X, y is the mean value of the variable Y, and r ∈ [−1,1]. If X and Y are independent of each other, r = 0. Assuming that m is the sample size in the sample data set D and each sample data set contains n features (nth is the prediction target), the Pearson correlation coefficient between every two features is calculated to form the correlation matrix, and R() is the Pearson correlation coefficient between features i and j, which is defined as follows: The characteristic correlation heat map drawn by calculation is shown in Fig 4 below:

Fig 4

Characteristic correlation heat map.

As observed from Fig 4, the correlation between the selected 15 feature vectors is weak, the lowest correlation coefficient between cvr8 and action_user_onlytype_min_6 is 0.00079, and the highest correlation coefficient between action_user_onlytype_std_5 and action_user_onlytype_max_5 is 0.69, and thus the selected features are not redundant.

4.4 Model training and parameter optimization

4.4.1 Model training

In this study, we use the Anaconda3 (64-bit) experimental platform, Anaconda, as a Python distribution that can be scientifically calculated. The machine learning tool function in the scikit-learn package is used in model training, which reduces the difficulty of the experiment. The experimental environment consists of a Core i7-10510U processor, Windows 10 system, 8 GB memory, and 4.9 GHz frequency. The training steps of the prediction model are given as follows: The training set is divided into five components, one of which is used as the verification set, and the other four are used as the training set. Five-fold cross-validation and training of 10 base models are carried out. The prediction is performed on the test set, and five prediction values trained by the base model on the training set and one prediction value on the test set are obtained; The 5 predicted values obtained from the training set are vertically overlapped and merged into 10 "features" to construct a new prediction data set. The logistic regression model is used in training, and the fusion model is established; The model trained in (2) is used to predict the values of the 10 "characteristics" constructed by the predicted values on the test set before the 10 base models to obtain the final prediction category.

4.4.2 Parameter optimization

The optimization parameters can accelerate the convergence speed and even obtain a better and smaller loss function value. Therefore, in this experiment, the parameters of the 10 base learners are adjusted and optimized to seek the optimal value for achievement of a better fusion effect. Due to space limitations, only the parameter adjustment of RandomForest is introduced. Many parameters must be set in the RandomForest model, and the main parameters are n_ estimators (number of subtrees), max_ depth (maximum depth), and min_ samples_ split (minimum number of samples). The appropriate parameter settings can significantly enhance the prediction accuracy of the model. In this experiment, the parameters of the model are adjusted and optimized using the grid parameter adjustment method. Fig 5 shows the experiment n_ estimators, which is a line chart of values and predicted F1 scores. The figure shows that the depth is 12, the number n_ estimators reaches approximately 300, and the F1 score reaches the maximum value, and thus we set n_ estimators to 300.

Fig 5

n_ parameter adjustment of estimators.

For parameter optimization, because of the interaction between certain parameters, it is necessary to carry out joint parameter adjustment. In this paper, n_ estimators is set to 400, the maximum depth of the RandomForest is max_ depth, and the required minimum number of samples min for the second time min_ samples_ split carries out a joint grid search. The experiment produces the results shown in Fig 6. From the figure, we find that when the depth of the tree is different if min_ samples_split increases the split value, the F1 score has a similar change trend. The depth is 12, and the min_ sample split is the maximum value when the split is 4, and thus the corresponding value is set as the parameter of the model. After adjusting other parameters, we did not find that the performance of the model was significantly improved, and therefore the other parameters of the model were taken as the default values.

Fig 6

max_ depth and min_ samples_ split joint parameter adjustment.

4.5 Model evaluation and comparison

4.5.1 Evaluation indicator

In the dichotomous problem, as mentioned in this paper, for the output variables of the model, represents purchase and indicates no purchase. The result can be divided into four categories: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The real cause is the positive sample of model inference, the true negative case is the negative sample of model inference, and the false-positive case and false-negative case are the positive and negative samples of model inference error. The samples used in this paper are positive samples and negative samples. The conclusion of the model can be shown clearly by the confusion matrix, as shown in Table 8.

Table 8

Confusion matrix structure.

Forecast Category	Positive (Purchase)	Negative (No Purchase)
Real Category	Positive (Purchase)	Negative (No Purchase)
Positive (Purchase)	True Positive (TP)	False Negative (FN)
Negative (No Purchase)	False Positive (FP)	True Negative (TN)

Based on the above four concepts, the confusion matrix consists of three KPIs: precision, recall, and F1 score. The calculation formula is given as follows: Precision, Pre The accuracy rate reflects the proportion of the number of samples correctly classified in all samples. Recall, Rec The recall rate is related to the category of minority samples, which represents the classification accuracy of minority samples. F1 score The F1 score is a measure of classification problems. Some classification problems often use the F1 score as the final evaluation method, and it is the harmonic mean of precision and recall; the maximum is 1, and the minimum is 0.

4.5.2 Analysis of empirical results

The F1 scores and training time of the fusion model are 98.40% and 2.9730 s, respectively. Compared with 10 models, the results are shown in Table 9. Except for the Gaussian Bayesianian model, which does not reach an F1 value of more than 90%, the training speed of the fused model is the fastest, 334.98 s faster than the optimal single model.

Table 9

Model comparison results.

NO.	Model	F1 score	Training time
1	Stacking	0.9840	2.97300
2	CatBoost	0.9831	337.953
3	XGBoost	0.9820	930.681
4	RandomForest	0.9814	53.6011
5	LightGBM	0.9789	78.3099
6	Logistic regression	0.9663	21.1730
7	LinearSVC	0.9639	40.4603
8	ExtraTrees	0.9619	15.0000
9	AdaBoost	0.9527	22.7302
10	K-NN	0.9122	17.7015
11	Gaussian Bayesian	0.8969	0.32770

As observed from Table 9, the F1 score of the fusion model is significantly improved compared with the 10 base models, indicating that the ensemble stacking model after fusion has a great effect on improving the accuracy of the prediction of user purchase behavior. Fig 7 compares the F1 scores of the stacking ensemble model and each base model. It can be observed that stacking has a better prediction effect than the bagging and boosting ensemble methods. The results of the stacking ensemble model are 0.26% higher than the best RandomForest model in the bagging method, 0.09% higher than the Catboost model in the boosting method, and 1.77% higher than the logistic regression algorithm in other types of learners.

Fig 7

Comparison of F1 scores of each model.

The above experimental data show that the performance of the ensemble learning model after fusion is notably good. The use of the information fusion and ensemble learning SE-stacking algorithm achieves good results, which verifies the effectiveness of the proposed user purchase behavior prediction model.

5. Conclusion

The prediction model proposed in this paper can predict the purchase of the user operation behavior data generated in the e-commerce platform, conduct statistical analysis and preprocessing on the original data and construct features, establish the information fusion and ensemble learning SE-stacking model to select features and train the prediction model, and evaluate and compare the comparison model and the ensemble stacking learning model after fusion to verify the effect, which attempts to predict the user purchase behavior using user behavior data. The main work and research results of this paper are summarized as follows: The experimental data used in this paper are provided by the DataCastle competition platform, and the amount of data is nearly 1.37 million. To predict the purchase behavior of future users more comprehensively and accurately, we construct 82 features based on the original data, which can better depict the purchase intention of users. To avoid overfitting of the model, improve the accuracy and shorten the training time, this paper uses SA-EFS to select features and verifies the same distribution and correlation to ensure that the training set is consistent with the test set and to prevent feature redundancy. To establish a model for prediction of purchase behavior, this paper uses a stacking scheme. To compare the prediction effects of the stacking and ensemble methods bagging and boosting, this paper takes three representative algorithms of bagging and four representative algorithms of boosting as the components of the base learners. In addition, four base learners of different categories are selected. The meta-learners adopt the stable logistic regression algorithm to obtain the final information fusion and ensemble learning SE-stacking model. A comprehensive model evaluation index is used to evaluate the model. The F1 score of the fusion model constructed in this paper reaches 98.40%, and the training speed is fast. Therefore, it can be concluded that the stacking ensemble learning model has a better prediction effect than the base model, and it has a good application in research on predictive analysis of the purchase behavior of e-commerce platform users. The combination of the model and the actual e-commerce scenario has a certain practical value, e.g., it can reduce operating and marketing costs, optimize service quality, increase market share, optimize e-commerce warehousing, enable inventory intelligence, provide big data feedback reports, promote new brand continuous innovation, and can be applied to other similar research. Certain deficiencies exist in the research on this topic. Because the data from a single tourism boutique are used in this paper, the relationship between user behavior and different types of products cannot be explored. Therefore, in future research, we can enhance the information dimension of relevant products to correlate user behavior in different types of products and make better predictions of user purchase behavior. (ZIP) Click here for additional data file.

2 in total

Review 1. Applications of Fusion Techniques in E-Commerce Environments: A Literature Review.

Authors: Emmanouil Daskalakis; Konstantina Remoundou; Nikolaos Peppes; Theodoros Alexakis; Konstantinos Demestichas; Evgenia Adamopoulou; Efstathios Sykas
Journal: Sensors (Basel) Date: 2022-05-25 Impact factor: 3.847

2. Interactive Design Psychology and Artificial Intelligence-Based Innovative Exploration of Anglo-American Traumatic Narrative Literature.

Authors: Xia Hou; Noritah Omar; Jue Wang
Journal: Front Psychol Date: 2022-02-10

2 in total