| Literature DB >> 33169049 |
Abstract
The knowledge-based economy has drawn increasing attention recently, particularly in online shopping applications where all the transactions and consumer opinions are logged. Machine learning methods could be used to extract implicit knowledge from the logs. Industries and businesses use the knowledge to better understand the consumer behavior, and opportunities and threats correspondingly. The outbreak of coronavirus (COVID-19) pandemic has a great impact on the different aspects of our daily life, in particular, on our shopping behaviour. To predict electronic consumer behaviour could be of valuable help for managers in government, supply chain and retail industry. Although, before coronavirus pandemic we have experienced online shopping, during the disease the number of online shopping increased dramatically. Due to high speed transmission of COVID-19, we have to observe personal and social health issues such as social distancing and staying at home. These issues have direct effect on consumer behaviour in online shopping. In this paper, a prediction model is proposed to anticipate the consumers behaviour using machine learning methods. Five individual classifiers, and their ensembles with Bagging and Boosting are examined on the dataset collected from an online shopping site. The results indicate the model constructed using decision tree ensembles with Bagging achieved the best prediction of consumer behavior with the accuracy of 95.3%. In addition, correlation analysis is performed to determine the most important features influencing the volume of online purchase during coronavirus pandemic. © Springer Science+Business Media, LLC, part of Springer Nature 2020.Entities:
Keywords: Bagging; Boosting; Consumer behavior; Coronavirus disease (COVID-19); E-commerce; Machine learning; Prediction model
Year: 2020 PMID: 33169049 PMCID: PMC7643087 DOI: 10.1007/s10614-020-10069-3
Source DB: PubMed Journal: Comput Econ ISSN: 0927-7099 Impact factor: 1.741
Fig. 1Research process
A typical confusion matrix for a binary classification problem
| Predicted values | ||
|---|---|---|
| Positive | Negative | |
| Actual values ( as is in the dataset) | ||
| Positive | TP True Positive | FN False Negative |
| Negative | FP False Positive | TN True Negative |
Description of the dataset records
| Feature | Description | |
|---|---|---|
| 1 | Gender | Male or Female |
| 2 | Education level | Defined in five stages: No / Diploma / Bachelor / Master / PhD |
| 3 | Job | Categorized in seven classes: Housewife/Student/Employee/Self-employment/Teacher/Manager/Academic |
| 4 | Age | 18–60 years |
| 5 | Diabetes disorder | Diabetes disorder is labeled with No/Yes |
| 6 | Respiratory diseases | Respiratory diseases is labeled with No/Yes |
| 7 | Cancer | Cancer diseases or Improved status is labeled with No/Yes |
| 8 | MS diseases | Multiple sclerosis (MS) diseases is labeled with No/Yes |
| 9 | NP-BP | Number of online purchase in 2 months before pandemic (20 January 2020 – 20 March 2020) |
| 10 | NP-AP | Number of online purchase in 2 months after pandemic (20 March 2020 – 20 May 2020) |
| 11 | Effective | The effect of COVID-19 on number of online purchase during the era |
Fig. 2Correlation between all the features
Fig. 3Percentage of classification results obtained from individual classifiers
Fig. 4Percentage of classification results obtained from each of the classifiers ensembles with Bagging
Fig. 5Percentage of classification results obtained from each of the classifiers ensembles with Boosting
The results of individual classifiers and their ensembles with Bagging and Boosting
| Classifier | Accuracy (%) | |
|---|---|---|
| Single classifiers | DT | 94.6 |
| SVM | 90 | |
| SMO | 78 | |
| NB | 83.3 | |
| ANN | 93.3 | |
| Bagging | DT | 95.3 |
| SVM | 92 | |
| SMO | 80 | |
| NB | 82.6 | |
| ANN | 92.6 | |
| Boosting | DT | 94 |
| SVM | 92 | |
| SMO | 80.6 | |
| NB | 88.6 | |
| ANN | 93.3 |