| Literature DB >> 35761867 |
Zengyuan Wu1, Lingmin Jin1, Jiali Zhao1, Lizheng Jing1, Liang Chen2.
Abstract
In view of the shortcomings of traditional clustering algorithms in feature selection and clustering effect, an improved Recency, Frequency, and Money (RFM) model is introduced, and an improved K-medoids algorithm is proposed. Above model and algorithm are employed to segment customers of e-commerce. First, traditional RFM model is improved by adding two features of customer consumption behavior. Second, in order to overcome the defect of setting K value artificially in traditional K-medoids algorithm, the Calinski-Harabasz (CH) index is introduced to determine the optimal number of clustering. Meanwhile, K-medoids algorithm is optimized by changing the selection of centroids to avoid the influence of noise and isolated points. Finally, empirical research is done using a dataset from an e-commerce platform. The results show that our improved K-medoids algorithm can improve the efficiency and accuracy of e-commerce customer segmentation.Entities:
Mesh:
Year: 2022 PMID: 35761867 PMCID: PMC9233613 DOI: 10.1155/2022/9930613
Source DB: PubMed Journal: Comput Intell Neurosci
The performance of 4 algorithms working on different datasets.
| Datasets | ||||
|---|---|---|---|---|
| Clustering algorithm | Breast cancer | Iris plants | ||
| ACC (%) | Time (ms) | ACC (%) | Time (ms) | |
| K-medoids | 0.858 | 33.1 | 0.663 | 26.5 |
| K-means++ | 0.854 | 208.2 | 0.833 | 265.0 |
| Spectral clustering | 0.667 | 103.8 | 0.9 | 118.1 |
| Improved K-medoids | 0.868 | 22.7 | 0.840 | 13.9 |
Figure 1Line chart of CH value.
Figure 2Line chart of inflection point method.
The fields and descriptions in the dataset.
| Field name | Data type | Field description |
|---|---|---|
| Customer_unique_id | Int | Customer's unique identification |
| Order_id | Int | Order identification |
| Product_id | Int | Product identification |
| Behavior type | String | The type of user behavior towards the product, including browsing, favoriting, adding to cart, purchasing |
| Timestamp | Int | Time of behavior |
Partial data of RFMCV model.
| Customer_unique_id |
|
|
|
|
|
|---|---|---|---|---|---|
| 5 | 1 | 1 | 99 | 13 | 7 |
| 18 | 6 | 2 | 210 | 16 | 0 |
| 22 | 4 | 8 | 84 | 0 | 0 |
| … | … | … | … | … | … |
| 906311 | 7 | 5 | 118 | 7 | 0 |
| 906338 | 3 | 1 | 28 | 5 | 0 |
| 906355 | 5 | 4 | 84 | 9 | 0 |
The table of partial data of RFMCV model after normalized treatment.
| Customer_unique_id |
|
|
|
|
|
|---|---|---|---|---|---|
| 5 | −0.000902 | 2.068466 | −0.097700 | −0.745080 | −0.397498 |
| 18 | −0.623415 | −0.018191 | 1.465430 | 1.340590 | −0.390554 |
| 22 | 1.247733 | 3.807347 | 0.597041 | −0.390554 | −0.390554 |
| … | … | … | … | … | … |
| 906311 | −0.625219 | −0.365967 | −1.139736 | −0.397498 | −0.390554 |
| 906338 | 0.935574 | 1.025137 | 0.324119 | 0.167400 | −0.390554 |
| 906355 | 0.311257 | −0.677361 | −0.097700 | 0.428109 | −0.390554 |
Figure 3Distribution chart of four groups.