| Literature DB >> 24381525 |
Abstract
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.Entities:
Mesh:
Year: 2013 PMID: 24381525 PMCID: PMC3863462 DOI: 10.1155/2013/869658
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1The framework of recommendation model based on users clustering.
The ratio of K-means using ABC to traditional K-means in metric D/L.
| Number of instances | Number of attributes |
| D/L | |
|---|---|---|---|---|
| Ratio of | ||||
| Balance | 625 | 4 | 3 | 0.873 |
| Cancer | 569 | 30 | 2 | 0.862 |
| Cancer-Int | 699 | 9 | 2 | 0.871 |
| Credit | 690 | 15 | 2 | 0.891 |
| Dermatology | 366 | 34 | 6 | 0.886 |
| Diabetes | 768 | 8 | 2 | 0.879 |
| Ecoli | 327 | 7 | 5 | 0.881 |
| Glass | 214 | 9 | 6 | 0.895 |
| Heart | 303 | 75 | 2 | 0.868 |
| Horse | 364 | 27 | 3 | 0.859 |
| Iris | 150 | 4 | 3 | 0.860 |
| Thyroid | 215 | 5 | 3 | 0.867 |
| Wine | 178 | 13 | 3 | 0.876 |
What we can know from this table is that our method is better than traditional K-means and it has more applicable cluster results.
Algorithmic performance for MovieLens dataset. The ranking score, precision, recall, intrasimilarity, Hamming distance are corresponding to L = 30, 40, and 50. Each number presented in this table is obtained by averaging over five runs, each of which has an independently random division of training set and test.
| Algorithms | Ranking score | Precision | Recall | Intrasimilarity | Hamming distance |
|---|---|---|---|---|---|
|
| |||||
| CF | 0.148 | 0.077 | 0.321 | 0.330 | 0.704 |
| MCF | 0.131 | 0.087 | 0.360 | 0.306 | 0.751 |
| NN- | 0.124 | 0.085 | 0.352 | 0.311 | 0.744 |
| cluster-based CF | 0.109 | 0.098 | 0.393 | 0.279 | 0.796 |
|
| |||||
|
| |||||
| CF | 0.137 | 0.071 | 0.332 | 0.338 | 0.698 |
| MCF | 0.121 | 0.080 | 0.373 | 0.317 | 0.743 |
| NN- | 0.120 | 0.079 | 0.369 | 0.320 | 0.736 |
| cluster-based CF | 0.105 | 0.089 | 0.405 | 0.288 | 0.790 |
|
| |||||
|
| |||||
| CF | 0.125 | 0.066 | 0.343 | 0.347 | 0.692 |
| MCF | 0.110 | 0.072 | 0.385 | 0.330 | 0.735 |
| NN- | 0.109 | 0.070 | 0.381 | 0.334 | 0.729 |
| cluster-based CF | 0.097 | 0.078 | 0.417 | 0.301 | 0.776 |
Algorithmic performance for the real-world dataset. The ranking score, precision, recall, intrasimilarity, Hamming distance are corresponding to L = 50, 60, and 70. Each number presented in this table is obtained by averaging over five runs, each of which has an independently random division of training set and test.
| Algorithms | Ranking score | Precision | Recall | Intrasimilarity | Hamming distance |
|---|---|---|---|---|---|
|
| |||||
| CF | 0.052 | 0.035 | 0.171 | 0.364 | 0.479 |
| MCF | 0.043 | 0.042 | 0.182 | 0.328 | 0.527 |
| NN- | 0.042 | 0.041 | 0.184 | 0.332 | 0.523 |
| cluster-based CF | 0.034 | 0.047 | 0.201 | 0.272 | 0.614 |
|
| |||||
|
| |||||
| CF | 0.046 | 0.032 | 0.185 | 0.385 | 0.461 |
| MCF | 0.040 | 0.039 | 0.194 | 0.342 | 0.506 |
| NN- | 0.039 | 0.038 | 0.196 | 0.348 | 0.504 |
| cluster-based CF | 0.031 | 0.043 | 0.209 | 0.281 | 0.603 |
|
| |||||
|
| |||||
| CF | 0.041 | 0.030 | 0.202 | 0.403 | 0.457 |
| MCF | 0.035 | 0.036 | 0.216 | 0.362 | 0.502 |
| NN- | 0.034 | 0.035 | 0.217 | 0.364 | 0.499 |
| cluster-based CF | 0.027 | 0.040 | 0.231 | 0.293 | 0.592 |