| Literature DB >> 34370770 |
Yu Du1, Nicolas Sutton-Charani1, Sylvie Ranwez1, Vincent Ranwez2.
Abstract
Recommender systems aim to provide users with a selection of items, based on predicting their preferences for items they have not yet rated, thus helping them filter out irrelevant ones from a large product catalogue. Collaborative filtering is a widely used mechanism to predict a particular user's interest in a given item, based on feedback from neighbour users with similar tastes. The way the user's neighbourhood is identified has a significant impact on prediction accuracy. Most methods estimate user proximity from ratings they assigned to co-rated items, regardless of their number. This paper introduces a similarity adjustment taking into account the number of co-ratings. The proposed method is based on a concordance ratio representing the probability that two users share the same taste for a new item. The probabilities are further adjusted by using the Empirical Bayes inference method before being used to weight similarities. The proposed approach improves existing similarity measures without increasing time complexity and the adjustment can be combined with all existing similarity measures. Experiments conducted on benchmark datasets confirmed that the proposed method systematically improved the recommender system's prediction accuracy performance for all considered similarity measures.Entities:
Mesh:
Year: 2021 PMID: 34370770 PMCID: PMC8351953 DOI: 10.1371/journal.pone.0255929
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Dataset characteristics.
| dataset | #users | #items | #ratings | rating scale | density | domain |
|---|---|---|---|---|---|---|
| MovieLens-100K | 943 | 1682 | 100K | [1, 5] | 6.30% | Movie |
| MovieLens-1M | 6,040 | 3,900 | 1M | [1, 5] | 4.47% | Movie |
| Jester | 59.1K | 140 | 1.7M | [-10, 10] | 20.53% | Joke |
The dataset density, i.e. the inverse of the sparsity, represents the percentage of cells in the full user-item matrix that contain rating values.
Fig 1Experimental results of EBCR with different a values (cf. Eq (7)) for different neighbourhood sizes (k).
Fig 2Comparisons of original similarity measures (i.e. dashed curves) with their variants integrating EBCR ratios (i.e. solid curves with the same colour) on three benchmark datasets: MovieLens-100K, MovieLens-1M and Jester.
Fig 3Comparisons of the Significance Weighting (SW) with the EBCR approach on three benchmark datasets: MovieLens-100K, MovieLens-1M and Jester.
Comparisons of the Laplace smoothing (LS) with the EBCR approach on three benchmark datasets: MovieLens-100K, MovieLens-1M and Jester.
| Dataset | Similarity measure | Evaluation metric (LS, EBCR) | Neighbourhood size | |||
|---|---|---|---|---|---|---|
| 5 | 10 | 20 | 40 | |||
| MovieLens-100K | MSD | MAE | (0.790, | (0.758, | (0.743, | (0.738, |
| RMSE | (1.010, | (0.969, | (0.949, | (0.943, | ||
| COS | MAE | (0.788, | (0.757, | (0.743, | (0.737, | |
| RMSE | (1.009, | (0.970, | (0.951, | (0.944, | ||
| NormPCC | MAE | (0.786, | (0.756, | (0.743, | (0.739, | |
| RMSE | (1.008, | (0.969, | (0.951, | (0.945, | ||
| MovieLens-1M | MSD | MAE | (0.782, | (0.747, | (0.727, | (0.717, |
| RMSE | (0.997, | (0.949, | (0.923, | (0.909, | ||
| COS | MAE | (0.781, | (0.747, | (0.727, | (0.716, | |
| RMSE | (0.997, | (0.950, | (0.924, | (0.909, | ||
| NormPCC | MAE | (0.775, | (0.742, | (0.724, | (0.714, | |
| RMSE | (0.990, | (0.945, | (0.922, | (0.910, | ||
| Jester | MSD | MAE | (3.045, 3.045) | (3.035, 3.035) | (3.074, 3.074) | (3.125, |
| RMSE | (4.189, 4.189) | (4.123, | (4.141, | (4.180, 4.180) | ||
| COS | MAE | (3.039, | (3.017, | (3.041, | (3.083, 3.083) | |
| RMSE | (4.181, | (4.101, 4.101) | (4.104, | (4.137, 4.137) | ||
| NormPCC | MAE | (3.049, | (3.037, 3.037) | (3.075, 3.075) | (3.126, 3.126) | |
| RMSE | (4.191, 4.191) | (4.124, 4.124) | (4.142, | (4.181, | ||
The best value for each (LS, EBCR) comparison is shown in bold characters.
Comparisons of EBCR vs. model-based collaborative filtering approaches.
| Dataset (density) | ||||||
|---|---|---|---|---|---|---|
| Approach | MovieLens-1M (4.47%) | MovieLens-100k (6.30%) | Jester (20.53%) | |||
| MAE | RMSE | MAE | RMSE | MAE | RMSE | |
| Baseline | 0.7195 | 0.9088 | 0.7484 | 0.944 | 3.3982 | 4.3134 |
| SVD | 0.6863 |
| 0.7376 |
| 3.3713 | 4.5004 |
| SVD++ |
|
|
|
| 3.6209 | 4.9042 |
| NeuMF |
| 0.8765 | 0.7437 | 0.9363 |
|
|
| EBCR | 0.7052 | 0.9016 |
| 0.9413 |
|
|
The best MAE and RMSE values for each dataset are shown in bold characters and the second ranking ones are underlined. For EBCR, the similarity measure (SM) and the neighbourhood size (NS) used for each dataset are as follows: SM = COS, NS = 60 for MovieLens-1M; SM = COS, NS = 60 for MovieLens-100k and SM = COS, NS = 10 for Jester.
A case study example extracted from the MovieLens-100K dataset.
| Case study: predict the rating of user_903 (target user) for item_106 (the user’s actual rating is: 2) | ||
|---|---|---|
| Approach | Neighbours (# of co-rated items with the target user) | Predicted rating |
| COS_KNN (k = 5) | user_36 (2); user_33 (2); user_240 (3); user_61(1); user_173 (2) | 3.195 |
| EBCR_COS_KNN (k = 5) | user_556 (13); user_8 (18); user_898 (4); user_563 (7); user_609 (4) |
|
| Baseline | 3.122 | |
| SVD | 2.925 | |
| SVD++ |
| |
| NeuMF | 2.934 | |
The best prediction approach is bolded and the second ranking one is underlined.