| Literature DB >> 31600235 |
Heng-Ru Zhang1, Yuan-Yuan Ma1, Xin-Chao Yu1, Fan Min1.
Abstract
Slope one is a popular recommendation algorithm due to its simplicity and high efficiency for sparse data. However, it often suffers from under-fitting since the global information of all relevant users/items are considered. In this paper, we propose a new scheme called enhanced slope one recommendation through local information embedding. First, we employ clustering algorithms to obtain the user clusters as well as item clusters to represent local information. Second, we predict ratings using the local information of users and items in the same cluster. The local information can detect strong localized associations shared within clusters. Third, we design different fusion approaches based on the local information embedding. In this way, both under-fitting and over-fitting problems are alleviated. Experiment results on the real datasets show that our approaches defeats slope one in terms of both mean absolute error and root mean square error.Entities:
Year: 2019 PMID: 31600235 PMCID: PMC6786606 DOI: 10.1371/journal.pone.0222702
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Notations.
| Notations | Meaning |
|---|---|
| The set of all users | |
| The set of all items | |
| The rating matrix | |
| The number of users | |
| The number of items | |
| The rating of | |
|
| The average rating of |
|
| The average rating of |
| The | |
| The | |
| The set of | |
| The set of | |
| The total number of user clusters | |
| The total number of item clusters | |
| The sub-matrix ratings for | |
| The sub-matrix ratings for | |
| The sub-matrix ratings for | |
|
| The predicting rating of |
|
| The predicting rating of |
|
| The predicting rating of |
|
|
|
|
|
|
|
|
|
|
|
|
The rating matrix (R).
| UID/TID | |||||
|---|---|---|---|---|---|
| 2 | 4 | – | 5 | 1 | |
| 7 | 9 | 8 | 6 | 9 | |
| – | 8 | 9 | 8 | 6 | |
| 6 | – | 9 | 8 | 7 | |
| 1 | 3 | 5 | 4 | 2 |
The basic information of four datasets.
| Dataset | # of Users | # of Items | # of Ratings | # of Density | # of Average rating |
|---|---|---|---|---|---|
| ML100K | 943 | 1,682 | 100,000 | 6.30% | 3.53 |
| ML1M | 6,040 | 3,900 | 1,000,209 | 4.19% | 3.59 |
| ML10M | 71,567 | 10,681 | 10,000,054 | 1.31% | 3.51 |
| DB | 2,965 | 39,695 | 912,479 | 0.78% | 3.75 |
Runtime comparison under M-distance clustering (unit: ms).
| Algorithms | Dataset | |||
|---|---|---|---|---|
| ML100K | ML1M | ML10M | DB | |
| 6,295 | 92,499 | 9,278,753 | 78,398 | |
| 4,061 | 53,109 | 5,334,255 | 53,159 | |
| 4,926 | 55,990 | 5,468,273 | 55,923 | |
| 3,617 | 39,643 | 4,073,868 | 43,139 | |
| 7,638 | 98,433 | 9,075,662 | 85,890 | |
| 9,945 | 106,902 | 11,450,463 | 90,936 | |
| 9,839 | 106,844 | 11,336,946 | 92,349 | |
| 19,777 | 255,122 | 20,658,793 | 165,813 | |
MAE comparison under M-distance clustering.
| Algorithms | Dataset | |||
|---|---|---|---|---|
| ML100K | ML1M | ML10M | DB | |
| 0.7417±0.0047 | 0.7713±0.0023 | 0.7624±0.0052 | 0.6632±0.0045 | |
| 0.7594±0.0029 | 0.7474±0.0033 | 0.6792±0.0064 | ||
| 0.7751±0.0020 | 0.7641±0.0004 | 0.6634±0.0045 | ||
| 0.7575±0.0029 | 0.7489±0.0035 | 0.7301±0.0050 | ||
| 0.7552±0.0027 | 0.7276±0.0009 | 0.6716±0.0061 | ||
| 0.7470±0.0031 | 0.7516±0.0030 | 0.7349±0.0007 | 0.6659±0.0050 | |
| 0.7452±0.0027 | 0.7532±0.0028 | 0.7361±0.0007 | 0.6585±0.0045 | |
| 0.7433±0.0027 | 0.7520±0.0029 | 0.7351±0.0008 | 0.6660±0.0050 | |
| Lower | 0.21% | 3.11% | 4.60% | 1.66% |
RMSE comparison under M-distance clustering.
| Algorithms | Dataset | |||
|---|---|---|---|---|
| ML100K | ML1M | ML10M | DB | |
| 0.9670±0.0034 | 0.9729±0.0008 | 0.8633±0.0067 | ||
| 0.9804±0.0079 | 0.9134±0.0098 | |||
| 0.9439±0.0073 | 0.9714±0.0031 | 0.9750±0.0007 | 0.8631±0.0063 | |
| 0.9802±0.0090 | 0.9430±0.0043 | 0.9354±0.0067 | 0.8586±0.0083 | |
| 0.9754±0.0079 | 0.9419±0.0044 | 0.9319±0.0008 | 0.8931±0.0089 | |
| 0.9570±0.0069 | 0.9441±0.0043 | 0.9397±0.0008 | 0.8758±0.0076 | |
| 0.9568±0.0074 | 0.9476±0.0051 | 0.9409±0.0007 | ||
| 0.9531±0.0068 | 0.9442±0.0043 | 0.9401±0.0007 | 0.8758±0.0076 | |
| Lower | -0.16% | 2.55% | 4.23% | 1.00% |
MAE comparison under k-means clustering.
| Algorithms | Dataset | |||
|---|---|---|---|---|
| ML100K | ML1M | ML10M | DB | |
| 0.7713±0.0023 | 0.7624±0.0052 | |||
| 0.7694±0.0059 | 0.7709±0.0016 | 0.7614±0.0013 | 0.6708±0.0067 | |
| 0.7431±0.0048 | 0.7766±0.0020 | 0.6980±0.0086 | ||
| 0.7717±0.0059 | 0.7755±0.0011 | 0.7652±0.0049 | 0.6634±0.0040 | |
| 0.7661±0.0056 | 0.7726±0.0013 | 0.7585±0.0008 | 0.6904±0.0080 | |
| 0.7517±0.0060 | 0.7597±0.0010 | 0.6822±0.0068 | ||
| 0.7510±0.0020 | 0.7747±0.0013 | 0.7624±0.0011 | 0.6748±0.0062 | |
| 0.7485±0.0057 | 0.7716±0.0015 | 0.7609±0.0009 | 0.6823±0.0068 | |
| Lower | -0.19% | 0.21% | 0.66% | -0.03% |
RMSE comparison under k-means clustering.
| Algorithms | Dataset | |||
|---|---|---|---|---|
| ML100K | ML1M | ML10M | DB | |
| 0.9670±0.0034 | 0.9729±0.0008 | |||
| 0.9887±0.0079 | 0.9711±0.0007 | 0.9543±0.0181 | ||
| 0.9469±0.0076 | 0.9733±0.0030 | 0.9764±0.0008 | 0.8635±0.0066 | |
| 0.9941±0.0088 | 0.9762±0.0010 | 0.9745±0.0045 | 0.9015±0.0132 | |
| 0.9844±0.0074 | 0.9728±0.0008 | 0.9707±0.0010 | 0.9347±0.0164 | |
| 0.9589±0.0078 | 0.9703±0.0010 | 0.9041±0.0129 | ||
| 0.9626±0.0085 | 0.9724±0.0009 | 0.9736±0.0008 | 0.8836±0.0110 | |
| 0.9553±0.0073 | 0.9688±0.0011 | 0.9718±0.0008 | 0.9042±0.0128 | |
| Lower | -0.49% | 0.03% | 0.37% | -0.02% |