| Literature DB >> 31405108 |
Domokos Kelen1,2,3, Bálint Daróczy4,5, Frederick Ayala-Gómez6, Anna Ország4, András Benczúr4,5.
Abstract
Recommendation services bear great importance in e-commerce, shopping, tourism, and social media, as they aid the user in navigating through the items that are most relevant to their needs. In order to build recommender systems, organizations log the item consumption in their user sessions by using different sensors. For instance, Web sites use Web data loggers, museums and shopping centers rely on user in-door positioning systems to register user movement, and Location-Based Social Networks use Global Positioning System for out-door user tracking. Most organizations do not have a detailed history of previous activities or purchases by the user. Hence, in most cases recommenders propose items that are similar to the most recent ones viewed in the current user session. The corresponding task is called session based, and when only the last item is considered, it is referred to as item-to-item recommendation. A natural way of building next-item recommendations relies on item-to-item similarities and item-to-item transitions in the form of "people who viewed this, also viewed" lists. Such methods, however, depend on local information for the given item pairs, which can result in unstable results for items with short transaction history, especially in connection with the cold-start items that recently appeared and had no time yet to accumulate a sufficient number of transactions. In this paper, we give new algorithms by defining a global probabilistic similarity model of all the items based on Random Fields. We give a generative model for the item interactions based on arbitrary distance measures over the items, including explicit, implicit ratings and external metadata to estimate and predict item-to-item transition probabilities. We exploit our new model in two different item similarity algorithms, as well as a feature representation in a recurrent neural network based recommender. Our experiments on various publicly available data sets show that our new model outperforms simple similarity baseline methods and combines well with recent item-to-item and deep learning recommenders under several different performance metrics.Entities:
Keywords: fisher information; markov random fields; recommender systems; recurrent neural networks
Year: 2019 PMID: 31405108 PMCID: PMC6720552 DOI: 10.3390/s19163498
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Similarity graph of item i with sample items of distances from i.
Figure 2Pairwise similarity graph with sample set for a pair of items i and j.
Figure 3Single and multimodal similarity graph with sample set and modalities.
Figure 4Expanded Gru4Rec model for Fisher embedding.
Data sets used in the experiments.
| Data Set | Items | Users | Training Pairs | Testing Pairs |
|---|---|---|---|---|
| Netflix | 17,749 | 478,488 | 7,082,109 | 127,756 |
| MovieLens | 3683 | 6040 | 670,220 | 15,425 |
| Yahoo! Music | 433,903 | 497,881 | 27,629,731 | 351,344 |
| Books | 340,536 | 103,723 | 1,017,118 | 37,403 |
Co-occurrence quartiles.
| Data Set | 25% | 50% | 75% | Max |
|---|---|---|---|---|
| Books | 1 | 1 | 2 | 1931 |
| MovieLens | 29 | 107 | 300 | 2941 |
| Netflix | 56 | 217 | 1241 | 144,817 |
| Yahoo! Music | 4 | 9 | 23 | 160,514 |
Figure 5The Kernel Density Estimation function of the item co-occurrence concentrates at infrequent values.
Figure 6An example of movies from the MovieLens dataset that shows the relations of the movies using the DBpedia knowledge graphs. The black squares show the movie title, the edges are the properties and the white nodes are the property values.
Percentiles for the distribution of how many times a property is used in the knowledge graph. 75% of the properties are used only 42 times. We discard rare movie attributes, and only focus on Starring, writer, genre, director, and producer.
| Mean | Std. | Min. | 25% | 50% | 75% | Max. |
|---|---|---|---|---|---|---|
| 1 K | 5.3 K | 1 | 1 | 3 | 42 | 70 K |
Top 5 movie features for the selected properties in the knowledge graph.
| Property | Popular Values |
|---|---|
| Starring | Robin_Williams, Robert_De_Niro, Demi_Moore, Whoopi_Goldberg, and Bruce_Willis. |
| Writer | Woody_Allen, John_Hughes_(filmmaker), Robert_Towne, Lowell_Ganz, and Ronald_Bass. |
| Genre | Drama_film, Baroque_pop, Blues, Drama, and Rhythm_and_blues. |
| Director | Alfred_Hitchcock, Woody_Allen, Steven_Spielberg, Barry_Levinson, and Richard_Donner. |
| Producer | Walt_Disney, Arnon_Milchan, Brian_Grazer, Roger_Birnbaum, and Scott_Rudin. |
Statistics of the Jaccard similarity using the 100 most similar movies for each movie.
| Mean | Std. | Min. | 10% | 20% | 30.0% | 40% | 50% | 60.0% | 70% | 80% | 90% | Max. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.1276 | 0.0728 | 0.0365 | 0.0778 | 0.087 | 0.0945 | 0.1005 | 0.1066 | 0.1179 | 0.126 | 0.1429 | 0.1976 | 0.8165 |
Figure 7The quality of algorithms FD and FC with Jaccard similarity, as the function of the number of most popular items used as reference in the similarity graphs of Figure 1, Figure 2 and Figure 3 (horizontal axis). The Recall (top) and DCG (bottom) increases as we add more items in the sample set (i.e., list of recommended items).
Experiments on MovieLens with DBPedia content, all methods using Jaccard similarity.
| Recall@20 | DCG@20 | |
|---|---|---|
| Collaborative baseline | 0.139 | 0.057 |
| Content baseline | 0.131 | 0.056 |
| FC content | 0.239 | 0.108 |
| FD content | 0.214 | 0.093 |
| FC multimodal | 0.275 | 0.123 |
Experiments on MovieLens with different input embeddings in Gru4Rec. Best performing methods are indicated in boldface.
| MPR | DCG@20 | Recall@20 | |
|---|---|---|---|
| Random embedding | 0.1642 | 0.296 | 0.582 |
| Neural embedding | 0.0890 |
| 0.799 |
| Feedback Jaccard based Fisher embedding | 0.0853 | 0.437 | 0.794 |
| Content based Fisher embedding | 0.0985 | 0.405 | 0.757 |
| Feedback and Content combination |
| 0.446 |
|
Figure 8Linear combination weights for Feedback Jaccard and content based Fisher embedding models.
Figure 9Performance of the different Gru4Rec based models in case of different item supports.
Figure 10Recall@20 as the function of item support for the Netflix data set.
Experiments with combination of collaborative filtering for the first quantile (based on KDE estimation of 25%) of the MovieLens data. Best performing methods are indicated in boldface.
| MPR | Recall@20 | DCG@20 | |
|---|---|---|---|
| Cosine | 0.4978 | 0.0988 | 0.0553 |
| Jaccard | 0.4978 | 0.0988 | 0.0547 |
| ECP | 0.4976 | 0.0940 | 0.0601 |
| EIR | 0.3203 | 0.1291 | 0.0344 |
| FC Cosine | 0.3583 | 0.1020 | 0.0505 |
| FD Cosine | 0.2849 | 0.1578 | 0.0860 |
| FC Jaccard | 0.3354 | 0.1770 |
|
| FD Jaccard |
|
| 0.1010 |
| FC ECP | 0.2504 | 0.0940 | 0.0444 |
| FD ECP | 0.4212 | 0.1626 | 0.0856 |
| FC EIR | 0.4125 | 0.0861 | 0.0434 |
| FD EIR | 0.4529 | 0.1068 | 0.0560 |
Experiments over the first quantile (based on KDE estimations of 25%). Best performing methods are indicated in boldface.
| MovieLens | Goodreads | Yahoo! Music | Netflix | ||
|---|---|---|---|---|---|
| MPR | Cosine | 0.5024 | 0.4995 | 0.5 | 0.5028 |
| Jaccard | 0.5024 | 0.4995 | 0.5 | 0.5028 | |
| ECP | 0.4974 | 0.5004 | 0.4999 | 0.4968 | |
| EIR | 0.3279 | 0.482 | 0.2437 | 0.3395 | |
| FC Jaccard | 0.2665 | 0.3162 | 0.2456 | 0.4193 | |
| FD Jaccard |
|
|
|
| |
| FC + FD JC | 0.3652 | 0.2751 | 0.1319 | 0.3792 | |
| Recall@20 | Cosine | 0.0988 | 0.0966 | 0.0801 | 0.1254 |
| Jaccard | 0.0988 | 0.0966 | 0.0801 | 0.1254 | |
| ECP | 0.0893 | 0.0956 | 0.0801 | 0.0954 | |
| EIR | 0.1212 | 0.0996 | 0.1324 | 0.1033 | |
| FC Jaccard | 0.1834 | 0.1084 | 0.1358 | 0.1845 | |
| FD Jaccard |
| 0.0917 |
| 0.1636 | |
| FC + FD JC | 0.118 |
| 0.101 |
| |
| DCG@20 | Cosine | 0.0518 | 0.0505 | 0.044 | 0.0739 |
| Jaccard | 0.0518 | 0.0505 | 0.044 | 0.0733 | |
| ECP | 0.0528 | 0.0505 | 0.044 | 0.0772 | |
| EIR | 0.0405 | 0.0635 | 0.05 | 0.1198 | |
| FC Jaccard | 0.1045 | 0.0517 | 0.0663 | 0.106 | |
| FD Jaccard |
| 0.0462 |
| 0.0971 | |
| FC + FD JC | 0.071 |
| 0.0559 |
|
Experiments over the first two quantiles (based on KDE estimations of 50%). Best performing methods are indicated in boldface.
| MovieLens | Goodreads | Yahoo! Music | Netflix | ||
|---|---|---|---|---|---|
| MPR | Cosine | 0.5145 | 0.4995 | 0.5002 | 0.5017 |
| Jaccard | 0.5143 | 0.4995 | 0.5002 | 0.5014 | |
| ECP | 0.4836 | 0.5004 | 0.4997 | 0.4953 | |
| EIR | 0.3474 | 0.482 | 0.2495 | 0.3522 | |
| FC Jaccard | 0.3181 | 0.3162 | 0.2452 | 0.4534 | |
| FD Jaccard |
|
|
|
| |
| FC + FD JC | 0.3167 | 0.2751 | 0.1357 | 0.3634 | |
| Recall@20 | Cosine | 0.1099 | 0.0966 | 0.0958 | 0.1792 |
| Jaccard | 0.1099 | 0.0966 | 0.0958 | 0.1789 | |
| ECP | 0.1001 | 0.0956 | 0.0958 | 0.0863 | |
| EIR | 0.1066 | 0.0996 | 0.1109 | 0.0914 | |
| FC Jaccard | 0.137 | 0.1084 | 0.121 | 0.1683 | |
| FD Jaccard |
| 0.0917 |
| 0.1448 | |
| FC + FD JC | 0.0981 |
| 0.1034 |
| |
| DCG@20 | Cosine | 0.0572 | 0.0505 | 0.0532 | 0.0987 |
| Jaccard | 0.0574 | 0.0505 | 0.0532 | 0.097 | |
| ECP | 0.0541 | 0.0505 | 0.0532 | 0.1104 | |
| EIR | 0.0474 | 0.0635 | 0.0459 | 0.1283 | |
| FC Jaccard | 0.0729 | 0.0517 | 0.0628 | 0.0973 | |
| FD Jaccard |
| 0.0462 |
| 0.0833 | |
| FC + FD JC | 0.0538 |
| 0.0567 |
|
Experiments over the first three quantiles (based on KDE estimations of 75%). Best performing methods are indicated in boldface.
| MovieLens | Goodreads | Yahoo! Music | Netflix | ||
|---|---|---|---|---|---|
| MPR | Cosine | 0.5223 | 0.4992 | 0.4989 | 0.4912 |
| Jaccard | 0.5203 | 0.4992 | 0.4989 | 0.4865 | |
| ECP | 0.4668 | 0.5007 | 0.501 | 0.4866 | |
| EIR | 0.3578 | 0.4663 | 0.254 | 0.3775 | |
| FC Jaccard | 0.4406 | 0.3257 | 0.256 | 0.4491 | |
| FD Jaccard | 0.3987 |
|
|
| |
| FC + FD JC |
| 0.2871 | 0.1507 | 0.3613 | |
| Recall@20 | Cosine |
| 0.0979 | 0.0958 | 0.1996 |
| Jaccard | 0.1226 | 0.0979 | 0.0958 | 0.1588 | |
| ECP | 0.096 | 0.0961 | 0.0927 | 0.0689 | |
| EIR | 0.1052 | 0.1048 | 0.1206 | 0.0724 | |
| FC Jaccard | 0.1225 | 0.1182 | 0.1316 | 0.2023 | |
| FD Jaccard | 0.1133 | 0.0891 |
| 0.0983 | |
| FC + FD JC | 0.0969 |
| 0.1215 |
| |
| DCG@20 | Cosine | 0.0655 | 0.0499 | 0.0528 | 0.1127 |
| Jaccard | 0.0655 | 0.0499 | 0.0528 | 0.0913 | |
| ECP | 0.0588 | 0.0499 | 0.0528 | 0.1655 | |
| EIR | 0.0495 | 0.0584 | 0.0545 | 0.1382 | |
| FC Jaccard |
| 0.0582 | 0.0681 | 0.114 | |
| FD Jaccard | 0.0587 | 0.0467 |
| 0.0542 | |
| FC + FD JC | 0.0506 |
| 0.0686 |
|