| Literature DB >> 33267416 |
Takafumi Kanamori1,2, Naoya Osugi3.
Abstract
The quality of online services highly depends on the accuracy of the recommendations they can provide to users. Researchers have proposed various similarity measures based on the assumption that similar people like or dislike similar items or people, in order to improve the accuracy of their services. Additionally, statistical models, such as the stochastic block models, have been used to understand network structures. In this paper, we discuss the relationship between similarity-based methods and statistical models using the Bernoulli mixture models and the expectation-maximization (EM) algorithm. The Bernoulli mixture model naturally leads to a completely positive matrix as the similarity matrix. We prove that most of the commonly used similarity measures yield completely positive matrices as the similarity matrix. Based on this relationship, we propose an algorithm to transform the similarity matrix to the Bernoulli mixture model. Such a correspondence provides a statistical interpretation to similarity-based methods. Using this algorithm, we conduct numerical experiments using synthetic data and real-world data provided from an online dating site, and report the efficiency of the recommendation system based on the Bernoulli mixture models.Entities:
Keywords: bernoulli mixture models; completely positive matrix; recommendation; similarity measures
Year: 2019 PMID: 33267416 PMCID: PMC7515218 DOI: 10.3390/e21070702
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Definition of similarity measures between the nodes and . The right column shows whether the similarity measure is a completely positive similarity kernel (CPSK); see Section 4.
| Similarity | Definition/Condition of | CPSK |
|---|---|---|
| Common neighbors [ |
|
|
| Parameter-dependent [ |
|
|
| Jaccard coefficient [ |
|
|
| Sørensen index [ |
|
|
| Hub depressed [ |
|
|
| Hub promoted [ |
| × |
| SimRank [ |
| |
| Adamic-adar coefficient [ |
|
|
| Resource allocation [ |
|
|
| Content-based similarity [ |
|
|
Figure 1Edges from X to Y. The bold edges mean that there are many edges between the connected groups. The broken edges mean that there are few edges between the connected groups.
Mean average precision (MAP) values of similarity-based methods under synthetic data. The bold face indicates the top two MAP scores.
| Similarity | MAP (MAD) |
|---|---|
| Common Neighbors | 1.889 (±0.413) |
| Cosine | 1.907 (±0.431) |
| Jaccard Coefficient | 2.115 (±0.421) |
| Sørensen Index | 2.021 (±0.369) |
| Hub Depressed | 2.231 (±0.376) |
| Hub Promoted | 2.053 (±0.301) |
| SimRank | 2.853 (±0.610) |
| Adamic-Adar coefficient | 2.188 (±0.587) |
| Resource Allocation | 1.950 (±0.516) |
| Bernoulli Mixture ( | |
| Bernoulli Mixture ( | |
| Bernoulli Mixture ( | 2.821 (±1.164) |
| Bernoulli Mixture ( | 1.382 (±0.449) |
| Bernoulli Mixture ( | 1.535 (±0.555) |
MAP values of updated Bernoulli mixture models with the SM-to-BM algorithm under synthetic data. The results of Bernoulli mixture models with and are reported. The bold face indicates the top two MAP scores in each column.
|
| |||
|
|
|
|
|
| Common Neighbors | |||
| Cosine | 4.725 (±1.119) | 4.549 (±1.119) | 4.549 (±1.119) |
| Jaccard Coefficient | 5.373 (±0.643) | 5.373 (±0.643) | |
| Sørensen Index | 5.013 (±2.223) | 4.964 (±2.223) | 4.964 (±2.223) |
| Hub Depressed | 5.417 (±1.756) | 5.262 (±1.756) | 5.262 (±1.756) |
| Hub Promoted | 5.120 (±0.563) | 5.165 (±0.563) | 5.165 (±0.563) |
| SimRank | 3.848 (±1.630) | 4.377 (±1.279) | 4.379 (±1.264) |
| Adamic-Adar coefficient | 4.348 (±1.170) | 4.404 (±1.170) | 4.404 (±1.170) |
| Resource Allocation | 4.435 (±0.552) | 4.385 (±0.552) | 4.385 (±0.552) |
| BerMix. (Random ini.) | 1.297 (±0.446) | ||
|
| |||
|
|
|
|
|
| Common Neighbors: | 5.059 (±0.939) | ||
| Cosine | 4.442 (±0.901) | 4.070 (±0.901) | 3.948 (±0.901) |
| Jaccard Coefficient | 5.167 (±1.745) | 4.765 (±1.745) | 4.792 (±1.745) |
| Sørensen Index | |||
| Hub Depressed | 4.668 (±1.807) | 4.391 (±1.807) | |
| Hub Promoted | 5.078 (±0.702) | 4.815 (±0.702) | 5.008 (±0.702) |
| SimRank | 4.121 (±1.274) | 3.592 (±1.150) | 3.615 (±1.447) |
| Adamic-Adar coefficient | 5.284 (±1.166) | 4.909 (±1.166) | 5.084 (±1.166) |
| Resource Allocation | 4.884 (±0.751) | 4.499 (±0.751) | 4.263 (±0.751) |
| BerMix. (Random ini.) | 1.080 (±0.446) | 3.705 (±1.925) | 4.810 (±1.268) |
MAP scores for real-world data. The bold face indicates the top two MAP scores in each column.
| Similarity | Recomm. of | Recomm. of |
|---|---|---|
| MAP (MAD): | MAP (MAD) | |
| Common Neighbors:Interest | 6.267 (±0.806) | 2.893 (±0.343) |
| Common Neighbors:Attract | 2.053 (±0.276) | 8.813 (±0.757) |
| Cosine:Interest | 3.496 (±0.324) | 3.699 (±0.309) |
| Cosine:Attract | 2.746 (±0.276) | 6.108(±0.546) |
| Jaccard Coefficient:Interest | 4.098 (±0.373) | 4.066 (±0.294) |
| Jaccard Coefficient:Attract | 3.288 (±0.362) | 7.777 (±0.724) |
| Sørensen Index:Interest | 4.205 (±0.363) | 3.996 (±0.293) |
| Sørensen Index:Attract | 3.205 (±0.319) | 7.910 (±0.620) |
| Hub Depressed:Interest | 4.370 (±0.369) | 4.106 (±0.291) |
| Hub Depressed:Attract | 3.366 (±0.379) | 8.364 (±0.613) |
| Hub Promoted:Interest | 1.959 (±0.300) | 2.691 (±0.334) |
| Hub Promoted:Attract | 1.662 (±0.262) | 2.641 (±0.263) |
| SimRank:Interest | 2.079 (±0.164) | 6.336 (±0.423) |
| SimRank:Attract | 5.100 (±0.775) | 3.158 (±0.193) |
| Adamic-Adar coefficient:Interest | 6.216 (±0.701) | 2.970 (±0.308) |
| Adamic-Adar coefficient:Attract | 2.209 (±0.267) | 8.300 (±0.632) |
| Resource Allocation:Interest | 5.521 (±0.679) | 3.557 (±0.262) |
| Resource Allocation:Attract | 2.713 (±0.298) | 6.875 (±0.660) |
| Bernoulli Mixture ( | 4.578 (±0.734) | |
| Bernoulli Mixture ( | 12.825 (±2.323) | |
| Bernoulli Mixture ( | 5.055 (±0.813) | |
| Bernoulli Mixture ( | 10.362 (±1.981) | 10.348(±2.514) |
| Bernoulli Mixture ( | 4.263 (±0.772) | 15.013 (±4.042) |
| Bernoulli Mixture ( | 8.786 (±1.195) | |
| Bernoulli Mixture ( | 5.664 (±1.873) | 14.288 (±6.409) |
| Bernoulli Mixture ( | 10.029 (±3.933) | 8.436 (±3.929) |
| Bernoulli Mixture ( | 2.910 (±0.436) | 8.525 (±1.199) |
| Bernoulli Mixture ( | 5.980 (±1.612) | 5.119 (±0.464) |