| Literature DB >> 29348779 |
Arthur Flexer1, Jeff Stevens2.
Abstract
This paper is concerned with the impact of hubness, a general problem of machine learning in high-dimensional spaces, on a real-world music recommendation system based on visualisation of a k-nearest neighbour (knn) graph. Due to a problem of measuring distances in high dimensions, hub objects are recommended over and over again while anti-hubs are nonexistent in recommendation lists, resulting in poor reachability of the music catalogue. We present mutual proximity graphs, which are an alternative to knn and mutual knn graphs, and are able to avoid hub vertices having abnormally high connectivity. We show that mutual proximity graphs yield much better graph connectivity resulting in improved reachability compared to knn graphs, mutual knn graphs and mutual knn graphs enhanced with minimum spanning trees, while simultaneously reducing the negative effects of hubness.Entities:
Keywords: Music recommendation; curse of dimensionality; graphs; hubness
Year: 2017 PMID: 29348779 PMCID: PMC5750815 DOI: 10.1080/09298215.2017.1354891
Source DB: PubMed Journal: J New Music Res ISSN: 0929-8215 Impact factor: 1.143
Figure 1.Soundpark web player showing recommendations as a visualisation of the underlying knn graph and as a text list.
Analysis of knn, muknn, muknn+msp and mp graphs.
| knn | muknn | muknn+msp | mp | |
|---|---|---|---|---|
| maxhub | 419 | 5 | 145 | 30 |
| #hub | 291 | 0 | 23 | 2 |
| #anti | 2661 | 4566 | 0 | 641 |
| 13.97 | 1.43 | 10.15 | 0.91 | |
| reach | 65.28% | 40.43% | 100.00% | 91.62% |
| #edges | 38,325 | 5790 | 17,616 | 38,325 |
| 19.68 | 2.40 | 4.50 | 19.50 | |
| scc | 29.11% | 11.89% | 100.00% | 85.26% |
| #scc | 408 | 652 | 0 | 102 |
| 2.87 | 3.36 | – | 2.87 | |
| 55.78% | 37.79% | 52.10% | 52.20% |
Figure 2.Distribution of k-occurrences for knn, muknn, muknn+msp and mp graphs. Notice the strong skewness in knn, muknn, and muknn+msp, while the mp graph is much more evenly dispersed with few hubs and anti-hubs.
Figure 3.Statistics for self-avoiding random walks, black bars for normal vertices, dark grey for hubs, light grey for anti-hubs, white for missing.