| Literature DB >> 35464121 |
Linus W Dietz1, Mete Sertkan2, Saadi Myftija1, Sameera Thimbiri Palage1, Julia Neidhardt2, Wolfgang Wörndl1.
Abstract
Characterizing items for content-based recommender systems is a challenging task in complex domains such as travel and tourism. In the case of destination recommendation, no feature set can be readily used as a similarity ground truth, which makes it hard to evaluate the quality of destination characterization approaches. Furthermore, the process should scale well for many items, be cost-efficient, and most importantly correct. To evaluate which data sources are most suitable, we investigate 18 characterization methods that fall into three categories: venue data, textual data, and factual data. We make these data models comparable using rank agreement metrics and reveal which data sources capture similar underlying concepts. To support choosing more suitable data models, we capture a desired concept using an expert survey and evaluate our characterization methods toward it. We find that the textual models to characterize cities perform best overall, with data models based on factual and venue data being less competitive. However, we show that data models with explicit features can be optimized by learning weights for their features.Entities:
Keywords: content-based filtering; data mining; destination characterization; expert evaluation; rank agreement metrics; recommender systems
Year: 2022 PMID: 35464121 PMCID: PMC9022027 DOI: 10.3389/fdata.2022.829939
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
Overview of the data sources for characterizing cities.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Venue | Foursquare | LBSN | Venues | 2,468,736 | FSQ |
| Data | OpenStreetMap | Collaborative map | Map entities | 3,106,856 | OSM |
| Textual | Wikipedia | Collaborative encyclopedia | Documents | 1,150,719 words | WP |
| Wikitravel | Travel-related Wiki | Documents | 984,777 words | WT | |
| Google travel | Travel information | Documents | 56,499 words | GT | |
| Factual | Webologen | Travel information provider | City features | 49 tourism facts/city | TF |
| Nomad list | Collaborative travel information | City features | 8 features / city | Nomadlist | |
| Seven factor model | Scientific characterization | Derived factors | 7 factors / city | 7FM-2018 | |
| Geographic location | Geographic location | Latitude, longitude | 1 coordinate pair / city | GEO |
Figure 1Geographic distribution of the characterized cities. Map data © OpenStreetMap contributors, see https://www.openstreetmap.org/copyright.
Figure 2Venue category distribution for a subset of cities.
Three expert opinions on the city of Munich are contrasted with the WP-jaccard ranked lists. The ranking of Expert 1 is closer to the ranked list than the two others.
|
|
|
|
| |
|---|---|---|---|---|
| 1 | Vienna | Salzburg | Vienna | Frankfurt |
| 2 | Dusseldorf | Vienna | Milan | Brussels |
| 3 | Leipzig | Cologne | Dusseldorf | Heidelberg |
| 4 | Berlin | Graz | Paris | Budapest |
| 5 | Frankfurt | Milan | Boston | Hamburg |
| 6 | Heidelberg | Edinburgh | Luxembourg | Barcelona |
| 7 | Cologne | Dusseldorf | Berlin | Vienna |
| 8 | Nuremberg | Hamburg | Cologne | Prague |
| 9 | Salzburg | Amsterdam | Vancouver | Berlin |
| 10 | Copenhagen | Brussels | Dubai | Rome |
| 146 | 292 | 263 | ||
| 14 | 15 | 20 |
Figure 3User interface of the expert survey.
Expert annotators behavior: amount of cities selected from the shortlist vs. the full alphabetical list and percentages of the same cities selected.
|
|
|
|
|
|---|---|---|---|
| Amsterdam | 20.00 | 80.00 | 17.65 |
| Bangkok | 30.43 | 69.57 | 27.78 |
| Barcelona | 40.00 | 60.00 | 12.57 |
| Berlin | 0.00 | 100.00 | 36.51 |
| Brussels | 55.00 | 45.00 | 11.11 |
| Chicago | 30.00 | 70.00 | 17.65 |
| Copenhagen | 22.73 | 77.27 | 15.79 |
| Hamburg | 23.33 | 76.67 | 27.78 |
| Hong Kong | 0.00 | 100.00 | 40.00 |
| London | 8.06 | 91.94 | 21.92 |
| Madrid | 18.00 | 82.00 | 29.50 |
| Miami | 48.00 | 52.00 | 26.59 |
| Moscow | 10.00 | 90.00 | 25.00 |
| Mumbai | 20.00 | 80.00 | 11.11 |
| Munich | 23.33 | 76.67 | 17.92 |
| New York City | 13.33 | 86.67 | 24.39 |
| Nice | 28.85 | 71.15 | 14.41 |
| Osaka | 10.00 | 90.00 | 11.11 |
| Oslo | 0.00 | 100.00 | 37.50 |
| Paris | 9.38 | 90.62 | 31.02 |
| Rome | 20.00 | 80.00 | 21.69 |
| Saint Petersburg | 43.33 | 56.67 | 17.92 |
| San Diego | 35.00 | 65.00 | 53.85 |
| Seville | 3.12 | 96.88 | 46.98 |
| Singapore | 11.63 | 88.37 | 30.72 |
| Stockholm | 12.50 | 87.50 | 32.83 |
| Vancouver | 10.00 | 90.00 | 26.32 |
| Vienna | 18.92 | 81.08 | 37.16 |
| Overall | 20.18 | 79.82 | 25.88 |
Figure 4Crosswise-analysis of all data models using Kendall's Tau method. The entries are sorted using hierarchical clustering; the dendrogram reveals families of data sources. The colors are scaled according to Kendall's Tau with the bright yellow corresponding to 0 (the diagonal) and dark red representing no correlation (Random).
Ranking of the different data sources using the modified rank agreement methods for top-k lists as well as MRR and Precision.
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
| WP-jaccard | 297.011 | GEO | 18.284 | WP-jaccard | 0.101 | WP-word2vec | 0.186 | WP-jaccard | 0.304 |
| GEO | 304.091 | WP-jaccard | 18.750 | WP-word2vec | 0.101 | WT-word2vec | 0.182 | GEO | 0.302 |
| WP-word2vec | 318.080 | WP-word2vec | 19.068 | WT-word2vec | 0.100 | WP-jaccard | 0.178 | WT-jaccard | 0.297 |
| WT-word2vec | 322.489 | WT-jaccard | 19.227 | FSQ-2nd | 0.094 | GEO | 0.162 | WT-word2vec | 0.297 |
| WT-jaccard | 330.057 | WP-BERT | 19.568 | WP-BERT | 0.093 | WT-jaccard | 0.161 | WP-word2vec | 0.291 |
| FSQ-2nd | 330.420 | GT-word2vec | 19.852 | GEO | 0.093 | FSQ-2nd | 0.154 | WP-BERT | 0.279 |
| WP-BERT | 343.307 | WT-word2vec | 20.011 | WT-jaccard | 0.093 | WP-BERT | 0.147 | FSQ-2nd | 0.264 |
| GT-word2vec | 346.955 | FSQ-2nd | 20.625 | GT-word2vec | 0.088 | GT-word2vec | 0.139 | GT-word2vec | 0.263 |
| TF | 395.841 | WT-BERT | 20.818 | WT-BERT | 0.081 | WT-BERT | 0.139 | WT-BERT | 0.243 |
| GT-BERT | 396.375 | TF | 21.409 | TF | 0.075 | TF | 0.113 | TF | 0.231 |
| WT-BERT | 402.159 | GT-BERT | 21.477 | GT-BERT | 0.074 | GT-BERT | 0.103 | GT-BERT | 0.202 |
| GT-jaccard | 408.943 | GT-jaccard | 21.864 | GT-jaccard | 0.067 | OSM-2nd | 0.099 | OSM-2nd | 0.195 |
| 7FM-2018 | 457.909 | OSM-2nd | 22.375 | 7FM-2018 | 0.065 | Nomadlist | 0.092 | GT-jaccard | 0.187 |
| Nomadlist | 461.830 | Nomadlist | 22.420 | OSM-2nd | 0.065 | OSM-TOP | 0.090 | 7FM-2018 | 0.187 |
| FSQ-TOP | 506.500 | FSQ-TOP | 22.864 | Nomadlist | 0.063 | GT-jaccard | 0.087 | OSM-TOP | 0.180 |
| OSM-TOP | 516.114 | 7FM-2018 | 22.966 | OSM-TOP | 0.063 | 7FM-2018 | 0.086 | Nomadlist | 0.169 |
| OSM-2nd | 521.273 | OSM-TOP | 23.045 | FSQ-TOP | 0.054 | FSQ-TOP | 0.060 | FSQ-TOP | 0.122 |
| RANDOM | 649.398 | RANDOM | 23.341 | RANDOM | 0.039 | RANDOM | 0.033 | RANDOM | 0.073 |
Optimization toward the Expert Opinion using Spearman's Footrule top-k.
|
|
|
|
|
|---|---|---|---|
| Nomadlist | 461.83 | 426.27 | 7.70% |
| FSQ-TOP | 506.50 | 503.33 | 0.63% |
| FSQ-2nd | 330.42 | 312.47 | 5.43% |
| OSM-TOP | 516.11 | 508.38 | 1.50% |
| OSM-2nd | 521.27 | 490.33 | 5.94% |