| Literature DB >> 28357190 |
Abstract
Recently, the LFM-1b dataset has been proposed to foster research and evaluation in music retrieval and music recommender systems, Schedl (Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR). New York, 2016). It contains more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and further includes a timestamp. Basic demographic information and a selection of more elaborate listener-specific descriptors are included as well, for anonymized users. In this article, we reveal information about LFM-1b's acquisition and content and we compare it to existing datasets. We furthermore provide an extensive statistical analysis of the dataset, including basic properties of the item sets, demographic coverage, distribution of listening events (e.g., over artists and users), and aspects related to music preference and consumption behavior (e.g., temporal features and mainstreaminess of listeners). Exploiting country information of users and genre tags of artists, we also create taste profiles for populations and determine similar and dissimilar countries in terms of their populations' music preferences. Finally, we illustrate the dataset's usage in a simple artist recommendation task, whose results are intended to serve as baseline against which more elaborate techniques can be assessed.Entities:
Keywords: Dataset; Experimentation; Music information retrieval; Music recommender systems; Music taste analysis; Statistical analysis
Year: 2017 PMID: 28357190 PMCID: PMC5350199 DOI: 10.1007/s13735-017-0118-y
Source DB: PubMed Journal: Int J Multimed Inf Retr
Description of the files constituting the LFM-1b dataset
| File | Content |
|---|---|
| LFM-1b_users.txt |
|
| LFM-1b_users_additional.txt |
|
| LFM-1b_artists.txt |
|
| LFM-1b_albums.txt |
|
| LFM-1b_tracks.txt |
|
| LFM-1b_LEs.txt |
|
| LFM-1b_LEs.mat | Idx_users (vector), idx_artists (vector), LEs (sparse matrix) |
Attributes of same emphasis are connected to each other
Description of the additional user features on preference and consumption behavior
| Attribute | Description |
|---|---|
| user-id | User identifier |
| novelty_artist_avg_month | Novelty score according to [ |
| novelty_artist_avg_6months | Novelty score, averaged over time windows of 6 months |
| novelty_artist_avg_year | Novelty score, averaged over time windows of 12 months |
| mainstreaminess_avg_month | Mainstreaminess score according to [ |
| mainstreaminess_avg_6months | Mainstreaminess score, averaged over time windows of 6 months |
| mainstreaminess_avg_year | Mainstreaminess score, averaged over time windows of 12 months |
| mainstreaminess_global | Mainstreaminess score, computed for the entire period of the user’s activity on Last.fm |
| cnt_listeningevents | Total number of the user’s listening events (playcounts) included in the dataset |
| cnt_distinct_tracks | Number of unique tracks listened to by the user |
| cnt_distinct_artists | Number of unique artists listened to by the user |
| cnt_listeningevents_per_week | Average number of listening events per week |
| relative_le_per_weekday[1–7] | Fraction of listening events for each weekday (starting on Monday) among all weekly plays, averaged over the user’s entire listening history |
| relative_le_per_hour[0–24] | Fraction of listening events for each hour of the day (starting with the time span 0:00–0:59) among all 24 h, averaged over the user’s entire listening history |
Statistics of items in the dataset
| Item | Number |
|---|---|
| Users | 120,322 |
| Artists | 3,190,371 |
| Albums | 15,991,038 |
| Tracks | 32,291,134 |
| Listening events | 1,088,161,692 |
| Unique <user, artist> pairs | 61,534,450 |
Statistics on country distribution of users. All countries with more than 1000 users are shown
| Country | No. of users | Pct. in dataset (%) |
|---|---|---|
| US | 10,255 | 18.581 |
| RU | 5024 | 9.103 |
| DE | 4578 | 8.295 |
| UK | 4534 | 8.215 |
| PL | 4408 | 7.987 |
| BR | 3886 | 7.041 |
| FI | 1409 | 2.553 |
| NL | 1375 | 2.491 |
| ES | 1243 | 2.252 |
| SE | 1231 | 2.230 |
| UA | 1143 | 2.071 |
| CA | 1077 | 1.951 |
| FR | 1055 | 1.912 |
| N/A | 65,132 | 54.131 |
Fig. 1Histogram of age distribution
Statistics on gender distribution of users
| Gender | No. of users | Pct. in dataset (%) |
|---|---|---|
| Male | 39,969 | 71.666 |
| Female | 15,802 | 28.334 |
| N/A | 64,551 | 53.649 |
Fig. 2Distribution of listening events by artist, log–log-scaled
Fig. 3Distribution of listening events by user, log–log-scaled
Statistics of the distribution of listening events among users and artists
| Users | Artists | |
|---|---|---|
| Playcount (PC) |
|
|
| Unique artists/users |
|
|
| Mean PC per artist/user |
|
|
| Median PC per artist/user |
|
|
Values after the ± sign indicate standard deviations
Fig. 4Distribution of listening events over weekdays
Fig. 5Distribution of listening events over hours of day. Each time range encompasses 0–59 min after the hour indicated on the x-axis
Statistics of novelty and mainstreaminess scores
| Novelty | Mainstreaminess | |
|---|---|---|
| Min. | 0.000 | 0.000 |
| 25-perc. | 0.354 | 0.016 |
| Median | 0.496 | 0.045 |
| 75-perc. | 0.647 | 0.079 |
| Max. | 1.000 | 0.393 |
| Mean | 0.504 | 0.054 |
| Std. | 0.211 | 0.048 |
Number of distinct genres and styles used by populations of different countries (absolute and relative to the 1998 genres in the Freebase list)
| Country | Genres (abs.) | Genres (rel.) (%) |
|---|---|---|
| US | 1111 | 55.55 |
| UK | 1103 | 55.15 |
| DE | 1100 | 55.00 |
| RU | 1097 | 54.85 |
| NL | 1081 | 54.05 |
| PL | 1077 | 53.85 |
| SE | 1062 | 53.10 |
| BR | 1053 | 52.65 |
| ES | 1043 | 52.15 |
| FI | 1042 | 52.10 |
Fig. 6Radar plot of genre profiles for the top 47 countries and most important genres in the Allmusic dictionary
Relative amount of listening events of the ten most frequent genres and styles for selected countries, using the Freebase dictionary
| UK | Japan | China | Iran | ||||
|---|---|---|---|---|---|---|---|
| Genre tag | LEs | Genre tag | LEs | Genre tag | LEs | Genre tag | LEs |
| Rock | 0.037763 | Rock | 0.037155 | Rock | 0.036785 | Rock | 0.042020 |
| Indie | 0.028798 | Alternative | 0.033602 | Alternative | 0.033730 | Alternative | 0.037685 |
| Pop | 0.028575 | Pop | 0.031633 | Pop | 0.032775 | Metal | 0.029204 |
| Alternative rock | 0.025095 | J-pop | 0.028724 | Electronic | 0.026281 | Experimental | 0.026783 |
| Electronic | 0.022812 | Indie | 0.025772 | Indie | 0.025942 | Alternative rock | 0.023297 |
| Indie rock | 0.022592 | Electronic | 0.023923 | Singer-songwriter | 0.021522 | Indie | 0.021951 |
| Experimental | 0.020482 | Alternative rock | 0.020440 | Pop rock | 0.018610 | Progressive | 0.021625 |
| Singer-songwriter | 0.017092 | Experimental | 0.018628 | Alternative rock | 0.018543 | Ambient | 0.020136 |
| Electronica | 0.016494 | Electronica | 0.018051 | Chill out | 0.018081 | Electronic | 0.019818 |
| UK 82 | 0.016274 | Pop rock | 0.016519 | Experimental | 0.016750 | Pop | 0.019022 |
Fig. 7Similarities between selected countries according to Freebase genre profiles
Average similarity of genre profiles to other countries. On the left side, countries with highest mainstreaminess; on the right side, countries with lowest mainstreaminess
| Country | Avg. sim. | Country | Avg. sim. |
|---|---|---|---|
| NL | 0.67677 | EE | 0.53230 |
| UK | 0.67371 | BG | 0.52833 |
| BE | 0.66060 | ID | 0.45704 |
| CA | 0.65586 | GR | 0.45162 |
| ES | 0.64279 | HU | 0.45156 |
| PT | 0.64021 | RO | 0.43108 |
| FR | 0.64000 | JP | 0.36894 |
| AU | 0.63917 | CN | 0.36652 |
| NO | 0.63090 | IR | 0.36369 |
| RU | 0.62583 | SK | 0.30447 |
Fig. 8Precision/recall plot of various recommendation algorithms applied to a random subset of 1,100 users from the LFM-1b dataset. Tenfold cross-validation was used. Precision and recall are plotted for various numbers of recommended artists N, ranging from 2 to 148 using a step size of 6