| Literature DB >> 32355599 |
Desislava Hristova1, Anastasios Noulas2, Chloë Brown1, Mirco Musolesi3, Cecilia Mascolo1.
Abstract
Online social systems are multiplex in nature as multiple links may exist between the same two users across different social media. In this work, we study the geo-social properties of multiplex links, spanning more than one social network and apply their structural and interaction features to the problem of link prediction across social networking services. Exploring the intersection of two popular online platforms - Twitter and location-based social network Foursquare - we represent the two together as a composite multilayer online social network, where each platform represents a layer in the network. We find that pairs of users connected on both services, have greater neighbourhood similarity and are more similar in terms of their social and spatial properties on both platforms in comparison with pairs who are connected on just one of the social networks. Our evaluation, which aims to shed light on the implications of multiplexity for the link generation process, shows that we can successfully predict links across social networking services. In addition, we also show how combining information from multiple heterogeneous networks in a multilayer configuration can provide new insights into user interactions on online social networks, and can significantly improve link prediction systems with valuable applications to social bootstrapping and friend recommendations. © Hristova et al. 2016.Entities:
Keywords: link prediction; media multiplexity; multilayer networks; online social networks
Year: 2016 PMID: 32355599 PMCID: PMC7175673 DOI: 10.1140/epjds/s13688-016-0087-z
Source DB: PubMed Journal: EPJ Data Sci ISSN: 2193-1127 Impact factor: 3.184
Figure 1Multilayer model. Multilayer model of OSNs (Panel Figure A) with different link types (Panel Figure B): I. Multiplex link; II. Single-layer link on ; and III. Single-layer link on .
Dataset properties: number of users (nodes); number of multiplex links (edges); number of Twitter and Foursquare only edges; average global and core degrees; activity and venues per city.
|
|
|
|
|
|
|---|---|---|---|---|
|
| 6,401 | 2,883 | 1,705 | 10,989 |
|
| 9,101 | 5,486 | 1,517 | 16,104 |
|
| 13,623 | 7,949 | 1,776 | 23,348 |
|
| 6,394 | 4,202 | 863 | 11,459 |
|
| 4.55 | 6.12 | 2.44 | 4.63 |
|
| 1.42 | 1.9 | 0.89 | 1.47 |
|
| 2,509,802 | 1,288,865 | 632,780 | 4,431,447 |
|
| 228,422 | 105,250 | 46,823 | 380,495 |
|
| 24,110 | 11,773 | 6,934 | 42,817 |
Figure 2CCDF of the Adamic/Adar overlap metric. Complementary cumulative distribution function of the log Adamic/Adar index for the different network configurations, grouped by link type - Twitter overlap (A), Foursquare overlap (B), Global overlap (C), Core overlap (D). Each figure shows the fraction of links with an value greater than x.
Figure 3Exponent matrix for . Colour gradient indicates the optimal exponents in terms of difference maximisation between the medians of the multiplex and non-existent link types - .
Figure 4Interaction features’ distribution for each link type. Panel Figure A-C show the distributions of Twitter mentions (A) Common hashtags (B), and Number of colocations (C) in log scale. Panel Figure D shows the distribution of distance in km between the home locations of users according to the type of link they have (top 10% of distances are excluded for figure readability), and Figure E and F show the distribution of the multilayer similarity and interaction features.
Summary of link features. We denote the Twitter neighbourhood as and the Foursquare neighbourhood as
|
| |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 5Link prediction results. ROC curves for the Random Forest classifier and Area Under the Curve (AUC) scores for each city dataset. Panel Figure A shows the results for predicting Foursquare links using Twitter features, while panel Figure B displays the results for the reverse task of predicting Twitter links using the Foursquare geographical features. Figures C-E focus on the second prediction task - predicting multiplex links using Twitter features (C), using Foursquare features (D) and using multilayer features (E).