| Literature DB >> 35260708 |
Daniel Thilo Schroeder1,2, Pedro G Lind3,4,5, Johannes Langguth1, Luk Burchard1, Konstantin Pogorelov1.
Abstract
Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose "friend" or "follower" edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance "strengths" underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet-retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content and the time elapsed between publication and sharing. Moreover, we also present a preliminary topological analysis of the German Twitter network. Finally, making the data describing the weighted German Twitter network of acquaintances, we discuss how to apply this framework as a ground basis for investigating spreading phenomena of particular contents.Entities:
Mesh:
Year: 2022 PMID: 35260708 PMCID: PMC8902855 DOI: 10.1038/s41598-022-07961-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Twitter lists including the initial accounts from which the data was collected.
| Twitter list | |
|---|---|
| kunigkeit/pod-parteien-politiker | wahl_beobachter/mdl-sachsen-anhalt |
| VeHoltz/bundespolitik | wahl_beobachter/mdl-baden-w-rttemberg |
| MarcusSchwarze/fakes | wahl_beobachter/mdl-nrw |
| wahl_beobachter/botschaften | wahl_beobachter/mdl-saarland |
| wahl_beobachter/kandidaten-europawahl | wahl_beobachter/mdl-schleswig-holstein |
| wahl_beobachter/bundesministerien | wahl_beobachter/mdl-mecklenburg-vorpommen |
| wahl_beobachter/bundestagsfraktionen | wahl_beobachter/abgeordnetenhaus-agh |
| wahl_beobachter/mdb-bundestag | wahl_beobachter/bundesregierung |
| wahl_beobachter/mdl-bayern | wahl_beobachter/politikwissenschaftler |
| wahl_beobachter/mdl-hessen | wahl_beobachter/ministeriums-twitterati |
| wahl_beobachter/mdl-niedersachsen | wahl_beobachter/alle-25-parteien-ep2014 |
| wahl_beobachter/mdbb-bremen | wahl_beobachter/deutsche-mep-2019-2024 |
| wahl_beobachter/mdl-brandenburg | wahl_beobachter/open-government-hamburg |
| wahl_beobachter/mdl-sachsen | wahl_beobachter/mdhb-hamburg |
| wahl_beobachter/mdl-rheinland-pfalz1 | AfD/verifizierte-accounts |
| wahl_beobachter/mdl-th-ringen | AfD/bundestagsabgeordnete |
Figure 1(a) Distribution of the number of tweets that a user j retweets from another user i. (b) Distribution of the time-span between the instant user i tweeted a tweet and user j retweeted it. (c) Distribution of the sum in Eq. (3) for each pair of users. (d) Distribution of the part in Eq. (3) depending only on . (e) Distribution of the values of the weights as defined in Eq. (3).
Figure 2(a) Comparison between the degree distribution of the main connected component of the derived network from the Twitter dataset and the one of the known network of followers. The existence of an edge in the Twitter follower network is associated with the -score of the edge in our derived weighted network. (b) Size distribution of the connected components not included in the main component. All these components have sizes not larger than 1000 nodes, which justifies neglecting these components and focusing the topological analysis on the main component having approximately thirty million nodes (see text).
Figure 3Illustration of topological properties characterizing first and second neighborhoods. The figure does not reflect any structural properties of the derived network itself and is for illustration only. Three adjacent nodes , and are shown. The neighborhood of each node is highlighted with the color of the corresponding node, and the node itself is labeled with the respective and (see Eq. 4). It should be noted that the node in the center is connected with an outgoing edge to and an incoming edge to . Summing the incoming and outgoing edges of ’s neighbors gives the value for the corresponding or (see Eq. 6). The same applies to the node but with the outgoing edge from . Therefore, we receive or .
Figure 4(a) Distribution of the total number of neighbors of a node i. In the inset we plot the distribution of the size of the influencing neighborhood of each user i. (b) Distribution of the total weighted degree , which sums up the activity and the impact of each user.
Figure 5Comparing activity with impact and influencing neighborhoods with influenced neighborhoods: (a) : those who influenced the most are those who are influenced the least; (b) : activity and impact are not strongly correlated (c) : how much is the “influenced” level of i correlated with its respective “influencing” level?.
Figure 6Relation between first and second neighborhoods, identifying regions with different user behaviors. (a) : how is the size of the influencing neighborhood of neighbors of a certain user i correlated with the size of their influencing neighborhood? There is a critical size beyond which disassortativity is observed. (b) : how is the size of the influenced neighborhood of user i correlated with the size of its neighbors influencing neighborhood? Here one observes three regions: (I) one with few influencing neighbors and a lot of influenced neighbors (the “stars” of Twitter); (II) one “normal” region, where there is a typical value of the influenced neighborhood; and (III) a region with many influencing neighbors, who basically do not influence anyone. (c) : how much are my neighbors influenced by others? Here one observes complete disassortative. (d) : How is the size of the influenced neighborhood of neighbors of user i correlated with the size of their influenced neighborhood? Similar behavior as in (c).
Figure 7(a) Clustering coefficient spectrum of the derived network, (b) the respective shortest path length (SPL) spectrum and (c) betweenness centrality. While SPL, weighted by the values of , shows a unimodal spectrum, a mode at low values, and then an approximately linear decrease, both the clustering coefficient and the betweenness seem to have a polynomial decay. For performance reasons, only smaller sets of N nodes sampled from the main component were selected. As one observes, similar behavior is observed for different sizes of the sampled set of nodes, evidencing that results in this figure are representative of the entire network. We chose a random node from the largest component to create the subgraphs and performed a (directed) breadth first search from this node.