| Literature DB >> 34945948 |
James Flamino1,2, Bowen Gong1, Frederick Buchanan1, Boleslaw K Szymanski1,2,3.
Abstract
Online social media provides massive open-ended platforms for users of a wide variety of backgrounds, interests, and beliefs to interact and debate, facilitating countless discussions across a myriad of subjects. With numerous unique voices being lent to the ever-growing information stream, it is essential to consider how the types of conversations that result from a social media post represent the post itself. We hypothesize that the biases and predispositions of users cause them to react to different topics in different ways not necessarily entirely intended by the sender. In this paper, we introduce a set of unique features that capture patterns of discourse, allowing us to empirically explore the relationship between a topic and the conversations it induces. Utilizing "microscopic" trends to describe "macroscopic" phenomena, we set a paradigm for analyzing information dissemination through the user reactions that arise from a topic, eliminating the need to analyze the involved text of the discussions. Using a Reddit dataset, we find that our features not only enable classifiers to accurately distinguish between content genre, but also can identify more subtle semantic differences in content under a single topic as well as isolating outliers whose subject matter is substantially different from the norm.Entities:
Keywords: information dynamics; network entropy; semantic analysis; topic modeling
Year: 2021 PMID: 34945948 PMCID: PMC8700409 DOI: 10.3390/e23121642
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Representation of a directed tree network T.
Symbols and definitions.
| Symbol | Definition |
|---|---|
|
| Some Subreddit within Reddit. |
|
| All users subscribed to |
|
| A submission within |
|
| A tree network representing the structure of hierarchically linked comments made by responders of a submission. |
|
| A user-generated comment within |
|
| The head node. This is the text submitted to the Subreddit that triggers the comment cascade. |
|
| A directed edge within |
|
| The set of branches found in |
Classifier scores for combinations of Subreddit labels.
|
| politics, gaming, soccer | politics, gaming | politics, soccer | gaming, soccer | politics, atheism |
|
| 0.81 | 0.82 | 0.96 | 0.91 | 0.76 |
Figure 2PCA of response features for 1000 submissions from , clustered by K-means (K = 2). Word clouds of keywords extracted from the text of the submissions contained in each cluster are shown above together, with the size of each keyword corresponding to the magnitude of their respective PageRank score.
Comparison of keyword frequency between the two identified topic clusters in .
| Game Thread | Playoff | Series | Friday | Trash Talk | |
|---|---|---|---|---|---|
|
| 204 | 76 | 35 | 1 | 1 |
|
| 1 | 1 | 22 | 31 | 32 |
Figure 3RBO score of top 10 keyword lists versus Euclidean distance of response feature clusters for (a) where and (b) where . RBO’s weight parameter p is set to .
Figure 4Comparison of clustering patterns with response feature K-means (a,c) and LDA (b,d) on the Subreddits (a,b) and (c,d).
Figure 5PCA of response features for 1000 submissions, clustered with K-means, from (a) where and (b) , where .