| Literature DB >> 35378818 |
Cagri Toraman1, Furkan Şahinuç1, Eyup Halit Yilmaz1, Ibrahim Batuhan Akkaya1.
Abstract
Information is spread as individuals engage with other users in the underlying social network. Analysis of social engagements can therefore provide insights to understand the motivation behind how and why users engage with others in different activities. In this study, we aim to understand the driving factors behind four engagement types in Twitter, namely like, reply, retweet, and quote. We extensively analyze a diverse set of features that reflect user behaviors, as well as tweet attributes and semantics by natural language processing, including a deep learning language model, BERT. The performance of these features is assessed in a supervised task of engagement prediction by learning social engagements from over 14 million multilingual tweets. In the light of our experimental results, we find that users would engage with tweets based on text semantics and contents regardless of tweet author, yet popular and trusted authors could be important for reply and quote. Users who actively liked and retweeted in the past are likely to maintain this type of behavior in the future, while this trend is not seen in more complex types of engagements, reply, and quote. Moreover, users do not necessarily follow the behavior of other users with whom they have previously engaged. We further discuss the social insights obtained from the experimental results to understand better user behavior and social engagements in online social networks. Supplementary Information: The online version contains supplementary material available at 10.1007/s13278-022-00872-1.Entities:
Keywords: Natural language processing; Online social network; Social engagement; Text features; Tweet; User features
Year: 2022 PMID: 35378818 PMCID: PMC8968783 DOI: 10.1007/s13278-022-00872-1
Source DB: PubMed Journal: Soc Netw Anal Min
Fig. 1An illustration of the phases that we follow to understand social engagements
Fig. 2An illustration of user and tweet features for supervised social engagement prediction
The list of the features that we extract and analyze in this study
| Group | Feature name | Short description | Range | ||
|---|---|---|---|---|---|
| User | Engagee | Meta | Account Age Influential Verified | How long ago engagee created the account The ratio between followees and followers of engagee Verification status of tweet author. | [0, ∞) (-∞, ∞) True of false |
| Engager | Meta | Influential Verified | The ratio between followees and followers of engager Verification status of engager | (-∞, ∞) True or false | |
| Activity | Conditional Activity Prior Activity | Prob. of engager being interacted with tweet author Prob. of engager having any engagement | [0, 1] [0, 1] | ||
| Social | Conditional Social | Prob. of engagement between the given engager and previous engagers of the given tweet | [0, 1] | ||
| Time | Conditional Time | Prob. of engager having any engagement in time slot | [0, 1] | ||
| Tweet | Meta | Meta | Media Language Type | Media type if tweet contains Language of tweet Type of tweet (not true engagement label) | Photo, video, GIF, or none {1,..,66} RT, quote, reply, or top |
| Content | Hashtag | Hashtag Existence Conditional Hashtag Prior Hashtag | Indication of existence of any hashtag in tweet. Prob. of engager having engagements with hashtag Prob. of hashtag being observed | True or false [0, 1] [0, 1] | |
| URL | URL Existence Conditional URL Prior URL | Indication of existence of any URL in tweet Prob. of engager having engagements with URL Prob. of URL being observed. | True of false. [0, 1] [0, 1] | ||
| Text | Length Embeddings Sim. (Cos/BOW) Sim. (Cos/TOK) Sim. (Dice/BOW) Sim. (Dice/TOK) Sim. (BERT NSP) | The total number of BERT tokens in tweet Pre-trained BERT sequence embeddings Cosine sim. between tweet and engager profile with BOW Cosine sim. between tweet and engager profile with TOK Dice sim. between tweet and engager profile with BOW Dice sim. between tweet and engager profile with TOK NSP task between tweet and engager profile | {0,..,512} 768-dim. [0, 1] [0, 1] [0, 1] [0, 1] [0, 1] | ||
Fig. 3An illustration for a two layer perceptron
The distribution of tweets, engagers, and engagees to the engagement types (like, retweet, reply, and quote). Negative means items having no engagement at all. Positive means items having at least one engagement
| Item | Total | Negative | Positive | Like | Retweet | Reply | Quote |
|---|---|---|---|---|---|---|---|
| Tweet | 14,115,364 | 6,936,057 | 7,179,307 | 6,183,928 | 1,581,922 | 379,874 | 108,216 |
| Engager | 9,989,359 | 4,738,788 | 5,250,571 | 4,550,366 | 1,250,552 | 363,948 | 105,175 |
| Engagee | 4,077,535 | 1,655,972 | 2,421,563 | 2,118,314 | 722,879 | 277,995 | 82,433 |
The distribution of engagers, hashtags, and URLs to the training and test sets
| Item | All dataset | Only test | Only train | Both train and test |
|---|---|---|---|---|
| Engager | 9,989,359 | 1,599,330 | 7,335,108 | 1,054,921 |
| Hashtag | 747,049 | 120,877 | 507,120 | 119,052 |
| URL | 952,381 | 209,029 | 725,895 | 17,457 |
The RCE scores obtained by the random and replicate predictors for each engagement type
| Method | Like | Retweet | Reply | Quote |
|---|---|---|---|---|
| Random prediction | −2410.713 | −1923.380 | −1350.714 | −1061.782 |
| Replicate prediction | −0.569 | −0.242 | −0.004 | −0.004 |
Fig. 4The top-5 high performing features in terms of RCE for each engagement type using LightGBM (left) and MLP (right)
Fig. 5RCE heatmap of feature groups for each engagement type using LightGBM (left) and MLP (right). Darker color means better performance
Performance of different combinations of textual features in terms of the RCE score. Improvements over BERT text sequence embeddings are given in bold
| Textual features | Like | Retweet | Reply | Quote |
|---|---|---|---|---|
| Text Sequence Embeddings (BERT) | 5.96 | 5.22 | 1.27 | |
| BERT + Length | 5.83 | 5.04 | 4.74 | 1.23 |
| BERT + Dice BOW | 5.68 | 4.17 | ||
| BERT + Dice TOK | 4.10 | |||
| BERT + Dice BOW + Cos BOW | 3.52 | 0.00 | 0.00 | |
| BERT + Dice BOW + Cos BOW + BERT NSP | 2.77 | −0.43 | 0.02 | |
| BERT + Dice BOW + Cos BOW + Length | 5.60 | 3.28 | 0.37 | |
| BERT + Dice BOW + Cos BOW + BERT NSP + Length | 3.48 | 0.78 | 0.02 | |
| BERT + Dice TOK + Cos TOK | 5.18 | 0.00 | 0.00 | |
| BERT + Dice TOK + Cos TOK + BERT NSP | 3.26 | 0.17 | 0.00 | |
| BERT + Dice TOK + Cos TOK + Length | 3.49 | 0.51 | ||
| BERT + Dice TOK + Cos TOK + BERT NSP + Length | 4.57 | 0.00 | -0.01 |
Important features for each engagement type in terms of RCE. Darker color means better performance. Not only individual features but also feature groups are considered. ER stands for engager, and EE for engagee
Failed features for each engagement type along with their sparsity and cold-start ratio. ER stands for engager, and EE for engagee
| Features | Sparsity | Cold-start |
|---|---|---|
| Tweet Text / Text Length | 0.000 | – |
| ER Social / Conditional | 0.536 | 0.603 |
| Tweet Hashtag / Prior | 0.801 | 0.504 |
| ER Meta / Influential | 0.845 | 0.603 |
| ER Meta / Verified | 0.998 | – |
| Tweet Hashtag / Conditional | 0.999 | 0.504 |
| Tweet URL / Conditional | 0.999 | 0.923 |
| ER Social / Conditional | 0.536 | 0.603 |
| Tweet Hashtag / Existence | 0.801 | – |
| Tweet Hashtag / Prior | 0.801 | 0.504 |
| Tweet URL / Prior | 0.862 | 0.923 |
| ER Meta / Verified | 0.998 | – |
| Tweet URL / Conditional | 0.999 | 0.923 |
| ER Social / Conditional | 0.536 | 0.603 |
| Tweet URL / Prior | 0.862 | 0.923 |
| Tweet Text / Cos BOW | 0.964 | 0.603 |
| ER Activity / Conditional | 0.994 | 0.603 |
| ER Meta / Verified | 0.998 | – |
| Tweet Hashtag / Conditional | 0.999 | 0.504 |
| Tweet Text / Text Length | 0.000 | – |
| EE Meta / Age | 0.000 | – |
| Tweet Meta / Type | 0.000 | – |
| ER Social / Conditional | 0.536 | 0.603 |
| Tweet Hashtag / Exist | 0.801 | – |
| Tweet Hashtag / Prior | 0.801 | 0.504 |
| Tweet URL / Existence | 0.862 | – |
| Tweet URL / Prior | 0.862 | 0.923 |
| Tweet Text / Dice BOW | 0.989 | 0.623 |
| ER Time / Conditional | 0.993 | 0.603 |
| ER Activity / Conditional | 0.994 | 0.603 |
| Tweet Hashtag / Conditional | 0.999 | 0.504 |
| Tweet URL / Conditional | 0.999 | 0.923 |
| ER Meta / Verified | 0.998 | – |
| ER Activity / Prior | 1.000 | 0.603 |