| Literature DB >> 33102925 |
Joshua Uyheng1, Kathleen M Carley1.
Abstract
Online hate speech represents a serious problem exacerbated by the ongoing COVID-19 pandemic. Although often anchored in real-world social divisions, hate speech in cyberspace may also be fueled inorganically by inauthentic actors like social bots. This work presents and employs a methodological pipeline for assessing the links between hate speech and bot-driven activity through the lens of social cybersecurity. Using a combination of machine learning and network science tools, we empirically characterize Twitter conversations about the pandemic in the United States and the Philippines. Our integrated analysis reveals idiosyncratic relationships between bots and hate speech across datasets, highlighting different network dynamics of racially charged toxicity in the US and political conflicts in the Philippines. Most crucially, we discover that bot activity is linked to higher hate in both countries, especially in communities which are denser and more isolated from others. We discuss several insights for probing issues of online hate speech and coordinated disinformation, especially through a global approach to computational social science. © Springer Nature Singapore Pte Ltd. 2020.Entities:
Keywords: Bots; COVID-19; Hate speech; Information maneuvers; Social cybersecurity
Year: 2020 PMID: 33102925 PMCID: PMC7574676 DOI: 10.1007/s42001-020-00087-4
Source DB: PubMed Journal: J Comput Soc Sci ISSN: 2432-2725
Evaluation of simple classifiers for hate speech using second-order psycholinguistic features
| Classifier | Micro F1 | Weighted F1 |
|---|---|---|
| Random baseline | 0.6262 | 0.6269 |
| Heuristic baseline | 0.6880 | 0.7236 |
| Logistic regression | 0.6342 | 0.6961 |
| Random forest |
Numbers in bold indicate the model with the best predictive performance
Subsequent analysis uses the best-performing random forest model
Summary of datasets with BotHunter, hate speech, and offensive speech scores
| Dataset | Bot predictions | Hate scores | Offensive scores | |||
|---|---|---|---|---|---|---|
| Tweets | Users | Tweets | Users | Tweets | Users | |
| US | 3.026 M (26.31%) | 237 K (14.91%) | 0.1023 (0.0769) | 0.1073 (0.0656) | 0.2773 (0.1519) | 0.2914 (0.1418) |
| PH | 3.436 M (21.73%) | 150 K (15.70%) | 0.0896 (0.0717) | 0.0924 (0.0600) | 0.2672 (0.1375) | 0.2836 (0.1180) |
For this table, an 80% probability threshold was used to classify a user as a bot. Bot tweets refer to tweets produced by bots predicted by the same threshold. Parentheses provide dataset proportions for BotHunter predictions and standard deviations for hate speech and offensive speech scores
Fig. 1Coefficients of a logistic regression model predicting whether an account is suspended based on its BotHunter probability, its hate speech score, and its offensive speech score. Error bars represent 95% confidence intervals
Tweet- and user-level correlation coefficients between BotHunter scores and hate speech/offensive speech scores
| Dataset | Bot-to-hate | Bot-to-offensive | ||
|---|---|---|---|---|
| Tweets | Users | Tweets | Users | |
| US | ||||
| PH | ||||
Fig. 2Normalized mean centrality of users in the datasets. Users are divided into low, medium, and high hate scores, respectively defined as the bottom quartile, the middle 50%, and the top quartile of hate scores. Bot predictions are based on an 80% probability threshold
Fig. 3Mean identity score of users in dataset. Users are divided into low, medium, and high hate scores, respectively defined as the bottom quartile, the middle 50%, and the top quartile of hate scores. Bot predictions are based on an 80% probability threshold
Summary of regression analysis for community-level bot activity and hate
| Variable | Model 1 | Model 2 | Model 3 |
|---|---|---|---|
|
| |||
| Intercept | 0.096 (0.007)*** | 0.136 (0.038)*** | 0.143 (0.013)*** |
| Bot Score | |||
| Size | 0.000 (0.000) | 0.000 (0.000) | 0.000 (0.000) |
| Density | |||
| E/I Index | 0.005 (0.013) | ||
| Cheeger Score | 0.011 (0.008) | 0.025 (0.056) | 0.016 (0.008) |
|
| |||
| Bot |
| 0.000 (0.000) | – |
| Bot | – | 0.065 (0.019)*** | 0.065 (0.019)*** |
| Bot | – | ||
| Bot | – | – | |
|
| |||
| Intercept | 0.068 (0.007)*** | 0.063 (0.059) | 0.087 (0.008)*** |
| Bot Score | |||
| Size | 0.000 (0.000) | 0.000 (0.000) | – |
| Density | 0.003 (0.005) | ||
| E/I Index | |||
| Cheeger Score | 0.028 (0.007)*** | 0.094 (0.069) | 0.064 (0.025)* |
|
| |||
| Bot | – | 0.000 (0.000) | – |
| Bot | – | 0.169 (0.031)*** | 0.166 (0.031)*** |
| Bot | – | 0.031 (0.099) | – |
| Bot | - | ||
Coefficients reported are unstandardized.
*, **, ***