Literature DB >> 32084164

Topic-driven toxicity: Exploring the relationship between online toxicity and news topics.

Joni Salminen^1,2, Sercan Sengün³, Juan Corporan⁴, Soon-Gyo Jung¹, Bernard J Jansen¹.

Abstract

Hateful commenting, also known as 'toxicity', frequently takes place within news stories in social media. Yet, the relationship between toxicity and news topics is poorly understood. To analyze how news topics relate to the toxicity of user comments, we classify topics of 63,886 online news videos of a large news channel using a neural network and topical tags used by journalists to label content. We score 320,246 user comments from those videos for toxicity and compare how the average toxicity of comments varies by topic. Findings show that topics like Racism, Israel-Palestine, and War & Conflict have more toxicity in the comments, and topics such as Science & Technology, Environment & Weather, and Arts & Culture have less toxic commenting. Qualitative analysis reveals five themes: Graphic videos, Humanistic stories, History and historical facts, Media as a manipulator, and Religion. We also observe cases where a typically more toxic topic becomes non-toxic and where a typically less toxic topic becomes "toxicified" when it involves sensitive elements, such as politics and religion. Findings suggest that news comment toxicity can be characterized as topic-driven toxicity that targets topics rather than as vindictive toxicity that targets users or groups. Practical implications suggest that humanistic framing of the news story (i.e., reporting stories through real everyday people) can reduce toxicity in the comments of an otherwise toxic topic.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 32084164 PMCID： PMC7034861 DOI： 10.1371/journal.pone.0228723

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Online toxicity, defined as hateful communication that is likely to cause an individual user leave a discussion [1], can manifest itself in various ways, including cyberbullying [2], trolling [3], and the creation of online firestorms, defined as “rapid discharges of large quantities of negative, often highly emotional posts in the social media environment” [4] (p. 286), where participants attack other groups or organizations. According to Patton et al. [5], online toxicity may result in violent actions also in the physical world and should, therefore, be treated as a matter with serious social gravity. Online hate speech is can be seen as old as the Internet itself. Anti-Semitic and racist hate groups were active on Bulletin Board Systems as early as 1984 [6]. In the present time, some communities are specifically geared towards promoting hate speech and providing avenues for expressing politically incorrect values that may not comfortably be expressed in face-to-face interactions [7,8]. Toxic commenting has also been found prevalent in general online discussion forums, news websites, and social media platforms. The existing research deals with multiple aspects, such as detection and classification of toxicity [9-11], assessing its impact on online communities [12,13], types of toxicity such as cyberbullying and trolling [2,14], and means of defusing online toxicity [15]. To approach toxicity, researchers have investigated multiple social media platforms, such as Twitter, YouTube, Facebook, and Reddit [7,11], as well as comments in online discussion forums and news websites [16]. Due to its high prevalence, toxicity has been identified as a key concern for the health of online communities. Additionally, previous research has identified several risks from new technology to news dissemination and journalism, including clickbait journalism [17], fake news [18], manipulation of search rankings and results to alter public opinion [19,20], and “story hijacking”, i.e., repurposing the original story [4]. For example, when the New York Police Department (NYPD) invited the community to share positive experiences, the move backfired, and 70 000 tweets of police brutality were shared alongside the hashtag #MyNYPD [4]. Despite the large amount of research focused on these two areas–online toxicity and the negative impact of technology on news–the relationship between news topics and online toxicity remains an unexplored research question. Even though prior research suggests an association between news topics and toxic comments, this association has not been empirically established. The previous studies suggest that political topics can cause hateful debates when associated with group polarization [21], i.e., a strong division to opposing groups among online users. In their study, Zhang et al. [22] considered topic as a feature in machine learning but did not provide an analysis of the relationship between different topics and toxicity. Despite implicative evidence of the relationship between news topics and online hate, toxicity of the comments of online news content has not been systematically analyzed by news topic in previous research. It is this research gap that we aim to address. We specifically investigate a concept that we refer to as online news toxicity, defined as toxic commenting taking place in relation to online news. Our aim is to analyze if different topics result in varying levels of toxic commenting. For this, we pose the following research questions: RQ1: How does online news toxicity vary by news topic? RQ2: What are the key themes characterizing online news toxicity? To address these questions, machine learning provides value, as it facilitates dealing with large-scale online data [11,23]. We address RQ1 by collecting a large dataset of YouTube news videos and all comments of those videos. We then topically classify the stories using supervised machine learning, and score each comment using a publicly available toxicity scoring service that has been trained using millions of social media comments. Using these two variables–toxicity and topic–we quantitatively analyze how toxicity varies by news topic. To address RQ2, we conduct an in-depth qualitative analysis of the relationship between content type and toxicity. We conclude by discussing the implications for journalists and other stakeholders and outlining future research directions. The focus on the online news context is important for a variety of reasons. First, because of the impact that news stories have in the society in shaping citizen’s worldview and the quality of public discourse [24]. Second, understanding toxic responses to online news stories matters to many stakeholder groups within the media profession, including online news and media organizations, content producers, journalists and editors, who struggle to make sense of the impact of their stories on the wider stratosphere of social media. Third, in the era of mischievous strategies for getting public attention, it is becoming increasingly difficult for news media to provide facts without seen as a manipulator or stakeholder in the debate itself. Previous research on online hate, suggest that toxicity is especially prevalent in online news media [11]. In the present time, news channels cannot isolate themselves from the audience reactions, but analyzing these reactions is important to understand the various sources of digital bias and to form an analytical relationship to the audience. Finally, the betterment of online experiences by mitigating online toxicity is a matter of societal impact, as toxic conversations impact nearly all online users across social media platforms [10,12,25].

Literature review

Antecedents for online toxicity

In online environments, toxic behavior is often seen enhanced by the fact that participants can typically comment anonymously and are not held accountable for their behavior in the same way as in offline interactions [3]. Online communities for marginalized or vulnerable groups are particularly exposed to online toxicity because discriminatory patterns, including sexism and racism, tend to be perpetuated and exacerbated online [26]. While inclusivity, accessibility and low barriers to entry have increased individual and citizen participation and the associated public debate on matters of social importance, toxic discussions show the cost of having low barriers or supervision for online participation. Because everyone can participate, also the people with toxic views are participating. Some studies highlight democracy of online environments as a contributing factor of online controversies [4,27]. Because the Internet brings together people with different backgrounds and allows a space for people to interact that do not normally interact with each other, an environment is created where contrasting attitudes and points of view are conflicting and colliding. Another explanation for online toxicity is that, even though online environments give unprecedented access to differing views and information, people tend to actively filter out information that is contrasting their existing views [21] and seek the company of like-minded individuals, forming closed “echo chambers”. These echo chambers are environments where like-minded people reinforce each other’s views, either without exposure to the views of the opposing side or seeing these views as the target of ridicule from the perspective of the shared narrative of the community [28]. Furthermore, the echo chambers may result in group polarization, in which a previously held moderate belief (e.g., “I’m not sure about the motives of the refugees”) is taking a more extreme form following the more radical elements of the community (e.g., “refugees are not really escaping violence but to get free social benefits”). A fundamental question that scholars investigating online hate are asking is whether online environments lend themselves sui generis to provocative and harassing behavior. Khorasani [29] notes that, like their counterparts in actual social networks, participants in online groups “make friendships and argue with each other and become involved in long and tedious conflicts and controversies” (p. 2). Moule et al. [30] observe, however, that online environments have created new forms of socialization and have forged changes in intra- and inter-group relations. Hardaker [3] argues that the relative anonymity provided in online exchanges “may encourage a sense of impunity and freedom from being held accountable for inappropriate online behaviour” [sic] (p. 215). In a similar vein, Chatzakou et al. [31] observe that because of the pseudo-anonymity of online platforms, people tend to express their viewpoints with less inhibition than they would in face-to-face interactions. Patton et al. [5] note the reciprocal relationship between online and offline violence. The low barriers of entry of online environments, they argue, have changed how peer-to-peer relationships are managed [5]. In sum, these previous findings support and stress the need for research on online toxicity.

Topics and online toxicity

Prior research has found that certain topics are more controversial than others (see Table 1). These include nationalism [29,32], sexism [31], agricultural policies [33], climate change (ibid.), religious differences (ibid.), defense [34], foreign policy (ibid.), intelligence agencies (ibid.), politician’s characteristics/personality traits (ibid.), energy [35], vaccination [19], fake news [19], and gun control [26,34]. For example, Kittur et al. [27] found that Wikipedia articles on well-known people, religion and philosophy involved more controversy and conflict.

Table 1

Topics for online toxicity.

Topic for toxicity	Definition / examples	Reference
Consumer firestorms	Consumer criticism toward corporations (e.g., Facebook outcry about a company’s billboard ads; Facebook privacy issues; Korean airlines firestorm; NFL’s CoverGirl ad; Notebook brand Moleskin asked designers to submit “free” designs; NYPD and McDonalds asking consumers to make positive online posts)	[36] [33] [4]
Environment	Polarizing environmental issues (e.g., climate change, agricultural policies, wind energy, biofuels, the Fukushima disaster)	[35] [33] [19]
Health	Health related commenting (e.g., vaccine controversies, food security)	[19]
Interpersonal	Disagreements between active members of specialized online discussion forums (e.g., petty disputes in a community forum)	[29] [3]
Media	Media and online platforms (fake news; fake reviews of tourist destinations and hospitality businesses)	[37] [19]
People	Personal attacks against public figures and well-known people (e.g., Woody Allen, Trump, attacking memorial pages of deceased people, known as RIP trolling)	[38] [39] [40] [11]
Philosophy	Philosophical debates	[40]
Politics	Political issues (Wikileaks and Edward Snowden, gun rights/gun control, news stories relating to economy, government inefficiency, immigration, defense, foreign policy, intelligence agencies, and politicians’ personality traits)	[33] [19] [26] [34]
Race	Race-related commenting (e.g., racist abuse on Twitter of an FA football player)	[41] [38]
Religion	Religious differences (e.g., Islamophobia)	[33] [40]
Sexism	Gender-related commenting (e.g., the #gamergate controversy related to gaming culture)	[42] [33] [26]

In general, the intersects between users’ commenting behaviour and the topic of news items are not yet well understood, even though some studies on negative user behavior explicate the link between topics and toxic commenting. It has been found that although controversial political or social topics typically generate more user comments, users often read news comments for their entertainment value rather than in response to the news article itself [43]. Another study found that writers of toxic comments rearticulated the meaning of news items to produce hate against a marginalized group, even if that group was not the topic of the news [44]. Although existing research on negative online behavior has implications for the research questions posed in this study, the relationship between online news topics and the toxicity of user comments has not been studied directly and systematically. The closest study we could locate is by Ksiazek [34] who offers a content analysis of news stories and user comments across twenty news websites with the aim of predicting the volume of comments and their relative quality in terms of civility and hostility. Hostility was defined as comments “intentionally designed to attack someone or something and, in doing so, incite anger or exasperation through the use of name-calling, character assassination, offensive language, profanity, and/or insulting language” [34] (p. 854). The study found that news stories about the economy, government inefficiency, immigration, gun control, defense, foreign policy, intelligence agencies, and politicians’ personality traits are more prone to elicit hostile discussion. Several other studies have treated the relationship between topic and toxicity implicitly. Wang and Liu [45] find support for readers’ varied emotional reactions specifically to news articles, while Salminen et al. [11] analyze the targets of online hate and find that media is targeted frequently in their dataset. Drawing on sociolinguistics and the social pragmatics of politeness, Zhang et al. [22] study some of the “warning signs” in the context of English Wikipedia that may indicate that an online conversation that started civil is about to derail. However, their study is explicitly topic-agnostic, as it disregards the influence of topic and focuses solely on the presence of rhetorical devices in online comments. Most notably, these earlier studies did not perform a topical analysis of the content. To extend the online research toxicity, we conduct a topical analysis to better understand the audience’s toxic responses to online news content. Although the relationship between news topics and online toxicity has not been systematically investigated, the broader literature on online hate speech suggests that topic sits within a host of other factors, all of which contribute to understanding the phenomenon of toxicity in online commenting. These studies point to the need for a deeper analysis of the intersects of personal values, group membership, and topic. While this study focuses only on the relationship between topic and toxicity, it is conducted with the understanding that the results provide a springboard for further research on the complex nature of toxic online commenting.

Methodology

Research design

We use machine learning to classify the topics of the news videos. For this, we use a fully connected Feed-Forward Neural Network (FFNN) that is a simple and widely used classification architecture [46]. We then score the toxicity of the comments automatically using a publicly available API service. The use of computational techniques is important because the sheer number of videos and comments makes their manual processing unfeasible. In this research, we utilize the website content, tagged for topics, to automatically classify the YouTube videos of the same organization that lack the topic labels. In other words, the FFNN is trained on textual articles from organization’s website, which are tagged with topic labels, and then used to predict the topics of YouTube videos, using their titles and descriptions. To answer our research question, we need to classify the videos because videos include user comments whose toxicity we are interested in. We then score each comment in each video for toxicity and carry out statistical testing to explore the differences of toxicity between topics. Additionally, we conduct a qualitative analysis to better understand the reasons for toxicity in the comments.

Research context

Our research context is Al Jazeera Media Network (AJ), a large international news and media organization that reports news topics on the website and on various social media platforms. Overall, AJ is a reputable news organization, internationally recognized for its journalism. However, from the content, we can see that the channel’s content has a “liberal” undertone that can be associated with political polarization between right and left, especially prominent in social media in the wake of the US presidential campaign in 2016 [47]. Previous research on toxicity in the organization’s social media comments [11] has shown that AJ’s content attracts a high number of toxic commenting. This can partly be explained by the fact that the audience consists of viewers from more than 150 countries, forming a diverse mix of ethnicities, cultures, social and demographic backgrounds. Previous literature implies that such a mix likely results in conflicts. At the same time, the organization represents an interesting research context as it reports news on a wide range of serious topics and is not geographically restricted–for example, AJ covers US politics but also international politics, European affairs and so on. However, this excludes entertainment and sports (apart from major sports events such as World Cup of football). For this reason, we characterize the content as “serious news” and consider the wide range of topics and diversity of the audience as well as the associated high prevalence of toxic commenting suitable for the purpose of this study.

Data collection

We collect two types of data from the news content (see Table 2): text content from news stories published in English on Al Jazeera’s (a) website (https://www.dropbox.com/s/keccjwuz0ruyztt/website%20data%20collection%20script.txt?dl=0) and (b) AJ+, one of the organization’s YouTube channel (https://www.youtube.com/channel/UCV3Nm3T-XAgVhKH9jT0ViRg). The website has more than 15M monthly visits, and the YouTube channel has more than 500,000 subscribers (August 2019).

Table 2

Description and purpose of data.

	Description	Content	Purpose
Dataset 1: YouTube	Comments and Video title and description	33,996 videos	To analyze the toxicity of videos by topic
Dataset 2: Website	News articles (HTML body text, titles), news keywords (topics)	21,709 webpages	To train the topic classifier for YouTube content

For YouTube data collection, we use the official YouTube Analytics API (https://developers.google.com/youtube/analytics/) with the channel owner’s permission and in compliance with YouTube’s terms of service. From YouTube, we retrieve all 33,996 available (through September 2018) videos with their titles, descriptions, and comments. The comments in this channel are not actively moderated, which provides a good dataset of the unfiltered reactions of the commentators. We collect the news stories using a Python script that retrieves the HTML content of new stories from the news organization’s website (see S1 File), including information about the article’s content, title, publication date, and topics. The website data contains 21,709 news articles, of which 13,058 (60.2%) have been manually tagged by AJ’s journalists and editors for topical keywords. Overall, there are 801 topical keywords used by the journalists to categorize the news articles. This tagging is done to improve the search-engine indexing of the news stories, so that the tags are placed in the content management system upon publishing the news story to characterize the content with topically descriptive tags, such as “racism”, “environment”, “US elections”, and so on.

Data pre-processing

The HTML content from the website contains some unnecessary information for the classification task, such as JavaScript functions, file directories, hypertext marking (HTML), white spaces, non-alphabetical characters, and stop words (i.e., common English words such as ‘and’, ‘of’ that provide little discriminative value). These add no information for the classifier algorithm and are thus removed. As machine learning models take numbers as input [48], we convert our articles into numbers using the Term Frequency–Inverted Document Frequency (TF-IDF) technique that counts the number of instances each unique word appears in each content piece. TF-IDF scores each word based on how common the word is in a given content piece, and how uncommon it is across all content pieces [49]. We then convert the cleaned articles into a TF-IDF matrix, excluding the most common and rarest words. Finally, we assign training data and ground-truth labels using a topic-count matrix.

News topic classification

We use the cleaned website text content, along with the topics, to train a neural network classifier that classifies the collected videos for news topics. Note that the contribution of this paper is not to present a novel method but rather to apply well-established machine learning methods to our research problem. To this end, we develop an FFNN model using the Keras, a publicly available Python Deep Learning library (http://keras.io) that enables us to create the FFNN architecture (a fully connected two-layer network). Additionally, we create a custom class to cross-validate and evaluate the FFNN, since Keras does not provide support for cross-validation by default. This is needed because cross-validation is an important step for ensuring that machine learning results are correct [50]. Training of the FFNN was done using the website data because the journalists have actively labeled the news articles for topics using their content management system that generates the topics as “news keywords” that can be automatically retrieved from the HTML source code. The YouTube content is not tagged, only containing generic classes chosen when uploading the videos on YouTube. The topics created by the journalists are crucial because journalists are considered as subject-matter experts of news, and the use of expert-labeled data generally improves the performance of supervised machine learning [51], because human expertise is helpful for the model to detect patterns from the data. From a technical point of view, this is a multilabel classification problem, as one news article is typically labeled for several topics. Note, however, that for statistical testing we only utilize the highest-ranking topic per a news story. More specifically, the output of the FFNN classifier is a matrix of confidence values for the combination of each news story and each topic. Of these, the chosen topics are the ones exceeding a set threshold value for the confidence–in our case, we use the commonly applied value of 0.5 for testing and, for statistical, we choose the topic with the highest confidence value. In other words, a story has only one “dominant” topic in the statistical analysis. This is done for parsimony, as using all or several topics per story would make the statistical comparison exceedingly complex.

Classifier evaluation

Here, we report the key evaluation methods and results of the topic classification. Note that a full evaluation study of the applied FFNN classifier is presented in Salminen et al. [48]. First, to optimize the parameters of the FFNN model, we create a helper class to conduct random optimization on both the TF-IDF matrix creation and the FFNN parameters. Subsequently, we identify a combination of FFNN parameters in the search space that provides the highest F1 Score (i.e., the harmonic mean of precision and recall). This combination is used to fine-tune the model parameters, and we obtain a solid performance (F1FFNN = 0.700). By “solid,” we mean that the results are satisfactory for this study, so that the accuracy of our algorithm is considerably higher than the probability of choosing the right topic by random chance (p = 1 / 799 ≈ 0.1%). FFNN also clearly outperforms a Random Forest (RF) model that was tested as a baseline model (F1RF = 0.458). As an alternative to the supervised methods, we also experimented with Latent Dirichlet Allocation (LDA), an unsupervised topic modeling approach [52]. LDA infers latent patterns (“topics”) from the distribution of words in a corpus [53]. For brevity, we exclude the results of these experiments from the manuscript; a manual inspection showed that the automatically inferred LDA topics are less meaningful and interpretable as the news keywords handpicked by the journalists working for the organization whose content we are analyzing. Therefore, we do not use LDA but rather train a supervised classifier based on manually annotated data by journalists that can be considered as experts of news topics. The importance of using domain experts for data annotation is widely acknowledged in machine learning literature [54,55]. Generally, expert taxonomies are considered as gold standards for classification [56]. We apply the model trained on website content (i.e., the cleaned article text) is applied to video content (i.e., the concatenated title and description text). Intuitively, we presume this approach works because the news topics covered in the YouTube channel are highly similar to those published on the website (e.g., covering a lot of political and international topics). Because we lack ground truth (there are no labels in the videos), we evaluate the validity of the machine-classified results by using three human coders to classify a sample of 500 videos using the same taxonomy that the machine applied. We then measure the simple agreement between the chosen topics by machine and human raters and find that the average agreement between the three human raters and the machine is 70.4%. Considering the high number of classes, we are satisfied with this result. In terms of success rate, the model provided a label for 96.1% of the content (i.e., 32,678 out of 33,996 YouTube videos).

Toxicity scoring

Alphabet, Google’s parent company, has launched an initiative, the Perspective API, aimed at preventing online harassment and providing safer environments for user discussions via the detection of toxic comments. Perspective API has been trained on millions of online comments using semi-supervised methods to capture the toxicity of online comments in various contexts [1]. Perspective API (https://perspectiveapi.com) defines a toxic comment as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion” [57]. This definition is relevant to our research, since it specifically focuses on online comments of which our dataset consists. Note that Perspective API is a publicly available service for toxicity prediction of social media comments, enabling replicability of the scoring process. We utilize the Perspective API to score the comments collected for this study. After obtaining an access key to the API, we test its performance. The version of the API at the time of the study had two main types of models: (a) alpha models and (b) experimental models. The alpha models include the default toxicity scoring model, while the experimental models include the severe toxicity, fast toxicity, attack on author, attack on commenter, incoherent (i.e., difficult to comprehend), inflammatory (provocative), likely to reject (according to New York Times moderation guidelines), obscene, spam, and unsubstantial (i.e., short comments). In this research, we use the alpha category’s default toxicity model that returns a score between 0 and 1, where 1 is the maximum toxicity. According to the Perspective API’s documentation, the returned scores represent toxicity probability, i.e., how likely a comment is perceived to be toxic by online users. To retrieve the toxicity scores, we sent the 320,246 comments to Perspective API; however, the tool returned some blank values. According to the API documentation, failure to provide scores can be due to non-English content, and too long comments. Overall, we were able to successfully score 240,554 comments, representing 78.2% of the comments in the dataset. A manual inspection showed that Perspective API was able to detect the toxicity of the comments well. To further establish the validity of the automatic scoring of Perspective API, we conducted a manual rating on a random sample of 150 comments. A trained research assistant determined if a comment is hateful or not (yes/no), and we compared this rating to the score of Perspective API. We use the threshold of 0.5 so that comments below that threshold are considered non-toxic and comments above toxic (note that this is comparable to the decision threshold of the classifier, also 0.5). We obtained a percentage agreement of 76.7% between the human annotator and the score given by Perspective API, which we deem reasonable for this study. We also computed Cohen’s Kappa that considers the probability of agreeing by chance. In total, there were 135 agreements (90% of the observations), whereas the number of agreements expected by chance would have been 118.5 (79% of the observations). The obtained Kappa metric of κ = 0.524 indicates a “moderate” agreement [58]. While the score would ideally be higher, we consider it acceptable for this study, especially given the evidence that toxicity ratings are highly subjective in the real world [59,60].

Obtaining toxicity scores of news topics

After scoring the video comments, we associate each comment with a topic from its video. As the toxicity score of each comment is known, we simply calculate the average toxicity score of the comments of a given video. After this, we have obtained the average toxicity score for each video based on its comments’ toxicity. Because we also have the topic of each video classified using the FFNN, taking the average score of all the videos within a given topic returns the average toxicity score of that topic.

Quantitative analysis

Data preparation

To simplify the statistical analysis, we reduce the number of classes by grouping similar topics under one theme (“superclass”). Thus, we group people into countries, countries into continents, and similar themes under one topic. In most cases, we kept the original names given by the journalists to the topics, only adding another topic. For example, Environment, Climate SOS and Weather became Environment & Weather. We grouped country names under continents. Many observations for Middle Eastern countries caused the creation of a separate superclass Middle East. Likewise, Israel, Palestine, and Gaza were grouped into the superclass Israel-Palestine. The superclass grouping was done manually by one of the researchers grouping the topics into thematically consistent classes, with another researcher corroborating that the superclasses logically correspond to the original classes. Table 3 shows the superclasses along with the number of topics and news videos in them. S1 Table provides a detailed taxonomy of the grouping.

Table 3

Superclasses (SC) and sample parameters.

Note that “Israel-Palestine” is considered as a news topic rather than region because the news stories in this category deal with various aspects of the regional conflict.

	Superclass	Sub-classes in SC	Videos in SC
News Topics
1	Arts & Culture	1	414
2	Business & Economy	1	831
3	Environment & Weather	3	309
4	Health	1	142
5	Human rights	1	287
6	Israel-Palestine Conflict	5	1012
7	Media	2	3054
8	Politics	9	1474
9	Racism	1	61
10	Science & Technology	1	356
11	Sport	2	63
12	War & Conflict	6	741
Countries & Regions
13	Africa	4	4819
14	Asia	12	3338
15	Europe	5	4348
16	Latin America	2	695
17	Middle East	12	5165
18	Russia	1	153
19	US & Canada	4	2258
Total		75	29,520

Superclasses (SC) and sample parameters.

Note that “Israel-Palestine” is considered as a news topic rather than region because the news stories in this category deal with various aspects of the regional conflict. By creating the superclasses, we reduced 73 topics to 19 superclasses, with a decrease of 74% in terms of the number of classes to analyze. This increases the power of the analysis by increasing the number of observations per class and makes the results easier to interpret.

Results

Exploring the means of toxicity by superclass reveals interesting information (see Table 4). For example, Racism has the highest average toxicity (M = 0.484, SE = 0.018) out of the news topics, while Science & Technology has the lowest (M = 0.277, SE = 0.007). Out of countries, news stories about Russia have the most toxic responses (M = 0.426, SE = 0.013), while stories about Latin America have the least toxicity (M = 0.359, SE = 0.006).

Table 4

Toxicity of superclasses.

Mean indicates average comment toxicity of the videos in the superclass.

Superclass	Mean toxicity	Std. Err.	95% CI
News topics
Racism	0.484	0.018	0.448	0.521
Israel-Palestine Conflict	0.474	0.004	0.466	0.482
War & Conflict	0.423	0.005	0.412	0.434
Human Rights	0.395	0.009	0.377	0.412
Media	0.374	0.002	0.368	0.380
Politics	0.370	0.004	0.362	0.379
Business & Economy	0.328	0.005	0.317	0.339
Sport	0.313	0.027	0.259	0.367
Health	0.310	0.014	0.281	0.339
Arts & Culture	0.303	0.008	0.286	0.320
Environment & Weather	0.301	0.009	0.283	0.320
Science & Technology	0.277	0.007	0.261	0.292
Countries & regions
Russia	0.426	0.013	0.400	0.451
Middle east	0.416	0.002	0.412	0.421
Europe	0.379	0.002	0.374	0.383
US & Canada	0.376	0.003	0.370	0.382
Asia	0.371	0.002	0.365	0.376
Africa	0.370	0.002	0.365	0.375
Latin America	0.359	0.006	0.345	0.372

Toxicity of superclasses.

Mean indicates average comment toxicity of the videos in the superclass. While explorative results are interesting, we cannot argue that the toxicity of Racism is higher than that of other superclasses without testing if the difference is statistically significant. This testing is done by comparing the average comment toxicity between the superclasses using regression analysis with dummy variables, as shown in Eq 1: where CT is the average comment toxicity of a news story i, belonging to superclass j (j = 1 to 19). Beta is the estimated regression coefficient. Moreover, is a dummy variable for superclass j. For each pairwise comparison, we exclude one of the dummy variables, which makes it a base category against which all other categories are compared. Since our regression has no other variables, the coefficient on every dummy variable represents the difference in mean values of toxicity of the respective superclass and the base superclass. Note that the F-test on this regression is equivalent to one-way analysis of variance (ANOVA) test for all groups, with the following hypotheses: Null hypothesis: All βj (j > 0) are equal to zero. Alternative hypothesis: at least one of the βj (j > 0) differs from zero. Rejecting the null hypothesis indicates that at least one of the two means are not equal and substantiate further pairwise comparison between means to clarify the exact pattern of differences. Given the regression specification, pairwise comparison of the superclass means–i.e., testing for statistical significance between means of two superclasses–can be done by t-test for statistical significance of respective dummy coefficients. However, the consistency and efficiency of the coefficients’ estimation by the ordinary least squares (OLS) method is based on the viability of several assumptions. One of the most vulnerable assumptions is equality of variance of the error term εi across the observations. The Cook-Weisberg [61] test for heteroskedasticity shows its violation for our dataset. Hence, we apply the Huber–White estimator of variance, which is a heteroskedasticity-robust estimation procedure [62]. Another aspect of validity in pairwise comparisons is the adjustment of p-values to account for multiple comparisons. These adjustments are needed because we perform the tests simultaneously on a single set of data. As a matter of sensitivity analysis, three types of adjustments are applied here: Bonferroni, Sidak, and Scheffe [63]. S2 Table shows the pairwise comparisons with each of these adjustments. Fig 1 shows a summary of conclusive (red in Fig 1) and inconclusive (yellow in figure) results. Due to the high number of pairwise comparisons, we show the results in the form of a matrix, where color indicates the significance of the mean differences.

Fig 1

Toxicity differences between topics.

Toxicity differences between topics.

Red indicates differences that are robust across the applied three multiple comparison tests. Orange indicates differences where the multiple comparison tests give inconclusive results, and grey cells are differences that not significant at p = 0.05. From the results, we observe that four topics have consistently fewer toxic responses: Science & Technology, Environment & Weather, Arts & Culture, and Business & Economy. Two other categories, Sport and Health, are also less provocative, although some of the test methods return insignificant results. The main reason for the inconclusive results for these two categories is likely their smaller number of observations. More provocative topics, in comparison to others, are Israel-Palestine, War & Conflict, Middle East, Russia, and Racism. The data along with full statistical results is available in S2 File.

Qualitative analysis

In addition to the quantitative analysis, we perform a qualitative analysis on a smaller subset of videos and the comments belonging to those videos. For this analysis, we purposefully sample 30 videos with the highest and 30 videos with the lowest toxicity scores from 9 video categories (5 of them being more toxic: Racism, Israel-Palestine, Russia, War & Conflict, and Middle East; and 4 of them being less toxic: Science & Technology, Environment & Weather, Arts & Culture, and Business & Economy). This sampling results in 30 × 2 × 9 = 540 videos. These 540 videos and their comments were analyzed for analytical questions (AQs): AQ1: Why are the comments likely to be toxic in a given superclass? AQ2: When are the comments in a generally toxic topic non-toxic? (For AQ2, we wanted to carry out a comparison between toxic and non-toxic videos–while a topic can raise a lot of toxicity, can we find cases where the comments are considerably less toxic? If so, what is the reason for that?) AQ3: When are the comments “toxicified”? (That is, when and why does a neutral topic like sport become toxic?) To address these questions, one of the researchers browsed all the sampled videos manually, examining their content, and reading the associated comments on YouTube. This researcher identified themes to address the analytical questions. Another researcher investigated the theme taxonomy and corroborated it. After this, the first researcher completed the analysis. We also recorded the number of views, duration, number of likes and dislikes, and the number of comments for each analyzed video. This manual data collection was performed during the final two weeks of April 2019, with the statistics for this data given in Table 5.

Table 5

Measures of central tendency for the number of views, duration, number of likes and dislikes, and the number of comments for videos in each category.

The table ignores the missing values of the videos that were removed between the collection of quantitative and qualitative data.

Category	# of Views	Duration (secs)	# of Likes	# of Dislikes	# of Comments
Racism Low	x¯=9628.47s = 16056.94R = 364;71998	x¯=575.47s = 896.01R = 116;2871	x¯=106.93s = 203.97R = 2;982	x¯=21.53s = 34.42R = 0;162	x¯=33.73s = 38.52R = 1;153
Racism High(1 missing value)	x¯=6426.24s = 8128.35R = 683;31889	x¯=356.07s = 666.66R = 51;2998	x¯=61.76s = 75.37R = 5;318	x¯=17.00s = 23.67R = 0;89	x¯=59.72s = 96.30R = 0;353
Israel-Palestine Low(6 missing values)	x¯=2633.30s = 3701.22R = 87;16839	x¯=338.71s = 665.48R = 6;2852	x¯=25.71s = 21.45R = 1;80	x¯=3.67s = 4.34R = 0;16	x¯=6.71s = 8.36R = 1;37
Israel-Palestine High(2 missing values)	x¯=1597.36s = 2884.00R = 86;15594	x¯=189.21s = 269.48R = 19;1500	x¯=13.43s = 12.08R = 1;62	x¯=3.86s = 7.13R = 0;37	x¯=5.71s = 7.26R = 0;32
Russia Low	x¯=4021.80s = 4424.29R = 132;14901	x¯=237.67s = 357.12R = 30;1582	x¯=26.67s = 23.60R = 0;127	x¯=7.30s = 10.57R = 0;42	x¯=10.33s = 12.71R = 0;61
Russia High(1 missing value)	x¯=2191.59s = 2435.35R = 137;13096	x¯=321.31s = 697.42R = 26;3673	x¯=15.86s = 11.09R = 3;58	x¯=6.45s = 8.33R = 0;44	x¯=9.72s = 11.38R = 0;39
War & Conflict Low(1 missing value)	x¯=2734.83s = 2610.78R = 542;12276	x¯=271.48s = 377.47R = 57;1500	x¯=19.24s = 16.25R = 4;69	x¯=5.62s = 6.29R = 0;21	x¯=3.34s = 3.73R = 1;16
War & Conflict High(2 missing values)	x¯=2638.11s = 1963.07R = 606;8245	x¯=137.36s = 109.22R = 39;658	x¯=18.32s = 28.19R = 3;159	x¯=3.89s = 3.64R = 0;17	x¯=4.46s = 5.32R = 0;25
Middle East Low(6 missing values)	x¯=2278.25s = 2057.82R = 408;10329	x¯=380.67s = 528.34R = 30;1616	x¯=14.29s = 13.55R = 3;64	x¯=2.88s = 3.37R = 0;14	x¯=2.54s = 3.19R = 0;16
Middle East High(3 missing values)	x¯=4630.85s = 15611.16R = 392;82609	x¯=202.67s = 278.34R = 15;1500	x¯=9.63s = 5.36R = 4;30	x¯=2.82s = 3.41R = 0;13	x¯=1.04s = 0.81R = 0;3
Science & Technology Low (3 missing values)	x¯=2585.33s = 2024.39R = 546;9105	x¯=896.33s = 942.85R = 76;2888	x¯=23.48s = 14.73R = 2;52	x¯=2.67s = 4.69R = 0;24	x¯=2.37s = 1.88R = 0;9
Science & Technology High (3 missing values)	x¯=6261.59s = 15223.10R = 340;73862	x¯=1301.74s = 1002.70R = 19;2362	x¯=48.26s = 122.39R = 2;642	x¯=12.74s = 27.25R = 0;107	x¯=37.48s = 109.88R = 0;563
Environment & Weather Low (2 missing values)	x¯=5316.36s = 6722.71R = 277;25445	x¯=505.54s = 533.89R = 72;1500	x¯=42.00s = 50.45R = 2;220	x¯=2.86s = 3.62R = 0;16	x¯=4.50s = 4.26R = 0;14
Environment & Weather High	x¯=1999.97s = 3101.94R = 471;14705	x¯=188.03s = 249.00R = 85;1500	x¯=13.47s = 7.49R = 2;34	x¯=2.43s = 2.60R = 0;10	x¯=6.37s = 7.51R = 1;34
Arts & Culture Low (1 missing value)	x¯=1963.66s = 1710.40R = 276;6709	x¯=289.62s = 418.68R = 99;1511	x¯=22.41s = 20.95R = 4;80	x¯=1.07s = 1.33R = 0;4	x¯=1.93s = 1.65R = 1;7
Arts & Culture High	x¯=4381.90s = 13052.42R = 428;73006	x¯=132.63s = 26.02R = 53;188	x¯=21.43s = 19.39R = 1;89	x¯=4.73s = 7.21R = 0;39	x¯=5.10s = 10.57R = 0;56
Business & Economy Low (1 missing value)	x¯=2372.83s = 2202.32R = 515;12374	x¯=323.86s = 484.23R = 21;1560	x¯=17.90s = 11.42R = 1;52	x¯=2.72s = 3.38R = 0;15	x¯=2.45s = 2.18R = 1;9
Business & Economy High	x¯=1798.88s = 1239.67R = 465;5625	x¯=335.70s = 477.47R = 108;1560	x¯=12.93s = 8.01R = 4;38	x¯=2.67s = 2.77R = 0;10	x¯=5.23s = 4.56R = 0;17

Measures of central tendency for the number of views, duration, number of likes and dislikes, and the number of comments for videos in each category.

The table ignores the missing values of the videos that were removed between the collection of quantitative and qualitative data. To understand whether the number of views, duration, number of likes and dislikes, and the number of comments are indicative of the toxicity score of a video, we calculated the Pearson coefficient between these values. The significant results are shown in Table 6. Although there does not seem to be a strong unifying story, it appears that more dislikes to a video and a greater number of comments correlate with more toxic video discussions, while more likes, a greater number of views, and longer videos correlate with less toxicity. While the correlation for likes vs. dislikes and the number of views with video toxicity score are easy to explain, duration is a surprising factor. Seemingly, the longer the video, the less toxic its discussions are likely to be. This leads us to believe that, perhaps, users did not want to comment without watching the entirety of a video and when the videos were longer, this probably dissuaded them from watching the content and commenting.

Table 6

Pearson correlation tests and direction between the toxicity score of a video and the number of views, duration, number of likes and dislikes, and the number of comments.

Category (Toxicity Scores)	# of Views	Duration (secs)	# of Likes	# of Dislikes	# of Comments
Racism	-	-	-	-	-
Israel-Palestine	-	-	p < 0.01 (-)	-	-
Russia	-	-	-	-	-
War & Conflict	-	p < 0.05 (-)	-	-	-
Middle East	-	-	-	-	p < 0.05 (-)
Science & Technology	-	-	-	p < 0.05 (+)	-
Environment & Weather	p < 0.05 (-)	p < 0.01 (-)	p < 0.05 (-)	-	-
Arts & Culture	-	-	-	p < 0.05 (+)	-
Business & Economy	-	-	p < 0.05 (-)	-	p < 0.01 (+)

Reading through and coding the comments and discussions under the videos, it was possible to discover several themes on the emergence of toxicity in these videos. These themes are discussed in the following.

Graphic videos

Qualitatively watching the videos revealed that graphic videos (typically these videos also have titles and thumbnails that indicate possible graphic content) spark more passionate and accordingly more toxic discussions. In contrast, videos that feature interviews and in-studio commentary pieces have less toxic discussions. Some examples of these graphic videos with high toxicity include Palestinians fight with Israeli security forces (BgplkpJrQXg), Clashes follow Palestinian teen’s funeral (E-ypG-hh4qc), and Russian troops enter Crimea airbase (EZzwv2byV6c). In contrast, when an interview or an in-studio commentary has toxicity (e.g., UpFront—Headliner: Richard Barrett, ihvq4IlTfFk), it is usually directed toward the presenter or the commentator (e.g., “Idiot […] what you suggest”).

Humanistic stories

Humanistic stories, i.e., ones that tell a story of an individual person, are less likely to attract toxicity, even under categories that are generally toxic like Middle East and War & Conflict. Some examples are Para athletic championship held in Middle East for first time (y0Nr4gr6vZQ), Former Uganda child soldiers return home (oMFk-jNXZEQ), Bomb-rigged homes delay return of Iraqi residents near Mosul (vDN5c7LTb94), and Ugandan families remember lost children (Se5KKIRsGH0). Even though there are political framings in these stories that elicit toxicity in other context, civil stories of war and conflict seem to attract less toxic comments. This observation is also in line with previous research by Jasperson and El-Kikhia [64] that underlined the importance of the media organization’s role in the humanitarian coverage of the Middle East in American media, especially CNN.

History and historical facts

Another major source of toxicity was the discussions around historical events and facts. This trend was even more apparent coupled with coverage on underrepresented communities that appear less in English news sources. It is possible to surmise that since English content about these issues appear less in news channels, they attract larger attention and discussion from users who have stakes about the content. Previous research asserts that social media users are more likely to access and share news from international news outlets [65]. Then, it seems likely that users who feel underrepresented in English news content are likely to disseminate these stories in social media, attracting more traffic and discussion. For example, videos titled Visiting the first free black town of the new world in Colombia (8gaXfr9WNwo), Afro-Cubans still at mercy of white wealth (9ycZwyIFDHI), Colombia: FARC rebels to disarm at transition zones (CGy9vVJDsmQ), and Thailand invites crown prince to become new king (eCm1LY3z7Kw) follow this trend. The discussions under these videos dominantly take place between locals (rather than locals vs. outsiders) while they are trying to agree upon the events and facts that led to the situations covered in the news piece. These are passionate discussions in English rather than in the local language. From the language use, content, and directions of the discussions we observe that they are made to create a “truthful” representation of events and the community to the international viewers.

Media as a manipulator

A common trend in less toxic categories like Business & Economy and high toxic categories like Racism is to frame international media as a tool of manipulation and propaganda. This is prevalent even when the message seems acceptable by the viewer (e.g., a comment in the video STUART HALL—Race, Gender, Class in the Media, FWP_N_FoW-I, reads: “Good message but shame it pushes an agenda.”). Especially, the coverage of #BlackLivesMatter and related news (e.g., Ferguson shooting) meets with a resistance that frames the organization’s coverage as anti-US propaganda that aims to destabilize the US public. Accordingly, this creates friction between users (presumably US citizens) who support these causes and those who see it as a manipulation regardless of the message. Similar discussions arise around discussions regarding Russia and Ukraine—from both sides depending on the context of the video. In a video about Mosul (Battle for Mosul: Iraqi forces advance on eastern front, ivXrlDpjlB8), users even try to deconstruct the content of the video as well as the political manipulation it aims for (e.g., “the guy on 1.30 is not even Iraqi.”). This trend becomes interestingly apparent in Business & Economy category. Although, generally, the category is a less toxic one, most of its coverage includes a resistance from users who have stakes in the content. For example, in the video Lebanon’s economy affected by Syrian conflict (CIcNhnQigvU), self-reported Lebanese users paint the coverage as economic manipulation. Similar discussions are in videos Japan braces for rise in sales tax (BvWvrp7VZL8), North Dakota Native Americans feel oil price pinch (VBLgARaM0Dk), Cuban economy faces hard times amid fears of Venezuela fallout (O_BI3p6eNIc), and Crimea vote brings economic uncertainty (GJpo6BVaRw4).

Religion

The final source of toxicity to note are the religious discussions that spark in the comments. They can be framed in two ways: (1) discussions between two users who are of different religious beliefs; (2) discussions between users with and without religious beliefs. An example for the first category would be this abbreviated exchange between two viewers from the video Philippines army clashes with rebels in the south (aBmrw5HEu48): User 1: “Islam is a crime against humanity […] Reject Islam and you might just get a taste of peace one day […]” User 2: “The Christians wiped out 100M natives in the New World, which is a genocide. A crime against Humanity, The Islamic World never reached that toll, and you say this is a crime against Humanity? How foolish […]” These discussions are generally framed around the perceived crimes committed by religious institutions and the members of particular religions in the past. The second category is sparked by user comments, which are non-toxic in nature and covers a sentimental religious adjuration. Frequently, these are met with anger from users (who might be less religious, have no religious beliefs, have a different perspective of the particular religion, or from other religions) who point out that the religious institutions and beliefs were the culprits of these problems in the first place. Here is an example exchange for this category from video Fragile truce broken in Syria refugee camp (Tk93DoL67c8): “Allah bless mujahideens” “Allah does not say to you to pick up arms […] Allah does not say to you to have 20 children and then fail to educate them.”

Discussion

Positioning findings to prior research

Our findings support the previous research highlighting the impact of topics on the emotional level of user comments in social media [33,40]. We extend this connection to the domain of online news media by specifically focusing on the relationship between online news content and toxicity of social media comments. The topics that are associated with a higher degree of toxicity can be interpreted as more divisive for the online audience, which is accentuated in the online environment that consists of participants with very different backgrounds, cultures, religions, and so on. In general, topics with political connotations (e.g., War & Conflict, Middle-East) arouse more toxicity than non-political topics (e.g., Sports, Science & technology), which corroborates previous research linking politics and online toxicity [11,21,66]. Regarding the qualitative analysis, the association between graphic content and toxicity is in line with previous research which asserts that graphic and/or violent images in news coverage spark a higher interest and elicit more passionate reactions—both negative and positive [67,68]. Multimedia news items elicit more user comments, and there is a small positive correlation between multimedia and online hostility [69]. It has also been suggested that especially carefully framed war imagery has the potential to construct narratives within official agendas and discourse [70]. Then, it becomes possible that these videos spark reactions both to their content and to the agendas that they seem to be developing. Another specific aspect to mention from the qualitative analysis is that the toxic comments often focus on the topic (e.g., religion, politics), rather than other participants or some unrelated targets. This characterizes the typical nature of toxicity in news context as “topic-driven toxicity” as opposed to other forms of toxicity, such as vindictive toxicity [28] where participants attack against one another. These personal attacks are more common when the participants are interacting; e.g., editing a Wikipedia page with controversial content [1], but they do not seem to be highly prevalent in online news toxicity. This suggests that users are not viewing news video commenting as a collaborative effort (e.g., discussion, conversation, or debate) but just as “an event to comment upon”. In particular, attacks against marginalized or vulnerable groups (e.g., minorities, women) that are reported in some earlier studies [26] are seldom present in online news toxicity; again, this highlights the target of toxicity being the “topic” rather than random individual or groups. However, we can observe group-related behavior when the topic is related to a specific group; for example, immigration videos do attract anti-immigration comments and religion videos anti-Islamic commenting. Moreover, the emergence of the “History and historical facts” theme shows how different groups are, in a way, “fighting over the narrative,” i.e., how the news stories should be framed and interpreted. This is interestingly contrasting the agenda-setting theory in that the audience may attack the news channel itself, challenging its agenda-setting authority. This conclusion is supported by the “media as a manipulator” theme and may be understood by keeping in mind that online readers fall broadly into “soldiers” (whose online activities are organized and group-based) and “players”, “watchdogs” and “believers” (who, for various reasons, act on their own initiative) [43]. In addition, there are obvious linkages to the “fake news” theme, where social media users are increasingly questioning the credibility of news channels [71]. Together, these themes suggest the audience is imposing their own interpretation and views over what happened, rather than readily adopting the facts or the story framing of the focal news outlet. This has at least two important implications: one, for public policy, these comments provide excellent material for analysis of alternative facts or narratives, as social media commentators are clearly voicing their–sometimes deviating–interpretations. Second, the news outlet can use these comments to segments the audience based on the different worldviews that are shown in the comments. One approach to this is creation of audience personas using social media data [72,73] or other forms of online news analytics [24].

Practical implications

In the era of social media, it is becoming increasingly difficult for news media not to be seen as a manipulator or stakeholder in the debate itself. However, the news channels cannot isolate themselves from the audience reactions in the wild. Analyzing new audience’s sentiment is important to leverage the two-directional nature of online social media [74] and to understand the various sources of “digital bias” of audiences and the news channels themselves. Our results suggest that news channels both have and have not power on the toxicity of the comments in their stories. In summary, the power comes from the fact that both topic selection (i.e., what topics are reported) and topic framing (i.e., how the topic is reported) impact toxicity of the social media commentators’ response. Both the empirical findings and the theoretical association between toxicity and agenda setting [75] and online toxicity suggest that content creators–intentionally or unintentionally–have power over the toxicity of online conversations. However, the unpredictable nature of social media commenting can reduce the channel owner’s power to govern the comment toxicity. For example, a neutral topic can become “toxicified” after introducing controversial elements, such as religion. We observed examples of this under Israel-Palestine and War-Conflict videos, where different political or national allegiances trigger toxicity, much similar to group polarization behavior [21]. For news channels, to avoid sensitive topics due to likely toxic reactions would be to submit to “tyranny of the audience,” i.e., avoiding important topics out of fear for toxic reactions. Obviously, this is not a good strategy, as responsible editorial decisions should be made based on the relevance of news rather than their controversial nature. However, being ignorant of the news audience’s reactions is not helpful either, as social media comments nowadays represent a major form of public discourse that the media should not ignore. Therefore, one needs to strike a balance towards fostering a constructive discussion and debate over topics, without sacrificing the coverage of sensitive topics. Perhaps a useful guideline is that, in the process of topic selection, content creators should be aware of the content topic’s inflammatory nature and possibly use that information to report in ways that mitigate negative responses rather than encourage them. This approach is compatible with the idea of “depoliticizing” suggested by Hamilton [76]. Note that depoliticizing does not mean avoiding political topics. It means defusing a controversial topic by using a framing style that is aimed at defusing toxicity while maintaining. In practice, journalists could use information from previous toxicity on a given topic when framing their news stories, especially in the context of topics with known high toxicity. Especially when dealing with an international audience base, the diversity of religious and political views is likely to result in heightened toxicity when stories are reported in a way that seems unfair or unbalanced for a group of participants. Therefore, we suggest that content creators should strive for a reporting style that appears objective and balanced, especially for the topics with a history of higher toxic commenting. To illustrate, consider a binary choice: given the journalist knows Topic A is controversial, does their story framing strategy aim to (a) exacerbate controversy or (b) alleviate controversy? This strategic choice, we argue, is important for the toxicity outcome. Our qualitative results suggest that when a story belonging to a topic with high average toxicity receives non-toxic responses, this is often consequence on how it is reported. This is especially visible in videos with tags “Humanistic” or “Humanistic stories” that report stories focused on real everyday people. A user quote on the story “Ugandan families remember lost children” sheds light to why humanistic stories are likely to be received more positively: “This is a really great video–informative and easy to watch makes you ponder on how grateful you really are.” Overall, toxicity seems less prevalent in these human stories. Note that we do not make the argument that human story angle should be applied to every story. Rather, consider news reportage as a mixture of framing styles and topics. This mixture can have topics and framing styles in different proportions to affect the total toxicity levels of a news organization. In one extreme, we have a news organization that is only reporting on controversial topics with a framing style that is polarizing. This combination, obviously, yields maximal toxicity in audience reactions. The opposite extreme, meaning avoiding controversial topics and reporting on everything with a non-polarizing strategy, would mitigate toxicity. The balance could be found somewhere in between, with a fair coverage of controversial topics using different story framing styles. Thus far, toxicity has not been a factor in editorial decision making, but could it be? This question is worth posing. The above guidelines highlight the need for an analytical understanding of the toxic behavior of news audiences and seeking ways to mitigate it, within the boundaries and best practices of responsible news reportage. Our findings are not meant to encourage the news media to avoid topics that cause toxicity or blame them for the toxicity. Rather, the findings depict the complex relationship between topics and news audiences. To this end, it is important to note that reading and commenting behavior do not always follow the logic of traditional news standards in deeming whether news is trustworthy or not [77]. News values have shifted dramatically since the advent of online news and online commenting. Bae [78], for example, found that readers who accessed the news via social media had a markedly raised tendency to believe political rumors. In one study, news stories that used sources–traditionally a measure for a story’s objectivity–elicited more hostility in the comments sections, while journalist participation in comments raised both the quality of commenting [34].

Limitations of the study

This research has, naturally, some limitations. First, the research assumes that the topics whose comments are more toxic are also more provocative topics than the topics whose comments are less toxic. However, the existence of toxicity can also have other reasons beyond the video itself, e.g., a toxic exchange between the commentators. In such a case, toxicity is due to not watching the video but due to hostile commentators. News topic, even though important, is not the only factor inciting toxic comments. In contrast, individual posting behaviour is a determining factor in predicting the prevalence of online hate. For example, Cheng et al. [79] found that, though the baseline rate of online hate was found higher for some topics, user mood and the presence of existing trolling behavior from other users within the context of a discussion doubled users’ baseline rates for participating in trolling behavior. As a social phenomenon, toxic online comments are shaped by many contextual factors [77], including individual psychology and group dynamics. A study by Kaakinen et al. [80] found that online hate increased after the November 2015 Paris terrorist attacks and that wider societal phenomena impact the prevalence of online hate at different times. The complexity of the matter is mirrored in the way research on user comments is dispersed through different disciplines, including journalism studies, communication studies, social psychology, and computer science, making an overarching grasp on the field difficult [77]. These distinct characteristics of online comments underline the fact that users’ hateful and toxic responses to certain topics are related to other factors than the topic itself. Future studies should, therefore, aim at synthesizing a conceptual framework of online news toxicity that would include elements of the topic, user-to-user dynamics, and story framing. Based on our findings, these three pillars are essential for understanding toxicity in the news context. Second, in this research, we make some assumptions that facilitate the analysis but may introduce a degree of error. We assume that the topic of the comment equals the topic of the video where we collected it from. However, it is possible that some comments are off-topic, i.e., not discussing the topic of the video. In such a case, the comment’s topic would not match the topic of the video. When interpreting the results, it is useful to consider reader comments to online news content as particular type of text. You et al. [81] describe online comments as “communicative”, “parasitic” and “intertextual” (p. 5). Comments share the same platform with the original news item and respond to both the original and to other user comments. Online comments may be generated long after the news item first appeared and may serve user agendas that have very little to do with the original news story. Third, regarding comment authenticity, it is possible that the sample contains some bot comments. Even though YouTube has filtering mechanisms for bots and the comments that we manually reviewed for this research all seemed real user comments, it is possible that there could be some bot comments. In this regard, we depend on the bot detection applied by YouTube, as bot detection in itself is a complicated subject of research [82]. Overall, we have no reason to believe the above issues would systematically affect a given topic on another topic. Rather, on average, it is likely that toxicity is triggered, to a major degree, by the topic of the video and, on average, the comments deal with the video rather than external stimuli. Fourth, our analysis omits factors, such as time and user characteristics, that could contribute to toxicity. Unfortunately, as noted in previous research [83], these characteristics are difficult to obtain as social media platforms typically do not expose comment-level user characteristics (e.g., age, gender, country). Here, our focus was on the analysis of topic and toxicity. Regarding generalizability of the findings, toxic commenting may differ across news organizations and geographical locations. However, the sampled news channel that has a diverse, international audience, reports on a variety of topics from politics to international affairs and has substantial commenting activity among its audience. While these features make it an exemplary case of a modern news channel facing online toxicity, replicating the analysis with content from other channels would be desirable in future work. Moreover, the study was only conducted in English, leaving room for replication in other languages.

Future research avenues

We identify several fruitful directions for future research. First, future research could investigate how various story framing styles (factual/one-sided/human story, etc.) as well as the linguistic style of news reporting influence the toxic commenting within a topic. Here, we investigated toxicity differences between the topics. As we observe that there is also a variation of toxicity within a topic, future research could explain within-topic variation, for example, by analyzing the impact of linguistic patterns on the average comment toxicity. Other ideas for future research include analyzing data from additional news channels and comparing the results, providing a deeper analysis beyond the included superclass taxonomy, and analyzing the differences between the toxicity levels on YouTube comments and comments in other social media platforms. Finally, research on channel-to-audience interaction is needed, specifically focusing on if and how journalist participation in social media can defuse toxicity.

Conclusion

Classifying tens of thousands of online videos for news topics and scoring the comments of the videos for toxicity, our empirical analysis reveals an association between online news topics and average comment toxicity. Results highlight the existence of topic-driven toxicity in online news context and provide some suggestions for news channels to potentially alleviate toxicity in their social media channels.

Python script explaining the data collection.

(TXT) Click here for additional data file.

Data with full statistical results.

(XLSX) Click here for additional data file.

Grouping of data into superclasses.

Note: “religion” was discarded from the analysis because the class contained only 3 videos. (DOCX) Click here for additional data file.

Summary of statistical test results.

(DOCX) Click here for additional data file. 5 Nov 2019 PONE-D-19-19498 Topic-driven Toxicity: Exploring the Relationship between Online Toxicity and News Topics PLOS ONE Dear Dr. Salminen, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== Based on the review comments from two experts in this field, all reviewers agree there are merits in this submission and the contributions are significant. At the same time, the reviewers also raised some concerns that are necessary to be addressed for publication. Based on my own reading, I fully agree with the review comments and hope the authors will address the concerns and revise the submission accordingly. ============================== We would appreciate receiving your revised manuscript by Dec 20 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Pin-Yu Chen, PhD Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Competing Interests section: "The authors have declared that no competing interests exist.". We note that one or more of the authors are employed by a commercial company: 'Banco Santa Cruz'. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form. * Please also include the following statement within your amended Funding Statement. “The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.” If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement. 2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. * Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. * Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests 3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. * We will update your Data Availability statement to reflect the information you provide in your cover letter. Additional Editor Comments (if provided): Based on the review comments from two experts in this field, all reviewers agree there are merits in this submission and the contributions are significant. At the same time, the reviewers also raised some concerns that are necessary to be addressed for publication. Based on my own reading, I fully agree with the review comments and hope the authors will address the concerns and revise the submission accordingly. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper explores the relationship between the topics of news videos posted on Youtube (from the Al Jazeera Media Network), and the toxicity of viewers’ comments. The subject is interesting and timely. The methodology is also interesting, and generally well described. Results show that comment toxicity varies according to the topic presented, which leads the authors to suggest that news comment toxicity is topic-driven (targeting topics) rather than vindictive (targeting people). While overall I think this is very interesting work, I have some concerns regarding the takeaways (or at least their current presentation) that I believe call for a revision. The paper would also benefit from another pass on typos and grammatical errors. MAJOR CONCERN On pages 25, 27, and 28, the authors should absolutely (!!!) remove the names of the persons they quote. Even if they are pseudonyms (which they do not seem to be), the authors should never include any clue that might allow to (re)identify individuals. This is an important breach of research ethics! REGARDING TAKEAWAYS In 6.1, the second recommendation seems dangerous to me. Although it follows a trend reported in other work, segmenting news audiences based on their worldview and curating content to comfort those views is precisely how thought echo-chambers are created. I strongly encourage the authors to nuance this point, and to offer some critical reflection on it. For example, while it might be a desirable economic outcome for the outlet, it is very likely not a desirable social outcome. I am not sure what the very last point in 6.2 adds to the discussion. The authors seem to imply that journalists should not use sources, as they might fuel hostility in comments. This is absurd. More generally, I am a bit concerned by the tension that exists throughout Section 6.2 between adapting journalistic practices to avoid online toxicity in comments, and, as the authors put it: “obviously, [avoiding sensitive topics] is not a good strategy.” While the authors do try to make it explicit that news outlets should not submit to the “tyranny of the audience”, their main arguments and suggestions seem to do exactly that, e.g., by depoliticizing content, or by applying a “humanistic” formula on every story. I recommend the authors rethink and rewrite this section to reduce this tension—especially in light of the first paragraph of 6.3. OTHER COMMENTS I think the beginning of Section 3 could be made clearer. What I understand from 3.3 is that the Neural Net is trained on (textual) articles from Al Jazeera’s website, which are tagged with topic labels, and then used to predict the topics of Youtube videos, using their titles and descriptions. Is this correct? This was not clear to me in 3.1. Further, are all the articles scraped form Al Jazeera’s website in English? The authors seem to imply their data collection is limited to English in 3.4, but do not mention it explicitly. Is each article on Al Jazeera’s website tagged only once, or can the same article have multiple tags? i.e., is the video-topic-prediction problem a multi-class or multi-label classification problem? The latter would then also complexify the study of the relationship between topic and comment toxicity. That said, it seems from the end of 3.6 that the authors’ Neural Net only outputs one label per video. I appreciate the authors’ attempts to bring external validation to each step of the data collection and (automatic) annotation process. I wonder how well connected the comments for a given video are to the topic of that video. Do viewers always stick to what they have watched in the video? Or might they write about other, unrelated topics? Admittedly, the authors do mention this in their discussion, but it would be good to state it earlier in the paper. Also, does each video contain only one topic? (this is related to the multi-label problem mentioned above). Assessing this seems important, since topic toxicity is simply calculated by subsequently averaging the comment toxicity for each video, and the video toxicity for each topic (see 3.8). I am not sure I understand the procedure for the aggregation of topics into superclasses. Did only one author do this? (Which seems to be the case, but the sentence mentioning this is not clear) Or did several? Also, is there overlap between News Topics and Countries and Regions, or are the topics that are combined into each of these superclasses distinct? Section 4.2 is very unclear to me. I suggest it be entirely rewritten, possibly condensed, and moved after 4.3 (at least after table 5). What is beta in the equation and in the null hypothesis? Can table 6 be fit into a single page? Also, the description in Section 4.3 mentions color in the matrix, but I do not see it. It would be good to highlight significant differences in the pairwise comparisons. In Section 5, the authors mention reading through and (manually?) coding the comments and discussions under the videos. How was the coding done? How many coders were there? In the subsection on Platform’s power, the authors do not prove there is a causal relationship between Google/Youtube’s description of the news outlet and the comments the they highlight. I understand the commenters are making statements based on the relationship between Al Jazeera and the Qatari government, but claiming this is directly linked to the phrasing of the Youtube tag is a bit of a stretch—which, again, is not empirically proven. What is the proportion of comments that directly target the relationship between the outlet and the government? Does this targeting not occur for the other outlets? Are there instances where Youtube does not tag a video and these types of comments are not present? This seems very one-way focused, and biased towards defending Al Jazeera. I do not think this has its place in the paper. In addition, it seems the wording of the three labels shown in figure 1 simply follows that of the first sentences of the three outlets’ respective Wikipedia pages. It is likely this wording is automatically derived from those sentences. I suggest the authors highly nuance, or even remove this subsection, as well as the discussion in Section 6 on it, as it seems partisan and weakens the rest of the contributions. In 6.3, one last, important limitation is that the study was only conducted in English. MINOR Add white space below each table. In 6.2, “it is becoming increasingly difficult for news media to remain [un]biased…” Also in 6.2, “—intentionally or [un]intentionally—” Reviewer #2: The theme of the paper is interesting, overall paper is well written and well organized. Moreover, the analysis have been rigorously performed and results are presented appropriately. I have only a few comments that are given below. In this paper, a third party service is used for toxicity quantification that was unable to compute toxicity on 21.8% of the comments that were likely to be not written in English. To tackle this, language detection of the users’ comments can be performed earlier to toxicity analysis. Please cite a reference to support the argument regarding manual tagging by Al Jazeera’s journalists and editors for topical keywords, if any. A few state of the art machine learning (ML) methods can be used for performance evaluation purposes, e.g., it would be interesting to compare the performance of traditional ML methods like decision trees classifier with that of the feed forward neural network. Moreover, performance evaluation can also help in selecting a suitable ML method for analysis. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 10 Dec 2019 Reviewer #1: This paper explores the relationship between the topics of news videos posted on Youtube (from the Al Jazeera Media Network), and the toxicity of viewers’ comments. The subject is interesting and timely. The methodology is also interesting, and generally well described. Results show that comment toxicity varies according to the topic presented, which leads the authors to suggest that news comment toxicity is topic-driven (targeting topics) rather than vindictive (targeting people). >>Thank you for seeing the value in our research! While overall I think this is very interesting work, I have some concerns regarding the takeaways (or at least their current presentation) that I believe call for a revision. The paper would also benefit from another pass on typos and grammatical errors. We thank the reviewer for this comment. >>The manuscript has now undergone another round of proof-reading. Changes made are highlighted in yellow color in the manuscript. On pages 25, 27, and 28, the authors should absolutely (!!!) remove the names of the persons they quote. Even if they are pseudonyms (which they do not seem to be), the authors should never include any clue that might allow to (re)identify individuals. This is an important breach of research ethics! >>Thanks for noticing - these have been removed. In 6.1, the second recommendation seems dangerous to me. Although it follows a trend reported in other work, segmenting news audiences based on their worldview and curating content to comfort those views is precisely how thought echo-chambers are created. I strongly encourage the authors to nuance this point, and to offer some critical reflection on it. For example, while it might be a desirable economic outcome for the outlet, it is very likely not a desirable social outcome. >>We have revised Section 6.1 by removing the recommendations the reviewer considers dangerous. We ask the reviewer to take a look and provide suggestions for additional changes if needed. I am not sure what the very last point in 6.2 adds to the discussion. The authors seem to imply that journalists should not use sources, as they might fuel hostility in comments. This is absurd. More generally, I am a bit concerned by the tension that exists throughout Section 6.2 between adapting journalistic practices to avoid online toxicity in comments, and, as the authors put it: “obviously, [avoiding sensitive topics] is not a good strategy.” While the authors do try to make it explicit that news outlets should not submit to the “tyranny of the audience”, their main arguments and suggestions seem to do exactly that, e.g., by depoliticizing content, or by applying a “humanistic” formula on every story. I recommend the authors rethink and rewrite this section to reduce this tension—especially in light of the first paragraph of 6.3. >>The reviewer’s comments are insightful. >>However, this is not an easy problem to address. >>Actually, it is a useful recommendation to frame stories on sensitive topics using human stories, as our findings suggest this can alleviate the conflict among social media audiences, if this is an organizational objective. Overall, there are very few practical recommendations for journalists in prior literature on how to address toxicity, or even consider it as a factor they could have control over. Thus, providing suggestions about story framing is useful and, naturally, editors and journalists can freely choose if they want to make use of these suggestions. We would love to hear some constructive ideas from the reviewer on this concerning other suggestions for journalistic practice. >>Regarding the aspect of tension, we do agree to some extent with the reviewer’s comment. The tension, however, is not of our making, but it follows from the state of the matters in the real world: if topics affect toxicity, then toxicity can be reduced by avoiding topics. This is just logical fact-based argument, not a recommendation of how things “should be” from a moral perspective. >>How things should be, in an ideal world, is that news channels and journalists would be perceived as objective by the audience, without agendas and biases. If opposite perceptions take place among the audience, as we increasingly see from social media commenting, the result will be more anti-media sentiments, alternative news, and judging traditional news media as “fake news” that only report on certain topics with certain sentiments. >>So, there *is* a serious conflict taking place in the real world between news audiences and news reporting. To be fair, this tension should not be hidden from the discussion of our results. >>As a side note, “depoliticizing” content does not mean avoiding political topics as the reviewer might imply. It means defusing a controversial topic by using a framing style that is aimed at defusing toxicity while maintaining high journalistic standards. We have now clarified this in p. 29. >>To illustrate, consider a binary choice: given the journalist knows Topic A is controversial, does the story framing strategy aim to (a) exacerbate controversy or (b) alleviate controversy. This strategic choice, we argue, is important for the toxicity outcome. >>Also note that we do not make the argument that human story angle should be applied to “every story”. Rather, consider news reportage as a mixture of framing styles and topics. This mixture can have topics and framing styles in different proportions to affect the total toxicity levels of a news organization. In one extreme, we have a news organization that is only reporting on controversial topics with a framing style that is polarizing. This combination, obviously, yields maximal toxicity in audience reactions. The opposite extreme, meaning avoiding controversial topics and reporting on everything with a non-polarizing strategy, would mitigate toxicity. The balance could be found somewhere in between, with a fair coverage of controversial topics using different story framing styles. Thus far, toxicity has not been a factor in editorial decision making, but could it be? This question is worth posing. >>From a philosophical point of view, one can consider this as a choice of worldviews: there can be a choice of worldview that states “toxicity is out there, it is external, and I (as a media agency) have no control over it”. Or, there can be an alternative view: “toxicity is out there, but I (as a media agency) can find ways to mitigate it with my own actions.” According to our experiences with journalists in this media organization, most of them subscribe to the first worldview ---- but, that does not mean that we, as researchers, could not increase their awareness over the potential ways toxicity could be defused. From an ethical point of view, the opposite seems like an imperative; and, if anything, we are proposing too few framing styles here. Future research should pursue these styles in much greater detail. >>Again, we welcome any suggestions on how to improve the discussion – but we cannot shy away from the tension of media being an actor in online toxicity. Topics and story framing are associated with toxicity – where to go from there is the million-dollar question. I think the beginning of Section 3 could be made clearer. What I understand from 3.3 is that the Neural Net is trained on (textual) articles from Al Jazeera’s website, which are tagged with topic labels, and then used to predict the topics of Youtube videos, using their titles and descriptions. Is this correct? This was not clear to me in 3.1. >>Correct. We have clarified this in the following terms (p. 10): “In other words, the FFNN is trained on textual articles from Al Jazeera’s website, which are tagged with topic labels, and then used to predict the topics of YouTube videos, using their titles and descriptions.” Further, are all the articles scraped form Al Jazeera’s website in English? The authors seem to imply their data collection is limited to English in 3.4, but do not mention it explicitly. >>Yes, in English. We have clarified this in the text (see p. 11). Is each article on Al Jazeera’s website tagged only once, or can the same article have multiple tags? i.e., is the video-topic-prediction problem a multi-class or multi-label classification problem? The latter would then also complexify the study of the relationship between topic and comment toxicity. That said, it seems from the end of 3.6 that the authors’ Neural Net only outputs one label per video. >>Good question. It is a multilabel classification problem, as one news article can contain many labels. We have modeled the problem as such, and now clarify this in p. 13: >>“From a technical point of view, this is a multilabel classification problem, as one news article is typically labeled for several topics.” >>Regarding the end of 3.6, the output of the classifier is confidence value for each news story and each topic (of which, the topics are the ones exceeding a set threshold value for the confidence, in our case the commonly used value of 0.5). For parsimony, we choose the topic with the highest confidence value for the statistical analysis, i.e., a story has only one dominant topic in the statistical analysis. Using all topics would make the statistical comparison exceedingly complex. We now mention this point explicitly in p. 14. >>[Reference to other comment: **] I appreciate the authors’ attempts to bring external validation to each step of the data collection and (automatic) annotation process. >>Thank you! I wonder how well connected the comments for a given video are to the topic of that video. Do viewers always stick to what they have watched in the video? Or might they write about other, unrelated topics? Admittedly, the authors do mention this in their discussion, but it would be good to state it earlier in the paper. Also, does each video contain only one topic? (this is related to the multi-label problem mentioned above). Assessing this seems important, since topic toxicity is simply calculated by subsequently averaging the comment toxicity for each video, and the video toxicity for each topic (see 3.8). >>Regarding the multi-label point, this was mentioned in relation to the previous comment (see reference to other comment: **). >>The “off-topic” factor is treated as random noise in our analysis, in the sense that we assume each comment for each video to be equally likely to have off-topic elements. This assumption, although somewhat naïve, is required because the alternative would be to explicitly model the topic structure of the comments. Doing so could possibly break the association between the comments discussing the video (with a known topic) unless the classification error would be much smaller than that of the video topic classification. It is likely that, given the performance of short-text classification, attempting a comment-specific classification would introduce another source of error, which in our opinion could be higher than that of leaving the off-topic comments in place and considering them as random noise. Most certainly, it would add to the complexity of the research design that, we feel, is already complex enough with several steps of analysis. >>So, in conclusion, while we agree with the reviewer that in the ideal case each comment would be evaluated for off-topic content, in reality, we must maintain this being a limitation of our particular study. I am not sure I understand the procedure for the aggregation of topics into superclasses. Did only one author do this? (Which seems to be the case, but the sentence mentioning this is not clear) Or did several? Also, is there overlap between News Topics and Countries and Regions, or are the topics that are combined into each of these superclasses distinct? >>The aggregation was done by one of the researchers and verified by another researcher, now clarified in p. 17. >>Regarding news topics and countries and regions, the taxonomy treats these as equal classes (not parallel or hierarchical). This means that if the classifier gives “Middle East” as the highest and “Science & Technology” as second highest, in the statistical analysis we would use “Middle East” as the class of the story. (Also see reference to other comment: **.) Section 4.2 is very unclear to me. I suggest it be entirely rewritten, possibly condensed, and moved after 4.3 (at least after table 5). What is beta in the equation and in the null hypothesis? >>We have revised this section and moved it after Table 5 (now Table 4). Please refer to p. 18. >>Beta is the estimated regression coefficient (mentioned now in p. 18). >>If anything else is unclear, please ask. Happy to clarify! Can table 6 be fit into a single page? Also, the description in Section 4.3 mentions color in the matrix, but I do not see it. It would be good to highlight significant differences in the pairwise comparisons. >>To address this comment, we have replaced the table with Figure 1 (p. 20). This figure communicates the statistically significant differences better, while also demonstrating the variation in some of the multiple comparison test results. We hope the reviewer finds this solution satisfactory. In Section 5, the authors mention reading through and (manually?) coding the comments and discussions under the videos. How was the coding done? How many coders were there? >>Page 21 now explains the method for qualitative analysis in greater detail. Essentially, it was carried out as a collaborative effort between two researchers. In the subsection on Platform’s power, the authors do not prove there is a causal relationship between Google/Youtube’s description of the news outlet and the comments the they highlight. I understand the commenters are making statements based on the relationship between Al Jazeera and the Qatari government, but claiming this is directly linked to the phrasing of the Youtube tag is a bit of a stretch—which, again, is not empirically proven. What is the proportion of comments that directly target the relationship between the outlet and the government? Does this targeting not occur for the other outlets? Are there instances where Youtube does not tag a video and these types of comments are not present? This seems very one-way focused, and biased towards defending Al Jazeera. I do not think this has its place in the paper. In addition, it seems the wording of the three labels shown in figure 1 simply follows that of the first sentences of the three outlets’ respective Wikipedia pages. It is likely this wording is automatically derived from those sentences. I suggest the authors highly nuance, or even remove this subsection, as well as the discussion in Section 6 on it, as it seems partisan and weakens the rest of the contributions. >>We agree with the reviewer and have removed this subsection from the results and discussion. In 6.3, one last, important limitation is that the study was only conducted in English. >>Added – see p. 33: >>“Moreover, the study was only conducted in English, leaving room for replication in other languages.” Add white space below each table. >>Done. In 6.2, “it is becoming increasingly difficult for news media to remain [un]biased…” >>We’ve corrected this sentence as follows (p. 5): >>“it is becoming increasingly difficult for news media to provide facts without seen as a manipulator or stakeholder in the debate itself.” Also in 6.2, “—intentionally or [un]intentionally—” >>Fixed (see p 29). Reviewer #2: The theme of the paper is interesting, overall paper is well written and well organized. Moreover, the analysis have been rigorously performed and results are presented appropriately. >>Thank you! In this paper, a third party service is used for toxicity quantification that was unable to compute toxicity on 21.8% of the comments that were likely to be not written in English. To tackle this, language detection of the users’ comments can be performed earlier to toxicity analysis. >>Good point! While this could have been done separately using a language detection approach, since Perspective API outputs an error for the non-English comments, a separate analysis was not seen necessary. Please cite a reference to support the argument regarding manual tagging by Al Jazeera’s journalists and editors for topical keywords, if any. >>Good suggestion! We have justified the use of journalists (=domain experts) for training data annotation as follows (p. 14): >>“The importance of using domain experts for data annotation is widely acknowledged in ma-chine learning literature [54,55]. Generally, expert taxonomies are considered as gold stand-ards for classification [56].” A few state of the art machine learning (ML) methods can be used for performance evaluation purposes, e.g., it would be interesting to compare the performance of traditional ML methods like decision trees classifier with that of the feed forward neural network. Moreover, performance evaluation can also help in selecting a suitable ML method for analysis. >>Yes, we actually did compare the neural network (NN) to Random Forest (RF) which is a tree-based method as suggested by the reviewer. The overall performance was better for NN (F1=0.700) relative to RF (F1=0.458). This is now explained in p. 14: >>“It also clearly outperforms a Random Forest (RF) model that was tested as a baseline model (F1RF=0.458).” >>We acknowledge there are other algorithms to test as well, such as LightGBM and XGBoost that have obtained high performance in text classification. However, for the purposes of this study, we consider the performance obtained with our FFNN as adequate. The contribution of the paper is not technical but rather focused on the analysis of the relationship between toxicity and news topics. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository >>The data and full results (using various correction methods) have been uploaded as a Supporting Information file. Submitted filename: Response to reviewers.docx Click here for additional data file. 23 Jan 2020 Topic-driven Toxicity: Exploring the Relationship between Online Toxicity and News Topics PONE-D-19-19498R1 Dear Dr. Salminen, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards, Pin-Yu Chen, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): The revised version has addressed the reviewers' concerns from the previous round. I thank the authors and the reviewers for making great efforts in improving this submission. I recommend to accept this version for publication as is. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No 12 Feb 2020 PONE-D-19-19498R1 Topic-driven Toxicity: Exploring the Relationship between Online Toxicity and News Topics Dear Dr. Salminen: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Pin-Yu Chen Academic Editor PLOS ONE

3 in total

2 in total

1. A Multi-platform Approach to Monitoring Negative Dominance for COVID-19 Vaccine-Related Information Online.

Authors: Paola Pascual-Ferrá; Neil Alperstein; Daniel J Barnett
Journal: Disaster Med Public Health Prep Date: 2021-05-03 Impact factor: 1.385

2. How the term "white privilege" affects participation, polarization, and content in online communication.

Authors: Christopher L Quarles; Lia Bozarth
Journal: PLoS One Date: 2022-05-04 Impact factor: 3.752